Text Processing (Demo)

Video Activity
Join over 3 million cybersecurity professionals advancing their career
Sign up with
Required fields are marked with an *
or

Already have an account? Sign In »

Time
21 hours 25 minutes
Difficulty
Intermediate
CEU/CPE
21
Video Transcription
00:00
>> Hey, there Cybrarians.
00:00
Welcome back to the Linux plus course here at Cybrary.
00:00
I'm your instructor Rob Gals,
00:00
and in today's lesson,
00:00
we're going to discuss text processing.
00:00
Upon completion of today's lesson,
00:00
you're going to be able to understand
00:00
the purpose of text processing in Linux,
00:00
the things that we can do with text processing,
00:00
and we're going to work with text
00:00
processing tools and utilities,
00:00
those are things like tr,
00:00
sort, cut, wc, and paste.
00:00
But without further ado, let's go ahead and get to
00:00
it with some demo time.
00:00
Here we are back again in our demo environment,
00:00
and today we're in [inaudible].
00:00
Our first command we're going to cover
00:00
is the echo command.
00:00
Echo just simply echoes output to the screen,
00:00
which as we know from our earlier lessons
00:00
here, is standard output.
00:00
We can just type in for instance,
00:00
why hello there,
00:00
and then hit "Enter",
00:00
and it's going to echo, why hello there, to the screen.
00:00
This is silly, but echo is
00:00
really great if you wanted to put something into
00:00
a file without bothering to touch
00:00
the file and then use Vim or nano.
00:00
For instance, you could just
00:00
echo with the why hello there
00:00
command into a file called hello file,
00:00
and we'll just do a standard
00:00
single output redirection to create this file,
00:00
single greater than sign,
00:00
and now when we do a cut on Hello file,
00:00
we'll see that it has the why hello there,
00:00
echo that we sent earlier.
00:00
Now, it's also really commonly used in scripting
00:00
to print a process indicator or a step information.
00:00
In my home directory here,
00:00
we can do a cut on a script
00:00
that I created called step info,
00:00
and really all this is going to do is it's going to
00:00
run this script in bash,
00:00
and it's going to echo an output of the steps,
00:00
and then it's going to sleep for
00:00
two seconds between those,
00:00
so if we run step_info,
00:00
we will see that the echo prints the output,
00:00
we wait two seconds, it prints another output,
00:00
prints another output, and
00:00
then it prints the done message.
00:00
That's just how that works.
00:00
That's generally the best use of
00:00
echo that I could tell you about.
00:00
Just keep that in mind, mostly is for the Linux exam,
00:00
or if you're ever doing scripting and you
00:00
needed print output to the screen.
00:00
Now, the next command we'll look at is that tr command,
00:00
which is short for translate.
00:00
Translate is used to change characters
00:00
in a file or input from the keyboard,
00:00
and tr is another great example
00:00
of using input redirection.
00:00
You do need to pass it input
00:00
to translate on such as a file.
00:00
In the following example, we can look at
00:00
a mixed case file that I have in my home directory.
00:00
If we do a cut on mixed case,
00:00
we can see that this file has a bunch of stuff
00:00
that's capitalized strangely in weird cases.
00:00
We can pass this to the tr command,
00:00
so you do a tr on A-Z,
00:00
and then put that to lowercase a-z.
00:00
What that's going to do is it's going to find
00:00
any characters that are capitalized,
00:00
and it's going to make them lowercase.
00:00
Then we can provide an input file,
00:00
again using input redirection,
00:00
to the tr command,
00:00
and it'll take all of that content
00:00
>> and make it lowercase.
00:00
>> We can also delete certain character sets
00:00
with a -d option.
00:00
Character sets have the syntax,
00:00
a square brace, and then the name of a set,
00:00
and then another square brace.
00:00
For instance, we can look at a file that has
00:00
numbers in it that I created here called numbered file,
00:00
and we see that all these lines
00:00
>> here start with a number.
00:00
>> To get rid of the numbers with
00:00
the tr command, we use tr-d,
00:00
and then we can say that the character
00:00
set that we want to remove
00:00
is digit, D-I-G-I-T area.
00:00
Then we're going to close that with
00:00
another square brace,
00:00
and we'll do input redirection
00:00
here for the numbered file,
00:00
and now when we run this command,
00:00
we can see that all digits are removed in the output.
00:00
Now let's move on to our sort command,
00:00
and I told a little white lie,
00:00
sort doesn't actually need input redirection to run,
00:00
you can just pass a file into it.
00:00
Let's look back at our Alphas
00:00
sort file that we had from earlier,
00:00
and we see that this file has a bunch of
00:00
weird organization in terms of alphabetical sorting.
00:00
We can just say Alpha sort,
00:00
and then it'll send all that information back to
00:00
us to this screen, sorted alphabetically.
00:00
If a file has numbers,
00:00
sort can actually sort that numerically as well.
00:00
We can do a cut on num_sort,
00:00
which is a numeric sort file,
00:00
then we do a sort on num_sort,
00:00
it's going to sort of for us numerically. Or is it?
00:00
Not really. It's putting a 10 before a one.
00:00
Well, we can tell sort to work a certain way.
00:00
We can force sort to work the way we want it to.
00:00
If you want to sort it in the terms of like
00:00
a string numeric value, we need to do a sort-n,
00:00
and then num_sort,
00:00
and that will sort it in the format
00:00
that we're used to, so 1, 2,
00:00
3, 4, 5, 6, 7, 8, 9, 10,
00:00
11, 12 instead of 10,
00:00
11, 12, 1,
00:00
2, 3, 4, 5.
00:00
That's the kind of sorting we're used to seeing
00:00
when we're talking about sorting things numerically.
00:00
We can also tell sort to sort by a different column.
00:00
We do that with using the K key.
00:00
The K option stands for Key,
00:00
and what it actually refers to is the column number.
00:00
For instance, all of these numbers would be
00:00
our first column and all of
00:00
these names would be our second column.
00:00
If we want to sort this file
00:00
alphabetically by the names in
00:00
the second column rather than
00:00
numerically by the numbers in the first column,
00:00
we can say sort-K2 to num_sort,
00:00
and what we'll see is that it actually
00:00
sorts it alphabetically,
00:00
so it starts with Alexis and Bob,
00:00
not numerically, because now it
00:00
>> starts with 11 and an 8.
00:00
>> The next command we'll look at is the cut command,
00:00
and a cut command for me is the easiest command to
00:00
use for getting a specific fields based on delimiters.
00:00
A delimiter is just the separators between files.
00:00
For instance, when a file has separators
00:00
between the fields using a space,
00:00
or using a comma,
00:00
or using a colon,
00:00
we can use the cut command to
00:00
separate things out and display
00:00
>> only the fields we want.
00:00
>> We say, hey, to limit by a colon,
00:00
and it will give you mean the first field.
00:00
For example,
00:00
we can actually run this on the Etsy password command.
00:00
What this would do is it'll just print
00:00
out the contents of Etsy password,
00:00
just the first name.
00:00
Let's see how this works. Let's take a look
00:00
at certegry password first.
00:00
When we look at this command,
00:00
we're going to see that it is colon delimited.
00:00
Every one of these fields is separated by a colon here,
00:00
and what we can do is we just want to
00:00
print out the first field,
00:00
which is the first column.
00:00
Let me clear my screen,
00:00
I'm going to run this command cut-d with
00:00
a colon as the delimiter and
00:00
the field as one on Etsy password,
00:00
and now we can see that it just outputs the names.
00:00
Now let's take a look at the wc command.
00:00
wc is short for Word Count,
00:00
and that's really all it does.
00:00
You pass any file name to wc,
00:00
and it will tell you the number of files,
00:00
the number of words, and the byte count.
00:00
Let's do that for Alpha sort file again.
00:00
That tells us that we have 12 lines.
00:00
Let's go ahead and do a wc on that.
00:00
We have 12 lines with 12 words and 81 characters.
00:00
That's what wc does for us.
00:00
Now, we can get the number of
00:00
lines in there by just doing wc-L,
00:00
and that'll just display 12 lines in Alpha sort.
00:00
I most often use wc to
00:00
give me a number of occurrences in a file.
00:00
Using Etsy password again,
00:00
let's find out how many times root occurs.
00:00
I'm going to search for root,
00:00
I'm going to grep for root and Etsy password,
00:00
and if I do a wc-L,
00:00
I'm going to see that it occurs two times.
00:00
If I take out wc-L,
00:00
I can see that that's true, it
00:00
>> only occurs on two lines.
00:00
>> Now let's take a look at one last command,
00:00
and that is the paste command.
00:00
The paste command is used to
00:00
join the data of two files together.
00:00
You might be asking, well,
00:00
how is that different from cut?
00:00
Cut joins files by appending
00:00
one content or one file right after the other,
00:00
and basically creating more
00:00
information at the bottom of the file.
00:00
Paste command joins two files together line-by-line.
00:00
For instance, in my home directory here,
00:00
if we cut out file 3 and file 4,
00:00
we can see it's got one line,
00:00
two lines for file 3,
00:00
one line, two lines for file 4.
00:00
If I do a paste on file 3 and file 4,
00:00
we are going to see that it puts those lines together.
00:00
Line 1 from file 3, line 1
00:00
from file 4 are on the same line,
00:00
whereas line 2 and line 2
00:00
from file 4 are also on the same line.
00:00
But with that, we've reached the end of the lesson,
00:00
and in this lesson,
00:00
we cover working with texts processing tools
00:00
and utilities such as tr,
00:00
sort, cut, wc, and paste.
00:00
Thanks so much for being here, and I look
00:00
forward to seeing you in the next lesson.
Up Next