Advanced Text Processing Part 2

Video Activity
Join over 3 million cybersecurity professionals advancing their career
Sign up with
Required fields are marked with an *
or

Already have an account? Sign In »

Time
21 hours 25 minutes
Difficulty
Intermediate
Video Transcription
00:00
>> Hey cybrarians. Welcome back to
00:00
the Linux plus course here at cybrary.
00:00
I'm your instructor Rob Gells.
00:00
In today's lesson, we're going to pick up where we left
00:00
off in our second part on advanced text processing.
00:00
Now, upon completion of today's lesson,
00:00
we're going to be able to work with some more
00:00
powerful text processing tools and utilities.
00:00
Today we're actually going to talk about awk,
00:00
sed, and printf.
00:00
Let's get to it with some demo time.
00:00
Here we are back in our demo environment.
00:00
We're going to work with awk first.
00:00
Awk is an incredibly powerful and complicated command.
00:00
Mostly complicated because it has
00:00
a goofy syntax, which we'll see in a minute.
00:00
Now, awk is often used in conjunction
00:00
with sed and we do cover sed next.
00:00
Indeed a whole book has been written about sed and awk.
00:00
Definitely Google the sed
00:00
and awk book for more information.
00:00
The awk command is used to perform text
00:00
processing one line at a time.
00:00
The shortest useful awk example
00:00
is just to print a certain part of a line.
00:00
Awk uses a whitespace or
00:00
whitespace as a delimiter by default.
00:00
We can just use a file where
00:00
there's whitespace and I created one.
00:00
For example, if we do a awk example,
00:00
we do a cat, an awk example.
00:00
We can see we just have a bunch of numbers
00:00
here that are separated by whitespace.
00:00
If I were to run awk,
00:00
and then this is the weird syntax I'm
00:00
talking about we have to open
00:00
awk with a single quote and a curly brace.
00:00
We're going to do a print on dollar 1, sorry.
00:00
Then let's close it with a curly brace and print
00:00
our output with a closing single quote as well.
00:00
What we're going to see here
00:00
is that this is actually going
00:00
to print the first line here.
00:00
Let me just clear this up a little bit.
00:00
We need to specify the file,
00:00
so we're going to do awk example.
00:00
Sorry about that. There we go.
00:00
We see that it just prints 1 and you're like "Well,
00:00
I could just do that with the cut command."
00:00
Well, this is also really helpful for other reasons.
00:00
It's also really simple to just
00:00
specify things like different lines in here.
00:00
For example, if we wanted to specify,
00:00
want to print five or we want to
00:00
print six, we can add that as well.
00:00
Now we see 1, 5, and 6.
00:00
Those separated areas, 1,
00:00
2, 3, 4, 5, 6,
00:00
this is the fifth, and sixth part of
00:00
the file separated by whitespace,
00:00
it gets printed out.
00:00
Let's see some more cool stuff we can do with awk.
00:00
We can also specify the field
00:00
>> separator or delimiter dog.
00:00
>> An example of this would be
00:00
>> our old buddy Etsy passwords.
00:00
>> If we just do a less on Etsy password.
00:00
We can see that it's delimited by colon,
00:00
its field separator or delimiter.
00:00
Awk uses the F option,
00:00
the capital F option to specify the delimiter.
00:00
For instance, if we wanted to print
00:00
just the users from Etsy password,
00:00
we can do awk with a field separator.
00:00
Capital F colon, and then
00:00
print the first column in Etsy password,
00:00
and we can see all the users.
00:00
We can also rearrange a file a little bit with awk.
00:00
This is where it really shines,
00:00
this is where it's really helpful.
00:00
Let's go back to this command and we're going to print
00:00
anything in Etsy password.
00:00
We want to print all the users.
00:00
But before we do that,
00:00
let's actually get some more information in here.
00:00
Let's say we're going to print the user,
00:00
and we're going to say the user number,
00:00
and we're going to say is,
00:00
and we're going to give it a space here,
00:00
and we'll use number 1.
00:00
Now when we hit Enter and print that out.
00:00
It actually gives us a little bit more information.
00:00
We'll take a space out of here.
00:00
You don't really need that extra space. There we go.
00:00
User is 0 and then the name of the user.
00:00
We can see that when we run
00:00
this type of command through awk,
00:00
each portion that we want to print is comma separated.
00:00
User in quotation marks separated,
00:00
and then we just
00:00
separate out the things we want to print it that way.
00:00
Just for reference, this NR
00:00
>> stands for numerical records.
00:00
>> I'm just printing out the numbers
00:00
associated with each username.
00:00
As I said, awk is incredibly powerful.
00:00
This is just a taste what we can do.
00:00
It's a bit beyond the scope of the exam,
00:00
but knowing how to use awk is incredibly powerful.
00:00
There's one bit of advice I can give
00:00
you about working in Linux.
00:00
Learn how to use Bash,
00:00
awk and sed together. They're awesome.
00:00
That brings us to our next command, which is sed.
00:00
Now sed, S-E-D is short for stream editor.
00:00
It's used to perform actions on text.
00:00
Sed is most often used to stroke
00:00
to search for a string of text and replace it.
00:00
Well sed can do a lot more of
00:00
the most common action is going
00:00
to be one of the following.
00:00
Either to replace a single occurrence
00:00
of a string in each line of a file,
00:00
or replace every occurrence of
00:00
a string in each line of a file.
00:00
Let's take a look at this.
00:00
Let's go ahead and let's just do a grep
00:00
for root in Sed password,
00:00
and then we can see that these are
00:00
the files or these are the lines
00:00
in the etsy password file where the string root occurs.
00:00
We can do sed S for root.
00:00
Let's make all the,
00:00
every first occurrence of route into uppercase.
00:00
We're going do that on Etsy password.
00:00
Let's go ahead and grab that to,
00:00
I Etsy password searching for root.
00:00
Then what it'll do is it'll actually
00:00
display all of the occurrences.
00:00
For the first occurrence is a root in uppercase.
00:00
We can see that that is the case
00:00
here. We missed something here.
00:00
We have S root on Etsy password.
00:00
We'll grab that I root
00:00
and let's give it a less Etsy password here.
00:00
Apologies, we don't need that.
00:00
We can see the first occurrence of
00:00
root in each one of these lines is an uppercase.
00:00
But let's do that for every line.
00:00
Again, if we grab and read,
00:00
we see that there are two lines here,
00:00
and we see that root occurs quite a few times on
00:00
this first line and only one time in the second line.
00:00
We can do basically the same command,
00:00
we do sed, and then we specify
00:00
we're doing a search for root.
00:00
Every time we find root,
00:00
we want to make it uppercase.
00:00
Now instead of stopping right there,
00:00
we specify G, which is the global option.
00:00
Anytime we find it, not just the first time we find it,
00:00
we're going to make this conversion.
00:00
We're going to edit the string,
00:00
and we're going to do this on Etsy password again.
00:00
We're going to grep for
00:00
dash-I root and any occurrences in the outlet.
00:00
That's going to display just the
00:00
>> occurrences where we see
00:00
>> root and root is now all in uppercase on every line.
00:00
The final command we're going to look at
00:00
today is the printf command.
00:00
The printf command can be used to perform
00:00
print output to format print output,
00:00
it's helpful if you want to print
00:00
a string with arguments are variables,
00:00
but you don't wanna have to break it up.
00:00
Instead, what you can do is you can
00:00
use format settings to
00:00
accept arguments to the command that trying to write.
00:00
For example, you can use percent
00:00
d. This displays a decimal,
00:00
displays the argument as a decimal,
00:00
you can use percent
00:00
S. This displays any argument as a string.
00:00
Then also you can use Luddy\n,
00:00
which is used for a new line.
00:00
The best way I can really show
00:00
you this just to give you an example.
00:00
If I do a printf,
00:00
and I say v percent S
00:00
barks percent d times and then\,
00:00
and then I just provide it,
00:00
I do the new line with \n.
00:00
I can just provide it with a string because that's
00:00
the first thing we're looking for is the percent S is
00:00
the string are here percent S. Then I
00:00
can provide it with an integer number 5,
00:00
and now when I hit Enter, it's
00:00
>> going to say that the dog
00:00
>> barks five times and it's going to give us a new line.
00:00
That's really all you need to
00:00
know about the print f command.
00:00
It does come in handy when you're
00:00
trying to format output,
00:00
especially when you're running
00:00
a command or writing a script.
00:00
With that, we've reached the end of this lesson.
00:00
In this lesson we covered working with
00:00
advanced text processing tools and utilities,
00:00
awk, sed and printf.
00:00
Thanks so much for being here. I look
00:00
forward to seeing you in the next lesson.
Up Next