Advanced Text Processing Part 1

Video Activity
Join over 3 million cybersecurity professionals advancing their career
Sign up with
Required fields are marked with an *
or

Already have an account? Sign In »

Time
21 hours 25 minutes
Difficulty
Intermediate
CEU/CPE
21
Video Transcription
00:00
>> Hey, cybrarians. Welcome back to
00:00
the Linux plus course here at Cybrary.
00:00
I'm your instructor Rob Gills.
00:00
In today's lesson, we're going to be discussing
00:00
advanced texts processing
00:00
>> Part 1 in our two-part lesson.
00:00
>> Upon completion of today's lesson,
00:00
you're going to understand the need for
00:00
advanced text processing and where we can use it.
00:00
We're also going to work
00:00
>> with a couple of powerful texts
00:00
>> processing tools and utilities such as grep,
00:00
egrep, and fgrep.
00:00
Let's go ahead and get to it with some demo time.
00:00
Here we are
00:00
back in our CentOS environment.
00:00
We didn't use this because this is where I have
00:00
everything setup for our demos
00:00
for this particular module.
00:00
We're going to go ahead and use
00:00
the new regular expression parser or grep for short.
00:00
We covered this back in Lesson 9.3.
00:00
Grep is used to search for
00:00
a specific string of characters within a file.
00:00
The common syntax when you're talking
00:00
about grep is you grep for an option,
00:00
and you grep searching for a string,
00:00
and then you specify the file you want to search in.
00:00
We're going to grep using any options we want to use
00:00
to do a search for a string in a file.
00:00
I'll give you an example here.
00:00
For instance, we can search with
00:00
no case sensitivity when we're looking
00:00
for content in /etc/password.
00:00
For instance, I'm going to search for the string, Rob,
00:00
my name, in /etc/password.
00:00
If I hit ''Enter'' here,
00:00
we're going to see that it returns
00:00
my name in /etc/password.
00:00
You may be thinking, great, who cares.
00:00
This looks exactly the same as
00:00
other greps that we ran previously.
00:00
Well, the nice thing here is that with
00:00
a case and sensitivity command,
00:00
I can type the string I want to search
00:00
for with any type of capitalization,
00:00
it'll return the same result.
00:00
Another great option that you could
00:00
use with grep is to do an inverse search.
00:00
We could do a grep minus v on Rob and /etc/passwd,
00:00
and what this is going to do instead is return
00:00
any occurrences in this file that are not my name,
00:00
so that's everybody else in this file.
00:00
Every other line in this file
00:00
that doesn't contain the string Rob.
00:00
This is a silly example.
00:00
A more helpful example is when you have
00:00
a file that has a lot of comments.
00:00
For instance, if we look at
00:00
the file /etc/ssh/sshd_config.
00:00
I'm going to need to put in my pseudo password
00:00
because it's a protected file.
00:00
Now we can see that this file has
00:00
just a bunch of comment lines,
00:00
but it has very little actual meat in it,
00:00
nothing that we can actually see.
00:00
You're reading through this and you're just
00:00
like comment after comment.
00:00
You're like, "Why do I care? I just want to
00:00
see the mean of the file."
00:00
You can actually specify grep minus
00:00
v and then give the character you want to exclude.
00:00
In this case, we want to exclude
00:00
the # sign or the £ sign,
00:00
and then we see only the lines
00:00
>> that contain information.
00:00
>> In other words, nothing that starts with
00:00
a pound sign or a hash symbol.
00:00
All the comments are excluded from the file.
00:00
Couple of other good options
00:00
that you can have with grep is that you can do
00:00
grep -n and that does matching on line numbers.
00:00
You can do a grep -n for my name again and /etc/passwd.
00:00
Now we can see that my name or
00:00
the string for Rob occurs on line 47 and /etc/passwd.
00:00
If we wanted to find any place
00:00
where the string Rob
00:00
occurs in etc, we could do a string,
00:00
we can do a pseudo grep
00:00
-l. It's going to list any place that
00:00
Rob lives in the etc directory
00:00
and we can see any place that it comes in.
00:00
We're also going to see some information about
00:00
unreadable files because they're directories,
00:00
we can suppress that by using the s option.
00:00
You can combine the two of these and now we
00:00
see all the places that we can actually
00:00
read and the places that contain
00:00
>> the string Rob in them.
00:00
>> Let's take a look now at a little bit more about grep.
00:00
Grep uses something called basic regular expressions.
00:00
Basic regular expressions only
00:00
use a handful of metacharacters.
00:00
Metacharacters are used to find
00:00
strings based on character matching,
00:00
the number of characters,
00:00
string positioning range of
00:00
characters that can do a lot of great things.
00:00
But the most common ones are
00:00
the asterisk, which we've already seen.
00:00
The asterisk finds zero
00:00
or any number of characters that
00:00
match the preceding characters.
00:00
If we were do a grep for Ro,
00:00
asterisk and /etc/passwd,
00:00
this is going to find anything that matches
00:00
zero or more times against just RL.
00:00
We'll see ripple, will also see anything that has r
00:00
or ro and you also see PROC,
00:00
we see a bunch of stuff here.
00:00
The period metacharacter is used
00:00
to match one character at the position.
00:00
This will match any one character at this position.
00:00
Ro and then any one character and /etc/passwd.
00:00
That turns more stuff that we're looking for.
00:00
We see the root user,
00:00
we see root mentioned here again,
00:00
but we also see the word trousers,
00:00
we see proc,
00:00
crony, set troubleshoot,
00:00
and you just any place that has ro and then
00:00
one other character is returned by the period command.
00:00
Now some really helpful metacharacters
00:00
that I find are helpful
00:00
to use are the caret and the dollar symbol.
00:00
What these are is that a caret
00:00
searches at the beginning of a line.
00:00
It searches for any string at the
00:00
very beginning of a line.
00:00
It's helpful to find usernames.
00:00
For instance, if we want to find
00:00
any user names that start
00:00
with ro and then
00:00
have any other character at the end of them,
00:00
we can do an ro dot and /etc/passwd with a character,
00:00
so character ro dot,
00:00
the caret character, and then returns root and Rob.
00:00
If we want to find anybody who uses the bash shell,
00:00
we can search at the end of the line because you can
00:00
see here it says bin bash is being used.
00:00
We could search with the end of the line.
00:00
We could search for bash, dollar symbol.
00:00
That's going to indicate search at
00:00
the very end of the line for the word bash,
00:00
and it returns any users that use
00:00
bash as their default shell in the system.
00:00
If you have to search for a metacharacter itself,
00:00
you have to escape that character using a backslash.
00:00
This negates the special meaning of the character.
00:00
For instance, if you're looking for an
00:00
asterisks and you need to
00:00
find an occurrence of it without
00:00
actually having a special character translated,
00:00
you can use Pseudo.
00:00
Let's use Pseudo because we need to go into
00:00
the do the shadow file,
00:00
so we'll do Pseudo grep and then were searching for in
00:00
this file and occurrence of asterisk in etc shadow.
00:00
We can see any place that asterisk occurs in this file.
00:00
If we were trying to, for instance,
00:00
search or two occurrences of
00:00
a single thing in one file, for instance,
00:00
let's say that I wanted to search for
00:00
my name and root in /etc/passwd, we could do a grep,
00:00
and we're going to do a search for
00:00
roots and we're going to do
00:00
a backslash pipe, and then my name.
00:00
That backslash escapes the meaning
00:00
of pipe and /etc/passwd,
00:00
and now we can see any places that
00:00
the string root or the string Rob occurs.
00:00
That's pretty much what you need to know for grep.
00:00
Now, egrep and fgrep get
00:00
mentioned on the command as well,
00:00
and those commands get mentioned on the exam as well.
00:00
The command egrep is used for extended
00:00
to regular extended egrep.
00:00
The only major difference with
00:00
extended regular expressions is that
00:00
some characters don't need to be escaped,
00:00
such as curly braces or pipe characters.
00:00
In the last example where we had to escape
00:00
this pipe with egrep, we don't have to do that.
00:00
We can just type in egrep,
00:00
and then we'll know that that is
00:00
a special character that doesn't
00:00
need to be escaped because it's
00:00
an extended regular expression.
00:00
Now the command fgrep can be useful when you want to
00:00
find a metacharacter and not have to escape it.
00:00
The other example that we use where
00:00
we had to escape this,
00:00
we can instead just do fgrep,
00:00
and then we're going to search for
00:00
this character in that file,
00:00
and we can see all the places that
00:00
the asterisk occurs in etc shadow.
00:00
But with that, in this lesson,
00:00
we covered working with advanced
00:00
text processing tools and
00:00
utilities such as grep, egrep and fgrep.
00:00
Thanks so much for being here and I look
00:00
forward to seeing you in our next lesson.
Up Next