Basic Static Analysis Part 1

Video Activity

In this module, we'll be discussing basic static analysis and begin with assembly code. You'll learn how to read the raw assembly code from the executable. You'll also be learning about debuggers, compilers, and disassemblers. The key characteristics of static analysis are that it is a slow, very detail-oriented process, requiring huge technical kn...

Join over 3 million cybersecurity professionals advancing their career
Sign up with
Required fields are marked with an *
or

Already have an account? Sign In »

Time
9 hours 10 minutes
Difficulty
Advanced
CEU/CPE
9
Video Description

In this module, we'll be discussing basic static analysis and begin with assembly code. You'll learn how to read the raw assembly code from the executable. You'll also be learning about debuggers, compilers, and disassemblers. The key characteristics of static analysis are that it is a slow, very detail-oriented process, requiring huge technical know-how. Static analysis is utilised to confirm your findings in dynamic analysis and understand the behaviour of malware. It also helps you to identify additional Indicators of Compromise (IOCs) such as encrypted strings or payloads, domain generation algorithms (DGA's), and network traffic encryption algorithms. Static analysis helps in determining malware defences such as anti-debugging and anti-VM. You'll also learn how to assess malware risks and their impact on a system, malware sophistication, and attributions. Further, you'll learn that assembly code is a human readable code for a particular chip. You'll also learn about various chip architectures like x86, x64. We'll also discuss about the various chip manufacturers such as Intel, AMD, ARM, MIPS and where these are used. Next, we'll understand the x64 and x86 assembly. X86 is the most common architecture and we'll discuss this in some detail as most malware is written in it. X86 is known as a complex syntax instruction set and has many functions. As an aspiring assembly coder or someone trying to read/understand assembly, you'll need programming knowledge like functions, local variables, and application programming interface (API's) and some math know-how like binary, hexadecimal and decimal (and how to convert them). We then move on to demonstrating compilers like GCC C, Cygwin; debuggers like IDA Pro, OllyDbg 2.0, and finally Visual Studio.

Video Transcription
00:03
>> Hello, welcome to cyber, my name is Sean Pierce.
00:03
I'm subject matter expert for
00:03
introduction to malware analysis.
00:03
Today we are going to be covering
00:03
the basics of static analysis.
00:03
We're going to go over Part 1 assembly.
00:03
What exactly is static analysis?
00:03
Well, we're going to read the raw assembly code
00:03
from an executable.
00:03
We use tools such as debuggers and disassemblers.
00:03
Now, when I say debugger,
00:03
you probably think, you're going to step
00:03
through each line of the source code.
00:03
That's not quite true.
00:03
If you have an IDE,
00:03
an integrated development environment,
00:03
debuggers are useful for finding bugs.
00:03
You step through each line of the source code say,
00:03
this is the if statement and this is the value
00:03
I'm comparing this variable against and etc.
00:03
Well, we don't usually have the source code,
00:03
in fact it's exceedingly rare that we do.
00:03
We step through each assembly instruction,
00:03
we're going to see some examples of that in a bit.
00:03
But several disassemblers do have debugging components
00:03
in them so you can run
00:03
the malware and if you ever do run the malware,
00:03
even with a debugger attached where you think
00:03
you can control each instruction,
00:03
I highly suggest you keep it in
00:03
a virtual machine or a VM,
00:03
like one we built in the earlier videos.
00:03
The characteristics of static analysis
00:03
is usually that it's very slow,
00:03
very detail oriented,
00:03
and with a lot of technical knowledge
00:03
required. I like it.
00:03
I don't think it's super hard once you get
00:03
the hang of it after
00:03
a few weeks of digging into something,
00:03
you get pretty comfortable with it pretty fast.
00:03
Especially if you're executing
00:03
some code over and over and over again,
00:03
you can see exactly what the instructions are doing.
00:03
But it does take time to sort
00:03
through rather large executables.
00:03
I wouldn't suggest starting with something like
00:03
Visual Studio or another large application,
00:03
like photoshop or something like that.
00:03
Because those are huge programs,
00:03
a megabyte worth of code is a lot of code.
00:03
Generally, when we're looking at code
00:03
statically and we are
00:03
looking at the individual instructions,
00:03
were trying to confirm our dynamic analysis.
00:03
We saw there it created a file in
00:03
this directory and it had this weird file name.
00:03
Is that name really hard-coded into the binary?
00:03
Will it use that name every single time?
00:03
Or did it choose that name based
00:03
>> on the date or the time,
00:03
>> or the version of the operating system?
00:03
Static analysis, we can really
00:03
understand the behavior of the malware and we can
00:03
understand what conditional behavior would
00:03
come about in a different version
00:03
of the operating system or
00:03
whether or not it had access to
00:03
the Internet or different things
00:03
about the family of
00:03
malware we're looking for
00:03
because that's very helpful too.
00:03
If we look at indicators like constants that we find,
00:03
or certain strings in the code you can just Google
00:03
for and you can see that someone else
00:03
has already done some analysis,
00:03
you can find out a lot of information.
00:03
Some strings in some malware are meant to be jokes.
00:03
They know that analysts are going to look at this
00:03
later and they try to make things harder,
00:03
like encrypting strings, they encrypt payloads.
00:03
If a piece of malware drops another file,
00:03
another executable and then executes that,
00:03
it's frequently, just encrypted in
00:03
the first dropping malware, the dropper.
00:03
It'd be interesting from a static analysis standpoint
00:03
to know when it would drop that.
00:03
I see that, it's going
00:03
to look at all the processes that are running.
00:03
If there's no processes named,
00:03
whatever debugger or have
00:03
the debugger keyword in the process title,
00:03
then I'll drop this other executable and kick that off.
00:03
We can also understand domain generation algorithms.
00:03
Some malware families out there
00:03
don't have hard coded domains or
00:03
IP addresses as command and control
00:03
servers to report back to but instead,
00:03
it gets us updates by generating
00:03
new malicious domains every single day
00:03
or every single second or every single minute.
00:03
It will use a time
00:03
or some other value like from the Internet,
00:03
like the weather or something like that.
00:03
It'll use that as a seed to
00:03
generate a domain name and then
00:03
check that domain name to see if
00:03
the malware author has put anything there.
00:03
Of course, a malware author usually signs off stuff,
00:03
so security researchers can't
00:03
just take over the botnet by
00:03
planting self-destruct configurations on
00:03
fake command and control servers.
00:03
We can also easily figure out, well,
00:03
not so easily sometimes,
00:03
but network traffic encryption.
00:03
Several families of malware will
00:03
encrypt their network traffic
00:03
to their command and control server.
00:03
Sometimes it's really easy to figure
00:03
out the encryption method they're using.
00:03
Usually it's a single byte XOR operation,
00:03
which is just one instruction
00:03
the CPU needs to run in
00:03
order to get its original traffic.
00:03
Sometimes if you look at string encryption,
00:03
you'll frequently see that it goes through
00:03
each byte and just XORs it against a value.
00:03
We can figure that stuff out statically without
00:03
running the executable and we can determine defenses.
00:03
A lot of malware out there has
00:03
anti-VM or anti-debugging defenses like I mentioned
00:03
earlier and if we dropped
00:03
an executable in our VM and
00:03
we double-clicked it and nothing happened,
00:03
or we can see in the log that it just started and
00:03
stopped and we don't really know why it's doing that,
00:03
we can throw it into a disassembler like IDA Pro,
00:03
and just look through
00:03
the code and see if anything jumps out at us
00:03
as being some kind of
00:03
anti-virtual machine technology
00:03
or anti-virtual machine code.
00:03
It's also important when
00:03
we really dig into a piece of malware,
00:03
what its capabilities are,
00:03
and what code wasn't
00:03
executed during dynamic analysis and really,
00:03
what is the risk and impact to your organization?
00:03
For example, I was given a piece of
00:03
malware and when I run it in a VM,
00:03
all it did was pop up and say,
00:03
you've been hacked by a hacker group name here,
00:03
and Explorer did a quick little refresh.
00:03
Within the first 30 seconds of static analysis,
00:03
I opened up an IDA and I
00:03
was looking at it and I was just like, oh,
00:03
it looks they just took a BAT file and
00:03
they ran a tool on it called BAT to EXE,
00:03
and made an EXE file.
00:03
I was able to recover the original BAT file,
00:03
which was just a few lines of code.
00:03
The first two lines ran a command to
00:03
disable the mouse and then run
00:03
another command to disable the keyboard,
00:03
and then it ran a command to take
00:03
the file and place it in the startup programs folder.
00:03
Then it would pop up,
00:03
you have been hacked,
00:03
then it would kill explorer.exe.
00:03
I know this my mouse and keyboard we're
00:03
working during this dynamic analysis
00:03
and I Googled around and I
00:03
found out those commands didn't work since Windows 95.
00:03
Furthermore, I guess,
00:03
in the older versions of the operating system,
00:03
when explorer.exe dies, it doesn't come back.
00:03
But on all modern operating systems,
00:03
they do or explorer.exe
00:03
responds as soon as it's no longer running.
00:03
It also occurred to me that
00:03
key.bat was probably the original file name.
00:03
It was trying to copy key.bat instead of key.exe,
00:03
which is what it's originally named.
00:03
I was able to determine pretty quickly that this
00:03
offered very little risk and very low impact.
00:03
It was very low impact to
00:03
anyone that was actually infected with it.
00:03
With static analysis,
00:03
after you do it a little while,
00:03
you can get a feel for how
00:03
sophisticated the programmer was.
00:03
I was able to look at that piece of malware and
00:03
determine that the actors were not very sophisticated.
00:03
There's obviously not very much testing.
00:03
They didn't realize that their are
00:03
persistence mechanism failed because
00:03
they renamed the file.
00:03
They didn't realize that
00:03
the keyboard and mouse we're still working after they
00:03
ran these commands and that
00:03
explorer.exe would just restart after it was killed.
00:03
Based on this and
00:03
other indicators from the region
00:03
this piece of malware came from I
00:03
was able to determine that
00:03
>> these attackers were not very
00:03
>> sophisticated and they were a lot of hype.
00:03
They talked a lot, but they didn't actually know a lot.
00:03
I was able to determine all this just really quickly
00:03
within 30 seconds of static analysis.
00:03
Dynamic analysis in this case
00:03
really did not help very much.
00:03
While I was looking at
00:03
this group that had made another piece of malware that
00:03
deleted all files on
00:03
the operating system or
00:03
that deleted all the files on the computer.
00:03
I looked at it again in static analysis
00:03
and saw that it was just a BAT to EXE file
00:03
running one command which was delete
00:03
everything recursively under the C drive forcefully.
00:03
I looked at another piece of
00:03
malware from the same region,
00:03
and it was also a wiper malware.
00:03
It used a driver,
00:03
a benign, well-known signed driver,
00:03
to get direct access to the hard drive
00:03
>> and then overwrite
00:03
>> the first several chunks of memory of
00:03
the hard-drive called the MBR
00:03
>> or the master boot record.
00:03
>> It was very faster,
00:03
and it was a much more sophisticated piece of malware.
00:03
I was able to attribute
00:03
one group's malware and
00:03
another group's malware to two different actors.
00:03
Earlier I was talking about we're going to
00:03
read the assembly of an executable.
00:03
What exactly is that?
00:03
Simple answer is it's human-readable machine code,
00:03
for a particular chip.
00:03
Each line of assembly usually corresponds
00:03
to a line of code that the CPU circuitry will execute.
00:03
Intel invented this 8086 chip several,
00:03
>> several years ago.
00:03
>> It was originally 8-bit and then 16-bit
00:03
>> and then 32-bit.
00:03
>> When I say 32-bit,
00:03
those are the size of the registers and thusly,
00:03
how much memory it could address.
00:03
32-bits when you address it via word or D word,
00:03
you could only maximally address
00:03
four gigs or gigabytes of memory.
00:03
The upper limit for
00:03
early 32-bit operating systems was four gigs of RAM.
00:03
We moved on to another Architecture,
00:03
64-bit architecture also called
00:03
AMD 64 because they're the ones who made the standard.
00:03
Then Intel made its own 64-bit
00:03
standard which was almost completely identical.
00:03
Most all 64-bit chips can run X86 code.
00:03
Like the chips have the circuitry
00:03
to run both sets of code.
00:03
If you ever look at X64 assembly,
00:03
we have in the lower left-hand corner,
00:03
it looks pretty similar to
00:03
the assembly on the right-hand side
00:03
which is X86 assembly.
00:03
We also have ARM architectures and MIPS architectures.
00:03
ARM is a lot more common than MIPS.
00:03
ARM is usually used in phones and tablets and
00:03
MIPS is in printers and MIPS in some tablets.
00:03
This is really just to show you
00:03
that there is lots of architectures out there,
00:03
lots of different chips out there.
00:03
The most common architecture found in
00:03
the world is X86 architecture.
00:03
We should really get to know
00:03
X86 assembly because that's
00:03
what most malware is written in.
00:03
Malware, like any other piece of software,
00:03
usually tries to keep with compatibility and
00:03
wants to infect as many machines as possible.
00:03
I was saying, down the left-hand side,
00:03
there is 64-bit code,
00:03
in the middle is ARM assembly,
00:03
and on the right is MIPS,
00:03
and on the far right and up half the page is
00:03
X86 and that's what we're going to dig into more here.
00:03
X86 uses a lot of different instructions.
00:03
I can't remember exactly how many there are
00:03
but it's not what we call a RISC architecture.
00:03
RISC is reduced instruction set
00:03
and X36 is known as a CISC,
00:03
is a complex instruction set.
00:03
That was because Intel wanted just to
00:03
have lots of instructions to do
00:03
>> lots of different things.
00:03
>> Because like I said, each line,
00:03
each instruction here is
00:03
physically executed on the circuitry.
00:03
If you could combine a whole bunch of
00:03
operations into one instruction,
00:03
your code theoretically will be faster.
00:03
But the 14 most common instructions
00:03
make up 90 percent of all code out there.
00:03
The first 14, the most common
00:03
14 are once I've listed here on the right,
00:03
and you're going to see these a lot
00:03
so I would suggest memorizing them.
00:03
When you're reading assembly,
00:03
there's two different ways to read X86 and that's
00:03
usually with the Intel syntax and the AT&T syntax.
00:03
With the Intel syntax,
00:03
it's generally right to left.
00:03
Here we see move EAX, five.
00:03
That means we're moving the value
00:03
five into the EAX register.
00:03
A register is the little bit of memory
00:03
>> that's in the CPU.
00:03
>> There's a couple of different registers.
00:03
We'll go over them in a minute.
00:03
But just know that when I
00:03
want to manipulate information in the CPU,
00:03
we do it with registers and they are super fast.
00:03
It's really slow to get memory or to
00:03
get something from RAM and it's
00:03
really slow to get something from the hard drive.
00:03
If you can do it with registers, you really should.
00:03
If you're worried about optimizing your code,
00:03
compilers do a great job of that.
00:03
We're going to see some of the different
00:03
compilers here in a minute.
00:03
Different compilers will produce different assembly.
00:03
We can take the same program and
00:03
use 10 different compilers on it
00:03
and it'll almost always be
00:03
different between each compiler.
00:03
If you do enough stack analysis,
00:03
you'll get to know the output of certain compilers.
00:03
You can just look at it and say, "Oh,
00:03
this was produced by Visual Studio," or, "Oh,
00:03
this was produced by Delphi," or, "Oh,
00:03
this was produced by whatever Visual Basic."
00:03
Or I named languages but you can usually tell that too.
00:03
You really need to know programming knowledge
00:03
to get into this like
00:03
loops and functions and
00:03
local variables and APIs,
00:03
application programming interfaces.
00:03
You can Google stuff as you go along
00:03
but if you're looking at
00:03
the assembly for a program and you see,
00:03
"Oh, there is this jump instruction and then there's
00:03
an increment instruction and then there's
00:03
another jump instruction and it does
00:03
some stuff and another jump instruction."
00:03
If you're a knowledgeable programmer,
00:03
you could spot that as like,
00:03
"Oh, that's a for-loop.
00:03
It's incrementing some counter value."
00:03
Then checking constant to see if it's over, under that.
00:03
Local variables are useful because we're
00:03
going to talk about the stack in the second video.
00:03
Application programming interfaces are
00:03
useful because you can usually tell a sophistication
00:03
of a actor or you can usually
00:03
tell the programming methodology.
00:03
If they like to create
00:03
their own sockets and read [NOISE] and
00:03
write information directly from them,
00:03
that's a different way to transfer
00:03
information over a network rather than
00:03
using the HTML or the
00:03
when I net libraries that Microsoft also provides.
00:03
I would also suggest that you would know a bit of math,
00:03
particularly binary, hexadecimal and decimal,
00:03
and how to convert between them.
00:03
We're going to do a little demonstration.
00:03
We're going to take some C code and compile it with
00:03
GCC and Cygwin which
00:03
is what I prefer and what
00:03
>> I suggested in earlier videos.
00:03
>> We're also going to do it in
00:03
the native Visual Studio compiler CL.
00:03
I'm going to start up a Cygwin terminal here.
00:03
The Cygwin shell here and do ls and there's a hello.c.
00:03
I'm going to display it on
00:03
the screen by saying cat hello.c.
00:03
We can see it is a simple program,
00:03
simple C program where it just includes the
00:03
library std io.h which is standard IO.
00:03
It uses the printf function
00:03
to print hello world to the screen.
00:03
We can say gcc hello.c.
00:03
It will compile it and then we can execute it.
00:03
The final output by default is a.exe.
00:03
We can execute it and there is hello world.
00:03
With gcc, you can also specify
00:03
-S function and compile it.
00:03
We will see that we now have the hello.s.
00:03
I'm going to run cat hello.s
00:03
and this is what we call a listing file.
00:03
This is the assembly,
00:03
the X86 assembly that the compiler
00:03
had to generate in order to fully
00:03
compile an EXE file.file hello.c,
00:03
these dots are basically
00:03
saying that these lines are just metadata.
00:03
They don't have any code equivalent.
00:03
That is output. They're actually sections of the file.
00:03
We can see that there is indeed some metadata,
00:03
these included by this compiler,
00:03
we can see it's using the AT&T syntax.
00:03
That's doing that pushl, novel,
00:03
andl, subl, call___main.
00:03
That's pretty interesting. Oh,
00:03
okay, there's no clear lines.
00:03
Just a reset there, just. Now we're
00:03
going to compile it with
00:03
the Visual Studio compiler
00:03
that Microsoft provides for
00:03
free which is very nice of them.
00:03
Under Visual Studio, you can
00:03
go to the Developer Command Prompt.
00:03
We're going to browse, we're going to change directory.
00:03
Let me increase the font here.
00:03
>> Cygwin/home/on. We see the a.exe,
00:03
the hello.c and the hello.s.
00:03
We're going to compile CL,
00:03
the Microsoft supplied C compiler,
00:03
we can do hello.c.
00:03
It created an object file and then created hello.exe.
00:03
We're going to execute hello.exe.
00:03
Hello World. Very simple.
00:03
Now my press up
00:03
and specify the /FA parameter,
00:03
and now the Microsoft C compiler
00:03
will have produced a hello.ASM file.
00:03
We can look at that,
00:03
the pad plus plus because it's nicely colored.
00:03
Go to Cygwin,
00:03
home, Sean,
00:03
hello.ASM and we'll also opened hello.s.
00:03
This is hello.ASM.
00:03
This compiler produce this assembly file,
00:03
this listing file and it was
00:03
nice enough to go line by line and
00:03
specify the assembly instructions
00:03
that were produced from that line of code.
00:03
Since we only had two lines of code,
00:03
it's not that big a deal,
00:03
but it could be pretty interesting.
00:03
I highly suggest that you experiment with this.
00:03
We can see that this is the Intel syntax,
00:03
XOR EAX, EAX.
00:03
That zeros out EAX,
00:03
no matter what value is in EAX, it is now zero.
00:03
It's just a very quick way of doing it,
00:03
and we can see some push or pop instructions
00:03
here and that deals with
00:03
the stack which we'll talk about next video.
00:03
We can see that it pushes an offset
00:03
aka it pushes an address
00:03
onto the stack and then it calls printf,
00:03
and we can see that
00:03
the address has to be somewhere in this file.
00:03
There it is. Hello World
00:03
and there's a 0a for the backslash n for the new line,
00:03
and then zero to mark the end
00:03
>> of the string and let see,
00:03
>> you can color this,
00:03
and the listing file produced by GCC,
00:03
we see that it has
00:03
some fewer instructions that it's in AT&T syntax.
00:03
I'm sure we could have put
00:03
out Intel syntax if we wanted to.
00:03
We can see that it's produced some different assembly.
00:03
If you're looking at both these files and
00:03
a dissembler with enough practice you can
00:03
see what was produced by what compiler.
00:03
Here we're going to use a dis-assembler called IDA Pro.
00:03
I download the demo for version 6.8.
00:03
I'm just going to get hit,
00:03
''Go'' here, and I'm going to pull
00:03
over my GCC executable,
00:03
and IDA is going to look at it
00:03
and try to load various debugging information.
00:03
Here it'll give me a graphical view
00:03
of the assembly that it's found.
00:03
This is where the main CRT startup function
00:03
is and or better known as main.
00:03
Then there's also a main function
00:03
here that will eventually
00:03
pass off control or be the actual main function.
00:03
This is a good bit of code that
00:03
the GCC compiler for Cygwin
00:03
has inserted into our executable.
00:03
If you want to reverse this some more, you could,
00:03
an easy way would be to look at
00:03
the imports and exports and we
00:03
can even go to strings because Hello World was a string.
00:03
Hello World, and we can see where that was accessed
00:03
by pressing ''X'' or just going to
00:03
this X reference over here,
00:03
and we can see our code was
00:03
basically in Intel Syntax is push,
00:03
EBP, move EPP,
00:03
SP, these are stack registers.
00:03
We'll talk about that in the next video.
00:03
It calls the main function here and
00:03
moves the pointer into
00:03
a certain location and calls the putS,
00:03
which is a better function than printf.
00:03
For many reasons, printf is very old,
00:03
but it will make the adjustments.
00:03
You can see up here that this blue area,
00:03
this is actually code,
00:03
and these areas either couldn't figure
00:03
out what it was or it's just data.
00:03
We can press the spacebar and actually
00:03
see that this code is
00:03
just a small part of what this file is,
00:03
that there's lots of other code there,
00:03
library code, boilerplate code,
00:03
data, just blank data.
00:03
Sometimes it's just references,
00:03
sometimes it's just zeros.
00:03
But the point is, the compiler took that bit of
00:03
code and created this executable.
00:03
With the same code,
00:03
Visual Studio created
00:03
>> a completely different executable.
00:03
>> Functionally they are the same.
00:03
But as we'll see, it looks completely different.
00:03
It's calling these functions,
00:03
and IDA didn't know what their names were,
00:03
it just gave it a name based
00:03
on the address it founded in memory.
00:03
We're going to use the same technique to
00:03
find the actual small bit of code that we wrote.
00:03
I'm going to go to View strings.
00:03
>> You see a lot of strings were included.
00:03
I'm going to find Hello World somewhere,
00:03
Control F, Hello, helloworld/n.
00:03
We found it here.
00:03
We can press X to see where it's referenced.
00:03
We see in this function,
00:03
so 401260 that there is a push EBP moves ESP.
00:03
EBP, like the other function
00:03
>> that we saw on the previous,
00:03
>> a.exe, and we can
00:03
see it's pushed and then this function is called.
00:03
If we scroll over this function and then scroll,
00:03
or if we hover over this function,
00:03
then scroll with our mouse wheel down,
00:03
it has another function and then that calls
00:03
two other functions that looks
00:03
like and provide some arguments.
00:03
The compiler-like Visual Studio will do this
00:03
mainly because of security and stability.
00:03
It will put in certain checks
00:03
between functions that it knows are vulnerable.
00:03
Print f is vulnerable to a certain type of exploit
00:03
called a print string
00:03
or a print format string vulnerability.
00:03
It will surround the Print f function
00:03
or Microsoft's implementation or
00:03
the print f function with
00:03
more code boiler plate code,
00:03
is sometimes what they call it.
00:03
This shouldn't be alarming, and like I said,
00:03
once you get used to analyzing a few pieces of code,
00:03
programs that you write yourself
00:03
or publicly available programs,
00:03
you'll see that compilers will
00:03
insert their own code or add their own stuff.
00:03
Like Visual Studio will add in
00:03
the call stack canaries to see if there's
00:03
a certain type of exploit called
00:03
a buffer overflow exploit,
00:03
which we will talk about when we talk about stacks.
00:03
Iar is the best assembler,
00:03
but it's not necessarily the best compiler.
00:03
>> That's okay.
00:03
>> But most reverse engineers
00:03
prefer the debugger called ory debug.
00:03
A debugger that a lot of
00:03
reverse engineers like to use is ory debug.
00:03
It's currently at 2.01 version wise,
00:03
and a lot of people still use
00:03
1.1 because a lot
00:03
of old plug-ins were written for
00:03
it for reverse engineering.
00:03
We're going to take our Hello.exe.
00:03
We can execute this and we see that it
00:03
pops and says, Hello world.
00:03
If we wanted to see what's up through
00:03
the program, we can just take it,
00:03
drop it into ory debug,
00:03
and we can see that we can step over each instruction.
00:03
Now, if ory debug scares you with
00:03
all this stuff, it's okay.
00:03
Just take it one step at a time,
00:03
one instruction at a time, if you will.
00:03
The shortcut for this step over instruction is F8,
00:03
so we can just single step through
00:03
the program as it's working.
00:03
We can see as the instructions are executed,
00:03
changes are being made to certain registers.
00:03
We'll get into this,
00:03
push instruction here in a bit,
00:03
but we can easily step
00:03
over the next call instruction here,
00:03
or we can step into it and we can
00:03
actually follow the program flow.
00:03
I'm just going to step over.
00:03
If we see this push instruction execute and
00:03
push onto the lower right-hand pane.
00:03
There is what we call the stack,
00:03
which we'll talk about in just a minute.
00:03
You step over and we can see that
00:03
something was pushed onto the stack.
00:03
Ory looks at it and knows it's
00:03
an argument to the next function,
00:03
which is a call here,
00:03
which we can step over again and pop the stack again,
00:03
which we'll talk about in just a minute.
00:03
There's a test, AL AL is testing to see if AL is 0,
00:03
and we'll talk about AL in a bit.
00:03
We can keep doing this until
00:03
>> we get to the point where it
00:03
>> calls the function that
00:03
actually prints Hello World to the screen.
00:03
We can see it's pushing two parameters,
00:03
it's calling another function.
00:03
Does a jump pushing two more parameters calls
00:03
another function into the Visual Studio
00:03
has added in a lot of code.
00:03
Somewhere here, probably this last function call
00:03
resulted in print f being called at some point,
00:03
and Hello World being displayed on the screen.
00:03
We can also do a similar thing with Visual Studio.
00:03
I'm going to end all you hear,
00:03
I'm going to grab this source code and copy it.
00:03
Now
00:03
that Visual Studio
00:03
has loaded I'm going to start a new project.
00:03
I'm going to choose empty project.
00:03
I'm just going to call it Project Hello.
00:03
I'm going to add a new file called hello.c,
00:03
I'll create it, I'll paste
00:03
in a Hello World code and save it.
00:03
Now I'm going to put a break point over here.
00:03
I'm going to say debug or Local Windows Debugger.
00:03
It's going to say, do you want to build this?
00:03
Yes, of course I do.
00:03
It will build the assembly.
00:03
It'll compile it,
00:03
it will produce an executable,
00:03
and begin to run it just like all we
00:03
did and stop it at a certain spot.
00:03
Now here's a really cool thing about Visual Studio.
00:03
You can right-click and say go to assembly.
00:03
Over here, it'll actually show
00:03
you for each line of code that you wrote,
00:03
the assembly that it created.
00:03
Just like the only debug program,
00:03
we can step over each instruction and then we can see
00:03
the resulting register is being changed.
00:03
If you want to look at all the registers onto Debug
00:03
under Windows, under registers.
00:03
Up here, we can see the EAX Register,
00:03
we can see EBX Register,
00:03
we can see the EIP Register being changed,
00:03
and we can do the same operation
00:03
where we step over each instruction,
00:03
and we can see exactly what it does
00:03
>> and when it does it.
00:03
>> We can see Hello World was printed.
00:03
This line of code was executed.
00:03
I highly encourage you to write
00:03
some source code and see the assembly that it produces,
00:03
because that's really the best way
00:03
to get to know compilers,
00:03
is really the best way to understand
00:03
what high level source code
00:03
produce what level or what assembly.
00:03
If you're feeling adventurous,
00:03
you can always change the architecture from
00:03
X86 to X64 and do the same thing.
00:03
Right-click. Or you can go over here,
00:03
see build this assembly.
00:03
You can see that they're
00:03
very similar type of instructions.
00:03
Instead of move EAX,
00:03
it will move at different value
00:03
>> into EAX or it'll produce
00:03
>> a different function call or
00:03
might do something slightly differently,
00:03
like instead of a push instruction,
00:03
it's using an LEA instruction
00:03
and doing something called a fascicle,
00:03
so it doesn't actually push or pop
00:03
>> anything on the stack.
00:03
>> It just places something in
00:03
the register because working
00:03
with registers are very fast.
00:03
But we'll get into calling conventions later.
Up Next