Hello. Welcome to Cyber on a mission Pearson subject matter Expert for introduction to malware Analysis.
Today we're gonna be covering the basics of static analysis
and we're gonna go over part one assembly. So what exactly is stack analysis? Well, we're going to read the raw assembly code
from an execute a ble,
and we use tools such a cz deep ogres and disassemble er's.
I said, you bugger, you probably think, Oh, you're gonna step through each line of the source code
And that's not quite true. If you have an I. D and integrated development environment, the buggers are useful for finding bugs. You step through each line of the source code, say, Oh, this is the if statement and this is the value I'm comparing this variable against et cetera. Well, we don't
usually have the source code for malware.
Fact is exceedingly rare that we do, and
we step through each instruction,
each assembly instruction. We're going to see some examples of that in a bit. But several
dis assemblers do have debunking components in them, so you can run the malware. Aah! And if you ever do run the malware,
even with a d bugger attached where you think you can control each instruction. Ah, highly suggest you keep it in a virtual machine. Ah, b m like one we built in the earlier videos. So the characteristics of stack analysis is usually that it's so very slow, very detail oriented, and a lot of technical knowledge required.
I like it. I don't think it's super hard once you get the hang of it. After a few weeks of digging into something,
you get pretty comfortable with it
pretty fast, especially if you're executing some code over and over and over again. You can see exactly what the instructions are doing,
but it does take time to sort through rather large execute bols. So I wouldn't suggest starting with something like visual studio or another large application,
Um, like photo shopper or something like that,
because those are huge programs make a bite worth of code is a lot of code.
So generally, when we're looking at code statically, uh, and we are looking at the individual instructions. We're trying to confirm us dynamic analysis. We saw there create a file in this directory, and it had this weird file name so
is that name really hard coded into the binary
like, well, it use that name every single time or to choose that name based on the date or the time, or
the version of the operating system. So Stack analysis. We can really understand the behavior of the malware, and we can understand what kind of conditional behavior would come about in a different version of the operating system, or whether or not it had access to the Internet or
different things about
the family of malware we're looking for, cause that's very hopeful, too. If we look at indicators like Constance that we find our certain strings in the code, you can just Google for and you can see that someone else has already done some analysis, you can find out a lot of information.
some strings in some. Our are meant to be jokes. They know that analysts are gonna look at this later, and they try to make things harder by like encrypting strings. They encrypt payloads like a piece of our drops. Another file, another execute below and then executes that frequently
just encrypted in the first
dropping mount where the dropper
and it be interesting from a stack analysis standpoint to know, you know, when it would drop that. It's like, Okay, I see that it's going toe.
Look at all the processes that are running, and if there's no process is named,
you know, whatever d bugger
or have the d bugger keyword in the process title, then I'll drop this other execute herbal and kick that off. We can also understand
domain generation algorithms. Some our families out there don't have hard coded domains or I p addresses as commanding control servers
to report back to. But instead it gets us updates by generating new, malicious
domains every single day or every single second or every single minute, and will use a time or some other value, like from the Internet.
the weather or something like that. And we'll use as a seed to generate ah domain name and then check that domain name to see if the Mauer author has put anything there.
And, of course, the Mount where author usually signs of stuff. So, you know, security researchers can't just take over the botnet by
you know, self destruct configurations on,
fake commanding control servers.
We can also easily figure out
well, not so easily, sometimes, but, uh,
network traffic encryption. Several families of malware will encrypt their network traffic to their command and control server. Sometimes it's really easy to figure out the encryption method they're using. Usually it's like a single bite X or
operation, which is just one instruction
that the CPU needs to run in order to get its original traffic.
And sometimes, if you look at string encryption,
you'll frequently see that it goes through each bite and just X Or is it against ah value? So we can figure that stuff out statically without running.
and we can determine defense's. So a lot of Mao out there has anti V M or anti debugging defenses like I mentioned earlier.
And if we dropped inexcusable in R V M and we double clicked it and nothing happened, or we can see in the log that just started and stopped,
and we don't really know
throw it into a dissembler like Ida pro and just kind of looked through the code, see if anything jumps out at us is being some kind of anti virtual machine technology or anti virtual machine code.
And it's also important
when we really dig into a piece of malware what its capabilities are and what code
wasn't executed during dynamic analysis. And really, what
is the risk and impact to our organization?
For example, I was given a piece of malware, and when I ran in a V M,
all I did was pop up and say, You've been hacked by no uncertain hacker group name here and
did a quick little refresh.
within the 1st 30 seconds of stack analysis, you know, I opened up in Ida and I was looking at it. I was just like, Oh, it looks like they just took a bat file
ran a tool in a cold bat T x C m in e x e file.
And so I was able to recover the original bat file, which was just a few lines of code and the 1st 2 lines. Try ran a command to disable the mouse and Iran another command to disable the keyboard, and then it ran a command to take the file key dot bat
and place it in the startup programs folder, and then it would pop up. You have been hacked
and then it would kill, Explored at XY,
and I know it's my mouse and keyboard were still working during this dynamic analysis, and I googled around, and I found out those commands didn't work since Windows 95.
I guess in the older operate older versions the operating system one explorer that X he dies. It doesn't come back, but on all modern operating systems
they do or explore that XY responds as soon as it's not no longer running.
it also occurred to me that key dot bat was probably the original file name,
and it was trying to copy keyed up bat instead of Kita E X E, which is what it's originally named. So
I was able to determine pretty quickly that this offered very little risk and very low impact. It was very low impact to anyone that was actually infected with it, Uh, and
with stack analysis,
after you do it a little while, you can get a feel for how sophisticated the programmer waas. So I was able to look at that piece of malware and determined that
the actors were not very sophisticated, There was obviously not very much testing. They didn't realize that their persistence mechanism
failed because they renamed the file. They didn't realize that the keyboard and mouse were still working after they ran these commands and that explored a taxi would just restart after it was killed.
based on this and other indicators from the region this piece of malware came from, I was able to determine that these Attackers were not very sophisticated. And they were, ah, a lot of hype. They talked a lot, but they didn't actually know a lot. And
I was able to determine all of this just really quickly within 30 seconds of static analysis and the dynamic analysis in this case do really did not help very much. And,
well, look at this group that had made another piece of mount where that deleted all the files on the operating system or delete all the files in the computer. Uh,
I looked at it again on stack analysis and saw it was just a bat t X file running one command, which was delete everything. Recursive Lee under the C drive forcefully,
and I looked at another piece of power from the same region, and it was also a wiper malware.
Ah, benign, well known
to get direct access to the hard drive and then overwrite the first several chunks of memory
of the hard drive cold, the NBR or the master boot record. And that
was, Ah, very faster, and it was a much more sophisticated piece of malware.
So I was able to attribute
and another group's malware to two different actors.
Earlier I was talking about
We're going to read the Assembly of Execute a ble. So what exactly is that simple? Answer is it's human readable machine code for a particular chip.
Each line of assembly usually corresponds to a line of code that the CPU circuitry will execute.
So Intel invented this 80 86 chip several several years ago, and it was originally eight bit and then 16 bit and then 32 bit.
And I say 32 bit those air the size of the registers and thusly, how much memory it could address and 32 bits when you address it via word or D word. You could Onley maximally address four gigs or gigabytes off
And so the upper limit for early 32 bit operating systems was four gigs of RAM.
So we moved on to another architecture. 64 bit architecture, also called Andy 64. Because they're the ones who made the standard on Intel made its own 64 big standard, which was almost completely identical.
And most all 64 bit ships
can run X 86 code, like the chips have the circuitry to run both sets of code. And
if you ever look at X 64
assembly like we have in the lower left hand corner, it looks pretty similar to the assembly on the right hand side, which is exiting six assembly. We also have arm architectures and MIPS architectures. Air arm is is a lot more common. The MIPs
arm is usually used in phones and tablets and mrs like in printers and missing some tablets. And this is really just to show you that there's lots of architectures out. There are lots of different ships out there. The most common architecture found in the world is X 86 architecture,
so we should really get to know X 86
Assembly because that's what most Mauer's written in malware, like any other piece of software, usually tries to keep with compatibility and wants to infect as many machines as possible.
like I was saying, down the left hand side, there's 64 bit code. In the middle is arm assembly, and on the right, it's mitts. And on the far right on up half the page is X 86. And that's what organ, huh? Well, that's what we're going to dig into Maur here. So
a lot of different instructions. I can't remember exactly how many there are, but
it's not what we call a risk architecture. A risk is reduced. Instruction set, and X 86 is known as a Sisk is a complex instruction set. And that was because Intel wanted just tow, have lots of instructions to do lots of different things
like I said, each line each instruction here is physically executed on the circuitry. So if you could combine a whole bunch of operations into one instruction, your code theoretically will be faster. But
the 14 most common instructions make up 90% of all cud
on the 1st 14 the most common 14 are ones. I've left it here on the right and you're going to see these ah lot, so I would suggest memorizing them.
And when you're reading assembly, there's two different ways to read. X 86. That's usually with the Intel syntax and the 18 T syntax with the Intel syntax.
It's generally right toe left, so
here we see move E X comma five. That means we're moving the value five into the E X Register.
the little bit of memory that's in the CPU.
There's, ah, a couple of different registers will go over there in a minute. But just know that when I want to manipulate information in the CPU,
we do it with registers and they are super, super fast.
really slow to get memory.
Are there to get something from Ram? It's really, really slow to get something from the hard drive, so if you can do it with registers, you really should.
And if you're worried about optimizing your code
compilers do a great job of that. We're going to see some of the different compilers here in a minute,
and they will produce different
They will produce different compilers, will produce different assembly. So we can take the same program and use 10 different compilers on it. And it will almost always be different
between each compiler. And if you do enough stack analysis, you'll get to know the output of certain compilers you could just look at and say, Oh, you know, this was produced by visual studio. Oh, this was produced by Del Fi or Oh, this was produced by, you know, whatever. Visual basic.
All right, I named languages, but you can usually tell that to.
you really need to know programming knowledge to get into this, uh, like
loops and functions and local variables and AP eyes application programming interfaces. You can Google stuff as you go along.
But if you're looking at the assembly for program and you see
you know there is, you know this jump instruction and then there's an increment instruction. And then there's another jump instruction and it does some stuff. Another trump instruction global.
ah knowledgeable programmer, you be, you could spot that as like Oh, that's a four loop. It's incriminating some counter value
and then checking constant
to see if it's over under that,
um, local variables are useful because we're gonna talk about the stack in the second. Video
and application programming interfaces are useful because you can usually tell the sophistication of actor. Or you can usually tell the programming
programming mythology methodology.
if they like to create their own sockets and read and write and read right information directly from them. That's a different way to transfer information over a network rather than using
HTML or the wind. I net libraries that Microsoft also provides.
I would also suggest that you would know Ah, bit of math, particularly binary, hexi, decimal and decimal and how to convert between them.
So we're gonna do a little demonstration. We're gonna take some C code
and compile it with GCC in Sigmund,
which is what I prefer and what I suggested in earlier videos. And we're also going to do it and
visual studio compiler C L.
start up a single internal here,
do l s. And there's a hello dot C,
I'm gonna display it on the screen, but saying cat hello, That see? And we can see it is a simple program.
Simple see program where it just
includes the library STD
Iota H, which is standard io.
And it uses the print a function
to print hello world to the screen
So we can say GCC hello dot c
Then we can execute it. The file that output by default is ADA e x e. We can execute it and there's hello world
with G C. C. You can also specify Dash s function
we will see that we now have a hello dot s
So I'm gonna run cat.
And this is what we call the listing file. This is the Assembly, the X 86 assembly
that the compiler had to generate
fully compile an e x e file.
dot file hello dot See, these dots are basically saying that these lines are just
metadata. They don't have
any code equivalent.
They're they're actually sections of the file,
so we can see that there is indeed some metadata,
those included. But by this compiler,
we can see it's using the T and T syntax
that's doing push. L
underscore underscore. Underscore Main.
So that's pretty interesting. And we can
Well, you know, there's no clear. And
just to reset that just
So now we're going to compile it with
that Microsoft provides for free, which is very nice of them.
So under visual studio in Goa
to the developer Command prompt.
We're going to browse. We're gonna change directory. Let me increase the thought here.
Seguin slash homes. LaShawn,
we see the eight XY the hello dot See the hello. Dot s.
we're gonna compile with C l. The
supplied C compiler. So we could do hello. Not see
it created an object file and then created
Hello? Not yet. See?
So we're gonna execute hello dot Taxi.
the slash f a parameter
Microsoft C compartment, Microsoft C compiler will have produced a Hell Oda s m file
because it's nice and colored.
So this is Luda awesome. This compiler produced this assembly file this listing file
and it was nice enough to go line by line and specify
the assembly instructions that were produced from that line of code.
Since we only had two lines of code, it's not that big a deal, but it could be pretty interesting. I highly suggest you experiment with this
so we can see that this is the Intel syntax.
Excellent E x e x that zeroes out the ex no matter what value is any axe,
It's just a very quick way of doing it.
And we can see some push and pop instructions here. And that deals with Stack, which we'll talk about next video.
We can see that it pushes,
uh, offset a K, pushes an address
onto the stack and then calls print up
and we can see that the address
has to be somewhere in this file. There it is.
Hello world. There's a zero a for the backslash end for the new line and then zeroed and marked the end of the string
and the listing file produced by GCC. We see that has
but some fewer instructions that it's in 18 t syntax. I'm sure we could have it come put out
intel syntax if we wanted to,
and we can see that it's produced some different assembly.
So if you're looking at both these files
with enough practice, you can see what was produced by what compiler.
we're gonna use a dissembler
I don't know the demo for ah, version 6.8. I'm just gonna get hit. Ah, go here
and I'm gonna pull over my GCC
and Ida is going to look at it
load various debugging information.
And here Oh, give me cava graphical view of the assembly that it's found.
the main see Artie startup function is and,
you know, better known as Maine. And then there's also a main function here that will eventually
pass off control, or b
the actual main function.
And this is a good bit of code that the G C C compiler for Sig one has inserted into our execute a ble.
And if you want to reverse this, um, or you could and easy Way would be to look at the imports and exports. And we can even go to strings because hello, world
Hello world. And we can see where that was access
by pressing X or just going to this x reference over here.
until syntax is push
E v p movie PPS PDS or stack registers, we'll talk about that in the next video. It calls
the main function here
into a certain location
which is a better function than
for many reasons. Print of his very old.
But it will make the adjustments.
You can see up here that
this blue area this is actually code
Aah! These areas either couldn't figure out what it was or it's just data so it can press the space bar and actually see
this code is just a small part of what this pile is that there's lots of other code their library code boilerplate code data. Just blink data.
sometimes it's just,
uh, references. Sometimes it's just zeros.
the compiler took that bit of code
and created this This execute herbal
and with the same code
created a completely different execute a ble. Functionally, they are the same,
it looks completely different.
So it's calling thes functions.
And I didn't know what their names were, so it just gave it a name based on the address that found it in memory.
We're going to use the same technique to find the actual small bit of code that we wrote.
I'm gonna go to a few
see, a lot of strings were included.
I'm gonna find hello world somewhere. Control f Hello.
Hello, World backslash in.
Be impressed, X to see where it's reference and we see
in this function. So 401260
that there is a pushy BP move E s p e v p like the other
function that we saw in the previous eight XY
pushed. And then this function is called. If we scroll over this function and then scroll or if we move, hover over this function and scroll with our whole school down.
another function, and then that calls to other functions. It looks like
and provide some arguments, and
visual shell, will do this mainly because of security
So it will put in certain checks between functions, and it knows we're vulnerable. Print F is vulnerable to a certain type of exploit.
our print format string vulnerability. And so it will surround
the print of function or Microsoft's implementation of the printout function
more code boilerplate
boiler plate code. Sometimes what they call it.
So this shouldn't be alarming. And
like I said, once you get used to analyzing a few pieces of code programs that you write yourself or publicly available programs, you'll see that
compilers will insert their own code or add their own stuff like visual studio will add in what they call stack canaries to see if there's a certain type of exploit called a buffer overflow exploit, which we will talk about when we talk about stacks.
I, uh, is the best dissimilar, but it's not necessarily the best compiler, and
But most reverse engineers prefer
A de bugger that a lot of reverse engineers like to use eyes. Ah, ladybug is currently at 2.1
version wise, and a lot of people still use 1.1
because a lot of old plug ins were written for it for a reverse engineering. So we can take our hello Dottie XY
and we can execute it. And we see that it pops and says, Hello world.
And if we want to see will step through the program, we can just take it, draw, drop it into a ladybug, and
we can see that we could step over each instruction.
Now, if only debug scares you with all this stuff,
it's okay. Just take it one step at a time. One instruction at a time, if you will
like the shortcut for this step over construction is F eight. So we can just single step through the program as it's working
and we can see as the instructions are executed, changes are being made to certain registers
and we'll get into like
this push instruction here in a bit,
Uh, but we can easily step
like, the next call instruction here, or we can step into it and we can actually follow the program flow.
I'm just gonna step over.
if we see this, push instruction, execute and push on to the lower right hand pain. There is what we call the stack,
which we'll talk about in just a minute. You step over and we can see that something was pushed onto the stack.
We only looks at it and knows it's an argument to the next function, which is a call here,
which we can step over again.
And Poppy See X has to do with the stack again,
which we'll talk about in just a minute. There's a test a l A L. It's testing to see if
a l zero, we'll talk about a little bit.
keep doing this until we get to the point where it calls the function
the actually Prince Hello world to the screen.
You can see it's pushing two parameters. Is calling another function?
Does it jump? Is pushing to more parameters called another function can see that visual studio has added a lot of code.
And somewhere here, probably this last function call
print f being called at some point and hello world being displayed on the screen.
We can also do a similar sort of thing with visual studio.
So I'm gonna end all here.
Okay, Now, the visuals to you has loaded. I'm gonna start a new project,
and I'm gonna choose Empty Project. I'm just gonna call this project. Hello?
I'm gonna add a new file.
I will paste in. Ah, hello, World Code and save it.
Now I'm gonna put a breakpoint
and I'm gonna say debug or local windows debunker,
that's going to say, Do you want to build this? Yes, of course I do.
It will build the assembly. It'll compile. It will produce an execute herbal
and begin to run it just like all he did and stop it at a certain spot.
Now, here's a really cool thing about visual studio.
You can right click and say, Go to assembly.
it'll actually show you
for each line of code that you wrote the assembly that it created.
And just like the on ladybug
we can step over each instruction and then we can see the resulting registers being changed.
look at all the registers under debug under windows,
we can see the X Register with TB Ex register into the P register being changed,
and we can do the same sort of operation where we step over each instruction and we can see exactly what it does and when it does it,
Hello World was printed the sign of excuse. This line of code was executed and so highly encourage you to write some source code and see the assembly that it produces because that's really the best way to get to know compilers. It's really the best way to understand what high level source code produced,
level or what assembly.
And if you're feeling adventurous, you can always change the architecture
two from like X 86 toe like X 64
and do the same sort of thing,
right Click already can go here seaview disassembly and you can see that they're very similar type of instructions. And so, um,
you know, instead of
it'll move a different value in the X, or it'll produce ah different function coal, or might do something slightly differently,
like instead of a push. Instruction it's doing is using an L. E A instruction and doing something called a fast Cole so it doesn't actually push her pop anything out of the stack. It just places something in the register because working with registers are very fast,
but we'll get into calling conventions later.