welcome to separately. Hi, My name is Sean Pierce from the Civic Matter. Extra firm.
A direction to malware analysis. Say we're gonna be talking about part two.
So how we go from raw assembly
Wealthy compiler created
raw execute herbal code and that included it in this container.
is called a P E file. And Windows usually uses these P files to know
what section of the code to begin executing what libraries it needs and what version of what compiler made it. And all this other metadata about the executed all code.
If you want to go explore that, I highly suggest
it because there's a wealth of information that we can use in there, such as time stamps like
ads in when this file was made, uh, usually says what its target platform and architecture for our what function calls is going to call and other information like that.
And there are several p e. Parsons p file format purser's P Explorers CFF Explorer and not C o FF IX for P I. D. P. Studio
010 Hex editor is really good if you use a P binary template.
Binary template. And, of course, you can make your own. It's not super hard to do, and there's a lot of work already done for that.
If you look at them our Alice book,
they have a lot of scripts to help you with that, and you get to know the file format very well.
What use is the P E file format? Simply don t X C files are the most common deal. Oh, files are the exact same format with just a few flags changed and a few more function
The deal. Oh, mein being the only one that a deal Oh, foul needs to export in order for it to be used and loaded up by the windows loader for execution
SRC files or screen saver files. But they execute code
Um, as a demonstration,
we'll open up our signal in terminal here.
And when we were talking last,
prince. Hello, world.
I'll just cat the hell of world file. You can see it
a very simple program.
to any one of these things.
list. We see that there is a doubt fon we're gonna execute that
and a prince Hello world. It's the same file format. Sometimes malware will use these other extensions, but it gets to be highly suspicious. So, um,
you know, most I've seen is SRC
this a d s r c. And we can execute a doubt as our C
and it's still executes it just like it would any other e x e file.
So it's good to be aware off this.
So we're talking about
assembly, and we're talking about actually executing it. I said I mentioned the flags were destroyed later,
why it's so important is because it keeps state
of the instructions.
And if you look over on the far right here we have some X 86 assembly.
The first line is move one into ta X.
The next line a compare instruction is moved to
or compare the X to the number two
instruction Below that is Jay Z jump If zero is what it stands for.
if the above operation
So the compare instruction
does the same thing is a sub strapped instruction
store the result back to the register.
All it does is change. The flags register.
when compared, the X
It would do a subtraction operation
This would result in negative one,
but it would not result in zero.
if you see in the middle picture zero flag
and right above it the sign flag, those would both Or the sign flag would be flipped. So it would be a one in the sign flag and the zero flag would be zero.
So false zero true one.
So the zero flag would be zero signs like would be one. And so since
zero was not flagged, this jump
will not work. It will not jump to the label is too,
which is on line six. It will just simply fall through to the label is not
labels don't actually do anything. There's no assembly equivalent. They're just placeholders. And so
what will happen is Line five would be executed where two will be moved into the ex register
will be executed. But since label doesn't actually
equate to any assembly code life seven will be executed. Zero will be moved into the axe.
So with these conditional jumps, compilers will turn if statements and loops and other control flow mechanisms,
ah, spaghetti coud or switch statements or whatever
There are other flags that you should be aware of, but those are the biggest. Is the sign flag in the zero flag
overflow. And under Flo flags will generally tell you if, ah, if you try to multiply or do something else and
you lost some information
Ah, the overflow and under Flo flag will
And sometimes code is good enough to check for that,
um, Carrie flag to see if
you know there's operation that had
you know what you add
added some bites to some register, and, uh,
belay said the most important ones are zero and sign.
I talked about the push and the pop instructions and the S P in the V p
registers. And I said they had to do with the stack. And
here we're gonna look at how excited six uses this data structure,
which is almost always at the top of memory, where the operating system is down at the bottom of memory.
and the heap where Malik and Kallick
stack is slowly growing down,
a really high memory address. And every time you do a push
it'll decrease the E S P, which points the top of the stack,
uh, the E V P is always above it. So the base pointer is always putting at the bottom of the stack, which is at the highest memory address. This might be kind of hard to wrap your head around, and a lot of people get confused by this.
care to just draw it out on a piece of paper every once in a while and that really helps, it helps me, at least,
so the push instruction
So it's kind of like you're adding
to the stack like you're You have a stack of plates
and, like all you can eat buffet or something. And you,
somewhere and every time you put a plate on top, the whole set time moves down a little bit.
if you want to grab a plate, you have to
pop off the one on top.
And if you pop something, then you will take the data from
that stack and the E S P value will increments
four bites. And, uh, there are pushing pop instructions that can
do, like 16 bites and
or 16 bits, which is two bites.
most common is push four bites and pop four bites.
And the call instruction will also affect the stack. Because each stack frame,
you know, between the E. S. P and the BP pointer
you've stacked frame
is the local scope for that function. So all local variables will be be between the ESPN E v p.
And if this sounds confusing, don't worry. You can look at some assembly,
uh, of what the functions
and you will see how the stack is being manipulated
for each function. Cole.
when you call something.
You push the E i P value
So wherever you returned place in memory is,
or the next instruction is to be executed, is pushed onto the stack,
and then you can jump to a location
and memory, another location.
And when you call the Rhett instruction or the return instruction
you're going, the E I. P is going to be repopulated
that the call instruction had pushed onto the stack. And it is going to,
resume execution in that place that you called from.
Yes, it sounds confusing. Don't worry.
You're gonna see some assembly, and it's gonna make a bit more sense.
Stack grows downwards.
when you look at assembly and there's local variables
and we'll see an example and a little bit,
you'll see them usually addressed from E B P.
plus a certain value
or, more appropriately, E V p minus a certain values. So if I want to access
it'll be e V P minus four.
do a print F and pass it a single parameter
Hello world or a pointer to the string Hello, world
The functional reference that for Amber
it accesses that pointer
I'm gonna let that sink in for just a minute.
So just a little miscellaneous information. There is something called a NOP instruction that stands for no operation and eyes, actually,
but it is an alias to the exchange.
Uh, yea x e x instruction. So it moves E X into be a X, so a result in nothing.
The napkins instruction is very useful. If you wanna manipulate malware so that you can say, Oh, I see it's doing a function called check to see if there's any bugger and dies. If
there's a debunker so it
knobs out, you can just put in op instructions in there, and that will result in the compiler
or in the CPU just completely doing nothing. For those instructions, I can show you example of knocking out some instructions.
I showed you the flags register bit Mask is good to know,
because it's doing basically a logical and isolate a piece of memory that it wants to. So
here are some examples where we want to get to a certain bit or a certain bite
to see what the value is.
And so we do a logical and
the bullying logical. And and, uh,
if you are curious about how this works, if you look at enough assembly, you'll eventually come across it.
But it's it's not that big a deal. You should just be aware of it.
Indian, it says, I pointed out earlier, is when spot bites are swapped around wind in storage.
When they're in registers, they look normal where
the most significant value is on the left hand side.
So when we typically read from left to right and our culture,
you should know that. And we're gonna cover that and I'll hear in the biz.
So we should also know,
Uh, no MacLeish er such as,
Or you should also know the nomenclature surrounding data types such as word, D, word and cured and academia. Word means
the base unit of memory and architecture terms. So,
if I'm talking about
ah, 32 bit computer, ah, word is generally
Microsoft we're making the programming languages they had to keep with compatibility, and the original word for the original
made word synonymous with 16 bits.
So you'll often see and Microsoft AP eyes and websites and code
the term D word, which is double word, which is pretty much synonymous with
32 bits or four bites.
Q. Word is quad Word, so it's double that's 64 bits a eight bites
like this is the difference between, like
industry standard and, uh, academia.
just be aware that when you see word or D word
or Q word, it means 16 32 and 64 bits.
unless you're reading a textbook
ones. Compliment is something you should be familiar with. It basically means you just flip all the bits. So if it was 0010
then you flip. All of this will be 1101
and you might go OK. It's very simple. Two's compliment
is where you flip all the bits and then
and you flip all the bits would be 110
and if you add one to, that would be 111
So I'm talking about
all this in terms of binary, and you might think, OK, that's kind of weird. Why would you ever use that or need that? And it's used for negative numbers.
that if you store negative numbers as two's compliment
you can use the same circuitry
for addition and subtraction and other operations.
If a negative number is in twos. Compliment, that's pretty nifty. And it's a shortcut that the hardware designers took to make computers really fast and not have to have extra circuitry for both negative and positive numbers.
You'll probably never see it, though, but it is something to be aware of
Oh, you know, the output of this
and then you see, like
the output from a print statement,
native zero. That's just like, How do you have a negative zero?
It's like, Well, technically, it's possible toe flip the sign bit
not not have one added to it. So it's native zero and then
native. One would be one
I mentioned earlier. Indian is important, and
it's kind of strange and rather confusing. And until it was really the only one that does it that I'm aware of,
it means that swaps the bites.
the the the references kind from Gulliver's Travels,
he came across the land of small people who were fighting viciously over what end to crack their egg
and little Indian. It does exactly what you think, where the value is
and it's stored. That way.
stored Little end first. So little Indian
the strange one in that
the least significant
in the lowest address.
So example is 12345678
The bites or swap. So it's 78
Intel is Lindy in, and it's like
I think the only one that I've ever seen this little Indian maybe aimed is, But I don't think so.
Um, Big Indian is exactly
what you would think it is So with the value of ah
number or ah, spot on memory is 12345678
it's stored like that. So this is when network traffic is being sent across. Um, whatever device is sent his big Indian like, If that's
then that's how it's sort of memory.
And Intel does this because
it's a some point, they found it to be more efficient
on they could do operations faster.
So a visual representation of this
would be something like
Little Indians on the left, Big Indians on the right.
If you're still a little confused, I'm gonna do it an example
so we can take this number.
So if we take this number,
I'm gonna split up into 28 byte segments.
and then the second value would be zero x for Hexi decimal
the bites. So zero x
and then the other value would be zero x
and then we combined them back again.
would be in an Intel Little Indian system
sword as this value.
I was just practicing it once or twice.
so just some notes for the paranoid.
A CZ, I said earlier, just simply can't be wrong.
it's an unsolvable problem without actually executing the code, it's
impossible to know exactly what the instructions will do sometimes more, or do things that will break assemblers or trick thumb.
one good trick that I've seen is switching from x 86 assembly code X 64 code, which you can only do and 64 bit system.
this is simpler, or D bugger right now can handle it.
they just get the disassembly wrong because they make certain assumptions about the code. It's processing that they're processing,
and also jumping into the middle of other instructions. So
the very purposefully take advantage of the assumptions made by assemblers too.
This symbol of some disassemble some code so that it looks like a bunch of move instructions,
But What actually happens is that a jump instruction jumped into the middle of one of those instructions
and is actually executing something else.
when you statically disassemble something,
you can't make an assumption
that you're reading what is gonna be executed. But Mauer will very frequently
change its own code,
like as its executing. It will change the code ahead of it
to jump somewhere else or to decrypt something and then jump into that.
And we can see an example of that here
in the next few videos.
And it's pretty interesting because we might have to dump
it out of memory and then use a combination of static and dynamic analysis to figure out what's going on.
Some malware will statically compound
the library's. It's using into itself instead of relying on the print F function.
Being in a library that we can access
malware authors will include the whole seed.
The sea standard library, like it will include the entire STD Io
five instruction or 10 instruction.
Uh, analysis just turned into,
you know, millions of instructions.
I'm a pro and others will try to identify
frequently used libraries and then tribal, label them and just say, Oh, this is string length or oh, this is
but it doesn't always catch it.
There's a lot of Mao out there. They will have what is called junk code, which functionally does nothing.
One of the first piece of malware that I was looking at for a good, long while,
I just spent a week on.
I had a lot of jump code where it would make function Cole's like get system time or get system information and then jump into functions that just had a lot of move instructions and a lot of jump instructions. And they didn't really do anything.
buried somewhere deep in one of the calls that it made
unfold, decrypt some other code of the payload,
use another function call to jump into that.
That's not uncommon,
but once you look at it for a while, you start to see some of the patterns that a lot of,
those kinds of tools those kinds of tools will produce. Like
one thing that would go into unexcusable and just kind of insert random instructions like,
movie A X, e, x or exchange E, T X
and E X, and then exchange them back a few instructions later. So it's code that guarantees that the function of the program doesn't change. And you can see that it's pretty easy to associate meaningless instructions with certain
programs, like packers or critters.
When we're talking about push instructions as parameters as it loading up as it's loading up parameters. Keep in mind that the compilers the one generating these instructions
compilers don't necessarily have to follow those conventions.
But it does. When it has to interface with AP eyes
so internally it can load up
like we saw with 64 bit code, it could load up the parameters and registers.
But when it comes to actually calling system libraries,
it has to have the stack in a good
situation has to have it in a good state. If you want that a p I to work and we will take a look at some malware, which
ah corrupt its own stack
and its malware that we've already seen before in prior videos is that little IRC bought malware
so We'll take a look at that and it's Ah,
gonna be interesting. And there's many, many ways to do this stuff. Luckily, most malware does not try to be
Most malware authors aren't very sophisticated,
and most malware authors, frankly, just don't care
if they develop a new
anti analysis technique or an anti debugging technique. Or
that they spend a week developing a new way to push and pop parameters on and off the stack. You know that maybe,
you know, good. It might be better
in defensive measures, but it would take them a week or two, and it would just take you, like
15 minutes, thio figure out. So it's really not a great time. Tradeoff for them. Our author, especially if they're my hour, has already been analyzed. That means,
you know they've their operation is probably already blown,
the pain on their motives it's usually not
depending on the malware authors motives and depending on the purpose of the malware, it's usually not advantageous from our authors to put in a whole lot of defense's into their malware
searches to recap and a list of good resource is that I suggest you check out.
we went over the goals of stack analysis. We what? We want to understand what's going on
underneath the hood, and we want to find out more information and get IOC's and confirmed dynamic analysis and really wanna gauge sophistication and maybe eventually attribution of
the malware on the intent of the authors.
We went over a lot of technical details and exiting six assemblies. I highly suggest you check out how different control, flow structures and different data structures compile down and what the resulting assembly is like. How ah, switch statement compiles down or how nested
if statements compiled down
meth all methodology of the compiler is using for optimization.
And there's a lot of different ideas about how to do this. And if you are interested in that,
I would suggest take
taking a look at the art of assembly.
That is a pretty good book, and it's relatively
neutral in terms off
looking at X 86 how to do things in terms of looking at, uh, arm and how to do something. Also
reversing secrets of reverse engineering
that is a fantastic book. Some people think it's dry and boring. I think it's fantastic for what you want to do.
And three eyed a pro book,
let's see. Unofficial at a pro book. But it's the only idea pro book, and it's pretty good,
especially if you're just beginning.
And I would also suggest checking out websites that hosts were called Crack Means.
And they're meant for reverse engineers and crackers to figure out
key generation algorithms or how to manipulate a program into getting the information that you want
puts on a fantastic intro to x 86.
Attn. They're open security train got info website, and they have all the materials up there. The slides, challenges
videos up on YouTube
posted YouTube playlist also suggests checking out
X 86 assembly language and the X 86 calling conventions. Calling conventions is something we didn't quite cover,
but we will go over in the future,
they enter a function call.
Sometimes the function
the call, he will clean up the stack. It said, Okay, I'm gonna make
500 bytes for my own local
on. Then at the end, it cleans up the 500 bites off the stack
on, then sometimes another calling conventions
like standard. Cole.
Uh, the Kolar cleans up the stack. It's like, OK, this function is going to need 500 fights for its local variables. I'm gonna give it 500 bison stack. Call the function. After it gets back,
I'll clean up the 500 bytes all clean up to stack.
So that's that's just what that is. Thank you for watching.
I highly suggest you go explore your own if you're interested in this stuff. I was just checking out malware stat. Stop Ward And you can see how many malware samples they have doing anti V M or anti disassembly techniques.
And you can see what function calls air using highlights. Just you also check out Cork. Am I or Coke? Mammy?
Ah, his all his information. He did a great job breaking down the P E
and displaying it in a very understandable way. And he
a lot of the values and the P E file format, and he
messes with them, and he says. Okay, this one does this. This one does this. This one doesn't do anything. Even the Microsoft says it does.
Um, and he shows how much you can mess with,
a portable X kyul file format before
the operating system refuses to try to execute it. And malware will use ah, lot of those same type of tricks,
um, automated analysis systems
on. We'll take a look at that stuff in the future as well.
Thank you for watching. And I hope you
follow along to the next video.