3 hours 41 minutes
Hello, everyone, and welcome to static analysis In this session, we're going to look at suspicious files without executing them and understand different areas in which we should focus our code analysis
before we start analyzing code. Let's talk about where code analysis fits into our static analysis process. So if you remember, we've got a simple static analysis process. This is where we hash the file, and we research it so we can see if anyone else is analyzed it before or if there's any reporting and open source.
We can look at the strings to get an idea of what the malware sample will do
by looking at a P I function calls etcetera.
Then we'll want to identify any malware challenges like packers or obfuscation. We're gonna look at those more in depth in the next section,
and then you can use some of the techniques that we've previously reviewed, such as looking at the P E Header toe. Understand when it was compiled, what are its imports, exports, base address and other file properties? It's important to note that though sometimes you're not gonna be dealing with the windows, execute herbal, so you'll need to make sure that you do a proper file header analysis
so you can better focus your analysis efforts.
We can also employ dynamic analysis to understand what a piece of malware is doing. But when you've completed these steps and you need a greater understanding of what the suspicious file is doing, this is where we use code analysis.
We can perform code analysis by executing our program in a do bugger or without executing the binary
and these instances where we don't execute the binary. This is where a dis assembler can help us
Now. The most popular dis assembler currently is Ida Pro. But because I do pro is rather expensive. People in the analysis community are switching to another dis assembler called Deidra. This is free and open. Source to use in this course will use Ida Pro
ID. A Pro is an interactive, programmable, extensible dis assembler that you can run on Mac, Linux or windows
and using on a pro. You can translate your programs machine code into assembly language as well as look at strings, imports, exports. You can debug programs, we can create pseudo code, and we can also run python scripts depending on the version you have, it's
really fully featured, stable, reliable and well documented.
All right, so let's go ahead and hop into our lab, and we can look at the pro interface
now toe load, a binary, an idea. You can obviously click and drag it over the execute herbal. Or you can double click the shortcut and select a new file to disassemble
the file that you do give. Ida is going to act like a Windows loader. It does this based on the file headers, and it looks at the pre processor type that should be used during the disassembly process by default. Ida Pro doesn't load the P E headers or their resource section in the disassembly, but you can click those options,
and you can manually specify a base address worthy Execute Herbal code
should be loaded after you click. Okay, of course, I deplore loads the file into memory and the disassembly engine dis assembles the machine code.
The idea pro desktop gives you a lot of common static analysis tools, and we begin, of course, in the main window where we have the disassembly view.
Included in this view are some available tabs. This includes ways to look at our program In Hex, we can look at imports and exports, and we've got some other helpful tabs
on the left. We can view user created and compiled library functions to navigate into a function, we could simply double click on the function name.
Our main view consists of a graphical representation of the disassembly. However, there is also a text view, which you can toggle between by pressing the space bar.
In the graph of you, you can display one function at a time in a flow chart style, and the function is broken down into blocks. This is a great view if you want to quickly recognized branching and looping statements. Here in our example application. We can see that Ida Pro has given us that this assembly for a function.
So let's take a moment to examine what's going on here. Our first instruction is a push. Rvp.
This is pushing the base pointer onto the stack. The rvp registered. Now it's a special purpose register, and it's the base pointer. This is going to point to the memory address base of the current stack frame. Remember, the stack is just a structure of data that items are put on and then taken off
the next instruction moves. Theodore S of the RSP register into the RPP register. Thea RSP is the stack pointer and the r B p is the base pointer. We just said that
now it's important to remember you aren't overriding the rvp register you just pushed. You're simply using the stack to keep track of the address you just pushed. This is setting up your stack frame because, as you can see the next line, this is where there's some space added to the stack. Here, the stack pointer is decreased by 20 in checks, which is 32.
Remember, the stack grows down,
so if you look at a few lines down, you can see that the 20th hex is being added back to the stack.
You're going to see this commonly in functions the stack spaces allocated and then D allocated. Once the function is complete, we call the beginning of the function the function prologue and we call the end of the function the function Epilogue. This is, in essence, the function from start to finish.
Next, you can see that we've got a few calls to functions you can navigate within your calls by double clicking them in Ida Pro.
The first function is a main function now from your programming days. The main function is where your code and function calls occur. So when performing analysis to understand program flow, you'll want to know where the main function is.
However, this main function is a result of the compiler I used, which is GCC with GCC. This main function is necessary when I compile C code and allows me toe link C and C plus plus object coat together. This main functions pretty irrelevant in our case and weaken basically skip it.
The export window, which is one of our tabs here.
This can show you which functions can be used outside files of which main would be one.
Because I've used GCC, I have the main startup functions which is our programs entry point
along those same lines. We also have the ability to see all of the imports. Using the imports tab, we can see all of our windows a P I s and different libraries related to functions.
As we navigate back to our right of you, we have a few more functions. Now, as we analyze Mauer, we have a saying this is follow the calls, meaning that if we find an interesting function that we want to analyze, we can navigate to that function. And we can follow how the function is called to understand the functionality of programs
to navigate into functions. You can simply double click the function name, and Ida Pro will navigate you into that function. So if we navigate into our if example function, we can see that we've got a similar output to our main function. It's got a function prologue, which sets up the stack frame, and if you look at the bottom of our first main graph,
you can see that we have a Jan's instruction,
which indicates we have a conditional branching statement. This could be an if statement or select statement or a loop, But loops are a little bit different, and we'll look at those later
what I really like those. I like this feature in Ida Pro that the conditional jumps are displayed with the use of the green and red arrows. The green arrow indicates the jump will be taken if the condition is true, and the Red Arrow indicates the jump will not be taken. This is the normal flow of the program.
If we examine this function after the prologue, we could see that five is being moved into the address of our BP plus far for far. Four indicates a local variable, and local variables that are function requires are listed above Thief function. Prologue. From this, you could say that this function is defining two variables to use in its function.
After the five is moved into our BP plus four, you can see that the address of a string is loaded into our c X, if remember from earlier r c X and R D X thes general purpose registers are typically used for string operations.
Then another function named puts is called. And if you're unfamiliar with, puts its a function that essentially writes a string to the console. Next, our second variable is loaded into R E X. But if you notice here, instead of using the move, instruction were using the Elia instruction. Now, if you remember when we use L. E. A.
We aren't actually loading the value of our eight,
but the memory address safari. So instead of this being a value, this would be a pointer,
so I have to remove our pointer. Then we move that pointer address into the RDX register, and then we form at the string for the scan F function.
Now, if you notice this is how parameters are passed to other functions,
the parameters are moved into different registers, and then the function uses those as it's executed. So scan F is a function that takes input from the console.
So after Scan F does its work, it moves the result of its function into E. A X, which is compared to Hvar four, which we know is five.
So as we follow down through the branches, if the number that the user enters matches, five
string is displayed to the console, and if the number the user enters is not five, the console prints the string. Try again
now in a real world example, when you're looking at that where the functions probably aren't going to be so conveniently named as they are in their example program. However, as you analyze your code statically, you have options to rename variables or function names to help you organize your disassembly so that you can come back to it later and know what you did,
or so that when other analysts look at your data bases,
they know what you did once we know what we want to call them. For instance, we know that five is a function variable. That doesn't change. So we could just call this in five, and we know the other variable is user number, so we can highlight it and press and and just call it user number.
When we're all done disassembling of function, we can navigate backwards by pressing the escape key on your keyboard or by pressing the back arrow. Now let's go ahead and take a look at one more quick example. So if you double click on the loop example function, you can see that it's got a similar graphical view off our if example function.
The stack is set up by the function prologue, and then you can see that we move zero into our BP plus far, for this is gonna be our local variable.
The next block takes nine and compares it to bar four, or, in this case, it's zero.
If the value of of our four is not equal to nine. Then the green branch is taken.
The value of Hvar four is then moved into E. D X as a string as well as the output console message.
Next, these two parameters air passed to print F, which will then be called to write to the console.
Now the next instruction, this is going to add 12 of our four. Now, maybe this looks familiar to you if the name of the function didn't give it away. But you're looking at a control structure. This is a loop. The blue arrow is used for an unconditional jump, and a loop is indicated by the upward backwards Blue Arrow.
Now, this is pretty important because
becoming familiar with how to recognize thes control structures are pretty critical to performing reverse engineering as a lot of malware will loop through a string or other types of data to perform its description routines.
Okay, so I hope you're still with me here. I know this is a rather long session, but I wanted to hit on a few more points.
Now in the Intel architecture, parameters that are passed to functions are pushed onto the stack and the return value is placed in the E A X or Rx register. Now, in order to understand this process, let's take a look at our sample program on the left. Now when the program is executed, of course, the main function is going to call each function
and execute the instructions has called. If you look at the third function, add me. You can see that we've declared an integer for total and total holds the return value of the some function where the two numbers five and six are added together.
Inside the some function, the value of the arguments are copied to local variables A and B, and these are subsequently added together, and the sum function returns the value for C. Knowing how this function works, let's try to visualize what's happening with the sack as these instructions are being executed.
So here and our ad me function, the first thing that happens after our function prologue is that there's some space allocated to the stack for local variables as the arguments for the some function.
So at this point, let's say that the top of the stack points to the hex address FF nine. See now, As you can see, the some function is then called as a line in the ad me function
to execute this function. But before we call it, what we're going to do is to arguments are pushed onto the stack in reverse order. So from right to left, so six will get pushed onto the stack and then five will get pushed onto the stack. Once those air pushed on
a return address will be pushed onto the stack A well,
as the arguments and return address are pushed onto the stack, our stack pointer also moves to the top of the stack.
Remember the stack pointer SP always points to the top of the stack.
Now I just said that we push our return address on top of the stack. So what's our return address? Well, in this instance, it's the address of the print F function.
All that the return address tells us says, Hey, when you're done with all the instructions and all your calls, etcetera, come back to this address.
So after the return address is pushed, the some function is called, so the CPU passes control to this function, and it sets up a new function prologue for the some function.
The old base pointer that is the base pointer of the enemy function is pushed to the stack saved and the new stack pointer is pushed to the new base pointer. This is going to set up the stack frame for our some function. So now that our execution lies within our some function,
we can set up some space for the int c variable
from here. This is pretty cool. We can use the E B P Pointer as a reference to access arguments or variables as it relates to the some function.
To access the argument. To perform the sum of six and five, we can use the value stored in E B P Plus 12, which is equal to six, and E b P plus eight, which is equal to five.
If we need to access a local variable in the some function like N C, of course, we'll have to because we need to return this value. We can access it by using the address of E B P minus four.
So typically, local variables are stored in E V P. Minus some value and arguments are usually stored in E V P, plus some value.
When we disassemble code, we're looking at different functions and assembly, which low different arguments in variables to the stack during its execution. Now, depending on how malware authors compile their code, they may be using any number of development environments which can make malware look differently.
Function calling conventions govern how functions are called, how parameters have pushed to the stack or into registers, and whether the calling function is responsible for cleaning up the stack or if the function, which was called known as the Cali, is responsible for cleaning up the stack.
Surgical is one of the most popular calling conventions in C deco all arguments past, as parameters to the functions are pushed onto the stack from right to left, and the color cleans up the stack when the function is complete,
as you can see in our sample assembly code. At the end of our function, 12 is added to RSP, which is the main function collar, cleaning up the stack
in 64 bit systems. Generally, the fast call convention is used with fast call. The first two arguments are passed into registers, with the most common registers being e x and S E X.
Additional arguments are then loaded from a right toe left, and the calling function is responsible for cleaning up the stack. This calling convention is called fast Call because typically it's more efficient to use registers, as code doesn't need to involve the stack as much
as you can see in our program are three variables are getting moved into the e, x, e, c, X and E X Register and in the function epilogue, the caller is cleaning up the stack.
Okay, so I know there was a lot of review material there, so let's take a quick break in. In part two, we'll review more static analysis concepts.
Advanced Malware Analysis: Redux
In this course, we introduce new techniques to help speed up analysis and transition students from malware analyst to reverse engineer. We skip the malware analysis lab set up and put participants hands on with malware analysis.