Static Analysis Part 2

Video Activity
Join over 3 million cybersecurity professionals advancing their career
Sign up with
Required fields are marked with an *

Already have an account? Sign In »

3 hours 41 minutes
Video Transcription
Hello, everyone, and welcome to static analysis, Part two. In this session, we're going to learn and review how to recognize code constructs and assembly.
The first bit of code involves global local variables. When we talk about global variables, thes air any variables that are accessible outside of the scope of local functions,
thes variables are constants and are available to any function throughout the program.
Local variables These are Onley accessible within the scope of the function in which they are declared.
Now, in see the declaration of these variables, they look similar. However, when you look at them in the dis assembler, they look totally different. If we inspect our main code, you can see that we're declaring to local variables of X and y, and then we're performing. In addition to the global variables A and B,
our assembly code, you can see how 102 100
are being pushed onto the stack as local variables. These have a memory address of R B P minus four in minus eight, respectively, and then next, the Global Variables A and B are also pushed to the stack. However, if you notice the two variables a and B are in brackets,
meaning that this is the memory location of these variables
in our static data section. Now, in this particular instance are dis assembler decided to display the variable name and square brackets. In other instances, I've seen references to the actual address of this variable in the static data section. So just be on the lookout for both of these display methods.
And C We can perform arithmetic in a lot of different ways on the left. In our example c code. We've got a couple of variables and some different arithmetic operations.
Two of them are the increment and detriment operation, which you typically see in looping statements. Also, we have the percent sign operation. This performs a module oh, between two variables, which is the remainder after you divide two numbers
in our example assembly code, you can see that 102 100 are pushed onto the stack and the ad function adds 300 to the variable X, which is located at R B P minus four. In our subtract output, you can see that the sub instruction is used to subtract 100 minus 200
in our detriment assembly output.
You can see that instead of using the detriment instruction the compiler used, the add instruction minus one. You'll typically see the add instruction or sub construction being substituted for the detriment instruction because the add instruction is faster and it requires less memory to run.
This is also true in our increment function, where the compiler chose to add one to the variable.
The final set of instructions implements armadillo and when performing the idea of instruction, you're dividing e d x Colin e. X, by the operation and storing the results in E X and the remainder in e d. X.
Previously, we have looked at how to recognize if statement signed a pro if statements alter program execution based on if a certain condition exists now, it's important to remember that these are really common in C code and disassembly, and as a malware analyst, you definitely want to be able to recognize these constructs.
To understand our C code. You could take the if statements and you could break them apart. So let's look at our first if statement
in this statement, everything hinges on the evaluation of a equal to be or nine equal to eight.
Obviously it is not. So we look the next line and we ask ourselves, Is nine equal to eight? No.
Is C equal to zero? No,
it's not. So we move on to the next line and we ask ourselves Okay, Should we print this? Well, see is not zero, but nine is not eight either. So we move to the next line.
Now we go back to our original evaluation
Is nine equals eight.
It is not so we know we're in the right branch. This is the else branch.
So here we have another. If
we ask ourselves again, is C equal to zero?
Well, it's not so in this case, we print that C is not equal to zero and a is not equal to be
to better understand our assembly for these nested If statements in this example will use the graph functionality of Ida Pro
So let's go over to our lab
and we will open up our program
now because we're already in the main function. Thio, view the graph for these nested if statements we can use the view
flow chart menu option.
Okay, so our graph opens up and let's make this a little bit bigger
now, here in our graph, you can see that we've got the main If statement, this is the top block
with the two other if statements branching from the main. If in these two other blocks here, you can immediately tell that this is a nested if statement
in our top block, you could see that at first we move our variables onto the stack and the variable A is being moved into E a X, which, of course, it's nine.
And then it's comparing E x 28 which is B. This is our first if evaluation.
Now remember, the Red arrows indicate a normal line by line execution flow,
which means that this branch wouldn't be taken if the variables were the same. They're not. So we follow the green branch. Then we're asking ourselves, Does C equals zero?
It does not. So again, we take the Green Branch which prince? Our message to the console.
So one last note here to fully understand if statements remember what the compare instruction is doing,
it's subtracting the right most operandi from the left most operandi. And if the value of that subtraction is zero. It moves along the red execution line as indicated by our Jay Z instruction.
J and Z is jump, not zero. So in this case will move along the green execution line as indicated by our graph.
Previously, we've looked at wild loops and as a reminder, loops are repetitive tasks which are pretty common in any type of software that's written these days.
Here we have a basic for loop and loops always have four components initialization comparison, the instructions that are executed and then the increment or detriment.
In this example, the initialization sets I 20 and then we check to see if I is less than 10.
If I is less than 10, the print effin function is called and it prints thestreet in the parentheses to the console. Next, the increment part of the function will add one toe. I and the process will see if isa less than 10.
The loop continues to execute until I is greater than or equal to 10. Here in our assembly code, you can see that zero is moved into Bar four in the initialization section.
Then our code executes a jump to the compared portion of our code, where nine is compared to the value in Bar four, which is a zero.
The next line says that if the result of the comparison is less than equal, then go to our print and increments section, print the status message and then add 1 to 0, which of course, is located in Bar four. Then our comparison section is run again. One is compared to nine,
and the loop continues its execution until I is greater or equal to 10,
at which point the program exits.
A switch statement is typically implemented by back doors, and they're used by programmers and malware authors alike to make a decision based on a character or into sure
switch statements are compiled in two ways. The first is using the if style, which we've already looked at, and those which implement a jump table. Jump tables are commonly found with really large contiguous switch statements and these air typically implemented as an optimization feature of the compiler.
When the compiler chooses to use the if style switch statements,
this assembly will look similar to what you've seen already.
It's gonna be hard to tell if the malware author use a switch statement in these cases. But in some instances when a jump table is used, an offset is used to indicate that there are additional memory locations and use
the offset seen in our code image uses the switch variable as an index into the jump table.
An array is a data list consisting of similar data types. Now the nice thing about rare elements is that they are story and continuous locations in memory, which makes it pretty easy to access different elements.
Malware authors will use a raise that contains pointers to strings as a way to indicate options for different malicious host names the malware might contain. If you look at our C code, we've got to a raise. The first is a global array named Numbs with three elements,
and the second is a local array named Z.
Also with three elements to access elements of the array, you use the array name along with the index,
the for loop. This simply updates each element of each array with the numbers one through three. Now, just to know here, we're using the same variable I to do the counting and the updating
in our assembly output. You can see that we are moving zero into bar for this is our initialization. Now we jump down to our loop, which compares to 20 This is our comparison, of course.
Then we jump to the access update section of our program. Which moves are our initialization variable into E d. X theory. Elements are accessed using three things.
The base address of the array. In our first case for our zero Ray, this is of our 10 the index indicated by our X and the size of the element. Now, because we're using an into your data type, the size of the element is four. So that's what we've got are a X times for
when we see this assembly like this, it's a dead giveaway that the elements of the array of being accessed
The resulting value is added Thio the base address of the array to access the proper array element.
Our second array is being accessed and update using a data segment register.
This is because our ray has been defined as a global variable.
Once the for loop updates both arrays, the program will complete and exit
a string. Is Justin array of characters. When you define a string, keep in mind that a null Terminator is added to the very end of the string.
Each element of the string array occupies one byte of memory. Because the data type of HR is one bite
in our Example program, the string named Domain is a pointer that points to the first character in the string, meaning that this points to the base address of the domain character array.
You can access each character similarly to how you would any other array element. In this example, you can see that we set up a string array equal to bad side dot com, and in the second line we set an integer to the length of that strength.
Next hour, Lupita rates over every element of the character array, using the length as a condition.
The print of command will then print out every letter in the extreme domain. This would output the entire bad side domain to the console.
As you can see in our assembly code, the memory address of our string is loaded into our A X and then Rx and R C X are used to get the value of the string length.
The return value from the string length is placed into E. A X, which is then used as our comparison variable, which is placed into far 14.
Then we take zero and make it our initialization variable, which is moved into bar for. Then we jump to our loop statement, which takes zero and compares it to far 14. Next we jumped to iterate over
which accesses the bites of our string from memory to memory, using the moves the ex and move SX instruction.
Then our character is moved from E a x two, E, c. X and the put Char a K A. The print of function is called and then prints the character to the console and then one is added to our immigrant variable var. For
then we continue down. We start the loop over again until we iterated over every character in the string bad site dot com, and when that's complete, the program will exit.
Now, in our instance, we're moving characters from memory to memory. But the X 86 x 64 architecture er supports moving different data types as well as data from and into different types of locations.
For instance, if we're copying a value from a registered to memory, we could use the S T o X instruction.
If we're loading from memory to our register, we could use the L. O. D s ex instruction
or for scanning through the memory to look for a specific character. We could use the S E. A s ex instruction. Lastly, to compare values and memory, we can use the C M. P S X instruction. These instructions are frequently coupled with the R E P or repeat instruction
to copy memory to and from different locations.
All right, so that concludes our static analysis section. I hope that this session was a good review for you and that you'll be able to use it as a reference as you're performing your malware analysis or reverse engineering. With that being said, let's wrap up our module with a quick summary
Up Next