3 hours 41 minutes
Hello, everyone. And welcome to this section on assembly. Language in this session will learn about assembly language and how programs interact with the computer architectures.
This is a pretty critical skill. If we're going to be performing a static analysis on now, we're binaries. So let's get started
as a quick review. When we write a program were typically writing it in a high level language. This could be C C plus plus or any other type of language that's compiled by the computer.
Now there's many types of languages out there, and that's our challenge. Is a reverse engineer, right? We need to figure out which language of program was crafted by.
But for now, let's use C or C plus, plus an example. During program compilation, the code is going to go through four separate stages pre processing compilation assembly and linking. Now the stage where most concerned with is where the pre process their code is translated into assembly instructions based on the target process architectures.
Now you can see that we've compiled a sample program here on the left
and we've output id the assembly code on the right.
Now the goal of this section is to understand assembly language enough so that we can feel confident when we're looking at the Assembly code of a disassembled program. Now, based on the information that we've covered so far, we have almost all the pieces we need to make this possible. However, we just need to add a few more relevant concepts to make this clear.
The malware executed files We analyze these air all in machine code format
because this is really impossible for us to read as humans, we use this assemblers. Now this assemblers air going to take machine code and they're going to convert it into assembly language. In this course, we use either pro or Godhra to do that conversion for us.
So if we go ahead and open up our malware file, I already have a malware
database that I've been using. So I'm just gonna having double click this.
So let's go ahead and double click and execute that. Okay, So, to understand assembly, we need to look at the format. So any time we look at assembly, it's displayed a line by line with its label first. So here is the label. This will be the address in hex
and the second column. This is going to display the machine code or it's called the op code into the right of that, we have the instruction pneumonic. That's push and move here, and the instruction pneumonic also contains arguments, and these are known as the operations. So we have two operations right here.
When we read the Assembly, we could get an idea of the operation
by reading the instruction. For instance, right here on the second line, we are taking the data in the RSP register and moving it into the R B P Register when we read instruction arguments there displayed first by destination, followed by the source. So source and destination
an assembly. Every instruction consists of an operation killed and operations.
Now the op code indicates what operation the CPU executes and the operations are the data and our values that the operation operates on.
For instance, here we've got three instructions that move data to and from different locations. They are grouped into three types. The first, our immediate top brands thes have a fixed data value.
For instance, here in our first line of assembly, were moving in immediate value of C eight into the R B X Register. The RBS Register is called a register operate
now, in addition to immediate and register our brands. We also have indirect memory addresses
now indirect memory addresses. They provide data values that are located at a specific memory location.
These are typically shown in the form of square brackets. As you can see in our 2nd and 3rd lines of code,
however, the memory locations can be supplied in a few different ways. It can be a fixed value, a register or any combination of register or fixed.
For example, here in our second line, R C X in square brackets refers to data located at the address held in our C X.
So if our c X holds Theodore s for 0000
this instruction transfers the value held at that address into RDX.
In our third line,
E B X plus four refers to the data located at the address held in E B X plus four. So if EBX holds an address of +40000 then the instruction operates on the data located at +40004
in assembly language. We've got some common instructions you're going to see when you disassemble programs. Typically, they can be broken down into five categories, and the first ones will look at facilitate copying and accessing data.
For example, we've got the move, instruction and move instructions, read values at the given address registers and so on. In addition to move, there is also the load effective address instruction, which is used to get data in the form of a memory address.
The load effective address calculates the source operandi in the same way as the moved US. But rather than loading the contents of the address into the destination opera and it loads the address itself
as a note, you might see the L A instruction used for general purpose arithmetic. Also, the next type of instructions are addition, subtraction, multiplication and division, and these are indicated by the pneumonic Add sub m u l and def
the arithmetic instructions at and sub thes. Take two operations, a destination and a source.
The destination could be a register memory location, and the source may be either a memory location. Register constant
now the ad and sub thes add or subtract the source and destination and the results are stored in the destination.
Now, as a note, we also have the increment or Decorah mint, and these also can add or subtract one from a register.
When we want to perform multiplication and assembly, we do it using the M. U L instruction. It only takes one operandi and it's multiplied by the content of the Rx Register. Then the result is stored in the A X or D X family registers.
When he wants to perform a division and assembly, we do it using the DIV instruction. It only takes one opera and where the number to divide is stored in the E X and E X register.
In division, the number is split up where the significant the word is held in e. D. X. After the division is executed, the quotient is stored in the EAA X Register, and the remainder is stored in the E. D. X register.
Bit wise instructions are binary logic instructions which operate on bits. Thes are used because they're fast and they could be used to perform higher math functions like multiply or divide, and they're commonly used in cryptographic, obfuscation and decoding algorithms.
Now here we've got a pin number represented as zero and hex.
Now, even though we're working with a bite, the same applies to words, D words and so on. Now bits. They have a location starting from the least significant bid, starting on the right and moving towards the left to the most significant bit. So zero through seven, respectively.
The not instruction. This is our first but wise operation. This takes one operandi and simply inverse the bids so 000008 zeros becomes eight once the result is then stored in the same location. It's very useful for inverting values
the and X or AND or
functions thes, perform operations on the source and destination, and they store the results in the destination.
These operations are similar to end X or and or in Sierra Python.
Shift and rotate instructions. Perform these operations on the destination and the count, which we'll see in a second.
The end instruction compares each binary form of two injures in returns. A new integer.
The new interest er is formed by looking at each bit position of the comparison and setting the new bit position to one. If both are one otherwise, it sets the bit to zero.
A useful implementation of the and instruction is to check and see if a number is even or odd.
The or instruction compares each binary form of two integers. The new integer is formed by looking at each bit position of the comparison and setting the new bit position to one if either of them are one.
Otherwise it sets the bit to zero.
The X or instruction compares each binary form of two integers, and this operation sets the new bit to zero. If both bits are equal and if not, it sets a bit to one. This is commonly used to clear register to zero.
The SHL instruction takes bits of our binary number and move them to the left. By the count operandi, this is the same as multiplying it by two to the power of n.
So if the contents of the A L register is 20 and this is equal to 00010100 in binary and the count operate is three, then we shift bits to the left by three.
The bits on the left fall off and we fill in the new bits with zeros on the right, and so are new. Binary number becomes 10100000 or 160 in decimal. The SRL instruction. This shift spits to the right by the Count top brand.
This is the same as shift left, but instead
it's the least significant bits that fall off on the right hand side. And this is the same as a division by two to the power of end. Lastly, the rotate left and or rotate right instructions thes air similar to the shift instructions. But instead of moving the shifted bits, they're rotated to the other end.
Okay, so there was a lot of assembly language there. I hope you are still with me in the next session. We're going Thio, wrap up our computer architecture er and assembly language discussion with examining control flow and the stack
Advanced Malware Analysis: Redux
In this course, we introduce new techniques to help speed up analysis and transition students from malware analyst to reverse engineer. We skip the malware analysis lab set up and put participants hands on with malware analysis.