By: Shimon Brathwaite
October 12, 2021
Assembly Language Basics
By: Shimon Brathwaite
October 12, 2021
Assembly language is a low-level programming language that is intended to communicate directly with a machine’s hardware. Computer programming languages can be high-level or low-level languages. The primary difference is that a high-level programming language is more programmer-friendly, meaning that it is much easier to understand than a low-level language. However, it is also less memory efficient because it operates at a higher level of abstraction. It is not designed to interact directly with the hardware of a computer and requires a compiler or interpreter for a translation.
In comparison, assembly language is a more machine-friendly language; it is more difficult for humans to understand, but it offers more control, allowing it to be more memory efficient. It needs an assembler for translation rather than a compiler or interpreter. Today, high-level programming languages are far more popular. However, cybersecurity is a few places where learning a low-level language such as assembly language is still extremely valuable. Primarily this is in the field of malware analysis. This is where cybersecurity professionals take malware samples and break them down into assembly language to determine what the malware does and hopefully how you can stop it.
Why Is Assembly Language Important For Malware Analysis?
Whenever malware is written, the author typically uses a process called obfuscation to prevent other people from being able to read the malware’s code and understand how it works. When a malware sample is found, analysts use ollydbg or IDA pro to break the program down into its assembly language components. From here on out, you will understand what the program does and eventually how you can stop it from spreading or detect it on a machine. This is done by identifying indicators of compromise (IOCs), which are important ways to detect a virus that is unique to that program. Examples of IOCs are file names, IP addresses that the program calls to, file directories that it saves itself, etc.
How Does Assembly Language Work?
It was previously stated that low-level languages provide little or no abstraction. It means abstraction from the processor. When you write something in an assembly language, you are giving commands directly to the computer processor. Modern-day assembly languages typically code programs in assembler, which are converted to machine code, which is the only type of programming language that a machine understands without processing. This is an example of what machine code looks like in binary form:
Source @ secjuice
Let us look at some of the assembly language components and how it gets converted to machine code.
What Are Mnemonics?
A mnemonic is a name assigned to a machine function or abbreviation for an operation in assembly language. Each mnemonic represents a machine instruction in assembly. In the example above, add is an example of one of these machine instructions. Some other examples include mul, lea, CMP, and je.
What Are Registers?
Registers in assembly can be compared to global variables used in higher-level programming languages like Python or C. There are three primary types of registers:
- General-purpose: Eax, Ebx, Esp, Ebp
- Segment: CS, CD
- Control: EIP
These registers are capable of storing a certain amount of data and a certain type of data. For example:
EAX - Accumulator Register - used for storing operands and result data. It can store 32 bits of data.
EBX- Base register - Points to data. Can store 32 bits of data.
ECX - Counter Register - Loop operations. Can store 32 bits of data.
What Happens To Data That Doesn’t Fit In Registers?
Within assembly language, whatever data that cannot fit into registers is stored in computer memory. This happens because registers allow for the quickest retrieval of data. Therefore, you want to store as much data as possible in the registers and only store what is leftover in the memory where the retrieval rate will be slower. Below is a diagram outlining the memory hierarchy of a computer:
Source @ secjuice
As you can see, the fastest storage space is at the top, but it is also the smallest when it comes to capacity. Size increases drastically, moving down the hierarchy, but it also becomes slower.
Typically data is stored in two data types in memory: little-endian or big-endian. Little-endian data storage is typically used when the main focus is processing speed and not the amount of power consumed, which is great for laptops and computers and is usually used for intel processes. While big-endian is typically used in ARM processors used for mobile devices, it prioritizes being power efficient.
How To Learn Assembly Language
Unfortunately, assembly language doesn’t have nearly as many resources for learning as high-level programming languages. This is simply because most programming is done in higher-level languages, so there isn’t a huge demand for learning assembly outside of very niche specialties. If you’re interested in learning assembly programming, there are a few good options, including Udemy, tutorialspoint, and Cybrary’s assembly language programming course. Like most programming languages, the best way to learn is through deliberate practice and reading and studying. If you’re interested in learning assembly language for cybersecurity, particularly malware analysis, you will want to be familiar with the tools ollydbg and IDA pro. These are the two leading tools for malware analysis and converting .exe files to assembly code. You can download these tools individually, or you can download a Windows distribution called Flare VM. This security-focused distribution was created by FireEye and designed for reverse engineers, malware analysts, incident responders, forensic investigators, and penetration testers. When it comes to which assembly language, to begin with, I recommend X86, followed by ARM. These are the two most popular assembly languages, X86 is used for intel processors, and ARM is primarily used in mobile devices like cell phones.
Assembly language is a low-level programming language used to give direct instructions to the processors of a device. While low-level languages are typically harder to learn, they allow you to be more memory efficient than high-level languages. Also, it is extremely useful for malware analysis because most .exe files can be converted to assembly language and studied to find indicators of compromise or other means of defending against that strain of malware. To be most effective as a malware analyst, you should learn how to use industry-standard tools like ollydbg and IDA pro for malware analysis instead of just focusing on learning assembly language.