Ana səhifə Blog I hated Assembly (until I met GDB)

I hated Assembly (until I met GDB)

I hated Assembly (until I met GDB)

For a starter, I should say Assembly language is really annoying in many points if you get used to work with high level languages. It literally needs to be instructed for "everything". Metaphorically speaking, It's similar to thinking about every move for moving a step forward like which muscles you will you move, the exact location of your foot, how it should move in which way and where exactly will it stop moving and in what speed, instead of just stepping forward. However, getting full control over your muscles with practice, I mean, getting used to work with assembly, makes you unbeatable by that fancy high level languages when you need. I'm still a beginner at assembly language and I don't know how far I'll go. Probably I'll use it only for special purposes such as critical memory operations or performance issues etc. But I want to share some of my experience here so less people could suffer.

Before starting to explain my codes, I'll share some useful sources that really helped me a lot. Note that, I'm using Ubuntu 13.10 and NASM. Maybe you can find answers there, too.

  • NASM Forums : If you got stuck and couldn't manage to handle it, you can always ask your questions here. Not very very active, but active users are willing to help and you can learn many things from older posts and questions about your issue here.
  • NASM Manual : Don't look for straight answers to complicated problems, but it explains everything about NASM in every detail here.
  • Dr. Paul Carter's Assembly Tutorials : This source was one of my primary. It's really good and explaining everything. I strongly recommend it.
  • Linux System Call Table : This holds system calls with register definitions for Linux. You'll need it. Save it.
  • Running Nasm with Gbd : Tutorials and explanations about using almighty GNU Debugger with NASM.
  • Google : is our best friend as always.

I'm not unaware of importance of debugging, yet when I started learning assembly, simple text editor (gedit in my case) and an assembler (NASM) were my only tools. Yet, for more complicated operations, keeping track of the process is essential. Gdb is the saviour in that point. Since assembly is not as understandable as high level languages, you can lose it at some point. Nobody suggested me to use it, I was desperately trying to make things work. It might be my lack of knowledge, but this is the part of being newbie. So I strongly recommend you to learn using NASM with GDB if you are not familiar. A simple and beneficial link is above. It helped me a lot.

Now, first thing I did with assembly is to calling a C function within assembly code. Both are very simple, but there are some tricks to handle it. Here's my C code, named callee.c :

#include stdio.h

extern void hello_world();

void hello_world(){
printf("Hello World!\n");
}

As you can see, this program will not run itself, since there is not a main method to call it. If you try to create an executable of it, you'll get errors according to this. So you need to add a main in your assembly to call it. extern keyword in front of function definition is not mandatory for our example since we'll link executables manually. But for any case, I would like to be cautious. Now, here's the assembly code that calls hello_world function in it, called caller.asm :

section .text
global _start
global main
extern hello_world

_start :
call main

mov eax,1
int 80h

main :
call hello_world
ret

Basics of nasm are explained at links above. .text section(or segment, interchangeable) holds the program logic. For gcc compiler you need main, for ld linker you need _start. _start will be the beginning of our code, it'll call main in it, after main return to the line, 1 will be assigned to the eax register and system call will be initiated. Registers and system calls are also mentioned at links above. Program calls hello_world function, then returns and halts. To run program, you should give these commands at terminal :

$gcc -c callee.c
$nasm -f elf caller.asm
$ld -s -o call callee.o caller.o -lc -I/lib/ld-linux.so.2

Extras at ld linker are necessary only when you call a C function. Otherwise you'll not be able to get a valid executable.

For a better example, I wanted to implement an assembly code for Counting Sort Algorithm. Here's the C code for it called count.c :

#include stdio.h

int main(){
int array[] = {0, 2, 1, 2, 3, 5, 4, 7, 8, 9};
int mid[16] = { 0 };
int res[10] = { 0 };
int range = 16;
int length = 10;

int i = 0;
for(i ; i printf("%d\n", array[i]);
}
i = 0;
for (i ; i< length; i += 1){
mid[array[i]] += 1;
}
i = 0;
for (i ; i< range; i += 1){
mid[i+1] += mid[i];

}
i = length;
for(i ; i>0 ; i -= 1){
mid[array[i-1]] -= 1;
res[mid[array[i-1]]] = array[i-1];
}
i = 0;
for(i ; i printf("%d\n", res[i]);
}
return 0;
}

I made it as simple as possible to implement logic to my asm code. Only stdio library used, with printfs and for loops. Well, basically, counting sort counts elements for each, then puts these count values to another array. Then adds every count value to next one. By doing this, we could get an idea of how many of these, that are smaller than and equal to a certain element exists in our array. By substracting 1, which is a representation of our current value, we could get the location of this value, in a sorted list. Here's my assembly code for Counting Sort, named count.asm :

%define STEP_SIZE 4
%define RANGE 10
%define LENGTH 15
%define STDOUT 1

section .data
global array1
array1: dd    3, 4, 1, 1, 7, 5, 3, 0, 9, 2, 1, 3, 4, 5, 5
arrayLen : equ   LENGTH
array2:     times RANGE dd 0
tmp: dd 0
spc: db    ' - '
spcLen: equ $-spc
spc2: db ' - '
spc2Len: equ $-spc2

section .bss
result resd 50
section .text
global _start

_start:
call count
mov eax, 0
push ecx
mov ecx, result
call display
pop ecx
mov eax, 1
int 80h

count:
pushad
mov ebx, array1
mov ecx, array2
mov eax, arrayLen
call for1
mov ecx, array2
call for2
mov eax, arrayLen
mov ebx, array1
call for3
popad
ret

for1:
push eax
mov eax, [ebx]
add ebx, STEP_SIZE
add dword [ecx + STEP_SIZE*eax], 1
mov eax, [ecx + STEP_SIZE*eax]
pop eax
sub dword eax, 1
cmp eax, 0
jnz for1
ret

for2:
push eax
mov dx, 0    ;for sum
add dx, [ecx + eax*STEP_SIZE]
inc eax
add dx, [ecx + eax*STEP_SIZE]
mov [ecx + eax*STEP_SIZE], dx
mov eax, [ecx + eax*STEP_SIZE]
pop eax
add eax, 1
cmp eax, RANGE-1
jne for2
ret

for3:
push eax
mov dx, 0    ;result value
sub dword eax, 1
push eax
mov eax, [ebx + eax*STEP_SIZE]        ;get first array's eax value
sub dword [ecx + eax*STEP_SIZE], 1    ;mid array - 1
mov eax, [ecx + eax*STEP_SIZE]        ;put new value to eax as index
pop edx
mov dx, [ebx + STEP_SIZE*edx]        ;put that value at array to dx
mov [result + eax*STEP_SIZE], dx    ;move dx to result array at given point
mov eax, [result + eax*STEP_SIZE]
pop eax
sub dword eax, 1
cmp eax, 0
jnz for3
ret

display:         ;array print (Version4)
push eax    ;save counter
push ecx
mov ecx, spc2
mov edx, spc2Len
mov ebx, STDOUT
mov eax, 4
int 80h

pop ecx
pop eax
push eax
mov eax, [ecx + STEP_SIZE *eax]
add eax, '0'    ; convert to ascii
mov [tmp], eax
mov ebx, STDOUT
push ecx
mov ecx, tmp
mov edx, STEP_SIZE
mov eax,4
int 80h

mov ecx, spc
mov edx, spcLen
mov ebx, STDOUT
mov eax, 4
int 80h

pop ecx
pop eax
inc eax
cmp eax, arrayLen
jne display
ret

As seen from the example, .data section holds initialized variables, since .bss is for uninitialized variables. define keyword holds values that defined before compiling and could be anywhere in code for this value. I also want to mention pushad and popad instructions. These two are life savers when a block of operations are needed in code. pushad pushes important registers to stack and popad pops them to registers, let say it creates a saving point for your code for that new code segment to not to broke anything. Of course operations on stack with bad addressing or changing esp register value would cause it to misbehave but for registers it's a good security check for me. This program is almost identical to C code. I deliberately keep loops as different labels because, I was at the edge of losing it and I could convert operations to assembly easily. To run it you should give terminal these commands :

$nasm -f elf count.asm
$ld -s -o count count.o
$./count

Last code for me now includes recursion. For that I implemented Newton-Rapson Root Finding Algorithm. Logic is basic. You take an initial guess for square root, by the knowledge of that dividing some number of its square root results again as its root, and dividing something bigger than it could result some number which is smaller than its root and lastly, dividing it to a number smaller than its root would result a number which is bigger than its root. By taking arithmetic mean of this divisor and quotient we could get an approximately closer value to number's root. This new root will be assigned to current and program compute an approximate value by calling itself recursively until a satisfying answer comes out. Here's the asm code for it, nr.asm :

section .data
number: dd 28
section .bss
result: resd 1
section .text
global _start

_start:
call root
mov eax, [result]
add eax, '0'
mov [result], eax
mov ebx, 1
mov ecx, result
mov edx, 4
mov eax, 4
int 80h

mov eax, 1
int 80h

root:
pushad
mov esi, number
mov edi, result
mov ecx, [esi] ;ecx holds number
mov ebx, ecx ;ebx previous guess
shr ebx, 1 ;first guess is calculated by right shift, which in binary means dividing to 2
call nr
mov edi, result
mov [edi], eax
popad
ret

nr:
mov edx, 0
mov eax, ecx ;Move number to eax.
div ebx ;Number divided to guess and result is stored at eax
add eax, ebx ;quotient and guess are added
shr eax, 1 ;and get and arithmetic mean of it by dividing to 2
mov edx, eax
sub edx, ebx ;substract new guess from old guess
jz finish ;jump to finish if it's zero
cmp edx, 1 ;or if it's 1
je finish ;jump equal
mov ebx, eax ;result is moved to ebx register, which holds the address of guess
call nr ;calls itself with new values
finish:
ret

Since operations are on integer values, result will be a bit far from accurate if number is not an exact square of another integer value. Using and manipulating float variables is another story. Maybe I would mention that later. To run this program you should enter commands to terminal :

 

$nasm -f elf nr.asm
$ld -s -o nr nr.o
$./nr