The Basics of Shellcoding
{LANG_NAVORIGIN} Exploits
Angelo Rosiello
10/01/2004
Introduction
A shellcode is a group of instructions which can be executed while
another program is running. Nowadays lots of examples show how a shellcode can be executed while an application is running
and its followings is proposed us by vulnerabilities' exploits. In order to get advantage from a vulnerability it is
indispensable to inject a shellcode because we have to get the control of a running application. The goal of this article is
not to explain all the possibilities of injecting a shellcode developed during last years, but to analyze and understand its
essence.
Registers
Before analyzing the assembly code and then the binary's one, it
is necessary to give an overview of the CPU's registers in order
to understand their importance in the assembly language.
The architecture we are going to show is the Intel-x86's one.
All the registers of the Intel's platform support 32 bits which
can be divided in sub sections of 16 and 8 bits, just to let an
heuristic use of the memory.
32 bits 16 bits 8 bits (high) 8 bits (low)
EAX AX
AH
AL
EBX BX
BH
BL
ECX CX
CH
CL
EDX DX
DH
DL
EAX, AX, AH, AL These registers are said accumulators
and can be used for arithmetical and input/output operations or
to execute interrupt calls. We will see how it's indispensable to
use them when we have to realize system calls.
EBX, BX, BH, BL These registers are the base registers
and they are used as base pointers to access in the memory.
We will use these registers to pass the system calls' arguments.
Now and then they are also used to store the return value of an
interrupt. (e.g. When we call an open(), the descriptor's value
of the file is stored in the register EBX.)
ECX, CX, CH, CL These registers are said counter registers.
EDX, DX, DH, DL These registers are the data registers
and they can be used for arithmetical operations, interrupt calls
and some input/output operation.
Introducing the Assembly language
The assembly language we are going to approach is named "Inline
Assembly" and it adopts the syntax of AT&T.
The name of the registers is preceded by the symbol "%",
thus if we have to use the register eax we must type "%eax".
If we are going to refer to numerical constants, its value must
be preceded by the symbol "$".
In the following scheme, one can observe the most used in-
structions in the assembly language.
MOV - This instruction let us to move a value in a register.
mov $0x4, %al - moves 0x4 into al
mov %eax, %ebx - moves what is in eax into ebx
PUSH - Put a value in the stack.
POP - Get a value from the stack and store it in a register
or in a variable.
INT - interrupt call.
int $0x80 - it gives the control to the kernel.
Codification phase
The algorithm we are going to implement in assembly language
and then in binary code(as hexadecimal version) is the print on
the video of the string "WWW.ROSIELLO.ORG".
The solution of the problem in C language is the following
piece of code:
int main()
{
write(0, "WWW.ROSIELLO.ORG", 16);
exit(0);
}
In order to realize the write() and the exit() we have to exe-
cute their system calls.
It is possible to find in Linux the library "unistd.h" where are
stored all the system calls that one can use.
angelo@rosiello.org$ cat /usr/include/asm-i386/unistd.h
/*
* This file contains the system call numbers.
*/
#define _NR_exit 1 <- This is our exit()
#define _NR_fork 2
#define _NR_read 3
#define _NR_write 4 <- This is our write()
#define _NR_open 5
write(0, "WWW.ROSIELLO.ORG", 16);
...............................
...............................
The first argument "0" is the standard output(video) where
we have to print the string wich appears as second argument.
The last argument "16" indicates the length of the string.
Let's try to implement this instruction in assembly.
xor %eax, %eax <- It cleans the register %eax
xor %ebx, %ebx
xor %edx, %edx
push %eax <- It inserts NULL into the stack closing the
string, thus, no garbage characters will appear.
push $0x47524f2e #push GRO. into the stack
push $0x4f4c4c45 #push OLLE into the stack
push $0x49534f52 #push ISOR into the stack
push $0x2e575757 #push .WWW into the stack
The above four push insert into the stack the string "WWW.ROSIELLO.ORG"
in its hexadecimal codify.
As one can notice the string must be pushed into the stack
overturned because of the stack's working strategy.
The Standard Output's descriptor is associated with the %ebx
register wich contains at the moment the value 0 then we have
not to indicate anything else. (write(0,..)).
mov %esp, %ecx # it moves %esp into %ecx
Now the string's address is in the register %esp (remember
that esp is increased/decreased only by pop/push) and we put
it in the register %ecx, thus the CPU will be able to find the
accurate position of the string in the stack (write(0, string, ..)).
mov $0x10,%dl #size 16 bytes
Exactly as in C language we indicate that the string size is
16 bytes (write(0, string, 16)).
mov $0x4,%al #syscall for write()
We put in the register eax (in the low part: al) the number
of the write() routine.
int $0x80 #execute the syscall
Now the kernel will get the control of the application and will
execute our write() routine.
The implementation of the exit(0) is even easier.
exit(0):
xor %eax, %eax
xor %ebx, %ebx
eax and ebx registers are clean.
mov $0x1, %al #syscall for exit()
Let's insert the value of the exit into al.
int $0x80 #execute the syscall
Let's give the control to our kernel.
Compile and Execute
The last step to do is the codification in binary code. In order
to reach our purpose we will use the gnu debugger (gdb).
angelo@rosiello.org:~shellcode$ gdb rosiello
(gdb) disas main
Dump of assembler code for function main:
0x80482f4
: push %ebp
0x80482f5 : mov %esp,%ebp
0x80482f7 : sub $0x8,%esp
0x80482fa : and $0xfffffff0,%esp
0x80482fd : mov $0x0,%eax
0x8048302 : sub %eax,%esp
0x8048304 : xor %eax,%eax
0x8048306 : xor %ebx,%ebx
0x8048308 : xor %edx,%edx
0x804830a : push %eax
0x804830b : push $0x47524f2e
0x8048310 : push $0x4f4c4c45
0x8048315 : push $0x49534f52
0x804831a : push $0x2e575757
0x804831f : mov %esp,%ecx
0x8048321 : mov $0x10,%dl
0x8048323 : mov $0x4,%al
0x8048325 : int $0x80
0x8048327 : xor %eax,%eax
0x8048329 : xor %ebx,%ebx
0x804832b : mov $0x1,%al
0x804832d : int $0x80
End of assembler dump.
Our code begins at the instruction and termi-
nates at .
To gain the opcode you should adopt the following way.
(gdb) x/bx main+16
0x8048304 : 0x31 <- OPCODE
(gdb)
0x8048305 : 0xc0 <- OPCODE
(gdb)
0x8048306 : 0x31 <- OPCODE
....
and so on till .
Now it's indispensable to put anything as this pattern"x31xc0x31..".
"x31xc0x31xdbx31xd2x50x68x2ex4f"
"x52x47x68x45x4cx4cx4fx68x52x4f"
"x53x49x68x57x57x57x2ex89xe1xb2"
"x10xb0x04xcdx80x31xc0x31xdbxb0"
"x01xcdx80"
To compile and execute the shellcode you can organize it in
a C program as the following scheme.
angelo@rosiello.org:~shellcode$ cat shellcode.c
#include
char shellcode[]=
"x31xc0x31xdbx31xd2x50x68x2ex4f"
"x52x47x68x45x4cx4cx4fx68x52x4f"
"x53x49x68x57x57x57x2ex89xe1xb2"
"x10xb0x04xcdx80x31xc0x31xdbxb0"
"x01xcdx80";
main()
{
void (*routine) ();
(long) routine = &shellcode;
printf("Size: %d bytesn", sizeof(shellcode));
routine();
}
angelo@rosiello.org:~shellcode$ gcc shellcode.c -o shellcode
angelo@rosiello.org:~shellcode$ ./shellcode
Size: 44 bytes.
WWW.ROSIELLO.ORG
Conclusions
Making a shellcode isn't difficult, but you will need patience and
practice to become skilled in doing it.
Shellcoding is very important mainly in the low level appli-
cations. For example, if you want to write an exploit you will
need to write shellcode to have the exploited program execute
the code you want.
Personally I think that anyone interested in security of com-
puter science should know these basic concepts and theories wich
support research of new bugs and exploiting ways.
http://www.rosiello.org
contact: angelo@rosiello.org
E-Mail Link
Your IP address will be sent with this e-mail