CTF:GitS-2013/Writeups/RTFM

Ghost in the Shellcode: RTFM Writeup

RTFM which usually stands for "Read the fucking manual" was used here for "Rich text for Morons". The challenge supplied an executable x86 ELF file and a data file.

First thing was to figure out that the executable was used to generate the data file. The data file included the header "RTFM", which we also found to be written at an allocated memory in the disassembly. Inside this function we also found the code that actually converts the data.

We did some testing with different input data, and figured out the the output file size can be less or more than the input data. For a one byte input we got either one or 1.5 bytes output. A repeating byte in the input from the readable charset was often compressed to less than input size. A repeating 0x01 was always encoded to repeating 1.5 bytes. On 0x00 in the input the executable stopped its execution. Similiar things happened to bytes above 0x80. Actually here is the point to see that more common/used character(sequences) are encoded to less, which hints to a huffman compression.

Anyway we thought it is easier to use a different approach instead of reversing the huffman code including the tables. So we wrote a small application using bruteforce. We put a new byte to the input and then compare if it matches more bits on the output than any other input byte. We use that in a recursive functions to make sure, that it will also follow repeating bytes that get mapped to the same length as the input string one byte before.

Our none-error-tolerant code in not-best-coding-style:

#include <stdio.h>
#include <unistd.h>

const char *inputfile = "rtfm_input";
const char *outputfile = "rtfm_input.rtfm";
const char *originalfile = "rtfm_original.bak";

unsigned char org[0x100000];

void load_org ( )
{
	FILE *tmp = fopen ( originalfile, "rb" );
	fread ( org, 1, sizeof ( org ), tmp );
	fclose ( tmp );
}

unsigned int compare ( unsigned char* data, unsigned int data_len )
{
	unsigned char test[0x100000];
	unsigned int test_length = 0;
	unsigned int i;
	unsigned char b1, b2;

	pid_t bla;

	FILE *tmp = fopen ( inputfile, "wb" );
	fwrite ( data, 1, data_len, tmp );
	fclose ( tmp );

	/* save some overhead with exec */
	//system ( "./rtfm-67cc5dcb69df4244bcf2d573481e6d6a06b861a3 rtfm_input > /dev/null" );
	bla = fork();
	if ( bla == 0 )
	{
		int fd = open("/dev/null", 0 );
		dup2(fd, 1);
		close ( fd );
		execl ( "./rtfm-67cc5dcb69df4244bcf2d573481e6d6a06b861a3", "rtfm_input", "rtfm_input", NULL );
	}
	else
		wait ( &bla );

	tmp = fopen ( outputfile, "rb" );
	test_length = fread ( test, 1, sizeof ( test ), tmp );
	fclose ( tmp );

	/* now compare bitwise */
	for ( i = 4*8; i < test_length*8; i++ )
	{
		b1 = test[i/8] >> (7 - (i%8));
		b2 = org[i/8] >> (7 - (i%8));

		if ( b1 != b2 )
		{
//			printf ( "equal bits(%u) bytes(%u)\n", i, i/8 );
			return i-4*8;
		}
	}

	printf ( "all same. found: %s\n", data );
	exit ( 1 );

	return 0xFFFFFFFF;

}

void checknextbyte ( int data_len, unsigned char *data, unsigned int maxlen, unsigned int depth  )
{
	unsigned char c, d;

	unsigned int tmp = 0;

//	char set[] = "123abcest";

//	if ( depth > 10 )
//		return;

	//for ( d = 0 /*0x20*/; d < /*0x80*/ sizeof ( set )-1; d++ )
	for ( c = 1; c < 0x80;  c++ )
	{
		//c = set[d];
		data[data_len] = c;
		tmp = compare ( data, data_len+1 );

		if ( tmp > maxlen )
		{
			/* if u don't do that, it will converge to the output, but probably never find it (or after centuries) */
			maxlen = tmp;

			//if ( tmp > maxlen+3 && depth > 0 )
			//	depth--;

			data[data_len+1] = 0;

			printf ( "char %c (%d) fits, len: %d: %s\n", c, c, tmp, data );
			checknextbyte (  data_len+1, data, maxlen+1, depth+1  );
		}
	}
}

int main ()
{
	unsigned char buf[0x200000];

	printf ( "censored :)\n" );

	load_org();

	checknextbyte ( 0,  buf, 0, 0 );

	return 0;
}

During bruteforcing you see the current output. Excitement! The overhead of the "Rich Text Format" was more annoying than ever before ;-) The key "Did I have Vari-good-code?" is again a nice wordplay.

The process of decompressing using bruteforce took more than one hour. Most of the CPU power was used for system calls, since we did not copypaste the encryption routine.