Preface

In today’s adventure I like to take you with me on my journey of reverse engineering another USB device. The EZP2010 is an USB programmer for flash memory as used by mainboard manufactures to store the BIOS, or embedded devices to store the firmware (or settings). When it comes to data recovery on hard drives, or similar storage devices, these flashers can also become handy.

EZP2010 USB-Highspeed programmer

This particular programmer can be bought on many Chinese store or Amazon. The prize range varies form $8 to $24. Unfortunately the software and drivers which come with this programmer are windows only.

The micro controller unit (MCU) used is the C8051F340. It was easily identified by opening the enclosure and looking at the markings of the MCU.

Start of an adventure

Realistic representation of me, being outside and enjoining it. But without the running and the joy. I also have shoes and longer trousers.

As already mentioned, this adventure will be about reversing the USB protocol of this programmer.

To start with analyzing the programmer a virtual machine would be an easy solution. And here I came across another problem this programmer has. It is not necessarily the fault of the hardware, but of the drivers and software provided to use it.

I worked on this project on different host machines and different versions of a virtual machine. The main difference was the architecture used. One VM was a 64bit Windows 10 and the other a 32bit Windows 10. Trying to run the software on a 32bit system failed, the software was unable to connect to the programmer, but I could not find any drawback in using a 64bit Windows.

Information Gathering

Me looking for information.

The most obvious things to do are lsusb and opening up the enclosure. Lsusb will output all connected USB devices connected on a Linux machine and get the description, as well as the vendor id and product id for device identification.

The output of lsusb gives information on how to talk to the device. With the product and vendor id the verbose output of lsusb can be limited to display only the device of interest. USB devices have separate input and output “endpoints”. On this particular device the endpoint to send our data to is 0x02 EP 2 OUT and 0x81 EP 1 IN where I can receive the responses and incoming data.

$ lsusb
[...]
Bus 003 Device 027: ID 10c4:f5a0 Cygnal Integrated Products, Inc.
[...]
$ lsusb -v -d 10c4:f5a0

Bus 003 Device 027: ID 10c4:f5a0 Cygnal Integrated Products, Inc.
Couldn't open device, some information will be missing
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass            0
  bDeviceSubClass         0
  bDeviceProtocol         0
  bMaxPacketSize0        64
  idVendor           0x10c4 Cygnal Integrated Products, Inc.
  idProduct          0xf5a0
  bcdDevice            0.00
  iManufacturer           0
  iProduct                0
  iSerial                 0
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength       0x0020
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0
    bmAttributes         0x80
      (Bus Powered)
    MaxPower              480mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           2
      bInterfaceClass         0
      bInterfaceSubClass      0
      bInterfaceProtocol      0
      iInterface              0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0040  1x 64 bytes
        bInterval               5
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x02  EP 2 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0040  1x 64 bytes
        bInterval               5

In the beginning of this post, I already mentioned the used MCU for this programmer. It will not be of much of significance for this project but information is always nice to have. So the C8051F340 by Silicon Labs can do different things, but the most important ones are Universal Serial Bus (USB) Controller and the Serial Peripheral Interface (SPI). The USB connection manages the communication between the host (PC) and the device. The SPI will manage the programming of the flash memory.

VM Setup

As previously told, I have used a 64bit Windows 10 virtual machine. For capturing the USB traffic I used Wireshark and USBPcap. The workflow is the same as it is with network packet capturing. Instead of selecting a network interface, a USB root hub is selected where the device of interest is connected.

Packet Capturing

At first I setup the packet capturing with Wireshark and select the USB root hub I want to monitor – and where I connect the programmer.

The next steps include a straight forward workflow, after Wireshark was configured and running. I started the programmer software. If Wireshark is configured correctly it reports the first packets captured during device initialization and opening the device by the software.

As there are not special device specific commands transferred during device and software initialization, these steps won’t be necessary to cover here.

Now, if I press any button in the software to interact with the programmer, a corresponding reaction should be captured in Wireshark. To keep track which action resulted in which packets, Wireshark supports comments for their pcapng fileformat. When I press a button in the software, I comment the first packet captured and the last packet incoming.

Firmware version read from programmer device.

To keep it simple, I will illustrate this on the command to request the firmware version of the device. The button is labeled “Ver” in the toolbar. When clicked, two messages are transmitted between the host and the device which are captured by Wireshark.

Request and response packet content.

In the screen shot above, I commented the outgoing packet from the host to the programmer and labeled it “Request firmware”. The answer – which came immediately after the outgoing message. I labeled the packet as “Request firmware: Answer”. When I look into this dump a few days later, I will still be able to figure out what the cause of these packets were.

The firmware version request command are two bytes (Leftover Capture Data in the Wireshark window) 0x17 0x00. The answer will be the version string EZP2010 V3.0 and a few more bytes.

This way I worked through every function provided by the software to capture all possible commands. The “Read” and “Prog” ROM commands will produce a few more packets depending on the size of the flash chip. The outgoing or incoming packets either be the contents read from the flash chip or the data going to be written to the flash chip.

While capturing the packets should be almost enough, a look at the decompiled code of the program can help finding possible pitfalls.

Reversing the Programmer Windows Software

This part was not really necessary, but helped in the process of packet analysis as well in matching the captured command packets with a function in the disassembled program.

Functions interacting with the programmer.

Each command supported by the programmer has a corresponding function. Matching all functions and command codes, I could see if I had found and captured all available commands the programmer hardware supports.

Refactoring the code generated by Ghidra, the result of reversing engineering a function could look like this. I am going through the code to request the firmware version.

Example of a reversed function to query the firmware version.

The code in the screenshot above sends two bytes to the programmer and than tries to read 23 (0x17) bytes as answer. The while loop at the bottom copies the contents of the bufferIn variable into the output argument variable version.

Just to clarify, the variables are pointers to a memory location at this point. the variable will store the address where the value (version string) is stored.

Writing a POC

Actual video of me coding a POC.

Python POC

I often try to build my first proof of concept, or to validate my ideas, with a small Python program.

The code is based on the tutorial example provided by PyUSB. I just added the other endpoint to read the responses from the programmer.

#!/usr/bin/env python
import usb.core
import usb.util


dev = usb.core.find(idVendor=0x10c4, idProduct=0xf5a0)
dev.set_configuration()

cfg = dev.get_active_configuration()
intf = cfg[(0, 0)]

epo = usb.util.find_descriptor(intf, custom_match=lambda e:
                               usb.util.endpoint_direction(e.bEndpointAddress)
                               == usb.util.ENDPOINT_OUT)
epi = usb.util.find_descriptor(intf, custom_match=lambda e:
                               usb.util.endpoint_direction(e.bEndpointAddress)
                               == usb.util.ENDPOINT_IN)

epo.write([0x17, 0x00])
s_buf = ''.join([chr(c) for c in epi.read(256)])
print(s_buf)

Executing the test program will give something like “EZP2010 V3.0” and a view more additional bytes. Which should be expected, as the receiving buffer is defined with a size of 0x17 (23 bytes).

At the moment I have not looked into the remaining bytes and their purpose. The original software does not display any more information, but the version string.

C POC

Doing it in C requires a bit more work. Below is the whole POC program source I have used to verify my work. The main part of the program is the USB device tree traversal to open and configure our target device. There is an easier to use function to do this with libusb.

But the function libusb_open_device_with_vid_pid() gave me a bit of a trouble, as I was not able to configure the device properly to write and read data.

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

#include <libusb-1.0/libusb.h>

#define EZP2010_VID 0x10c4
#define EZP2010_PID 0xf5a0

static libusb_device_handle *devh = NULL;

libusb_device_handle *open_ezp2010()
{
    ssize_t devc;
    libusb_device **dev_list;
    static libusb_device *dev = NULL;
    struct libusb_device_descriptor dev_desc;
    struct libusb_config_descriptor *dev_cfg = NULL;
    const struct libusb_interface *intf = NULL;
    const struct libusb_interface_descriptor *intf_desc = NULL;

    int r = 0;

    devc = libusb_get_device_list(NULL, &dev_list);
    if (devc < 1)
        return NULL;

    for (int i = 0; i < devc; i++) {
        dev = dev_list[i];
        if (libusb_get_device_descriptor(dev, &dev_desc))
            continue;

        if ((dev_desc.idVendor != EZP2010_VID || dev_desc.idProduct != EZP2010_PID))
            continue;

        r = libusb_open(dev, &devh);
        if (r < 0) {
            perror("libusb_open");
            return NULL;
        }

        for (int j = 0; j < dev_desc.bNumConfigurations; j++) {
            if (libusb_get_config_descriptor(dev, j, &dev_cfg))
                continue;

            for (int k = 0; k < dev_cfg->bNumInterfaces; k++) {
                intf = &dev_cfg->interface[k];
                for (int l = 0; l < intf->num_altsetting; l++) {
                    intf_desc = &intf->altsetting[l];
                    if (libusb_kernel_driver_active(devh, intf_desc->bInterfaceNumber))
                        libusb_detach_kernel_driver(devh, intf_desc->bInterfaceNumber);

                    libusb_set_configuration(devh, dev_cfg->bConfigurationValue);
                    libusb_claim_interface(devh, intf_desc->bInterfaceNumber);

                    int e = 0;
                    while (libusb_claim_interface(devh, intf_desc->bInterfaceNumber) \
                            && (e < 10)) {
                        sleep(1);
                        e++;
                    }
                }
            }

            libusb_free_config_descriptor(dev_cfg);
        }
        return devh;
    }

    devh = NULL;
    return NULL;
}

int main(int argc, char *argv[])
{
    int r = 0;
    int transferred = 0;
    unsigned char buf[256];

    r = libusb_init(NULL);
    if (r < 0)
        return 1;

    open_ezp2010();

    buf[0] = '\x17';
    buf[1] = '\x0';
    r = libusb_bulk_transfer(devh, 0x2, buf, 2, &transferred, 500);
    if (r < 0) {
        perror("libusb_claim_interface");
        fprintf(stderr, "Error: %s\n", libusb_strerror(r));
    } 
    printf("Bytes sent: %d\n", transferred);
    
    r = libusb_bulk_transfer(devh, 0x81, buf, 0x20, &transferred, 500);
    if (r < 0) {
        perror("libusb_claim_interface");
        fprintf(stderr, "Error: %s\n", libusb_strerror(r));
    }
    printf("Bytes received: %d\n", transferred);
    printf("Packet: %s\n", buf);

    libusb_release_interface(devh, 0);
    libusb_reset_device(devh);
    libusb_close(devh);

    libusb_exit(NULL);

    return 0;
}

Future Work

My idea is to integrate it into existing programmer software, or to implement a small standalone tool for this programmer.

There is also still some research to do on the support for different flash chips. The original software provides a database with known and supported flash chips.

Looking at the code and figuring out the database structure might be the next step in further reverse engineering and developing further support for this programmer.

TL;DR; Short the ERASE pin with VDDCORE, if ERASE == PIN_55 && VDDCORE == PIN_54

According to complains in the internet, users report bricking their Proxmark3 EASY, when they try to flash the latest firmware with the ‘flasher’ software tool.

Sometimes flashing process of firmware can go wrong, but it can often be recovered with JTAG programmers, or similar programmers.

I will not about setting up the environment to build, and flash the firmware, but I will tell you what you might be missing out and why it might be not working.

If you do not know where to start with flashing your Proxmark3, than have a look here, here, here or here. The first link describes the standard way of upgrading your firmware, which can fail, if you are unlucky. The other three links describe ways to recover your Proxmark3.

Why can upgrading the firmware fail? There are quite some reasons it can go wrong.

  • bad firmware image
  • wrong parameters
  • power loss
  • chip security measurements

With the Proxmark3 EASY it seems, that some devices have the Security Bit of the AT91SAM7S512 processor set. The datasheet (see page 113, paragraph 19.2.4.5) says: “The goal of the security bit is to prevent external access to the internal bus system. […] JTAG, Fast Flash Programming and Flash Serial Test Interface features are disabled. Once set,this bit can be reset only by an external hardware ERASE request to the chip. […]”.

To unlock the chip again we can find interesting information in this document on page 20 in paragraph 2.5. Which describes the unlocking the chip by applying Vcc to the ERASE pin. Applying voltage to the pin will wipe the security bit, but also all contents of the flash!

Unfortunately the ERASE pin, which is pin number 55 on the AT91SAM7S512, has no connection. The first idea was to solder a jumper wire to Vcc. On second guess and looking at the datasheets again, reveals pin 54 is VDDCORE, which applies 1.65V to 1.95V (1.8V typical) to the CPU for operation.

To erase and reset the Proxmark, I shortened pin 54 and pin 55 with the tip of a multimeter, applied power via USB to the Proxmark3. After >300ms the flash and security bit is erased and the device can be powered off.

The JTAG interface is now enabled again. Next I flashed the bootloader, and the fullimage using the Bus Pirate v4 using as described in one of the first links mentioned above.

#hackinghackertools

#MemeFreeEdition

The Challenge

You are provided a zip file containing two files.

  • program
  • vm.py

The file program contains instructions encoded as emojis for a virtual machine called vm.py. At the bottom of vm.py I found a list, or lets call it a translation table, with emojis and their function name counter part.

OPERATIONS = {
        '🍑': add,
        '🀑': clone,
        'πŸ“': divide,
        '😲': if_zero,
        'πŸ˜„': if_not_zero,
        'πŸ€': jump_to,
        'πŸš›': load,
        'πŸ“¬': modulo,
        '⭐': multiply,
        '🍿': pop,
        'πŸ“€': pop_out,
        '🎀': print_top,
        'πŸ“₯': push,
        'πŸ”ͺ': sub,
        'πŸŒ“': xor,
        'β›°': jump_top,
        'βŒ›': exit
}

To execute program with vm.py I ran:

$ python vm.py program
http://^C

The program started printing letters to the terminal, but became horribly slow, until I decided to terminate the execution. So the challenge seemed to be, to convince program to give me its secret. With the first characters looking like an URL, the flag might be hidden within the URL or on the website the URL pointing to.

In which way I had to convince the program to reveal its secrets to me, I had yet to decide. One way could be to optimize the VM code, the other might be optimizing the code of program.

The Solution(s)

Solving the Challenge: Trail and Error

The python script vm.py defines a class named VM with the following constructor:

def init(self, rom):
self.rom = rom
self.accumulator1 = 0
self.accumulator2 = 0
self.instruction_pointer = 1
self.stack = []

The VM stores the program code in the variable self.rom. It is filled on creation by reading the source file (program) given as argument. The source is stored as a list (returned by .read().split() of a file object). Two accumulator registers, self.accumulator1 and self.accumulator2, are initialized with ‘0’. A third register is the self.instruction_pointer, it will the the VM which operation in self.rom will be executed next. It starts at ‘1’ and not at ‘0’. The list containing the rom is initialized with an empty string (['']) and then extended by the program code. The last variable is self.stack which is empty list.

Lists in Python can be used as a stack like structure. With list.append() (push) and list.pop() (pop) the behavior of these data structures is identical.

I decided to translate the emoji code of program into a more human readable version as a beginning. A dictionary of all operations is at the bottom of the code.

The first version of my script did not translate all emojis in the code. I started digging around further in vm.py to find some more emojis I had to consider. As a result a almost completely emoji free version of the program could be generated. The “final” version of my script emoji2mnemonic.py had the following dictionary to translate program.

OPERATIONS = {
    '🍑': 'add',
    '🀑': 'clone',
    'πŸ“': 'divide',
    '😲': 'if_zero',
    'πŸ˜„': 'if_not_zero',
    'πŸ€': 'jump_to',
    'πŸš›': 'load',
    'πŸ“¬': 'modulo',
    '⭐': 'multiply',
    '🍿': 'pop',
    'πŸ“€': 'pop_out',
    '🎀': 'print_top',
    'πŸ“₯': 'push',
    'πŸ”ͺ': 'sub',
    'πŸŒ“': 'xor',
    'β›°': 'jump_top',
    'βŒ›': 'exit',
    'βœ‹': ';',
    'πŸ₯‡': '[acc=1]',
    'πŸ₯ˆ': '[acc=2]',
    'πŸ’°': '@',
    '😐': 'end_if',
    'πŸ–‹': 'func_',
    'πŸ€': 'BRK<',
    'β›°': 'BRK!',

    'πŸ’ πŸ”ΆπŸŽŒπŸš©πŸ': 'Label01',
    'πŸ’ πŸπŸŽŒπŸ”ΆπŸš©': 'Label02',
    'πŸš©πŸ’ πŸŽŒπŸ”ΆπŸ': 'Label03',
    'πŸπŸš©πŸŽŒπŸ’ πŸ”Ά': 'Label04',
    'πŸ”ΆπŸŽŒπŸš©πŸ’ πŸ': 'Label05',
    'πŸŽŒπŸπŸš©πŸ”ΆπŸ’ ': 'Label06',
    'πŸ”ΆπŸš©πŸ’ πŸπŸŽŒ': 'Label07',
    'πŸš©πŸ”ΆπŸπŸŽŒπŸ’ ': 'Label08',
    'πŸŽŒπŸš©πŸ’ πŸ”ΆπŸ': 'Label09',
    'πŸŽŒπŸπŸ’ πŸ”ΆπŸš©': 'Label10',
    'πŸπŸ’ πŸ”ΆπŸš©πŸŽŒ': 'Label11',
    'πŸ’ πŸŽŒπŸπŸš©πŸ”Ά': 'Label12',

    '0️⃣ ': '0',
    '1️⃣ ': '1',
    '2️⃣ ': '2',
    '3️⃣ ': '3',
    '4️⃣ ': '4',
    '5️⃣ ': '5',
    '6️⃣ ': '6',
    '7️⃣ ': '7',
    '8️⃣ ': '8',
    '9️⃣ ': '9',
}

The language for vm.py also knows labels for the jump_to operations and a symbol to distinguish between the accumulators to use.

A look at the translated source was a little bit more helpful. The example below shows a cleaned up, but shortened version of my interpretation of the syntax after translation.

load [acc=1] 0;
push [acc=1]
load [acc=1] 17488;
push [acc=1]
load [acc=1] 16758;
push [acc=1]
load [acc=1] 16599;
push [acc=1]
load [acc=1] 16285;
...
push [acc=1]
load [acc=2] 1;

func Label01
pop [acc=1]
push [acc=2]
push [acc=1]
load [acc=1] 389;
push [acc=1]
push [acc=2]
jump_to $[ Label04
xor
print_top
load [acc=1] 1;
push [acc=1]
add
pop [acc=2]
if_not_zero
jump_to $[ Label01
end_if
...

To find out what each operation does, I looked at the source code of vm.py. The load instruction reads as follows: “Load accumulator1 with the immediate value 0”. With push the value stored in the accumulator1 is pushed to the stack. Other operations take the top two items from the stack to process them (addition, subtraction, multiplication, division, modulo, xor, …).

By only looking at the code I had no clue what was going on, but with all the jumps, I though about visualizing the program flow. I used my translated program and formatted the code to be a compatible Graphviz dot file. The resulting graph was again a little bit more of an help.

Call graph of program

The graph shows a bunch of instructions to fill the stack of the VM. The right side has some more “complicated” instructions with modulo, division, xor, ….

When I see xor instructions, I often think of encryption and with multiplications and modulo instructions right next to it this though hardens.

But starring at the graph and the translated code did not help.

Solving the Challenge: Finally

It took me a while of starring until I realized, that I should try something else.

My next approach to solve this challenge was to write a trace log of the program execution step by step.

In each instruction function in vm.py I added some additional code to store the current status of the registers self.instruction_pointer, self.accumulator1, self.accumulator2 and translated each operation into its human readable counter part. I also added (if possible) the parameters and the result of calculations.

00000002 [acc1=00000000][acc2=00000000]: load("acc1", 0)
00000006 [acc1=00000000][acc2=00000000]: push(acc1) = 0
00000008 [acc1=00017488][acc2=00000000]: load("acc1", 17488)
00000016 [acc1=00017488][acc2=00000000]: push(acc1) = 17488
00000018 [acc1=00016758][acc2=00000000]: load("acc1", 16758)
00000026 [acc1=00016758][acc2=00000000]: push(acc1) = 16758
00000028 [acc1=00016599][acc2=00000000]: load("acc1", 16599)
00000036 [acc1=00016599][acc2=00000000]: push(acc1) = 16599
00000038 [acc1=00016285][acc2=00000000]: load("acc1", 16285)
...
00000386 [acc1=00000389][acc2=00000001]: push(acc2) = 1
00000388 [acc1=00000389][acc2=00000001]: jump_to(Label04) @1040
00001041 [acc1=00000002][acc2=00000001]: load("acc1", 2)
00001045 [acc1=00000002][acc2=00000001]: push(acc1) = 2
00001048 [acc1=00000002][acc2=00000001]: jump_to(Label08) @1098
00001099 [acc1=00000002][acc2=00000001]: clone() = 2
00001100 [acc1=00000002][acc2=00000001]: load("acc1", 2)
00001104 [acc1=00000002][acc2=00000001]: push(acc1) = 2
00001107 [acc1=00000002][acc2=00000001]: sub(b=2, a=2) = 0
00001108 [acc1=00000002][acc2=00000001]: if_zero() = 0
00001109 [acc1=00000002][acc2=00000001]: pop_out()  // 0
00001110 [acc1=00000001][acc2=00000001]: load("acc1", 1)
00001114 [acc1=00000001][acc2=00000001]: push(acc1) = 1
00001115 [acc1=00000001][acc2=00000001]: end_if = πŸ€
00001115 [acc1=00000001][acc2=00000001]: …cont… if_zero()
00001116 [acc1=00000001][acc2=00000001]: jump_to(Label05) @1050
00001051 [acc1=00000001][acc2=00000001]: if_zero() = 1
00001055 [acc1=00000001][acc2=00000001]: end_if
00001057 [acc1=00000001][acc2=00000001]: pop_out()  // 1
...

There are still a few emojis left, but I did not care about them. With the trace log I could observe the program being executed and calculating the URL, or at least see the trace part where the print_top of the characters happens.

00001076 [acc1=00000002][acc2=00000001]: if_zero() = 0
00001077 [acc1=00000002][acc2=00000001]: pop_out()  // 0
00001078 [acc1=00000002][acc2=00000389]: pop(acc2) = 389
00001080 [acc1=00000002][acc2=00000389]: push(acc1) = 2
00001082 [acc1=00000002][acc2=00000389]: push(acc2) = 389
00001083 [acc1=00000002][acc2=00000389]: end_if = β›°
00001083 [acc1=00000002][acc2=00000389]: …cont… if_zero()
00001084 [acc1=00000002][acc2=00000389]: jump_top() @389
00000390 [acc1=00000002][acc2=00000389]: xor(b=106, a=2) = 104
00000391 [acc1=00000002][acc2=00000389]: print_top("h")
00000392 [acc1=00000001][acc2=00000389]: load("acc1", 1)
00000396 [acc1=00000001][acc2=00000389]: push(acc1) = 1
00000398 [acc1=00000001][acc2=00000389]: add(a=1, b=1) = 2
00000399 [acc1=00000001][acc2=00000002]: pop(acc2) = 2
00000401 [acc1=00000001][acc2=00000002]: if_not_zero() = 119
00000401 [acc1=00000001][acc2=00000002]: end_if = πŸ€
00000401 [acc1=00000001][acc2=00000002]: …cont… if_zero()
00000402 [acc1=00000001][acc2=00000002]: jump_to(Label01) @371

The print_top instruction takes the item from the top of the stack and prints it out. The stack only stores numbers, so the number is converted into a character before output. Right before the print_top is a xor.

The xor takes the two top items from the stack to get the result of ‘104’ which is the letter “h”. The number ‘106’ was pushed to the stack in the beginning, but where does the ‘2’ come from? This took be a while to find out. But only after letting the program run bit longer.

$ grep -A1 xor trace.log 
00000390 [acc1=00000002][acc2=00000389]: xor(b=106, a=2) = 104
00000391 [acc1=00000002][acc2=00000389]: print_top("h")
00000390 [acc1=00000003][acc2=00000389]: xor(b=119, a=3) = 116
00000391 [acc1=00000003][acc2=00000389]: print_top("t")
00000390 [acc1=00000005][acc2=00000389]: xor(b=113, a=5) = 116
00000391 [acc1=00000005][acc2=00000389]: print_top("t")
00000390 [acc1=00000007][acc2=00000389]: xor(b=119, a=7) = 112
00000391 [acc1=00000007][acc2=00000389]: print_top("p")
00000390 [acc1=00000011][acc2=00000389]: xor(b=49, a=11) = 58
00000391 [acc1=00000011][acc2=00000389]: print_top(":")
00000390 [acc1=00000101][acc2=00000389]: xor(b=74, a=101) = 47
00000391 [acc1=00000101][acc2=00000389]: print_top("/")
00000390 [acc1=00000131][acc2=00000389]: xor(b=172, a=131) = 47
00000391 [acc1=00000131][acc2=00000389]: print_top("/")
00000390 [acc1=00000151][acc2=00000389]: xor(b=242, a=151) = 101
00000391 [acc1=00000151][acc2=00000389]: print_top("e")
00000390 [acc1=00000181][acc2=00000389]: xor(b=216, a=181) = 109
00000391 [acc1=00000181][acc2=00000389]: print_top("m")
00000390 [acc1=00000191][acc2=00000389]: xor(b=208, a=191) = 111
00000391 [acc1=00000191][acc2=00000389]: print_top("o")
00000390 [acc1=00000313][acc2=00000389]: xor(b=339, a=313) = 106
00000391 [acc1=00000313][acc2=00000389]: print_top("j")
00000390 [acc1=00000353][acc2=00000389]: xor(b=264, a=353) = 105
00000391 [acc1=00000353][acc2=00000389]: print_top("i")
00000390 [acc1=00000373][acc2=00000389]: xor(b=344, a=373) = 45
00000391 [acc1=00000373][acc2=00000389]: print_top("-")
00000390 [acc1=00000383][acc2=00000389]: xor(b=267, a=383) = 116
00000391 [acc1=00000383][acc2=00000389]: print_top("t")

Again after a long round of starring at the output, it hit me. The argument a is always a prime number, but these are special primes. All primes are palindromes. The argument b is the value from the stack.

In my next step I wrote a script to calculate the palindrome primes and xor them with the values from the stack. I also discarded the idea to optimize the one of the components to solve this challenge.

As my final solution does not use the next code blocks anymore I will only highlight some ideas and problems I encountered.

I found a algorithm to calculate palindrome prime numbers on Stackoverflow and modified it to my needs.

def gen_pal_prime(a, b):
    for i in range(a, b):
        y = True
        if str(i) == str(i)[::-1]:
            if(i > 2):
                for a in range(2, i):
                    if i % a == 0:
                        y = False
                        break
                if y:
                    yield i

palindromes = [_ for _ in gen_pal_prime(0, 999999)]

The problem with this was, depending on the second parameter it takes a long time to calculate all primes and the other problem was, it did not work. Or it stopped working after the 39th character.

The algorithm failed to calculate the correct xor value for ‘93766’. As the string read until it broke “http://emoji-t0anaxnr3nacpt4na.web.ctfco” my guess was, that the next character has to be a ‘m’ (ctfcompetition).

Brute forcing the prime number for ‘93766’ returned ‘93739’. What left me puzzled again. The previous prime number was ‘17471’. So I split up the whole blocks of pushing numbers to the stack into three blocks.

+-----------+-----------+-----------+
| Block 1   | Block 2   | Block3    |
+-----------+-----------+-----------+
| 17488     | 98426     | 101141058 |
| 16758     | 97850     | 101060206 |
| 16599     | 97604     | 101030055 |
| 16285     | 97280     | 100998966 |
| 16094     | 96815     | 100887990 |
| 15505     | 96443     | 100767085 |
| 15417     | 96354     | 100707036 |
| 14832     | 95934     | 100656111 |
| 14450     | 94865     | 100404094 |
| 13893     | 94952     | 100160922 |
| 13926     | 94669     | 100131019 |
| 13437     | 94440     | 100111100 |
| 12833     | 93969     | 100059926 |
| 12741     | 93766     | 100049982 |
| 12533     | 99        | 100030045 |
| 11504     |           | 9989997   |
| 11342     |           | 9981858   |
| 10503     |           | 9980815   |
| 10550     |           | 9978842   |
| 10319     |           | 9965794   |
| 975       |           | 9957564   |
| 1007      |           | 9938304   |
| 892       |           | 9935427   |
| 893       |           | 9932289   |
| 660       |           | 9931494   |
| 743       |           | 9927388   |
| 267       |           | 9926376   |
| 344       |           | 9923213   |
| 264       |           | 9921394   |
| 339       |           | 9919154   |
| 208       |           | 9918082   |
| 216       |           | 9916239   |
| 242       |           | 765       |
| 172       |           |           |
| 74        |           |           |
| 49        |           |           |
| 119       |           |           |
| 113       |           |           |
| 119       |           |           |
| 106       |           |           |
| 1         |           |           |
+-----------+-----------+-----------+

The purpose of the last number of each block opened up to me, when I finally found the correct prime for the third block. To decode the correct URL I ignored these numbers (‘1′, ’99’, ‘765’) and started building special cases into my decoding function.

The code below almost worked fine. Each block would be separately being decoded, after a certain number of iterations (i). I tried to make some sense of these numbers at the bottom of each block.

print('Palindrome Primes #1')
palindromes1 = [_ for _ in gen_pal_prime(1, 18000)]
palindromes1.insert(0, 2)  # manually add the number 2.
print('Palindrome Primes #2')
palindromes2 = [_ for _ in gen_pal_prime(93766 - 99, 999999)]
print('Palindrome Primes #3') 
palindromes3 = [_ for _ in gen_pal_prime(9916239 - 765, 19916239)]

palindromes = palindromes1
stack.reverse()
for i, val in enumerate(stack):
    if i == 40:
        mode = 'stack2' 
        palindromes = palindromes2
    if mode == 'stack2':
        i -= 40
        if i == 14:
            mode = 'stack3'
            palindromes = palindromes3
    if mode == 'stack3':
        i -= 13
    c = val ^ palindromes[i]
    print(f'{i:<8d}| {val:8d} ^ {palindromes[i]:<8} = {c:3d}', chr(c))

While my program ran, I encountered yet again a problem but this time during decoding the third block of numbers. The calculation of my palindrome primes took quite some time and did not result in printable/human readable characters.

Talking to a colleague of mine, he came up with the idea to look up lists of precalculated prime numbers and I found a website which provides lists of the first 2 billion prime numbers as downloadable files.

The site http://www.primos.mat.br/2T_en.html provides individual compressed text files, with 10 million primes each file. For this challenge the prime numbers from 2 to 179424673 should suffice.

After decompressing the file I had to convert the file from columns separated by tabs to a easier to read format for my Python programs. To remove the columns and only have one prime per line I used the program tr.

$ tr '\t' '\n' < 2T_part1.txt > primes.txt

I wrote a function to read this file to only return the palindrome primes as a list. The str(_.rstrip()) == str(_.rstrip())[::-1] part of the condition will check if the number is a palindrome, by comparing the number as string against it reversed counterpart.

def load_primes(filename):
     with open(filename, 'rt') as handle:
         return [int(_) for _ in handle.readlines()
                 if str(_.rstrip()) == str(_.rstrip())[::-1]]

With this list of palindrome primes I wrote a short brute force solution to find the correct prime number to decode the third block, or to be more precise, where in this list of primes can I find it.

As a constraint I iterated through all primes until the prime p from primes is greater than the number (‘9916239’) I am looking for. Then I know, that my prime had to be within the first 765 primes.

>>> primes = solver.load_primes('primes.txt')
>>> for i, p in enumerate(primes):
...     if p > 9916239:
...         print(i)
...         break
...
765         

At this point the 3 numbers I could not figure out until now, now made sense. These are the n-th number prime from a list of consecutive palindrome primes. The first block begins at the the 1st prime from my list of primes, the second block begins at the 99th prime number and the third block at the 765th prime number from my list.

And now it also makes sense, why program becomes slower with each iteration. The code block (see the call graph above) calculates palindrome prime numbers, which becomes more and more of a problem regarding performance when it hits greater numbers.

How does my final solution look like? I wrote a script called solver.py, put all numbers from all three blocks into a single list and wrote my decryption function.

#! /usr/bin/env python
#-*- coding: utf-8 -*-

stack = [101141058, 101060206, 101030055, 100998966, 100887990,
         100767085, 100707036, 100656111, 100404094, 100160922,
         100131019, 100111100, 100059926, 100049982, 100030045,
         9989997, 9981858, 9980815, 9978842, 9965794, 9957564,
         9938304, 9935427, 9932289, 9931494, 9927388, 9926376,
         9923213, 9921394, 9919154, 9918082, 9916239,  # Block 3
         98426, 97850, 97604, 97280, 96815, 96443, 96354, 95934,
         94865, 94952, 94669, 94440, 93969, 93766,  # Block 2
         17488, 16758, 16599, 16285, 16094, 15505, 15417, 14832,
         14450, 13893, 13926, 13437, 12833, 12741, 12533, 11504,
         11342, 10503, 10550, 10319, 975, 1007, 892, 893, 660,
         743, 267, 344, 264, 339, 208, 216, 242, 172, 74, 49, 119,
         113, 119, 106]  # Block 1


def load_primes(filename):
    with open(filename, 'rt') as handle:
        return [int(_) for _ in handle.readlines()
                if str(_.rstrip()) == str(_.rstrip())[::-1]]


if __name__ == '__main__':
    palindromes = load_primes('primes.txt')
    stack.reverse()
    offset = 0
    for i, val in enumerate(stack):
        if i == 40:
            offset = 98 - 40
        if i == 54:
            offset = 764 - 54
        c = val ^ palindromes[i + offset]
        print(chr(c), end='')
    print()

Running this script will produce the URL “http://emoji-t0anaxnr3nacpt4na.web.ctfcompetition.com/humans_and_cauliflowers_network/“. Visiting this URL gives a choice for three sub pages of different people. The flag is stored in the personal page of Amber within a picture. The flag reads “CTF{Peace_from_Cauli!}“.

Lessons learned

Doing CTFs sometimes require you to think out of the box and/or ignore the first possible solutions which might seem to be the obvious ones, but might lead you nowhere or are just there to keep you side tracked.

This challenge first got me thinking about modifying and changing the behavior of the provided files program and vm.py. While working through the challenge, which kept me busy for quite some time, I found a different solution.

The previous tries to solve this challenge were not for nothing, especially the addition of my trace log addition, which finally helped me understanding the problem and getting my on the right track to the solution.