pwnies_please

Disguise these pwnies to get the flag!
http://pwnies-please.chal.uiuc.tf
author: Anusha Ghosh, Akshunna Vaishnav, ian5v, Vanilla

The Setup

When we click on the provided link, we are given links to two files, a PyTorch model and the site source code. There is a picture of a pony along with a button for uploading images. Let's take a look at some of the core logic:

Essentially, when we upload an image it is run through two models: a "robust" model and a "non-robust" model (the one we have). Our goal is to trick the non-robust model into thinking we submitted anything but a horse, while still passing it off as a horse to the robust model. In the three failing cases, the session variable yolo is incremented---otherwise, the session variable level is incremented. Near the end of the file there are these checks:

So we need to pass 50 horses in a row with no more than 3 errors.

Observations

We immediately noticed that this problem is very similar to the picoCTF 2018 problem "dog or frog". You can read our team's (very entertaining) writeup of that problem here. We'll adopt a very similar approach. Let's make a few reasonable assumptions:

High Level Overview

Assume we have a magic function which, given an image, outputs a "non-ponyness score" (NPS), which represents how close we are to tricking the non-robust model. Also assume we have some way of randomly mutating an existing image. With these two components, we can do naive stochastic optimization to get a solution as follows

1. Start at generation 0 with the provided image and calculate its NPS.
2. For each generation, randomly mutate the current best image and get its NPS
    a) if its NPS is higher than the NPS of the current best image, set the mutated image as the current best
    b) otherwise throw out the image and try again 
3. If the NPS is high enough (we tricked the non-robust model), output the solution

Solution

First we need a scoring function whose output represents how close we are to tricking the model. We can use the provided code to get a list of scores for each object class for a given image. In the code for the site, the highest score from this list is chosen and the corresponding class is returned. We can get a simple scoring mechanism by taking the difference between the score for horse and the second-highest score:

So if the returned score is positive, then the prediction will not be a horse. For our mutation function, we will simply add between -50% and 50% to each individual pixel randomly:

There is an additional limitation we have not discussed yet: the site hashes any images we input and checks that the difference from the original is no more than 5. We also don't want radically different images since they will have a much higher chance of failing the robust check. So we can just add a stipulation that if the hash difference exceeds 3, we throw out the current iteration and try again. We wrap a compare_hash method and the above score_image function in a Predictor class and end up with this:

Here is an example of what the output ends up looking like:

"original" "disguised"

After getting someone else to write a script to upload the processed images for me (unironically the hardest part of this challenge), we get the flag. You can find our full solve scripts here.