Disguise these pwnies to get the flag!
http://pwnies-please.chal.uiuc.tf
author: Anusha Ghosh, Akshunna Vaishnav, ian5v, Vanilla
When we click on the provided link, we are given links to two files, a PyTorch model and the site source code. There is a picture of a pony along with a button for uploading images. Let's take a look at some of the core logic:
nonrobust = get_prediction(image_bytes=img_bytes, model = model_nonrobust, curr_image = session['img'])
robust = get_prediction(image_bytes=img_bytes, model = model_robust, curr_image = session['img'])
# robust model is the "ground truth", non-robust is the "bouncer"
# cases:
# bouncer does not want to let in horses, you want to let them in anyway
# robust says horse, non-robust says horse: you have been detected
# robust says not horse, non-robust says horse: you fail extra hard
# robust says horse, non-robust says not horse: flag
# robust says not horse, non-robust says not horse: they were let in but you didn't achieve the goal
Essentially, when we upload an image it is run through two models: a "robust" model and a "non-robust" model (the one we have). Our goal is to trick the non-robust model into thinking we submitted anything but a horse, while still passing it off as a horse to the robust model. In the three failing cases, the session variable yolo
is incremented---otherwise, the session variable level
is incremented. Near the end of the file there are these checks:
if session['yolo'] > 3:
session['yolo'] = 0
session['level'] = 0
response = "bouncer smacks you and you pass out, start over :)"
# MIN_LEVEL is hardcoded to 50
if session['level'] >= MIN_LEVEL:
response = FLAG
So we need to pass 50 horses in a row with no more than 3 errors.
We immediately noticed that this problem is very similar to the picoCTF 2018 problem "dog or frog". You can read our team's (very entertaining) writeup of that problem here. We'll adopt a very similar approach. Let's make a few reasonable assumptions:
Assume we have a magic function which, given an image, outputs a "non-ponyness score" (NPS), which represents how close we are to tricking the non-robust model. Also assume we have some way of randomly mutating an existing image. With these two components, we can do naive stochastic optimization to get a solution as follows
1. Start at generation 0 with the provided image and calculate its NPS.
2. For each generation, randomly mutate the current best image and get its NPS
a) if its NPS is higher than the NPS of the current best image, set the mutated image as the current best
b) otherwise throw out the image and try again
3. If the NPS is high enough (we tricked the non-robust model), output the solution
First we need a scoring function whose output represents how close we are to tricking the model. We can use the provided code to get a list of scores for each object class for a given image. In the code for the site, the highest score from this list is chosen and the corresponding class is returned. We can get a simple scoring mechanism by taking the difference between the score for horse and the second-highest score:
def score_image(self, image):
inputs = transform_image(image)
if torch.cuda.is_available():
inputs = inputs.cuda()
outputs = model_nonrobust(inputs)
# a list of scores for each class
class_scores = outputs.cpu().detach().numpy()[0]
# get difference between highest non-horse score and horse score
return max(np.delete(class_scores, 7)) - class_scores[7]
So if the returned score is positive, then the prediction will not be a horse. For our mutation function, we will simply add between -50% and 50% to each individual pixel randomly:
def mutate(px):
return (1 + (random.random() - 0.5) * LEARNING_RATE) * px
def evolve_image(im):
# applies mutate to whole image and restricts pixels to RGB values between 0 and 255
return np.clip(np.apply_along_axis(mutate, 2, im), 0, 255)
There is an additional limitation we have not discussed yet: the site hashes any images we input and checks that the difference from the original is no more than 5. We also don't want radically different images since they will have a much higher chance of failing the robust check. So we can just add a stipulation that if the hash difference exceeds 3, we throw out the current iteration and try again. We wrap a compare_hash
method and the above score_image
function in a Predictor
class and end up with this:
def disguise_pony(pony):
predictor = Predictor(pony)
current = np.asarray(pony)
score = predictor.score_image(pony)
generation = 0
while score < 0:
# randomly mutate the image
evolved = evolve_image(current).astype(np.uint8)
generation += 1
# convert from np array to PIL object
evolved_img = Image.fromarray(evolved)
# check that the difference in hashes is not too big
hash_diff = predictor.compare_hash(evolved_img)
if hash_diff > 3:
continue
# if we did better set the current prediction to this one and update the score
new_score = predictor.score_image(evolved_img)
if new_score > score:
current = evolved
score = new_score
print(f"generation {generation}: hashdiff = {hash_diff}, score = {score}")
return Image.fromarray(current)
Here is an example of what the output ends up looking like:
After getting someone else to write a script to upload the processed images for me (unironically the hardest part of this challenge), we get the flag. You can find our full solve scripts here.