Untitled

In [ ]:

nonrobust = get_prediction(image_bytes=img_bytes, model = model_nonrobust, curr_image = session['img'])
robust = get_prediction(image_bytes=img_bytes, model = model_robust, curr_image = session['img'])
# robust model is the "ground truth", non-robust is the "bouncer"
# cases:
    # bouncer does not want to let in horses, you want to let them in anyway
    # robust says horse, non-robust says horse: you have been detected
    # robust says not horse, non-robust says horse: you fail extra hard
    # robust says horse, non-robust says not horse: flag
    # robust says not horse, non-robust says not horse: they were let in but you didn't achieve the goal

Essentially, when we upload an image it is run through two models: a "robust" model and a "non-robust" model (the one we have). Our goal is to trick the non-robust model into thinking we submitted anything but a horse, while still passing it off as a horse to the robust model. In the three failing cases, the session variable yolo is incremented---otherwise, the session variable level is incremented. Near the end of the file there are these checks:

So we need to pass 50 horses in a row with no more than 3 errors.

Observations¶

We immediately noticed that this problem is very similar to the picoCTF 2018 problem "dog or frog". You can read our team's (very entertaining) writeup of that problem here. We'll adopt a very similar approach. Let's make a few reasonable assumptions:

the non-robust model is bad, so altering the image a bit can drastically change its prediction
the robust model is good, so altering the image a bit won't throw off its prediction

High Level Overview¶

Assume we have a magic function which, given an image, outputs a "non-ponyness score" (NPS), which represents how close we are to tricking the non-robust model. Also assume we have some way of randomly mutating an existing image. With these two components, we can do naive stochastic optimization to get a solution as follows

1. Start at generation 0 with the provided image and calculate its NPS.
2. For each generation, randomly mutate the current best image and get its NPS
    a) if its NPS is higher than the NPS of the current best image, set the mutated image as the current best
    b) otherwise throw out the image and try again 
3. If the NPS is high enough (we tricked the non-robust model), output the solution

Solution¶

First we need a scoring function whose output represents how close we are to tricking the model. We can use the provided code to get a list of scores for each object class for a given image. In the code for the site, the highest score from this list is chosen and the corresponding class is returned. We can get a simple scoring mechanism by taking the difference between the score for horse and the second-highest score:

In [ ]:

def score_image(self, image):
    inputs = transform_image(image)
    if torch.cuda.is_available():
        inputs = inputs.cuda()
    outputs = model_nonrobust(inputs)
    # a list of scores for each class
    class_scores = outputs.cpu().detach().numpy()[0]
    # get difference between highest non-horse score and horse score
    return max(np.delete(class_scores, 7)) - class_scores[7]

So if the returned score is positive, then the prediction will not be a horse. For our mutation function, we will simply add between -50% and 50% to each individual pixel randomly:

In [ ]:

def mutate(px):
    return (1 + (random.random() - 0.5) * LEARNING_RATE) * px


def evolve_image(im):
    # applies mutate to whole image and restricts pixels to RGB values between 0 and 255
    return np.clip(np.apply_along_axis(mutate, 2, im), 0, 255)

There is an additional limitation we have not discussed yet: the site hashes any images we input and checks that the difference from the original is no more than 5. We also don't want radically different images since they will have a much higher chance of failing the robust check. So we can just add a stipulation that if the hash difference exceeds 3, we throw out the current iteration and try again. We wrap a compare_hash method and the above score_image function in a Predictor class and end up with this:

In [ ]:

def disguise_pony(pony):
    predictor = Predictor(pony)

    current = np.asarray(pony)
    score = predictor.score_image(pony)
    generation = 0

    while score < 0:
        # randomly mutate the image
        evolved = evolve_image(current).astype(np.uint8)
        generation += 1
        # convert from np array to PIL object
        evolved_img = Image.fromarray(evolved)
        # check that the difference in hashes is not too big
        hash_diff = predictor.compare_hash(evolved_img)
        if hash_diff > 3:
            continue
        # if we did better set the current prediction to this one and update the score
        new_score = predictor.score_image(evolved_img)
        if new_score > score:
            current = evolved
            score = new_score
            print(f"generation {generation}: hashdiff = {hash_diff}, score = {score}")
    return Image.fromarray(current)

pwnies_please¶

The Setup¶

Observations¶

High Level Overview¶

Solution¶