Search engines use machine learning to develop the best possible ranking algorithm. Then, search engine optimisers (SEOs) pit themselves against this hidden algorithm, experimenting as they try to increase the rankings of a site.
Only search engine employees really know which algorithms are used; in this post I will look at a known algorithm and then use techniques called “adversarial learning” to optimise against it.
Working with a known algorithm is useful because it gives a controlled environment for running tests and optimisations. The algorithm we will be using is called Inception V3 and we will be using it to rank images of cats.
The post will be in several sections:
In summary, we start by building a weird toy ranking engine (not going to call it a search engine because it doesn’t do crawling and doesn’t really have queries) then we use special knowledge of the algorithm to optimise for it. Then we make our optimisation task more realistic by pretending we don’t know about the internals of the algorithm and finally we pretend the only thing we can observe about the algorithm is the ranking output at the end.
All the code used for this investigation is included in the post. You can ignore it if python isn’t your thing and just read the text.
Comments in code blocks contain content that is useful for people running the code but there shouldn’t be any necessary explanations for understanding the “big idea” there.
Since 2010 there has been a machine learning competition called The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) where competitors try to correctly classify images into 1000 different categories. In 2015 Google performed very well in this competition with their deep learning algorithm called Inception V3.
Inception V3 is a neural network and Google have made both the shape of the network and the network weights available for anyone to use to classify their own images. This means you can run the algorithm on your own computer, against your own images and have full visibility on all the computation it does. You can even alter it or train it further if that is what you want to do.
We can use the Inception V3 algorithm to classify an image of a cat.
# I have multiple keras backends installed
# Make sure that the tensorflow one is being used for this
# Must be done before keras libraries are loaded
import os
os.environ["KERAS_BACKEND"] = "tensorflow"
from keras.preprocessing.image import load_img, save_img, img_to_array
import numpy as np
import scipy
## inception_v3 is the pre-trained neural network
from keras.applications import inception_v3
from keras import backend as K
def preprocess_image(image_path):
# Util function to open, resize and format pictures
# into appropriate tensors.
img = load_img(image_path, target_size=(299,299))
img = img_to_array(img)
img = np.expand_dims(img, axis=0)
img = inception_v3.preprocess_input(img)
return img
def deprocess_image(y):
# Util function to convert a tensor into a valid image.
x = np.copy(y)
if K.image_data_format() == 'channels_first':
x = x.reshape((3, x.shape[2], x.shape[3]))
x = x.transpose((1, 2, 0))
else:
x = x.reshape((x.shape[1], x.shape[2], 3))
x /= 2.
x += 0.5
x *= 255.
x = np.clip(x, 0, 255).astype('uint8')
return x
# We won't be training so set learning phase to 0
K.set_learning_phase(0)
# Load the InceptionV3 model
# The model will be loaded with pre-trained ImageNet weights.
model = inception_v3.InceptionV3(weights='imagenet',
include_top=True)
# Now load and classify an image
original_img = preprocess_image('lazycat.jpg')
import matplotlib.pyplot as plt
%matplotlib inline
top_predicted_label = inception_v3.decode_predictions(model.predict(original_img), top=1)[0][0]
plt.title('Predicted Label: %s (%f)' % (top_predicted_label[1], top_predicted_label[2]))
# Add semicolon at the end of this line to prevent matplotlib
# adding a description
# https://stackoverflow.com/a/38971053
plt.imshow(deprocess_image(original_img));
The Inception V3 algorithm classifies our image as an “Egyptian Cat” with 49% certainty.
We can see what other classifications the algorithm thinks are possible in the table below.
import pandas as pd
preds = model.predict(original_img)
result = inception_v3.decode_predictions(preds,
top=10)
pd.DataFrame(result[0])
# Out[242]: |
|
|
|
|
---|---|---|---|---|
|
0 |
1 |
2 |
|
0 |
n02124075 |
Egyptian_cat |
0.488812 |
|
1 |
n02123159 |
tiger_cat |
0.096307 |
|
2 |
n02123045 |
tabby |
0.0924525 |
|
3 |
n02123394 |
Persian_cat |
0.0703125 |
|
4 |
n04399382 |
teddy |
0.00825767 |
|
5 |
n04344873 |
studio_couch |
0.00726824 |
|
6 |
n03793489 |
mouse |
0.00718327 |
|
7 |
n04152593 |
screen |
0.0058089 |
|
8 |
n03085013 |
computer_keyboard |
0.00459059 |
|
9 |
n02123597 |
Siamese_cat |
0.00341954 |
|
Egyptian cat is the most likely but there are other types of cat (e.g. tabby, siamese) that are also possibilities. We can add all these up to get the “cat score” for an image.
def cat_score(preds):
# See https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a for
# a mapping of class numbers to names
tabby = preds[0][281] #281 = tabby cat
tiger = preds[0][282]
persian = preds[0][283]
siamese = preds[0][284]
egyptian = preds[0][285]
total = tabby + tiger + persian + siamese + egyptian
return(total)
plt.title('Cat Score: %f' % (cat_score(preds)))
plt.imshow(deprocess_image(original_img));
The maximum possible cat score is one. This cat scores 0.75.
We can use the cat score to rank images of cats (or of anything!) based on how cat like Inception V3 thinks they are.
I just happen to have ten thousand cat images downloaded from the internet (I’m not joking about this). We can rank them using the cat score and see which are the most and least cat like images.
import glob
import pickle
import os.path
# Cache the results because this takes a while.
# If the cache file exists then just use this.
if os.path.isfile("cat_probs.pkl"):
f = open("cat_probs.pkl",'rb')
cat_probs = pickle.load(f)
f.close()
else:
files = glob.glob("CAT_*/*.jpg")
cat_probs = []
for f in files:
img = preprocess_image(f)
preds = model.predict(img)
score = cat_score(preds)
cat_probs.append({'file':f,
'total': score
})
f = open("cat_probs.pkl",'wb')
pickle.dump(cat_probs,f)
f.close()
# Sort the results
cat_probs = sorted(cat_probs,
key=lambda k: k['total'],
reverse=True)
top = cat_probs[0]
bottom = cat_probs[-1]
_, axs= plt.subplots(1,2, figsize=(8,10))
axs[0].set_title('Most cat like: %s' % (top['total']))
axs[0].imshow(deprocess_image(preprocess_image(top['file'])))
axs[1].set_title('Least cat like: %s' % (bottom['total']))
axs[1].imshow(deprocess_image(preprocess_image(bottom['file'])));
Here are the top ten most cat like cats:
fig, axs= plt.subplots(5,2, figsize=(8,10))
for i,ax in enumerate(fig.axes):
cat = cat_probs[i]
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
ax.set_title("Rank %s. Score %f" % (i+1,cat['total']))
ax.imshow(deprocess_image(preprocess_image(cat['file'])));
And the bottom ten in the rankings:
ncats = len(cat_probs)
fig, axs= plt.subplots(5,2, figsize=(8,10))
for i,ax in enumerate(fig.axes):
catix = ncats - i -1
cat = cat_probs[catix]
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
ax.set_title("Rank %s. Score %f" % (catix+1,cat['total']))
ax.imshow(deprocess_image(preprocess_image(cat['file'])));
So this is the cat engine - give it an image of a cat and it will give it a score which can be ranked against the images of the other ten thousand cats in the index.
In the next section we will look at optimising a cat image to improve the cat score so that it ranks higher in the cat engine.
Adversarial machine learning is a technique employed in the field of machine learning which attempts to fool models through malicious input. This technique can be applied for a variety of reasons, the most common being to attack or cause a malfunction in standard machine learning models.
Experts in adversarial learning use machine learning to trick other machine learning algorithms into making mistakes like misclassifying a picture of a cat as something else. This isn’t very interesting if the picture is of something else but it is possible to find adversarial examples that are obviously one thing to a human but look like something else to the algorithm.
To explain how to generate adversarial images we are first going to have to take a little dive into mathematics.
Before being sent to the algorithm all our images are scaled to 300x300 pixels. And each pixel has three values; one each for red, green and blue.
So, to a computer, each image is just a point in a 300x300x3=2700 dimension space.
And then Inception V3 is just a function from a 2700 dimension input to a 1000 dimension output (because it predicts for 1000 classes).
Then we turn the 1000 dimension output into a single number (1 dimension) by using our cat_scores function that calculates the score for an image from the Inception V3 predictions.
We can use calculus to differentiate the Inception V3 function which gives us the gradient (or slope) at a given point. Which tells us which direction to go to increase or decrease the output. This tells us which pixels to change in an image in order to make it more or less like a particular class.
This lets us use a machine learning method called “gradient descent” where we constantly move “downhill” to find the minimum of a function.
Luckily for us, machine learning systems like tensorflow and keras make finding gradients and doing gradient descent (relatively) simple.
from keras import losses
# The shape of the input
image_shape = model.input
# Define the shape of the target
# Inception V3 has 1000 classes
target_class = K.placeholder(shape=(1,1000))
# Our loss is the crossentropy between the predicted class of the
# output and the target class
adversarial_loss = losses.categorical_crossentropy(target_class, model.output)
# Keras automagically calculates the gradients
grads = K.gradients(adversarial_loss, image_shape)[0]
# Clip the grads to prevent blowup
grads /= K.maximum(K.mean(K.abs(grads)), K.epsilon())
# Define the computation graph in keras
outputs = [adversarial_loss, grads]
fetch_loss_and_grads = K.function([image_shape,target_class], outputs)
# cls is the target class
def eval_loss_and_grads(x,cls, loss_func):
outs = loss_func([x,cls])
loss_value = outs[0]
grad_values = outs[1]
return loss_value, grad_values
# then we do gradient descent to minimise the loss
def gradient_descent(x, iterations, step, cls=None, max_loss=None, loss_func=None):
for i in range(iterations):
loss_value, grad_values = eval_loss_and_grads(x, cls, loss_func)
if max_loss is not None and loss_value > max_loss:
break
x -= step * grad_values
return x
Let’s take our original cat image and then change it to make it classify as something else.
def optimise_image(input_path,
target,
iterations = 10,
step = 0.01,
max_loss = 10
):
# Now do the gradient descent
original_img = preprocess_image(input_path)
img_copy = np.copy(original_img)
newimg = gradient_descent(img_copy,
iterations=iterations,
step=step,
max_loss=max_loss,
loss_func = fetch_loss_and_grads,
cls = target
)
f, axs= plt.subplots(1,2, figsize=(8,10))
top_predicted_label = inception_v3.decode_predictions(model.predict(original_img), top=1)[0][0]
axs[0].set_title('Predicted Label: %s (%.3f)' % (top_predicted_label[1], top_predicted_label[2]))
axs[0].imshow(deprocess_image(original_img))
top_predicted_label = inception_v3.decode_predictions(model.predict(newimg), top=1)[0][0]
axs[1].set_title('Predicted Label: %s (%.3f)' % (top_predicted_label[1], top_predicted_label[2]))
axs[1].imshow(deprocess_image(newimg))
aircraft_carrier = np.zeros(shape=(1,1000))
aircraft_carrier[0,403] = 1.
optimise_image('lazycat.jpg', aircraft_carrier, iterations=1000)
After 1000 iterations of gradient descent our cat image looks a little bit fuzzy to the human eye but to the algorithm it looks like an aircraft carrier with very high confidence (1.00 out of 1.00!).
We can use the same idea to take an image of a cat and make it more cat like so that it ranks higher in our ranking engine.
Let’s do it with the bottom ranked image:
egyptian_cat = np.zeros(shape=(1,1000))
egyptian_cat[0,285] = 1
optimise_image(bottom['file'],
egyptian_cat,
iterations=100,
max_loss=None)
Perfect! From the bottom ranked image in my cat picture collection to the top! If only real life search engine optimisation was so easy!
One of the reasons why real life is not this easy is that in real life people don’t have access to the internals of the algorithm they are optimising for. So they can’t calculate the gradients and they don’t know which are the optimal changes to make to the image.
Does this mean there is nothing we can do? Of course not! We can still improve an image (in the eyes of the algorithm) even without knowing anything about the algorithm internals - we only need to know the input image and the score the algorithm gives it. We can treat the algorithm as a total black box!
In science, computing, and engineering, a black box is a device, system or object which can be viewed in terms of its inputs and outputs without any knowledge of its internal workings. Its implementation is “opaque” (black).
For the cat engine the input is an image and the output is the cat score. The “black box” is the Inception V3 neural network that calculates the score; we are going to pretend that we don’t know how it does this.
We can feed an image into the algorithm and see the score but that is all we know. We aren’t going to use the gradient to figure out which changes to the image will be most beneficial to us so the tricky part of the challenge is figuring out how to change the image.
As mentioned earlier, there are 2700 different things we could change and the combination of these changes adds up to a lot of potential images. Far more than we could check one at a time. if we could check every possible image I guess we could also just copy whatever is the highest scoring image.
Let’s consider a much smaller search space. What if we only changed one pixel? How much difference to the score could that make?
The images on the right are a magnification of the red box in the images on the left
# Modified from https://github.com/Hyperparticle/one-pixel-attack-keras/blob/master/1_one-pixel-attack-cifar10.ipynb
def perturb_image(xs, old_img):
img = np.copy(old_img)
# If this function is passed just one perturbation vector,
# pack it in a list to keep the computation the same
if xs.ndim < 2:
xs = np.array([xs])
# Copy the image n == len(xs) times so that we can
# create n new perturbed images
tile = [len(xs)] + [1]*(xs.ndim+1)
imgs = np.tile(img, tile)
# Make sure to floor the members of xs as int types
xs = xs.astype(int)
for x,img in zip(xs, imgs):
# Split x into an array of 5-tuples (perturbation pixels)
# i.e., [[x,y,r,g,b], ...]
pixels = np.split(x, len(x) // 5)
for pixel in pixels:
# At each pixel's x,y position, assign its rgb value
x_pos, y_pos, *rgb = pixel
img[x_pos, y_pos] = rgb
return imgs
original_img = preprocess_image(bottom['file'])
old_preds = model.predict(original_img)
old_score = cat_score(old_preds)
pixel = np.array([150, 150, 0, 0, 255]) # pixel = x,y,r,g,b
image_perturbed = perturb_image(pixel, original_img)
new_preds = model.predict(image_perturbed)
new_score = cat_score(new_preds)
import matplotlib.patches as patches
_, axs= plt.subplots(1,2, figsize=(8,10))
axs[0].set_title('Old Score: %f' % old_score)
axs[0].add_patch(patches.Rectangle((140,140),20,20,linewidth=1,edgecolor='r',facecolor='none'))
axs[1].set_xlim(140,160)
axs[1].set_ylim(160,140)
axs[1].set_title('New Score: %f' % new_score)
axs[0].imshow(deprocess_image(image_perturbed))
axs[1].imshow(deprocess_image(image_perturbed));
pixel = np.array([150, 150, 0, 255, 0]) # pixel = x,y,r,g,b
image_perturbed = perturb_image(pixel, original_img)
new_preds = model.predict(image_perturbed)
new_score = cat_score(new_preds)
_, axs= plt.subplots(1,2, figsize=(8,10))
axs[0].set_title('Old Score: %f' % old_score)
axs[0].add_patch(patches.Rectangle((140,140),20,20,linewidth=1,edgecolor='r',facecolor='none'))
axs[1].set_xlim(140,160)
axs[1].set_ylim(160,140)
axs[1].set_title('New Score: %f' % new_score)
axs[0].imshow(deprocess_image(image_perturbed))
axs[1].imshow(deprocess_image(image_perturbed));
pixel = np.array([150, 150, 255, 0, 0]) # pixel = x,y,r,g,b
image_perturbed = perturb_image(pixel, original_img)
new_preds = model.predict(image_perturbed)
new_score = cat_score(new_preds)
_, axs= plt.subplots(1,2, figsize=(8,10))
axs[0].set_title('Old Score: %f' % old_score)
axs[0].add_patch(patches.Rectangle((140,140),20,20,linewidth=1,edgecolor='r',facecolor='none'))
axs[1].set_xlim(140,160)
axs[1].set_ylim(160,140)
axs[1].set_title('New Score: %f' % new_score)
axs[0].imshow(deprocess_image(image_perturbed))
axs[1].imshow(deprocess_image(image_perturbed));
Changing just one pixel in the centre of the image between red, green, and blue can lead to quite a big change in the cat score. This indicates two things:
“One pixel attacks” are studied in adversarial learning where people try to change one pixel of an image in order for it to get wrongly classified. There is a good example of the code for this via Dan Kondratyuk on Github from whom I have borrowed for these examples.
The trick here is not to pick a specific pixel and then optimise the colour (as in the examples above) but to change different pixels and see how this influences the score. When a few different pixels/colours have been tested the results can be combined using a genetic algorithm called “differential evolution”. I’m not going to go into depth here about why this works because I don’t fully understand it myself; I kind of get why, in general, combining two good solutions to an optimisation problem might lead to another good solution. I am at a bit of a loss how this is meant to work with pixels on opposite sides of an image.
Luckily we don’t need perfect understanding because SciPy has an implementation of the differential evolution algorithm so all we need to do is prepare some functions and link them all together.
def get_new_score(xs, img, get_score, model):
# Perturb the image with the given pixel(s) x and get the prediction of the model
imgs_perturbed = perturb_image(xs, img)
predictions = model.predict(imgs_perturbed)
score = get_score(predictions)
# Return 1-score because we need something to minimise
return 1 - score
def attack_success(x, img, get_score, threshold, model, verbose=False):
# Perturb the image with the given pixel(s) and get the prediction of the model
attack_image = perturb_image(x, img)
preds = model.predict(attack_image)
score = get_score(preds)
# If the prediction is what we want (misclassification or
# targeted classification), return True
if verbose:
print('Score:', score)
if (score>threshold):
return True
# NOTE: return None otherwise (not False), due to how Scipy handles its
# callback function
from scipy.optimize import differential_evolution
def attack(img, model, threshold, pixel_count=1,
maxiter=75, popsize=400, verbose=False):
# Define bounds for a flat vector of x,y,r,g,b values
# For more pixels, repeat this layout
bounds = [(0,299), (0,299), (0,255), (0,255), (0,255)] * pixel_count
# Population multiplier, in terms of the size of the perturbation vector x
popmul = max(1, popsize // len(bounds))
# Format the predict/callback functions for the differential evolution algorithm
def predict_fn(xs):
return get_new_score(xs, img, cat_score, model)
def callback_fn(x, convergence):
return attack_success(x, img, cat_score, threshold, model,verbose)
# Call Scipy's Implementation of Differential Evolution
attack_result = differential_evolution(
predict_fn, bounds, maxiter=maxiter, popsize=popmul,
recombination=1, atol=-1, callback=callback_fn, polish=False)
#Calculate some useful statistics to return from this function
attack_image = perturb_image(attack_result.x, img)
old_preds = model.predict(img)
old_score = cat_score(old_preds)
new_preds = model.predict(attack_image)
new_score = cat_score(new_preds)
return {'attack': attack_result,
'attacked_image': attack_image[0],
'original_image': img,
'original_score': old_score,
'new_score': new_score,
'improvement': new_score-old_score
}
original_img = preprocess_image(bottom['file'])
# This takes a while to run so cache result
if os.path.isfile("one_pixel_results.pkl"):
f = open("one_pixel_results.pkl",'rb')
result = pickle.load(f)
f.close()
else:
result = attack(original_img, model, 0.9, pixel_count=1, verbose=False)
f = open("one_pixel_results.pkl",'wb')
pickle.dump(result,f)
f.close()
_, axs= plt.subplots(1,2, figsize=(8,10))
axs[0].set_title('Old Score: (%.3f)' % (result['original_score']))
axs[0].imshow(deprocess_image(result['original_image']))
axs[1].set_title('New Score: (%.3f)' % (result['new_score']))
axs[1].imshow(deprocess_image(np.array([result['attacked_image']])));
These look practically identical (which they are, only a single pixel is different!) but if we zoom in we can see the change.
pixel = result['attack'].x
import math
attack_x = math.floor(pixel[1])
attack_y = math.floor(pixel[0])
x = deprocess_image(np.array([result['attacked_image']]))
_, axs= plt.subplots(1,2, figsize=(8,10))
axs[0].add_patch(patches.Rectangle((attack_x-15,attack_y-15),30,30,linewidth=1,edgecolor='r',facecolor='none'))
axs[1].set_xlim(attack_x-15,attack_x+15)
axs[1].set_ylim(attack_y+15,attack_y-15)
axs[0].imshow(x)
axs[1].imshow(x);
We have found a single pixel that can increase the cat score from 0 to 0.782. This is enough to improve the ranking from 9997 to 4439 which is quite a lot. It would be totally useless in terms of search traffic (still on page 440!) but remember that we were starting with a terrible image of a cat to begin with and that we have only changed one pixel.
There is one more step we can take to make cat engine optimisation slightly more like real search engine optimisation. For the above image we treated the ranking algorithm like a black box with the image as input and the cat score as output. This is a bit unrealistic because no one sees actual search engine scores; instead you can only see where your pages rank relative to all the others.
This is an important distinction because there might be a big gap in score between your page and the page ranked above. You could be making improvements for some time before you observe a change in rankings.
To see if we can optimise with this new constraint we just need to change a few of the functions we use to configure the differential evolution algorithm so that it only receives feedback on changes in ranking rather than changes in score.
def get_new_score(xs, img, model):
# Perturb the image with the given pixel(s) x and get the prediction of the model
imgs_perturbed = perturb_image(xs, img)
predictions = model.predict(imgs_perturbed)
score = cat_score(predictions)
rank = sum([score < x['total'] for x in cat_probs]) + 1
return rank
def attack_success(x, img, target_rank, model, verbose=False):
# Perturb the image with the given pixel(s) and get the prediction of the model
attack_image = perturb_image(x, img)
preds = model.predict(attack_image)
score = cat_score(preds)
rank = sum([score < x['total'] for x in cat_probs])
if verbose:
print('Score:', score)
if (rank<target_rank):
return True
# NOTE: return None otherwise (not False), due to how Scipy handles its
# callback function
from scipy.optimize import differential_evolution
def attack(img, model, target_rank, pixel_count=1,
maxiter=75, popsize=400, verbose=False):
# Define bounds for a flat vector of x,y,r,g,b values
# For more pixels, repeat this layout
bounds = [(0,299), (0,299), (0,255), (0,255), (0,255)] * pixel_count
# Population multiplier, in terms of the size of the perturbation vector x
popmul = max(1, popsize // len(bounds))
# Format the predict/callback functions for the differential evolution algorithm
def predict_fn(xs):
return get_new_score(xs, img, model)
def callback_fn(x, convergence):
return attack_success(x, img, target_rank, model,verbose)
# Call Scipy's Implementation of Differential Evolution
attack_result = differential_evolution(
predict_fn, bounds, maxiter=maxiter, popsize=popmul,
recombination=1, atol=-1, callback=callback_fn, polish=False)
#Calculate some useful statistics to return from this function
attack_image = perturb_image(attack_result.x, img)
old_preds = model.predict(img)
old_score = cat_score(old_preds)
old_rank = sum([old_score < x['total'] for x in cat_probs])
new_preds = model.predict(attack_image)
new_score = cat_score(new_preds)
new_rank = sum([new_score < x['total'] for x in cat_probs])
return {'attack': attack_result,
'attacked_image': attack_image[0],
'original_image': img,
'original_score': old_rank,
'new_score': new_rank,
'improvement': new_score-old_score
}
original_img = preprocess_image(bottom['file'])
if os.path.isfile("one_pixel_rank_results.pkl"):
f = open("one_pixel_rank_results.pkl",'rb')
result = pickle.load(f)
f.close()
else:
result = attack(original_img, model, 10, pixel_count=1, verbose=False)
f = open("one_pixel_rank_results.pkl",'wb')
pickle.dump(result,f)
f.close()
_, axs= plt.subplots(1,2, figsize=(8,10))
axs[0].set_title('Old Rank: (%.f)' % (result['original_score']))
axs[0].imshow(deprocess_image(result['original_image']))
axs[1].set_title('New Rank: (%.f)' % (result['new_score']))
axs[1].imshow(deprocess_image(np.array([result['attacked_image']])));
pixel = result['attack'].x
import math
attack_x = math.floor(pixel[1])
attack_y = math.floor(pixel[0])
x = deprocess_image(np.array([result['attacked_image']]))
_, axs= plt.subplots(1,2, figsize=(8,10))
axs[0].add_patch(patches.Rectangle((attack_x-15,attack_y-15),30,30,linewidth=1,edgecolor='r',facecolor='none'))
axs[1].set_xlim(attack_x-15,attack_x+15)
axs[1].set_ylim(attack_y+15,attack_y-15)
axs[0].imshow(x)
axs[1].imshow(x);
The eventual ranking isn’t quite as good in this case but that isn’t really surprising given that we made the optimisation task harder.
I hope you find this topic as interesting as I do. Let’s quickly summarise what has been covered:
The early topics are useful general knowledge for anyone who works with data these days; I expect that tweaking neural nets that have been pre-trained by the tech giants will become a larger and larger part of normal data work as machine learning develops and grows into new areas.
The later topics introduce a more general optimisation approach (differential evolution) and demonstrate how to use it against our toy ranking engine. This leads to two further questions about real world uses
Differential evolution can be used for any optimisation where there are defined inputs and a single score function to maximise/minimise.
I suspect that there will be some problems that meet this criteria where differential evolution will not find good results; the “shape” of the score function will have to be such that the merging/breeding of agents in the genetic algorithm is likely to lead to a better score. I get the impression that this is likely to be the case for many real world problems but I don’t really have a good grasp on why this is or what kind of problems are likely to be amenable to this approach.
The big obstacle to real life use of differential evolution for online marketing tasks is that there is rarely a situation where you can give some inputs and then get immediate feedback on the outputs. A delay in feedback will slow down the differential evolution algorithm significantly. A similar issue, particularly for SEO, is that of noisy feedback; when we see the outputs of an algorithm there is often a load of other stuff going on too so it is harder to say for sure that the output is 100% the result of the input. This can also be simulated when optimising for the cat engine; I’ll leave this as an exercise for the interested reader and might return to it in a later post.
In order to be able to use differential evolution in our toy example we narrowed down the search space to look for only single pixel changes. We would have to go through a similar step before using this algorithm to optimise a web page.
What is the web page equivalent of a pixel?
It could be a single character in the HTML. But in a lot of cases changing a single character will just lead to a badly formed page. Or it could be a single DOM element. But DOM elements form a tree rather than an array so there would have to be some kind of funky conversion going on before this could be used as an input to the differential evolution algorithm. I’m sure none of this is impossible, but it does make things more difficult.
Perhaps a more realistic optimisation scenario would be optimising content within a template. This would constrain the input a bit more but, if we’re talking about optimising text, it would still mostly lead to nonsense or malformed output.
In short, I think this is a long way from anything that you can use to directly rank higher in real life. However, there might be other things you can optimise using differential evolution and these things could (indirectly) lead to an increase in traffic, user satisfaction etc.
And finally, remember that however well the cat engine works cats do not care what we think of them.
Special thanks to Dom Woodman, Jake Tucker and Bridget Fergie for editing and advice.