Screen Reading with OpenCV and Google Vision APIs

These days I'm doing some rowing again.

When I'm not rowing in a boat on the river, I use a Concept2 rowing machine at the gym (known as "the erg"). I am a bit of a nerd in all areas of my life (even sport where, sometimes, I am almost a jock) so I like to digitally log my workouts.

I find that the more convienient logging is, the more likely I am to consistently do it. Rowing on the water is automagically logged by my watch but for rowing on the erg I need to manually note the distances/speeds completed. There are apps that can do this, but they only connect to the more recent versions of the Concept2.

At the moment my process is a bit like this:

  1. Finish rowing
  2. Take photograph on phone of the rowing machine screen
  3. Refer to photo later when updating log

I think it should be possible to eliminate step 3 by algorithmically recognising the numbers of interest in the photo

The workout summary screen always looks something a bit like this:

My first attempt just sent the whole image to Google's Cloud Vision API to use their OCR service. The results were not ideal; with the layout of the screen not quite being a table Google seem to struggle with knowing which order to present the text they recognise. E.g. the might group the totals/averages together (as I want) or they might group the time in with the column headings and the logo (which I don't want).

Here is a typical example:

This does suggest a plan of attack; if I can extract just the totals/averages from the image then Google's OCR service should be able to manage this quite well.

I'm going to do all this in python because the opencv python library seems to have better support/docs than other languages.

We'll start with some notebook settings and imports

In [1]:
%matplotlib inline
import cv2 #the opencv library. Confusingly, on my system, this is installed as opencv3
import matplotlib.pyplot as plt #for displaying images
import imutils #convienient utils for resizing images
import numpy as np #an image is just an array of pixels

I also have a small number of photos for testing with (mostly stolen off Reddit)

In [2]:
files = [
    "data/3w5q2r_0.jpg", # this one is tricky

names = [

I'll be doing stuff to these images and wanting to view the results so I'll make some quick functions to display all the screens in a grid.

In [3]:
def sideBySidePlot(images):
    n = len(images)
    for i in range(0,n):
        x = plt.subplot(2,4,i+1)
        plt.imshow(images[i], cmap="Greys") #cmap="Greys" doesn't mean grayscale!

def columnPlot(images):
    n = len(images)
    for i in range(0,n):
        x = plt.subplot(n,1,i+1)
        plt.imshow(images[i], cmap="Greys")

# resize the images to a standard height
def readAndResizeImages(f):
  image = cv2.imread(f)
  ratio = image.shape[0] / 600.0
  image = imutils.resize(image, height = 600)
  return image

images = [readAndResizeImages(f) for f in files]
originals = [cv2.imread(f) for f in files]


Great! Now I can see what I'm working with.

The next step was hard! I want to just extract the screen bit from the photo (i.e. not the background and not the plastic surround). I tried a few different things which I will show you here just so you don't think that I'm some kind of genius for whom everything works first time.

The first thing I thought to try was to detect the edges of the screen.

In [4]:
# Edge detection works on a grayscale image
grays = [cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) for image in images]
# A Bilateral Filter smooths things out a bit so we won't detect noise as edges
grays = [cv2.bilateralFilter(gray, 9, 17, 17) for gray in grays]
In [5]:
# Canny is an edge detection algorithm
edged = [cv2.Canny(gray, 30, 200) for gray in grays]

This does not look good! I hoped it would draw a rectangle around the screen.

The next thing to try is "contouring". If we can use a contour detection algorithm to find the edge of the screen this will also work.

Start by "thresholding" the image. This converts to two tone black and white

In [6]:
thresholds = [cv2.adaptiveThreshold(gray,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
                            cv2.THRESH_BINARY,11,2) for gray in grays]

For some images, this looks quite promising. For others (e.g. 1, 2 and 5) it still looks terrible.

Even the good images are quite "speckly". "Erode" the speckles away as if they are small islands of sand in a river

In [7]:
kernel = np.ones((5,5),np.uint8) # how big and what shape the waves are
eroded = [cv2.erode(th,kernel,iterations = 1) for th in thresholds]

The opposite of erosion is dilation. By eroding and then dilating, speckles are removed during the first phase and "large islands" are joined together during the second. This should mean we end up with continuous lines on our thresholded images where continous lines existed in the originals.

In [8]:
kernel = np.ones((3,3),np.uint8)
dilated = [cv2.dilate(erode,kernel,iterations = 1) for erode in eroded]

This might be the right thing to do in theory, but it isn't looking very good in practice.

Find the contours anyway and draw the 10 largest

In [9]:
def findBigContours(i):
  _, contours,_ = cv2.findContours(i.copy(),cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE )
  cnts = sorted(contours, key = cv2.contourArea, reverse = True)[:10]

cnts = [findBigContours(i) for i in dilated]
contourimages = [cv2.drawContours(i.copy(), c, -1, (0,255,0), 2) for (i,c) in zip(images,cnts)]


Again, this doesn't look great but I'm going to keep going.

  1. Approximate each contour with straight edges
  2. Find the approximations that are rectangular
  3. Take the largest one of these as the screen
In [10]:
def findScreenCnt(cnts):
    screenCnt = None
    for c in cnts:
        peri = cv2.arcLength(c, True)
        approx = cv2.approxPolyDP(c, 0.04 * peri, True)
        if len(approx)==4 and cv2.isContourConvex(approx):
            screenCnt = approx
    return screenCnt

screenContours = [findScreenCnt(cnt) for cnt in cnts]
screenimages = [cv2.drawContours(i.copy(), [c], -1, (0,255,0), 10) for (i,c) in zip(images,screenContours)]


This isn't actually too bad! 3 out of 8 are roughly what I want.

Have a closer look at images 1 and 7 which seem like they might be bad for different reasons.

In [11]: