Advanced OCR topics [neural nets, image morphology, image transforms, etc.]

Gophering · Mar 29, 2013

Intro

Back by popular demand, another OCR tut! I've been quite busy with work recently so I do apologize for the delay. Was originally planning to prepare some material first, before posting this, however theres a bit too much to cover all at once so unless there are some objections we'll just work as we've done previously. I'll just be posting updates to this thread as soon as I have them (that said, it might take me a while longer right now as I'm really quite flooded with work at the moment).

Anyways, so in my last tut we broke a pretty simple captcha. The captcha featured almost no distortion, had space separated letters and was generally easy to break and easy to recognize. All we did is convert it to a binary image, clean distortion with a simple algorithm and segment the captcha by grouping black pixels into related sets. Unfortunately we won't be this lucky most of the time...

Most captcha breaking involves successful cleaning and segmentation of the image. Generally, the segmentation part is the hard part involved and captcha makers/captcha algos will try to make your life hard by interconnecting letters and throwing in obstructions which will make segmentation a lot harder (randomly generated lines for example and so on).

Now, in this tutorial we'll be looking at several segmentation (breaking down the image into parts) algorithms, pre-processing algorithms (removing small artifacts, focusing, etc.), post-processing algorithms (adding back removed parts, image morphology) as well as a couple of recognition/classification methods (like neural nets for example).

I'll be using several opensource image processing libraries to make this easier for everyone. Firstly, if we need to use a specific algorithm I'll make sure to describe how it works in theory but won't be reimplementing it if I can find it in an external library. We'll make extensive use of opencv[1], a great image processing library which can be plugged into many languages (C, python, Go, etc.)

Also, I won't be using Go here, just because I'm currently on a box without a Go install at hand... And the admin is a dick. To make all of this as readable as possible, I'll be using Python instead. The code will be commented of corse, so there shouldn't be any problems.

Let's start out!

Code:

[1] opencv [.] willowgarage [.] xom

Detecting lines

One of the most fundamental tasks in computer vision involves line detection and line recognition. Once able to detect straight lines, one could expand this into detecting shapes, regions and so on. So we'll be discussing line recognition first. With captchas, you'll often find that arbitrary lines will be thrown into an image to make the task of segmentation a lot harder. Suppose letters are still separated by white space but now the letters are obstructed by a straight (or non-straight) line running through them. In this case, segmentation might become a real problem as we won't know where a letter starts and where it begins (we can't differentiate based on white space alone).

Let's have a look at the following captcha:

We can see right away that this is a much harder case than our captcha from the previous thread. Letters are not aligned, there are some lines obfuscating the letters, letters spaces are overlapping, etc. Now remember again that our task is firstly simplification/cleaning and next, segmentation (breaking the captcha into separate parts). The pre-processing step is an important one as it will always help us with segmentation and ultimately classification.

Lets start by detecting the lines and removing them from the image.

Hough & Radon Transform

The Hough[1] and the Radon[2] transform are both mathematical algorithms which can help us detect lines (or shapes) in a given image. Both algorithms are very similar, hough being the more commonly used one so we'll just stick to it. Let's look into this a little.

We need to exploit a certain mathematical property, namely that lines which are represented in the cartesian (x, y) space can be equally translated to polar space using their slope and intercept properties. We need to translate our pixel data from the cartesian space to the polar space (m, c) representing our pixels through theta and r instead of x and y. Finding lines then is reduced to finding pixels which point to the same theta and r (since again, lines can be represented through multiple x, y points or through the slope and intercept). Once converted to polar space, we can define a threshold to extract the lines which really matter and then convert the coordinates of those lines back to cartesian space, giving us our straight lines.

So to implement this we would need to for example initialize an emapy 2d array which will hold our points, loop through the pixels in our image and translate them to theta and r values, then simply increment our array based on those values (when we get the same theta/r values, increment corresponding array element). Once we've done this, we can define a threshold algorithm to look for specific lines (based on length for example or density), extract those and map them back to the image.

I won't be implementing this algorithm here as I think it is fairly easy to implement in any language (have a look at rosettacode if you are a bit lost) + the algorithm is of course present in the excellent opencv library. Let's get going and write our line extraction algorithm:

Code:

import cv
from PIL import Image


def RemoveLines(img):
    dst = cv.CreateImage(cv.GetSize(img), cv.IPL_DEPTH_8U, 1) #New grayscale destination image
    cv.Copy(img, dst) #Copy our original image to the destination
    storage = cv.CreateMemStorage(0) #HoughTransform requires a memstore so we create it here
    '''
    Now we get the lines with the opencv HoughLines2 function. The function takes:
        img - our source image
        storage - memstore
        hough transform type - standard, probabilistic or scaled (we use probabilistic for accuracy)
        rho - distance resolution
        theta - angle resolution
        threshold - don't consider below this threashold
        min line length
        join up to this many line segments
    '''
    lines = cv.HoughLines2(img, storage, cv.CV_HOUGH_PROBABILISTIC, 1, cv.CV_PI/180, 35, 35, 3)
    for line in lines: #loop through each line
        cv.Line( dst, line[0], line[1], 255, 2, 0 ) #put them back into the iamge, coloured white for demo purposes
    return dst

def LoadAndProcess():
    img = cv.LoadImage("captcha.png", 0) #Load our source image in grayscale mode
    clean = RemoveLines(img) #Clean image
    out = Image.fromstring("L", cv.GetSize(clean), clean.tostring()) #Turn our opencv image to a PIL image
    out.save("clean.png") #save the image


if __name__ == "__main__":
    LoadAndProcess()

Here are the results, I've used the algorithm with a couple of captchas:

As you can see its certainly not perfect, but does get the job done. We can further adjust the algorithm and try to capture all the lines in the image. We might also need to employ several post-processing mechanisms in order to restore some lost information and fix the damage done to our letters. Much more on this and other stuff in the next post. Stay tuned!

Code:

[1] en [.] wikipedia [.] org/wiki/Hough_transform
[2] en [.] wikipedia [.] org/wiki/Radon_transform

thetermy · Apr 11, 2013

Amazing post, very impressive and informative.

Gophering · Apr 11, 2013

thetermy said:
Amazing post, very impressive and informative.

Thanks, glad you enjoyed it. I got new content coming up here as well. Was a bit busy lately but got some time off from work right now so I'll be picking this up shortly. Stay tuned. Cheers

JettyZ · May 10, 2013

Thanks for making these tutorials. This OCR stuff is really interesting, hope you make more tutorials on this topic.

Advanced OCR topics [neural nets, image morphology, image transforms, etc.]

Gophering

Junior Member

thetermy

Regular Member

Gophering

Junior Member

JettyZ

Newbie

Main Menu

Marketplace

Making Money

BlackHat World