1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Advanced OCR topics [neural nets, image morphology, image transforms, etc.]

Discussion in 'General Programming Chat' started by Gophering, Mar 29, 2013.

  1. Gophering

    Gophering Junior Member Premium Member

    Joined:
    Mar 21, 2013
    Messages:
    115
    Likes Received:
    279
    Occupation:
    Unemployed
    Location:
    EU
    Intro

    Back by popular demand, another OCR tut! I've been quite busy with work recently so I do apologize for the delay. Was originally planning to prepare some material first, before posting this, however theres a bit too much to cover all at once so unless there are some objections we'll just work as we've done previously. I'll just be posting updates to this thread as soon as I have them (that said, it might take me a while longer right now as I'm really quite flooded with work at the moment).

    Anyways, so in my last tut we broke a pretty simple captcha. The captcha featured almost no distortion, had space separated letters and was generally easy to break and easy to recognize. All we did is convert it to a binary image, clean distortion with a simple algorithm and segment the captcha by grouping black pixels into related sets. Unfortunately we won't be this lucky most of the time...

    Most captcha breaking involves successful cleaning and segmentation of the image. Generally, the segmentation part is the hard part involved and captcha makers/captcha algos will try to make your life hard by interconnecting letters and throwing in obstructions which will make segmentation a lot harder (randomly generated lines for example and so on).

    Now, in this tutorial we'll be looking at several segmentation (breaking down the image into parts) algorithms, pre-processing algorithms (removing small artifacts, focusing, etc.), post-processing algorithms (adding back removed parts, image morphology) as well as a couple of recognition/classification methods (like neural nets for example).

    I'll be using several opensource image processing libraries to make this easier for everyone. Firstly, if we need to use a specific algorithm I'll make sure to describe how it works in theory but won't be reimplementing it if I can find it in an external library. We'll make extensive use of opencv[1], a great image processing library which can be plugged into many languages (C, python, Go, etc.)

    Also, I won't be using Go here, just because I'm currently on a box without a Go install at hand... And the admin is a dick. To make all of this as readable as possible, I'll be using Python instead. The code will be commented of corse, so there shouldn't be any problems.

    Let's start out!

    Code:
    [1] opencv [.] willowgarage [.] xom
    
    Detecting lines

    One of the most fundamental tasks in computer vision involves line detection and line recognition. Once able to detect straight lines, one could expand this into detecting shapes, regions and so on. So we'll be discussing line recognition first. With captchas, you'll often find that arbitrary lines will be thrown into an image to make the task of segmentation a lot harder. Suppose letters are still separated by white space but now the letters are obstructed by a straight (or non-straight) line running through them. In this case, segmentation might become a real problem as we won't know where a letter starts and where it begins (we can't differentiate based on white space alone).

    Let's have a look at the following captcha:
    kxpo8eo.gif

    We can see right away that this is a much harder case than our captcha from the previous thread. Letters are not aligned, there are some lines obfuscating the letters, letters spaces are overlapping, etc. Now remember again that our task is firstly simplification/cleaning and next, segmentation (breaking the captcha into separate parts). The pre-processing step is an important one as it will always help us with segmentation and ultimately classification.

    Lets start by detecting the lines and removing them from the image.

    Hough & Radon Transform

    The Hough[1] and the Radon[2] transform are both mathematical algorithms which can help us detect lines (or shapes) in a given image. Both algorithms are very similar, hough being the more commonly used one so we'll just stick to it. Let's look into this a little.

    We need to exploit a certain mathematical property, namely that lines which are represented in the cartesian (x, y) space can be equally translated to polar space using their slope and intercept properties. We need to translate our pixel data from the cartesian space to the polar space (m, c) representing our pixels through theta and r instead of x and y. Finding lines then is reduced to finding pixels which point to the same theta and r (since again, lines can be represented through multiple x, y points or through the slope and intercept). Once converted to polar space, we can define a threshold to extract the lines which really matter and then convert the coordinates of those lines back to cartesian space, giving us our straight lines.

    So to implement this we would need to for example initialize an emapy 2d array which will hold our points, loop through the pixels in our image and translate them to theta and r values, then simply increment our array based on those values (when we get the same theta/r values, increment corresponding array element). Once we've done this, we can define a threshold algorithm to look for specific lines (based on length for example or density), extract those and map them back to the image.

    I won't be implementing this algorithm here as I think it is fairly easy to implement in any language (have a look at rosettacode if you are a bit lost) + the algorithm is of course present in the excellent opencv library. Let's get going and write our line extraction algorithm:

    Code:
    import cv
    from PIL import Image
    
    
    def RemoveLines(img):
        dst = cv.CreateImage(cv.GetSize(img), cv.IPL_DEPTH_8U, 1) #New grayscale destination image
        cv.Copy(img, dst) #Copy our original image to the destination
        storage = cv.CreateMemStorage(0) #HoughTransform requires a memstore so we create it here
        '''
        Now we get the lines with the opencv HoughLines2 function. The function takes:
            img - our source image
            storage - memstore
            hough transform type - standard, probabilistic or scaled (we use probabilistic for accuracy)
            rho - distance resolution
            theta - angle resolution
            threshold - don't consider below this threashold
            min line length
            join up to this many line segments
        '''
        lines = cv.HoughLines2(img, storage, cv.CV_HOUGH_PROBABILISTIC, 1, cv.CV_PI/180, 35, 35, 3)
        for line in lines: #loop through each line
            cv.Line( dst, line[0], line[1], 255, 2, 0 ) #put them back into the iamge, coloured white for demo purposes
        return dst
    
    def LoadAndProcess():
        img = cv.LoadImage("captcha.png", 0) #Load our source image in grayscale mode
        clean = RemoveLines(img) #Clean image
        out = Image.fromstring("L", cv.GetSize(clean), clean.tostring()) #Turn our opencv image to a PIL image
        out.save("clean.png") #save the image
    
    
    if __name__ == "__main__":
        LoadAndProcess()
    
    
    Here are the results, I've used the algorithm with a couple of captchas:
    cBpltuT.png

    As you can see its certainly not perfect, but does get the job done. We can further adjust the algorithm and try to capture all the lines in the image. We might also need to employ several post-processing mechanisms in order to restore some lost information and fix the damage done to our letters. Much more on this and other stuff in the next post. Stay tuned!

    Code:
    [1] en [.] wikipedia [.] org/wiki/Hough_transform
    [2] en [.] wikipedia [.] org/wiki/Radon_transform
    
     
    • Thanks Thanks x 4
  2. thetermy

    thetermy Regular Member

    Joined:
    Apr 1, 2011
    Messages:
    377
    Likes Received:
    225
    Amazing post, very impressive and informative.
     
    • Thanks Thanks x 1
  3. Gophering

    Gophering Junior Member Premium Member

    Joined:
    Mar 21, 2013
    Messages:
    115
    Likes Received:
    279
    Occupation:
    Unemployed
    Location:
    EU
    Thanks, glad you enjoyed it. I got new content coming up here as well. Was a bit busy lately but got some time off from work right now so I'll be picking this up shortly. Stay tuned. Cheers
     
  4. JettyZ

    JettyZ Newbie

    Joined:
    Dec 23, 2009
    Messages:
    15
    Likes Received:
    0
    Thanks for making these tutorials. This OCR stuff is really interesting, hope you make more tutorials on this topic. :)