1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Creating an OCR application - where should I begin?

Discussion in 'General Programming Chat' started by simpleonline1234, May 15, 2012.

  1. simpleonline1234

    simpleonline1234 Junior Member

    Joined:
    Jan 26, 2010
    Messages:
    169
    Likes Received:
    13
    I am starting to REALLY get the hang of programming (but still have a lot to learn). I am moving on up the ladder and now I'm ready to take on OCR (optical character recognition). I know there are few apps out there but they don't really solve any captcha's so to speak.

    Does anyone know where I can find some documentation on to learn where to start learning OCR?

    Thanks
     
  2. Markthedude

    Markthedude Power Member

    Joined:
    Feb 26, 2010
    Messages:
    572
    Likes Received:
    267
    Occupation:
    Entrepreneur
    Location:
    United States
    I'm not a programmer and so I don't know exactly where to point you so maybe this will be of some help:

    http://code.google.com/p/tesseract-ocr/

    On that page I clicked a link from the "Core Developers" section:

    http://code.google.com/p/ocropus/

    Maybe under "Related Projects" and/or "Resources" those links will be of some use to you.

    Ignore me if I'm wrong though :)

    Edit: Check out all these pages that might be of some use:

    https://www.google.com/webhp?sourceid=chrome-instant&ix=tea&ie=UTF-8#hl=en&sclient=psy-ab&q=programming+an+OCR+&oq=programming+an+OCR
     
  3. Paranoid Android

    Paranoid Android Jr. VIP Jr. VIP

    Joined:
    Jun 20, 2010
    Messages:
    1,486
    Likes Received:
    2,270
    Gender:
    Male
    Occupation:
    Pantie Thief
    Location:
    Native America
    i guess you should begin by collecting images from recaptcha, yahoo etc and get someone to manually type them out, and program comparisions of how n connects to a p and how d connects to b and stuff.
     
  4. simpleonline1234

    simpleonline1234 Junior Member

    Joined:
    Jan 26, 2010
    Messages:
    169
    Likes Received:
    13
    Good stuff.. thanks
     
  5. briggers

    briggers Newbie

    Joined:
    Jan 1, 2012
    Messages:
    15
    Likes Received:
    3
    Solving Captchas is very very hard. It is primarily a computer vision (computer science) problem and not a simple programming challenge. There are a bunch of java, php and python libraries that can solve simple captchas.

    Google's recaptcha is supposedly unbeatable, which is why the outsourced manual captcha solving business is booming.
     
  6. lancis

    lancis Elite Member

    Joined:
    Jul 31, 2010
    Messages:
    1,632
    Likes Received:
    2,385
    Occupation:
    Entrepreneur
    Location:
    Milky Way
    Home Page:
    Tesseract is rather old program, its not that good for captcha recognition. Plus its one of the standard tools to check captcha effectiveness - i.e. once you invented a new captcha you use tesseract to check how 'breakable' it is.

    I would suggest to start from academic papers on the subject in order to understand the problem you're facing.
     
  7. siteking

    siteking Junior Member

    Joined:
    Sep 16, 2011
    Messages:
    119
    Likes Received:
    10
    There a few software out there. If you can run a program to churn out the answers sing those softwares as a slave you should be good.
     
  8. lancis

    lancis Elite Member

    Joined:
    Jul 31, 2010
    Messages:
    1,632
    Likes Received:
    2,385
    Occupation:
    Entrepreneur
    Location:
    Milky Way
    Home Page:
    Thats true, I'm rolling out a built-in captcha breaker for some program in the near future. And it took 3 months of 2 M.Sc. dudes to get it to 85% break rate.
     
  9. Chris22

    Chris22 Regular Member

    Joined:
    Sep 29, 2010
    Messages:
    400
    Likes Received:
    1,060
    Check out the hough transform and the Sobel operator.
     
    Last edited: May 15, 2012
  10. openaidbh

    openaidbh BANNED BANNED

    Joined:
    Mar 3, 2012
    Messages:
    328
    Likes Received:
    320
    I'm actually working on a reCaptcha OCR right now, it's still a work in progress as far as training it to recognize overlapping characters but it's almost finished :) If you want a good "captcha" type puzzle to start out with, try making an OCR for teabag_3D. It's pretty easy, but you still have to think creatively about the problem at hand (and no crazy math stuff involved).
     
  11. briggers

    briggers Newbie

    Joined:
    Jan 1, 2012
    Messages:
    15
    Likes Received:
    3
    As an alternative, have you considered plugging in captcha trader or bypasscaptcha into your software?
     
  12. lancis

    lancis Elite Member

    Joined:
    Jul 31, 2010
    Messages:
    1,632
    Likes Received:
    2,385
    Occupation:
    Entrepreneur
    Location:
    Milky Way
    Home Page:
    I have Death by Captcha as an option, but with 85% success rate its not really needed. You just run twice with internal solver and its done.
     
  13. simpleonline1234

    simpleonline1234 Junior Member

    Joined:
    Jan 26, 2010
    Messages:
    169
    Likes Received:
    13
    Amazing...yeah I never knew how much went into it..but then again....with majority of high PR sites using ReCaptcha they must be doing something right to implement them into their site.
     
  14. johndea

    johndea Regular Member

    Joined:
    Jun 23, 2011
    Messages:
    308
    Likes Received:
    35
    How will you make your captcha breaker available?
     
  15. hunter3

    hunter3 Newbie

    Joined:
    Mar 26, 2012
    Messages:
    18
    Likes Received:
    1
    You should learn image processing algorithm and c# language first in order to make oCR apps
     
  16. Chris22

    Chris22 Regular Member

    Joined:
    Sep 29, 2010
    Messages:
    400
    Likes Received:
    1,060
    Not meaning to sound like a troll, but how exactly is that a useful answer?