1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How to make OCR decoder

Discussion in 'Black Hat SEO Tools' started by Witch, Jan 3, 2010.

  1. Witch

    Witch Newbie

    Joined:
    Dec 27, 2009
    Messages:
    7
    Likes Received:
    0
    Anyone know how to write an OCR software for captcha Decode.

    I am a IT undergraduate student,I just learn some programming and I intend to make a seo tool for the webmaster...

    The first thing is I need to create is a captcha decoder...
    Anyone doing this ???

    If yes,then can you tell me what is the thing that I need to focus on....
     
  2. pyronaut

    pyronaut Executive VIP

    Joined:
    Dec 9, 2008
    Messages:
    1,229
    Likes Received:
    1,422
    I think, you should probably start with something other than an OCR if your an undergraduate. OCR's are 10x harder then anything else ive come across.

    If you want to plow ahead. I would say avoid recaptcha for now. And try things like vbulletin/SMF inbuilt captcha systems. Some of them are just plain words and you should be able to make out the letters possibly even using bitmap functions.

    All depends on the limitations on your language really.
     
  3. audio

    audio Junior Member

    Joined:
    Sep 27, 2008
    Messages:
    157
    Likes Received:
    115
    Lookup Neural Networks
     
  4. Witch

    Witch Newbie

    Joined:
    Dec 27, 2009
    Messages:
    7
    Likes Received:
    0
    Thanks for pyronaut and audio...

    Pyronaut:Ya,OCR is hard,Some more need to use the bitmap function.Totally agreed

    For audio:what is neural network.I search wikipedia...Still have no idea...Never touch this before....
     
  5. audio

    audio Junior Member

    Joined:
    Sep 27, 2008
    Messages:
    157
    Likes Received:
    115
  6. cooooookies

    cooooookies Senior Member

    Joined:
    Oct 6, 2008
    Messages:
    1,008
    Likes Received:
    216
    It is a lot of work writing a captcha decoder and specific to individual captchas as far as my experience tells me. Check my old threads/posts I put some useful information about captcha decoding including code snippets.

    The key to captcha decoding is not the ocr element but image segmentation. Given an image, you must remove arcs, lines, background noise etc resulting in the individual letters. Then the ocr steps follows. For that you can either rely on programs like 'gocr' or 'tesseract' or write your own ocr solver basing on neural networks (e.g. by using weka library). But honestly, if I were you, I would focus on image segmentation and let gocr and friends do the other work. This is effective.
     
  7. Witch

    Witch Newbie

    Joined:
    Dec 27, 2009
    Messages:
    7
    Likes Received:
    0
    audio:Thanks audio for providing the link...

    coooookies:Ok...I will check out your old thread/post very soon.Any way,thanks for the information your provide...Now I know "GOCR".
     
  8. alinator

    alinator Junior Member

    Joined:
    Mar 3, 2012
    Messages:
    147
    Likes Received:
    25
    Location:
    NYC
    search uncaptcha.coder on gmail for recognized images and to contact developer.
     
  9. LazySeoEr

    LazySeoEr Registered Member

    Joined:
    Feb 21, 2012
    Messages:
    95
    Likes Received:
    66
    Neural Networks is

    Info -> Node -> Network -> Node -> Optimize -> Output

    They are meant to be trained to optimized algorithmic patterns by being trained by historical data.

    So if you were to build a Neural Network on Recaptcha, the code logic would look alot like this.

    Captcha -> Analyze -> Output -> Verify -> Yes -> Database Entry Correct Analyze Letters -> Letter Output
    -> No -> Database Entry Incorrect -> Analyze Letter -> Letter Output Stored -> Enter Database "Yes" -> Loop Test -> Verify -> Yes -> Database Entry -> No Database Entry Optimized Settings Set -> Array "No" New Entry Optimization Settings

    So then when it hits a another captcha that it gets incorrect it will go.

    -> Verify -> No -> analyze database -> Entry found -> New Optimization Settings input -> Analyze -> verify ->...

    The network will optimize the database through what they call "learning".

    The other type of network you can build is a "genetic" code- which is something that I don't grasp at all because its not based on historical data and is beyond my knowledge of coding.
     
  10. timmytim

    timmytim Newbie

    Joined:
    Mar 27, 2010
    Messages:
    41
    Likes Received:
    15
    ah the good ole' neural network...I remember trying to build a forex market forecasting bot using neural network but my computer could not go beyond analyzing 6mos worth of data with approx. 100 variables