1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Captcha OCR solver?

Discussion in 'General Programming Chat' started by qxxxp, Jul 6, 2011.

  1. qxxxp

    qxxxp Junior Member

    Joined:
    May 3, 2009
    Messages:
    185
    Likes Received:
    82
    Occupation:
    President of Planet Earth
    Location:
    /index.php
    Home Page:
    Hi, I need an automatic captcha solver like the one in xrumer or zenno poster, does anyone know if there are components for automatic captcha solving using ocr?
     
  2. algski

    algski Junior Member

    Joined:
    Nov 26, 2008
    Messages:
    130
    Likes Received:
    49
  3. qxxxp

    qxxxp Junior Member

    Joined:
    May 3, 2009
    Messages:
    185
    Likes Received:
    82
    Occupation:
    President of Planet Earth
    Location:
    /index.php
    Home Page:
  4. blakamia

    blakamia Junior Member

    Joined:
    Jan 25, 2010
    Messages:
    162
    Likes Received:
    343
    Idea in captcha cracking is to clean up image and then run it through an OCR such as Tesseract (my preference) or Gocr. You might also have to train the OCR to work better with whatever font the captcha is using.

    Cleaning up the image refers to removing stuff such as lines that intersect with characters or background noise.

    I recommend you use C#.
    Useful links that I started with(covers almost everything you need):
    http://code.google.com/p/aforge/
    http://www.codeproject.com/KB/dotnet/simple_ocr.aspx
    http://www.brains-n-brawn.com/default.aspx?vDir=aicaptcha

    Good tutorial on neural networks:
    http://www.ai-junkie.com/ann/evolved/nnt1.html

    Note: I started with neural networks, but I find that in a lot of cases templates based off of offsets in the character and with specified margins of error for each character work a lot better. Be aware of the tradeoffs in between the two approaches.
     
  5. kelvin.thechamp

    kelvin.thechamp Power Member

    Joined:
    Apr 4, 2011
    Messages:
    643
    Likes Received:
    272
    Occupation:
    Account selling , Creating Multithread Bot's
    Location:
    http://spamvilla.com
    Use open CV or Open CV c# wrapper or Afroge .net libraries for image processing
    if u want any other language libraries let me know i have good exp in image processing working on Recaptcha Ocr now..
     
  6. chatmasta

    chatmasta Junior Member

    Joined:
    Sep 1, 2007
    Messages:
    122
    Likes Received:
    38
    If you can break it down into decent quality characters yourself, run them through GOCR (google it, I can't post links)
     
  7. awesom-o

    awesom-o Newbie

    Joined:
    Jun 17, 2011
    Messages:
    13
    Likes Received:
    0
    Occupation:
    Computer Engineering Student
    Location:
    Germany
    Is there any class to do this for java or is it time to learn c# now? :)
     
  8. Baybo.it

    Baybo.it Registered Member

    Joined:
    Aug 9, 2011
    Messages:
    72
    Likes Received:
    39
    Occupation:
    Founder of Baybo.it
    Location:
    San Francisco
    Home Page:
    If you're interested in using python, someone was doing their PhD dissertation on cracking captchas. You can find their examples, research, and findings at a website called wausita. (can't give the full url due to newbie status -- sorry)

    A simple pixel by pixel analytics program for images in python can be achieved in only a few lines using the PIL (python image libraries).

    Code:
    from PIL import Image[FONT=monospace]
    [/FONT]im = Image.open("captcha.gif")[FONT=monospace]
    [/FONT]im = im.convert("P")
    print im.histogram()
    
    If you aren't a programmer, I found a question on Stackoverflow about reCaptcha cracks. The page has a submitted answer with 54 likes for a website called captchakiller which provides a captcha breaking API.
    This solution offers a RESTful API (meaning instead of importing a library, you can just curl or make requests to the API via the HTTP protocol.
     
    Last edited: Aug 10, 2011
  9. wu1239

    wu1239 Newbie

    Joined:
    Jun 4, 2011
    Messages:
    16
    Likes Received:
    0
    I love python so i recommend you python
    you can refer to :
    captchacker, a open source project on google code, you can search it.
     
  10. Xooor

    Xooor Newbie

    Joined:
    Aug 14, 2011
    Messages:
    18
    Likes Received:
    17
    I agree you should look into using Python for writing bots.

    Even if you are using another service through a RESTful interface, you can simply create a nice Python Class to encapsulate that using urllib2
     
  11. licorne101

    licorne101 Registered Member

    Joined:
    Aug 22, 2011
    Messages:
    88
    Likes Received:
    118
    These are very useful. Thanks for the introduction. I will try and implement them into my programs.
    I think it probably requires a lot more work than just using the API though. I want to build a learning capability so it improves itself.
     
  12. plumbum416

    plumbum416 Registered Member

    Joined:
    Mar 17, 2011
    Messages:
    93
    Likes Received:
    16
    You could also use a captcha-breaking service. I'm writing a bot in Java right now and for example recaptcha has a top api and gives you the code to use it in nearly every programming language (well at least the established ones, no such exots like "brainfuck" :D )
     
  13. lwelch45

    lwelch45 Junior Member

    Joined:
    Mar 24, 2010
    Messages:
    135
    Likes Received:
    38
    Home Page:
    Use train a neural network to recognize bineized characters(semi hard part) + Char Segmentation(really hard part) + loads of other secret things you must discover on your own because you will not be spoon feed(very easy part)= ocr