1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Breaking captchas with imageMagick and gocr

Discussion in 'Black Hat SEO' started by cool5now, Feb 11, 2010.

  1. cool5now

    cool5now Registered Member

    Joined:
    Nov 12, 2009
    Messages:
    85
    Likes Received:
    11
    Hi,

    I am trying to write a module that can crack a fairly simple captcha. The one I am focusing on to start with is http://captchas.net/ a grainy monochrome captcha where the individual letters are slightly rotated.

    I've found a way to download hundreds of different images of the same captcha text. i.e. different images but same text.

    The plan is to use imagemagick to clean up the image and then train gocr to convert it to text. Do this 500 times check which is the most popular answer gocr has given me and then send it back to the registration script.

    The problem I'm having is cleaning the image. The best I can do at present is:
    system 'convert ' + $capfile + ' -paint 0.01 -paint 0.01 -paint 0.01 -paint 0.01 -paint 0.01 -paint 0.01 -quality 100 -shave 40x20 ' + $capfile

    This removes the background blocks up the text and trims the borders. But it isn't perfect by any means and when I start to train gocr even I can't guess what letters like e and a are because they've had the middles filled in. I've read about the threshold command but can't seem to get it to work.

    Is there any other image magick commands that could help clean up this file?

    I've read the post at bluehatseo and some other ones on the net but still can't find a way to really clean up the image. If some one could give me a little shove in the right direction I'd be very grateful.
     
  2. cool5now

    cool5now Registered Member

    Joined:
    Nov 12, 2009
    Messages:
    85
    Likes Received:
    11
    Getting better:
    convert blog420.png -colorspace Gray -gaussian-blur 9.0 -level 0,25%,0,1 -depth 1 -paint 0.01 test.png
     
  3. minute80

    minute80 Regular Member

    Joined:
    Dec 3, 2008
    Messages:
    310
    Likes Received:
    81
    This guy has great tutorials about the same subject:

    Code:
    http://www.bluehatseo.com/index.php?s=captcha&submit=Search
    
     
  4. cool5now

    cool5now Registered Member

    Joined:
    Nov 12, 2009
    Messages:
    85
    Likes Received:
    11
    Yeah but programming in C makes my head spin even on simple stuff like this. Thanks for your help though. I got it sorted in the end. If anyone else is interested this is the command I used: system 'convert ' + $capfile + ' -colorspace Gray -gaussian-blur 9.0 -level 0,30%,0,1 -paint 1 -depth 1 -quality 100 -shave 40x20 ' + $capfile

    Trained gocr and have got it up and running. I've tested the program around a hundred times... so far 100% success rate, happy days.
     
  5. dudeshane01

    dudeshane01 Newbie

    Joined:
    Mar 9, 2010
    Messages:
    22
    Likes Received:
    2
    Hi Buddy
    Can this captcha breaker be used with any seo tools too?
    I am really interested to know more about it if it does.
     
  6. Autumn

    Autumn Elite Member

    Joined:
    Nov 18, 2010
    Messages:
    2,197
    Likes Received:
    3,041
    Occupation:
    I figure out ways to make money online and then au
    Location:
    Spamville
    Nice work!

    I used to do what you're doing back in the day but these days I just go with captcha breaking services. At the end of the day if you value your time, I reckon it's cheaper to just pay for captcha breaking and then you can crank through the http part of whatever you want to automate in no time at all. It's a lot of work just to break one captcha and captcha services are super cheap.

    Here are a few my old imagemagick strings, you might see an option you like:
    Code:
    convert penisbot.png -fill white -opaque '#6ecefe' -opaque '#fede6e' -opaque '#72fe6e' penisbot2.gif
    
    
    convert only4porn.jpg -shave 4x4+0+0 -trim +repage -fuzz 3000 -fill black -opaque '#000000' -black-threshold 60000 only4porn2.jpg
    
    
    convert c.gif -crop 26x26+92+2 -trim +repage -fuzz 3000 -fill white -opaque '#cdfe00' -black-threshold 60000 4.gif
    
     
  7. Autumn

    Autumn Elite Member

    Joined:
    Nov 18, 2010
    Messages:
    2,197
    Likes Received:
    3,041
    Occupation:
    I figure out ways to make money online and then au
    Location:
    Spamville
    I assume you're using PHP, so you might get some value out of this too. This code picks the colours from the specified pixels of an image, if you're lucky enough to find a captcha where they try to obfuscate the image with silly colours that don't appear in the text itself. Then you can replace those colours with imagemagick.

    PHP:
    // Original image
    $image 'image.gif';

    // Get background colours from image
    // x and y coordinates to pick a pixel from, for each quarter of the image
    $x_vals = array(5356595);
    $y 4;

    // Create GD image handler
    $my ImageCreateFromGif($image);

    // Retrieve pixel colour of each point
    foreach($x_vals as $x) {
        
    $rgb ImageColorAt($my,$x,$y);

        
    // Transform integer value to rgb
        
    $trans =  imagecolorsforindex($my$rgb);
        
    //echo "{$trans['red']}/{$trans['green']}/{$trans['blue']}\n";

        // Convert rgb values to hex
        // Need to add a zero to the end of each value if strlen==1
        
    $red dechex($trans['red']);
        if(
    strlen($red) == 1) { 
            
    $red $red .'0';
        }
        
    $green dechex($trans['green']);
        if(
    strlen($green) == 1) { 
            
    $green $green .'0';
        }
        
    $blue dechex($trans['blue']);
        if(
    strlen($blue) == 1) { 
            
    $blue $blue .'0';
        }

        
    $bg_cols[] = "#$red$green$blue\n";
    }

    // Destroy image handler
    imagedestroy($my);

    Good luck!

    Edit: damn, just saw the original date of the post. Oh well, hopefully this is of value to someone.
     
    Last edited: Mar 10, 2011