1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

logic to get and input captcha as string in python

Discussion in 'General Programming Chat' started by msimurin, Dec 29, 2009.

  1. msimurin

    msimurin Regular Member

    Joined:
    Sep 21, 2009
    Messages:
    243
    Likes Received:
    92
    I am trying to figure what library to use for automating lets say form posting for google gmail accont and how would the logic be to use decaptcher for example, use it with library, i dont get it...
     
  2. thaorius

    thaorius Junior Member

    Joined:
    Aug 19, 2008
    Messages:
    109
    Likes Received:
    33
    You are going to have to be more specific, what have you tried? What did work? What is it you don't get, exactly?
     
  3. msimurin

    msimurin Regular Member

    Joined:
    Sep 21, 2009
    Messages:
    243
    Likes Received:
    92
    I figured form posting and submitting and thanks for advices you gave me before, to use mechanize ;)

    My first problem is figuring what i need to download from decaptcher.com, they claim to use python api but i dont see it on their site inside member area, second i am trying to implement this to mechanize - solve captcha, input it and normally submit form...
     
  4. Eldalar

    Eldalar Newbie

    Joined:
    Dec 12, 2009
    Messages:
    21
    Likes Received:
    15
    Well, I just skipped the decaptcher interface and instead used the command line version of decaptcher(Command Line API), launching it with the right parameters, then the output string gets saved into a txt file, I think it was named "answer.txt". Although I don't use python but C++ I think you can do it the same way in python.
     
    • Thanks Thanks x 1
  5. msimurin

    msimurin Regular Member

    Joined:
    Sep 21, 2009
    Messages:
    243
    Likes Received:
    92
    thanks Eldalar i will try to cook that, more thoughts on Python in this matter would be appreciated.
     
  6. thaorius

    thaorius Junior Member

    Joined:
    Aug 19, 2008
    Messages:
    109
    Likes Received:
    33
    I just checked decaptcher and they indeed don't have the claimed Python module.

    Command line aside, which I find to be a dirty-hackish solution, I see 2 options:
    1) You download the source code in a different language, like PHP, and you write an implementation in Python.
    2) You write an API for decaptcher's web interface, which is described in their Downloads page.

    The said HTTP interface is as simple as an HTTP post with some variables. I'm sure you can pull that off with mechanize itself, or maybe using something like httplib[2].

    Here you have some examples on httplib2: http://code.google.com/p/httplib2/wiki/Examples
    And the poster module will help to encode the captcha image: http://atlee.ca/software/poster/
     
    • Thanks Thanks x 1
  7. msimurin

    msimurin Regular Member

    Joined:
    Sep 21, 2009
    Messages:
    243
    Likes Received:
    92
    i really dont get it, yet. but thanks for your help thaorius
     
  8. thaorius

    thaorius Junior Member

    Joined:
    Aug 19, 2008
    Messages:
    109
    Likes Received:
    33
    I couldn't work on anything "hard" since the family arrived a couple hours ago, so I figured I would do something at least useful.

    Very simple, based on the examples from the poster module.

    Code:
    # -*- coding: utf-8 -*-
    
    import urllib2
    
    from poster.encode import multipart_encode
    from poster.streaminghttp import register_openers
    
    USERNAME = ''
    PASSWORD = ''
    
    # Decaptcher HTTP API URL
    SERVICE_URL = 'http://decaptcher.com/poster/'
    
    # Decaptcher Constants
    ERROR_OK = 0
    ERROR_GENERAL = -1
    ERROR_STATUS = -2
    ERROR_NET_ERROR = -3
    ERROR_TEXT_SIZE = -4
    ERROR_OVERLOAD = -5
    ERROR_BALANCE = -6
    ERROR_TIMEOUT = -7
    ERROR_UNKNOWN = -200
    TIMEOUT_DEFAULT = 0
    TIMEOUT_LONG = 1
    TIMEOUT_30SECONDS = 2
    TIMEOUT_60SECONDS = 3
    TIMEOUT_90SECONDS = 4
    TYPE_UNSPECIFIED = 0
    STATUS_INIT = 1
    STATUS_LOGIN = 2
    STATUS_HASH = 3
    STATUS_PICTURE = 4
    
    register_openers()
    
    def solve(path):
        extra = {
            'pict' : open(path, 'rb'),
            'pict_to' : '0',
            'pict_type' : '0'
        }
        out = _call('picture2', extra)
        out = out.split('|')
    
        return out
    
    def bad(major_id, minor_id):
        extra = {'major_id' : major_id, 'minor_id' : minor_id}
        out = _call('picture_bad2', extra)
        out = out.split('|')
    
        return out
    
    def balance():
        return _call('balance')
    
    def _call(function, extra = {}):
        data = {
            'function': function,
            'username': USERNAME,
            'password': PASSWORD
        }
        data.update(extra)
    
        datagen, headers = multipart_encode(data)
        request = urllib2.Request(SERVICE_URL, datagen, headers)
        content =  urllib2.urlopen(request).read()
    
        return content
    Assuming the above code is in a file named decaptcher.py, and you are using it from within the same directory, you would use it like this:
    Code:
    import decaptcher
    decaptcher.USERNAME='johndoe'
    decaptcher.PASSWORD='p4ssw0rd'
    
    # What's my balance?
    print decaptcher.balance()
    
    # Solve an image
    out = decaptcher.solve('/tmp/captcha245.jpeg')
    
    if int(out[0]) != decaptcher.ERROR_OK:
        print "Error"
    else:
        # Do something with the image here.
        # Say the image was badly recognized...
        decaptcher.bad(out[1], out[2])
    That should do it :).
     
    • Thanks Thanks x 1
  9. msimurin

    msimurin Regular Member

    Joined:
    Sep 21, 2009
    Messages:
    243
    Likes Received:
    92
    amazing, thanks mate!

    out as '/tmp/captcha245.jpeg' is just example so i would probably need to solve this with regular expression for random captcha urls, aint that right?