1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Automating captcha protected forms

Discussion in 'General Programming Chat' started by weedsmoker, Nov 1, 2011.

  1. weedsmoker

    weedsmoker Junior Member

    Joined:
    May 2, 2011
    Messages:
    190
    Likes Received:
    79
    I'm trying to submit form protected with captcha via python bot (using mechanize library).
    Let's say i have 2 pages form.php and captcha.php. I grab the captcha image from captcha.php and manually solve it. But because captcha changes every reload, submitting fails because image retrieved from captcha.php isn't same as one in opened page where form is.
    I found 2 possible solutions to save cookies from captcha.php and save it to browser instance of mechanize class and some vb.net solution where captcha.php is inserted in instance of webbrowser which opened form.php page.
    I don't have a single clue how to implement this in python, and i don't see how can this solve the problem because captcha answer is stored in session variable on server.
    I would like appreciate any hint on how to solve this problem in python or some other language.
     
  2. weedsmoker

    weedsmoker Junior Member

    Joined:
    May 2, 2011
    Messages:
    190
    Likes Received:
    79
    Well, i solved this some time ago, and today i remembered this thread i've started, so i'll post solution if someone has similar problem.
    Code:
    import cookielib
    import mechanize
    import urllib2
    
    headers = [('Accept-Charset', 'ISO-8859-1,utf-8;q=0.7,*;q=0.7'), ('Connection', 'keep-alive'), ('Accept-Language', 'en-gb,en;q=0.5'), ('Accept', 'text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5'), ('User-Agent', 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)')]
    
    html = urllib2.urlopen(form_url).read()
    
    # your code to parse html and extract captcha url
    
    req = urllib2.Request(captcha_url)
    cj = cookielib.MozillaCookieJar()
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
    opener.addheaders = headers
    urllib2.install_opener(opener)
    cj.save('cookies.txt', ignore_discard=True, ignore_expires=True)
    
    # your code to save captcha image or display captcha
    
    cj = cookielib.MozillaCookieJar()
    br = mechanize.Browser()
    br.set_cookiejar(cj)
    br.set_handle_redirect(True)
    br.set_handle_referer(True)
    br.set_handle_equiv(True)
    br.set_handle_robots(False)
    br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
    br.addheaders = [('User-agent', 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)')]
    br.open(form_url)
    cj.load('cookies.txt', ignore_discard=True, ignore_expires=True)
    
    # your mechanize code for form submitting
    
    br.submit()
    
    In this code 2 things are most important:
    1. open captcha url and save cookies to cookies.txt file
    Code:
    cj.save('cookies.txt', ignore_discard=True, ignore_expires=True)
    2. load cookies from cookies.txt after you opened form page
    Code:
    cj.load('cookies.txt', ignore_discard=True, ignore_expires=True)
     
    • Thanks Thanks x 1
  3. different46

    different46 Newbie

    Joined:
    Nov 13, 2010
    Messages:
    11
    Likes Received:
    0
    thanks