1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Getting captcha with HTMLUnit(Java)

Discussion in 'General Programming Chat' started by xpro, May 19, 2012.

  1. xpro

    xpro Regular Member

    Joined:
    Jan 21, 2009
    Messages:
    416
    Likes Received:
    16
    I've decided to give HTMLUnit a try since its supposed to support Javascript, I've decided to make a Hushmail account creator with it as I know Hushmail needs JS in order to work. What I'm stuck at now is dealing with the captcha. Hushmail uses one URL for captcha.

    https://www.hushmail.com/signup/turingimage?hush_domain=hushmail.com

    I looked at this in Fiddler and whenever that page is visited cookies are placed in the browser. So I'm thinking all I need to do is go to that page with HTTPUnit so it can get its cookies and the image itself.

    webClient.getPage("https://www.hushmail.com/signup/turingimage?hush_domain=hushmail.com");

    I Googled this but I can't find a way to actually save that onto my hard disk. Any idea how to save that image and at the same time get the cookies the page responses with?
     
  2. ionutcib

    ionutcib Junior Member

    Joined:
    Feb 10, 2011
    Messages:
    116
    Likes Received:
    8
    Occupation:
    Java Programmer
    webClient.getPage(...) returns a HtmlPage object that has a void save(File file) method that saves the HTML page including images. Let me know if this works in this case.
     
  3. skrode

    skrode Junior Member

    Joined:
    Nov 13, 2011
    Messages:
    103
    Likes Received:
    16
    you can use following code to save captcha image

    Code:
    ImageIO.write(ImageIO.read(new URL("https://www.hushmail.com/signup/turingimage?hush_domain=hushmail.com")), "jpg", new File("C:\\captcha.jpg"));
    
    and for cookies: webClient.getCookieManager().getCookies();
     
    Last edited: Jun 13, 2012
  4. ionutcib

    ionutcib Junior Member

    Joined:
    Feb 10, 2011
    Messages:
    116
    Likes Received:
    8
    Occupation:
    Java Programmer
    You have to keep in mind that using ImageIO method you access that link twice...so the initial captcha will be different than the captcha loaded by the webClient.getPage(...) but maybe this should not be a problem...you should solve just the second(the saved) captcha probably.