1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Python, Selenium, CAPTCHA Download

Discussion in 'Other Languages' started by madoctopus, Oct 31, 2012.

  1. madoctopus

    madoctopus Supreme Member

    Joined:
    Apr 4, 2010
    Messages:
    1,250
    Likes Received:
    3,515
    Occupation:
    Full time IM
    Anybody has an idea how I can save a captcha image using selenium (in python or anything else)? Problem is the image is generated only once and if I reload the URL I get a HTTP error. So I need to save the image from the page (from Firefox cache). Problem is, if I fire an ActionChains on the context menu to do Save As, it pops an OS dialog that can't be handled by Selenium. Is there a way with selenium to save that image? I know I could use AutoIt or setup Firefox profile not to load images and load it myself in another WebDriver window, but I'm hoping there's a more elegant way.
     
  2. Grizzy

    Grizzy Senior Member

    Joined:
    Nov 11, 2008
    Messages:
    919
    Likes Received:
    1,001
    I haven't used selenium very much at all... but can it expose xpcom components so you can run privileged js? I assume it does, so maybe check out a function called "internalSave" (chrome://global/content/contentAreaUtils.js).

    A while ago I rigged together a little botting framework using virtual machines, ff profiles, and mozrepl. I used this function to silently save captcha images from memory to the default downloads directory:
    Hope it helps.
     
    • Thanks Thanks x 1
  3. cgimaster

    cgimaster Power Member

    Joined:
    Jun 30, 2012
    Messages:
    525
    Likes Received:
    311
    Gender:
    Male
    You could take a screenshot of the page and cut it on the position the captcha is :)

    Code:
    sel.capture_entire_page_screenshot("full_page_ss.png", "")
     
    • Thanks Thanks x 1
  4. madoctopus

    madoctopus Supreme Member

    Joined:
    Apr 4, 2010
    Messages:
    1,250
    Likes Received:
    3,515
    Occupation:
    Full time IM
    @Grizzy: i will look into that but I am not sure if it can do that.
    @cgimaster: I could do that but is not very elegant. Hoped there's a way in selenium to do this without over complicating myself.
     
  5. matessim

    matessim Junior Member

    Joined:
    Nov 22, 2008
    Messages:
    164
    Likes Received:
    73
    Occupation:
    Being funny and kind to puppies
    Location:
    UT 2003
    pssh, this is pretty clever. :)
     
  6. Question

    Question Registered Member

    Joined:
    Aug 14, 2011
    Messages:
    51
    Likes Received:
    32
    Just if anyone would need to do the same thing, here is how we do it. Just find captcha image src field, fetch contents of that url locally in some tmp folder and there you go, you can solve your captcha.
     
    • Thanks Thanks x 1
  7. pedrosilva

    pedrosilva Registered Member

    Joined:
    Feb 27, 2013
    Messages:
    51
    Likes Received:
    1
    Thats the solution, you can find some code in web to do this. (Cannot post urls :x sorry) In stackoverflow -> download-image-file-from-the-html-page-source-using-python
    Selenium isn't the answer for this problem.
     
  8. jazzc

    jazzc Moderator Staff Member Moderator Jr. VIP

    Joined:
    Jan 27, 2009
    Messages:
    2,513
    Likes Received:
    10,467
    On word: canvas.
     
  9. madoctopus

    madoctopus Supreme Member

    Joined:
    Apr 4, 2010
    Messages:
    1,250
    Likes Received:
    3,515
    Occupation:
    Full time IM
    this is a very old thread. thanks for the replies but found a workaround meanwhile.
     
  10. gustawgustaw

    gustawgustaw Newbie

    Joined:
    Apr 13, 2013
    Messages:
    31
    Likes Received:
    6
    Madoctopus, please tell what workaround you found.
     
  11. weedsmoker

    weedsmoker Junior Member

    Joined:
    May 2, 2011
    Messages:
    190
    Likes Received:
    79
    I had a problem fetching dynamic captcha from some sites, so I completed my solution for saving "elementshot" of given DOM element. You only need to pass the reference to DOM element (eg. 'document.getElementByID("captcha")') and script will fetch and save jpg image of element (captcha image) calculating element position and dimension. It's not very elegant but it works :)
    I'm using it with mozrepl, but probably it will work with selenium/webdriver with some minor changes.

    Code:
    function saveElementShot(filename, element) {
        var wm = Components.classes["@mozilla.org/appshell/window-mediator;1"].getService(Components.interfaces.nsIWindowMediator);
        var recentWindow = wm.getMostRecentWindow("navigator:browser");
        var canvas = recentWindow.document.createElementNS('http://www.w3.org/1999/xhtml', 'canvas');
        var tabbrowser = recentWindow.getBrowser();
        var browser = tabbrowser.getBrowserAtIndex(0);
        var win = browser.contentWindow;
        var width = win.document.body.clientWidth;
        var height = win.document.body.clientHeight;
        var bcr = eval("win." + element + ".getBoundingClientRect()");
        width = parseInt(bcr.width);
        height = parseInt(bcr.height);
        var top = parseInt(bcr.top);
        var left = parseInt(bcr.left);
        canvas.width = width;
        canvas.height = height;
        var ctx = canvas.getContext('2d');
        ctx.clearRect(left, top, canvas.width, canvas.height);
        ctx.save();
        ctx.scale(1.0, 1.0);
        ctx.drawWindow(win, left, top, width, height, 'rgb(255,255,255)');
        ctx.restore();
        //var dataUrl = canvas.toDataURL("image/png");
        var dataUrl = canvas.toDataURL("image/jpeg");
        var nsFile = Components.classes["@mozilla.org/file/local;1"].createInstance(Components.interfaces.nsILocalFile);
    
    
        try {
            nsFile.initWithPath(filename);
        }
        catch (e) {
            if (/NS_ERROR_FILE_UNRECOGNIZED_PATH/.test(e.message)) {
                if (filename.indexOf('/') != -1) {
                    filename = filename.replace(/\//g, '\\');
                }
                else {
                    filename = filename.replace(/\\/g, '/');
                }
                nsFile.initWithPath(filename);
            }
            else {
                throw e;
            }
        }
    
    
        var SGNsUtils = {
            dataUrlToBinaryInputStream: function(dataUrl) {
                var nsIoService = Components.classes["@mozilla.org/network/io-service;1"]
                    .getService(Components.interfaces.nsIIOService);
                var channel = nsIoService
                    .newChannelFromURI(nsIoService.newURI(dataUrl, null, null));
                var binaryInputStream = Components.classes["@mozilla.org/binaryinputstream;1"]
                    .createInstance(Components.interfaces.nsIBinaryInputStream);
                
                binaryInputStream.setInputStream(channel.open());
                return binaryInputStream;
            },
            
            newFileOutputStream: function(nsFile) {
                var writeFlag = 0x02; // write only
                var createFlag = 0x08; // create
                var truncateFlag = 0x20; // truncate
                var fileOutputStream = Components.classes["@mozilla.org/network/file-output-stream;1"].createInstance(Components.interfaces.nsIFileOutputStream);
                    
                fileOutputStream.init(nsFile,
                                      writeFlag | createFlag | truncateFlag,
                                      0664,
                                      null);
                return fileOutputStream;
            },
            
            writeBinaryInputStreamToFileOutputStream: function(binaryInputStream, fileOutputStream) {
                var numBytes = binaryInputStream.available();
                var bytes = binaryInputStream.readBytes(numBytes);
                fileOutputStream.write(bytes, numBytes);
            }
        };
    
    
        var binaryInputStream = SGNsUtils.dataUrlToBinaryInputStream(dataUrl);
        var fileOutputStream = SGNsUtils.newFileOutputStream(nsFile);
        SGNsUtils.writeBinaryInputStreamToFileOutputStream(binaryInputStream, fileOutputStream);
        fileOutputStream.close();
    }
    
     
    Last edited: Nov 7, 2013