1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How to automate downloading with a no gui web browser?

Discussion in 'General Programming Chat' started by foozoor, Nov 16, 2014.

  1. foozoor

    foozoor Newbie

    Joined:
    Nov 16, 2014
    Messages:
    11
    Likes Received:
    0
    Hello the blackhatworld community

    I need a way to automate downloading files with a full featured no gui browser.
    Something like phantomjs but without the downloading issues.

    A little example:

    • Login to the angular based website.
    • I navigate to a webpage where the download link is.
    • From xpath/css selector I get the download link.
    • Click the link and save the response as a file like a normal web browser without needing to substring the filename and so on.
    I need a full featured headless/no gui browser because of javascript rendering (angular spa) and the file is protected by cookies and need to be authenticated.

    Concrete examples like on github gist are welcome.
     
  2. MrBlue

    MrBlue Senior Member

    Joined:
    Dec 18, 2009
    Messages:
    950
    Likes Received:
    662
    Occupation:
    Web/Bot Developer
    Take a look at CasperJS. There is a download() method. CasperJS is built on top of PhantomJS

    From the docs:
    Code:
    http://docs.casperjs.org/en/latest/modules/casper.html#download
    Another option would be to use PhantomJS to scrape the download links off the Angular app and into a text file then just have a bash script iterate through that list using wget or curl to pull down the files.
     
    Last edited: Nov 16, 2014
  3. foozoor

    foozoor Newbie

    Joined:
    Nov 16, 2014
    Messages:
    11
    Likes Received:
    0
    PhantomJS / CasperJS have an issue with big file downloads! :(
    Just try to download a big file like a linux distribution and you will get 0 byte file.
    Moreover, the download() method needs a target argument...

    I tried Awesomium and CefSharp because they are very easy to install with nuget.
    But I can't find tutorial or code example for what I need.
     
    Last edited: Nov 16, 2014
  4. sockpuppet

    sockpuppet Junior Member

    Joined:
    Nov 7, 2011
    Messages:
    155
    Likes Received:
    145
    you have to patch phantomjs to support downloading files that the browser can't display
    look at this pull request on github: https://github.com/ariya/phantomjs/pull/11557
     
    • Thanks Thanks x 1
  5. foozoor

    foozoor Newbie

    Joined:
    Nov 16, 2014
    Messages:
    11
    Likes Received:
    0
    Patching qt based application is out of my skills.
    This pull request is very old, that's strange! If it works, why they don't use this patch in the master branch?


    Nobody would have alternatives to phantomjs/casperjs?
     
  6. k0d3r

    k0d3r Newbie

    Joined:
    Feb 17, 2013
    Messages:
    36
    Likes Received:
    28
    Location:
    Keyboard
    I think this can be done using Python (requests + BeautifulSoup), I would have to see the page to make sure.
    Just google for requests and Beautifulsoup and there is plenty examples.
    If you decide to use python and have any doubt drop me a PM.
     
  7. xNotch

    xNotch Registered Member

    Joined:
    Sep 16, 2014
    Messages:
    69
    Likes Received:
    15
    I assume your trying to avoid the detection of your bot. Hence the reason you using a full browser. Since I don't think I had the skills to patch a qt program either here's how id go about it:

    1. Use the browser to log into the site and grab the link needed to download what you want
    2. Pass the cookies from the browser off to the programming language of your choice
    3. download the file using the languages networking library

    If you do it right then it should be indistinguishable from doing it all from a browser.

    As an example lets say I use python. So first I'd use phantomjs to do all the login stuff then pass the information to urllib2 and download the file using urllib2.

    Sure as hell beats having to patch phantomjs.

    notes: remember to set the User-Agent in the networking library as well.