1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Http Requests vs Selenium?

Discussion in 'Programming' started by Google Prince, Apr 26, 2017.

  1. Google Prince

    Google Prince Jr. VIP Jr. VIP

    Joined:
    Dec 24, 2015
    Messages:
    158
    Likes Received:
    89
    Location:
    Google's Search Engine
    So recently I've been toying around with http requests and it's noticeably faster than other library's like selenium, Phantomjs etc.

    My question is when is it really necessary to use one over the other? I know javascript heavy sites can be a easier when using an automated browser but there sooo slooow I rather just use requests.
     
  2. codegray

    codegray Newbie

    Joined:
    Dec 7, 2016
    Messages:
    27
    Likes Received:
    1
    Gender:
    Male
    Selenium is more "human like" and less likely to be flagged. Yes, it is slow :(
     
  3. JerryWoodburn

    JerryWoodburn Newbie

    Joined:
    Oct 23, 2016
    Messages:
    19
    Likes Received:
    0
    Gender:
    Male
    Well,i think the biggest difference is that http request cannot "click" like selenium can. So you cant really automate everything.
     
  4. zigzagtech

    zigzagtech Regular Member

    Joined:
    Jan 2, 2014
    Messages:
    271
    Likes Received:
    28
    Gender:
    Male
    Occupation:
    selling yandex mail
    Location:
    india
    selenium . browser based automation is good.

    not http...
     
  5. codefoxdens

    codefoxdens Newbie

    Joined:
    May 8, 2017
    Messages:
    41
    Likes Received:
    5
    Gender:
    Male
    Exists a big difference between requests and selenium. Each have a purpose.
    Selenium is design for test with browser automation, in resume for html parse, and python requests is design for http request in general.
    So, for use without any browser automation or html parse, use requests, otherwise use selenium.
     
  6. Dan21

    Dan21 Registered Member

    Joined:
    Aug 13, 2016
    Messages:
    68
    Likes Received:
    3
    Gender:
    Male
    The difference is, that many, proper and good sites use javascript for user verification (including recaptcha). It is a logical fallacy that you can't bypass with http proxies. You can't know the outcome on the page unless you run it (javascript). What selenium does is, to render the entire webpage.
     
  7. Javardo69

    Javardo69 Junior Member

    Joined:
    Jul 19, 2014
    Messages:
    102
    Likes Received:
    6
    always go with requests, selenium only last resort if you don't know what you need to do with requests to get where you want. I can count with my hand the websites with javascript that there was no way to scrape without a browser or something running javascript, usually javascript stuff that runs while you're on a login page and goes threw some obfuscated javascript and data gets encrypted and you need that encrypted value to make a requests(i've seen this in amazon and linkedin login, you can login to linkedin without javascript activated). But hey, you can just open selenium with that particular page and go on with requests with the cookie value.
     
  8. LostLife

    LostLife Regular Member

    Joined:
    May 12, 2017
    Messages:
    258
    Likes Received:
    283
    Gender:
    Male
    Occupation:
    Software Engineer
    In short for heavy javascript sites use solenium else use httpwebrequest.
     
  9. The Doctor

    The Doctor Jr. VIP Jr. VIP

    Joined:
    Dec 18, 2010
    Messages:
    875
    Likes Received:
    258
    Occupation:
    Computer Scientist, Engineer, Programmer.
    Location:
    ☆☆☆☆☆☆
    Home Page:
    Use NightmareJS if you're going to do that. Phantom and others are very slow. Nightmare uses Electron which is based on Chromium (Much faster). It's also the highest-level abstraction I've seen which makes it very easy to use. You can run it headless or show the window for debugging.
     
    • Thanks Thanks x 1
  10. MonkeyClaws13

    MonkeyClaws13 Junior Member

    Joined:
    May 26, 2017
    Messages:
    118
    Likes Received:
    27
    Gender:
    Male
    Occupation:
    IM'er
    Location:
    USA
    I usually side with selenium for my web bots.
     
  11. The Doctor

    The Doctor Jr. VIP Jr. VIP

    Joined:
    Dec 18, 2010
    Messages:
    875
    Likes Received:
    258
    Occupation:
    Computer Scientist, Engineer, Programmer.
    Location:
    ☆☆☆☆☆☆
    Home Page:
    doesn't selenium have a ton of overhead?
     
  12. MonkeyClaws13

    MonkeyClaws13 Junior Member

    Joined:
    May 26, 2017
    Messages:
    118
    Likes Received:
    27
    Gender:
    Male
    Occupation:
    IM'er
    Location:
    USA
    Certainly contains more overhead than many other methods of creating web bots, but it does the job easily and works from beginning to end for all my needs.
    I find it very useful for quick development also.
     
  13. The Doctor

    The Doctor Jr. VIP Jr. VIP

    Joined:
    Dec 18, 2010
    Messages:
    875
    Likes Received:
    258
    Occupation:
    Computer Scientist, Engineer, Programmer.
    Location:
    ☆☆☆☆☆☆
    Home Page:
    Did you ever try Nightmare? It's been a dream.
     
  14. MonkeyClaws13

    MonkeyClaws13 Junior Member

    Joined:
    May 26, 2017
    Messages:
    118
    Likes Received:
    27
    Gender:
    Male
    Occupation:
    IM'er
    Location:
    USA
    I haven't tried Nightmare before.
    Does it work with c#, or just javascript?
     
  15. The Doctor

    The Doctor Jr. VIP Jr. VIP

    Joined:
    Dec 18, 2010
    Messages:
    875
    Likes Received:
    258
    Occupation:
    Computer Scientist, Engineer, Programmer.
    Location:
    ☆☆☆☆☆☆
    Home Page:
    Probably just JS but you could probably do some Node bundling setup with Edge. Also, there's TrifleJS. It wouldn't be hard to make some Nightmare wrapper for C#. Idk. I don't code .NET anymore and I hardly ever did (Don't like VMs).
     
  16. Repulsor

    Repulsor Power Member

    Joined:
    Jun 11, 2013
    Messages:
    766
    Likes Received:
    275
    Location:
    PHP Scripting ;)
    Selenium cant really fetch network traffic unless you use some other intermediate setups. PhantomJS can do that too, which is a plus PhantomJS has.

    Both are slow comparing http requests, but you dont get that level of human likeliness using http requests,that you will get when you use a headless browser.
     
  17. MonkeyClaws13

    MonkeyClaws13 Junior Member

    Joined:
    May 26, 2017
    Messages:
    118
    Likes Received:
    27
    Gender:
    Male
    Occupation:
    IM'er
    Location:
    USA
    I have completed some courses on JS some time ago, along with css, html and php. I'd like to put some of these other programming skills to use someday. Would you like to share with me what your coding environment for JS is like?
     
  18. The Doctor

    The Doctor Jr. VIP Jr. VIP

    Joined:
    Dec 18, 2010
    Messages:
    875
    Likes Received:
    258
    Occupation:
    Computer Scientist, Engineer, Programmer.
    Location:
    ☆☆☆☆☆☆
    Home Page:
    Sure. All you have to do is setup a package.json file for Node and use npm to install Nightmare.

    1) Install Node and NPM.
    2) Make directory for your project.
    3) Navigate to the directory and run: npm init
    3) Run: npm install --save nightmare

    Your package.json file should look something like this:

    Code:
    {
      "name": "MyApp",
      "version": "1.0.0",
      "description": "",
      "main": "server.js",
      "scripts": {
        "start": "node server.js",
        "test": "echo \"Error: no test specified\" && exit 1"
      },
      "keywords": [],
      "author": "",
      "license": "ISC",
      "dependencies": {
        "nightmare": "^2.10.0"
      }
    }
    Then edit server.js. This is your main automation script. Due to the async nature of Nightmare, you need to wrap your automation in a function call...

    Code:
    const Nightmare = require('nightmare');
    
    let proxyType = process.argv[2];
    let proxyHost = process.argv[3];
    let proxyPort = process.argv[4];
    let proxyUser = process.argv[5];
    let proxyPass = process.argv[6];
    
    
    function getRandomInt(min, max) {
        return Math.floor(Math.random() * (max - min + 1) + min);
    }
    
    async function yourBot(url, proxyUser, proxyPass, proxyType, screenshot){
        try{
            var socks = '';
            if(proxyType === "socks5" || proxyType === "socks4")
            {
                socks = proxyType + '://';
            }
            const nightmare = Nightmare({
                show:true,
                switches: {
                    'proxy-server': socks + proxyHost + ':' + proxyPort,
                    'ignore-certificate-errors': true
                },
                waitTimeout: 400000
            });
    
            nightmare.authentication(proxyUser, proxyPass);
    
            await nightmare.cookies.clearAll();
            await nightmare.useragent("Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.85 Safari/537.36");
            const response = await nightmare.goto(url);
            let waitTime = getRandomInt(10000,320000);
            // wait some random time and wait or you could wait for some selector on the page
            await nightmare.wait(waitTime);
            if(screenshot != 0) {
                await nightmare.screenshot('tmp/' + screenshot + '.png')
            }
            await nightmare.end(() => "Search bot done.");
        }catch(err){
            throw new Error(err)
        }
    }
    
    yourBot('http://www.example.com/', proxyHost, proxyUser, proxyPass, proxyType, "sshot1").then(console.log).catch(console.log);
    Now create a subdirectory called tmp/ Some of these things are stuff people don't want to show you because they don't want to give away how they make their bots. IDC though.

    Now that you've got that setup you can run from the command line: npm start http 1.1.1.1:8080 proxyUser proxyPass123
    This way, you can control your bot using the command line from .NET or whatever language you want.
     
    • Thanks Thanks x 2