Requests vs Selenium

Status
Not open for further replies.

JackTheRooster

Registered Member
Joined
Apr 19, 2018
Messages
50
Reaction score
9
Hello all!

I've been scripting for about a year, and have been doing it only via Python requests. It works, but I can't help but wonder if a headless browser is where it's at. The problem is that it uses a ton more resouyrces, so is more expensive to host.

Is this a valid concern? Should I stick with requests? How do you guys do it?
 
Always stick with requests when you can.
Faster, less resource intensive, multithreading won't be an issue, etc.
One downside of requests is that JS won't be interpreted, but it does in browser (selenium).
 
Huh!? You can scrape a heavy javascript dependent site with request .. something like a casino site that does real time update.
You need a headless browser like selenium.

But in my case .. "requests" first .. selenium is a last resort.
 
I agree with the dudes above. Requests first, render page only if you have to.
 
Thank you! How do you deal with all the HTTP calls that the Javascript fires off? Often this will pull in info for the user experience. Sure many of them are useless in terms of your end goal, but is it important to emulate them? I.E. Will the site give a damn if you don't do all those useless (from your perspective) calls?

I've noticed that quite a few sites have you send a post request with fingerprint info. Is it necessary to emulate this, or will the site simply assume you have Javascript turned off?
 
Thank you! How do you deal with all the HTTP calls that the Javascript fires off? Often this will pull in info for the user experience. Sure many of them are useless in terms of your end goal, but is it important to emulate them? I.E. Will the site give a damn if you don't do all those useless (from your perspective) calls?

I've noticed that quite a few sites have you send a post request with fingerprint info. Is it necessary to emulate this, or will the site simply assume you have Javascript turned off?
It really depends on what your goal is.

True, usually the auxiliary stuff is rendered, but not necessarily.
The site could or could not give a damn about those useless request. In most cases, they don't give a damn, but I could build you a prototype website that cares about those,
and if not done, get your blocked.

For the fingerprinting, same as above, some care some don't. In this days, if JS is turned off into browser, most site functionality won't work and devs don't bother with that so much
anymore since most devices can run a browser with JS.
 
I used to only do requests, having to use fiddler to see what was getting passed was a pain if it was fairly elaborate, having to emulate what was sent etc, then i thought why not use a headless browser (as a browser is what sites want to see coming), not as fast as requests, but visually you can see what is going on better, plus it takes care of all the Javascript loading issues, i use:

1) - https://www.katalon.com/resources-center/blog/katalon-automation-recorder/

This addon records your actions on any site, then you can export in C#/Java/Python your recorded actions which is pretty neat and saves a bit of time.
 
It really depends on what your goal is.

True, usually the auxiliary stuff is rendered, but not necessarily.
The site could or could not give a damn about those useless request. In most cases, they don't give a damn, but I could build you a prototype website that cares about those,
and if not done, get your blocked.

For the fingerprinting, same as above, some care some don't. In this days, if JS is turned off into browser, most site functionality won't work and devs don't bother with that so much
anymore since most devices can run a browser with JS.
Yes, that's about what I figured. It seems that you don't have to be perfect, just close enough.
 
I used to only do requests, having to use fiddler to see what was getting passed was a pain if it was fairly elaborate, having to emulate what was sent etc, then i thought why not use a headless browser (as a browser is what sites want to see coming), not as fast as requests, but visually you can see what is going on better, plus it takes care of all the Javascript loading issues, i use:

1) - https://www.katalon.com/resources-center/blog/katalon-automation-recorder/

This addon records your actions on any site, then you can export in C#/Java/Python your recorded actions which is pretty neat and saves a bit of time.
Thank you for the Katalon recommendation. Will def check it out as exporting exact actions is awesome.
 
Something I'm wondering, as I've been only using Selenium for now.
To automate anything on YouTube, as I don't see the elements I want to play with rendered, does it mean that I won't be able to use requests?
 
Something I'm wondering, as I've been only using Selenium for now.
To automate anything on YouTube, as I don't see the elements I want to play with rendered, does it mean that I won't be able to use requests?

I have not automated YouTube yet, but i would say requests are definately out, too much JavaScript/Hidden frames going on, Selenium would be the best way i would say :)

regards
 
Hello all!

I've been scripting for about a year, and have been doing it only via Python requests. It works, but I can't help but wonder if a headless browser is where it's at. The problem is that it uses a ton more resouyrces, so is more expensive to host.

Is this a valid concern? Should I stick with requests? How do you guys do it?
I don't think the script taking up "lots of resources" is a valid concern. Hardware is cheaper than it ever was, generally. If the program utilizes too much memory, it might be a problem with the settings/code. Either there's some memory leak or there's too many threads.
 
I don't think the script taking up "lots of resources" is a valid concern. Hardware is cheaper than it ever was, generally. If the program utilizes too much memory, it might be a problem with the settings/code. Either there's some memory leak or there's too many threads.
"Lots of resources" is relative. I agree, memory leaks should be plugged before throwing your hands up and getting more hardware.

Emulating a browser is always going to take far more resources than simply doing a requests call. With requests, you can run hundreds of threads on a Raspberry Pi. With selenium, you're lucky to get 10.
 
Emulating a browser is always going to take far more resources than simply doing a requests call. With requests, you can run hundreds of threads on a Raspberry Pi. With selenium, you're lucky to get 10.
Yupp agreed. This is why it is important to choose the right tool for the job. A hammer can break eggs but that's not always necessary. ;)
 
Yupp agreed. This is why it is important to choose the right tool for the job. A hammer can break eggs but that's not always necessary. ;)
I'm convinced. I'll go with a mixed approach because I've wasted enough time deciphering Javascript that's been minified trying to figure out wtf is going on.
 
There can be another possible solution, which is to write browser extension. Works like a charm
 
There can be another possible solution, which is to write browser extension. Works like a charm
Browser extensions would just take the same amount of resources as Selenium, right?
Why adding an additional layer when Selenium could make the job?
 
Browser extensions would just take the same amount of resources as Selenium, right?
Why adding an additional layer when Selenium could make the job?
Exactly. Browser extension might be an easy solution, but it's not an effective one imho.
 
Browser extensions would just take the same amount of resources as Selenium, right?
Why adding an additional layer when Selenium could make the job?
You can't run extensions when Chrome is in headless mode. Also, you can't use authed proxies easily either.
 
Status
Not open for further replies.
Back
Top