Selenium fingerprint spoofing

Kizianap

Newbie
Joined
Jul 23, 2020
Messages
4
Reaction score
1
Hello guys,

Currently im working on a scraper(newbie). My current stack is python + selenium firefox running on ubuntu 18, and the server is on AWS.
A website keeps blocking me from time to time like 3 out 10 and sometimes recaptcha doesnt load, it seems to be using Incapsula.

So far i did to my bot:
- Human like behaviors
- rotating user agent
- spoof http request
- im using proxy from proxymesh
- random delay timers
- dom.enabled.webdriver, false
- disabled webrtc
- etc for the webdriver configs

Multilogin is not an option for me since it is expensive.
Currently my guesses are that im being block because:
-my proxy has some bad ip pool
-it detects im using linux x86_64(in deviceinfo.me)
-webgl and canvas not spoofed

Im working on team rn. As i've read some post here i realized that we shouldn't be using linux to a site that has bot detector, or am i wrong? All our scraper is made from these stack. Should I invest my time on puppeteer instead? What do you think?
 
Have you tried other proxies? I've noticed I hit a lot more captchas when I'm using subpar proxies. Nothing against proxymesh as I have no experience with them.
 
Try undetected-chromedriver .
selenium.webdriver.Chrome replacement wiht compatiblity for Brave, and other Chromium based browsers. not triggered by CloudFlare/Imperva/hCaptcha and such. NOTE: results may vary due to many factors. No guarantees are given, except for ongoing efforts in understanding detection algorithms.
 
There was a post on hacker news today about this well worth reading the comments

https://news.ycombinator.com/item?id=27648719
The article they're referring to if a blog post on multilogin.

First thing I'd do is work out which anti bot technology the site you're working on is using and then start googling how to mitigate it.
 
I second undetected-chromedriver, not sure how to stop webRTC leaks yet tho, I may be missing something. Please advise if one knows. thnx
 
I don't know all the specifics of why, but it's pretty common knowledge that the best way to be stealthy is by using Puppeteer or Playwright with a detection evasion library. There are apparently some trivial ways to detect if a user is using Selenium. Puppeteer uses the CDP, which basically just sends events to the Chrome browser in the same way your mouse and keyboard do.
 
Back
Top