Hello guys,
Currently im working on a scraper(newbie). My current stack is python + selenium firefox running on ubuntu 18, and the server is on AWS.
A website keeps blocking me from time to time like 3 out 10 and sometimes recaptcha doesnt load, it seems to be using Incapsula.
So far i did to my bot:
- Human like behaviors
- rotating user agent
- spoof http request
- im using proxy from proxymesh
- random delay timers
- dom.enabled.webdriver, false
- disabled webrtc
- etc for the webdriver configs
Multilogin is not an option for me since it is expensive.
Currently my guesses are that im being block because:
-my proxy has some bad ip pool
-it detects im using linux x86_64(in deviceinfo.me)
-webgl and canvas not spoofed
Im working on team rn. As i've read some post here i realized that we shouldn't be using linux to a site that has bot detector, or am i wrong? All our scraper is made from these stack. Should I invest my time on puppeteer instead? What do you think?
Currently im working on a scraper(newbie). My current stack is python + selenium firefox running on ubuntu 18, and the server is on AWS.
A website keeps blocking me from time to time like 3 out 10 and sometimes recaptcha doesnt load, it seems to be using Incapsula.
So far i did to my bot:
- Human like behaviors
- rotating user agent
- spoof http request
- im using proxy from proxymesh
- random delay timers
- dom.enabled.webdriver, false
- disabled webrtc
- etc for the webdriver configs
Multilogin is not an option for me since it is expensive.
Currently my guesses are that im being block because:
-my proxy has some bad ip pool
-it detects im using linux x86_64(in deviceinfo.me)
-webgl and canvas not spoofed
Im working on team rn. As i've read some post here i realized that we shouldn't be using linux to a site that has bot detector, or am i wrong? All our scraper is made from these stack. Should I invest my time on puppeteer instead? What do you think?