Scraping

Nick1 · Feb 10, 2012

Hello,

I'm in the process of writing a scraper. The HTTP requests to the site will number in the thousands. What kind of precautions should I take?

Right now, I plan to route everything through tor with this:http://socksify.rubyforge.org/, but I have no idea what hidden caveats there might be (like DNS leaks).

I'm not that paranoid and I seriously doubt that anybody is monitoring my internet connection so is digging deeper into preventing a DNS leak a luxury that I should be able to dispense with?

zohar · Jun 24, 2014

I would route the beaver through Privoxy, its has very advanced configuration options and is multiple platform if I remember correctly.

COLDEXE · Jun 27, 2014

Cycle a certain amount of HTTP requests with a few hundred private proxies and send randomized browser headers and referrers. I would test the limits first to set your numbers, so if you get blocked after say 100 requests within a minute, you can limit them to around 80. This is how I'd do it anyway, hope it helps

Bruchetta · Dec 21, 2018

I guess test how many requests it takes to get your IP banned, then get appropriate number of proxies. Also spoof useragent, headers. Add random delays between requests, perhaps spoof the referrers. As for the DNS leak, why would it matter? It's not illegal to scrape websites, they just dont like you doing it.

Scraping

Nick1

Junior Member

zohar

Newbie

COLDEXE

Junior Member

Bruchetta

Newbie

Main Menu

Marketplace

Making Money

BlackHat World