1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Scraping

Discussion in 'General Programming Chat' started by Nick1, Feb 10, 2012.

  1. Nick1

    Nick1 Junior Member

    Joined:
    Oct 16, 2009
    Messages:
    196
    Likes Received:
    45
    Hello,

    I'm in the process of writing a scraper. The HTTP requests to the site will number in the thousands. What kind of precautions should I take?

    Right now, I plan to route everything through tor with this:http://socksify.rubyforge.org/, but I have no idea what hidden caveats there might be (like DNS leaks).

    I'm not that paranoid and I seriously doubt that anybody is monitoring my internet connection so is digging deeper into preventing a DNS leak a luxury that I should be able to dispense with?
     
    Last edited: Feb 10, 2012
  2. zohar

    zohar Newbie

    Joined:
    Jun 24, 2014
    Messages:
    44
    Likes Received:
    5
    I would route the beaver through Privoxy, its has very advanced configuration options and is multiple platform if I remember correctly.
     
  3. COLDEXE

    COLDEXE Junior Member

    Joined:
    Aug 29, 2013
    Messages:
    104
    Likes Received:
    24
    Location:
    UK
    Cycle a certain amount of HTTP requests with a few hundred private proxies and send randomized browser headers and referrers. I would test the limits first to set your numbers, so if you get blocked after say 100 requests within a minute, you can limit them to around 80. This is how I'd do it anyway, hope it helps