Scraping

Discussion in 'General Programming Chat' started by Nick1, Feb 10, 2012.

  1. Nick1

    Nick1 Junior Member

    Joined:
    Oct 16, 2009
    Messages:
    196
    Likes Received:
    45
    Hello,

    I'm in the process of writing a scraper. The HTTP requests to the site will number in the thousands. What kind of precautions should I take?

    Right now, I plan to route everything through tor with this:http://socksify.rubyforge.org/, but I have no idea what hidden caveats there might be (like DNS leaks).

    I'm not that paranoid and I seriously doubt that anybody is monitoring my internet connection so is digging deeper into preventing a DNS leak a luxury that I should be able to dispense with?
     
    Last edited: Feb 10, 2012
  2. zohar

    zohar Newbie

    Joined:
    Jun 24, 2014
    Messages:
    44
    Likes Received:
    6
    I would route the beaver through Privoxy, its has very advanced configuration options and is multiple platform if I remember correctly.
     
    • Thanks Thanks x 1
  3. COLDEXE

    COLDEXE Junior Member

    Joined:
    Aug 29, 2013
    Messages:
    104
    Likes Received:
    24
    Location:
    UK
    Cycle a certain amount of HTTP requests with a few hundred private proxies and send randomized browser headers and referrers. I would test the limits first to set your numbers, so if you get blocked after say 100 requests within a minute, you can limit them to around 80. This is how I'd do it anyway, hope it helps