1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Quick Questions about Scraping Google

Discussion in 'BlackHat Lounge' started by agag2, Dec 10, 2012.

  1. agag2

    agag2 Supreme Member

    Joined:
    Feb 17, 2009
    Messages:
    1,308
    Likes Received:
    254
    Hello


    I'm working w a programmer to create a tool that will scrape Google search results - but we have encountered several problems (I cannot use scrapebox for this, so I'm creating a custom solution).


    1. If we use Google API we're limited to 100 searches per day. This isn't a good idea.


    If we scrape raw HTML (no API) we're limited to 1,000 searches per day (this is what my programmer claims - is it true? I thought it was 1,000 per hour..)


    For this I believe the only solution is scraping w proxies. Correct?


    2. If querying Google many times @ the same time we get captcha.


    Is there a way around this? What is the max queries we can do w/o getting captcha?


    More specifically, I've used scrapebox and I can query Google tens of thousands of times per hour w a dozen proxies and never get captcha once. So why would we get captcha after querying several times -- simultaneously?


    How do that do it?


    Lastly does scrapebox use HTML scraping or Google API ? (I doubt it uses API - just confirming). And if it does use HTML how does it scrape so fast? My programmer claims that scraping via HTML will be very slow.


    BTW I plan on hitting server several hundred thousand times per day, if not per hour.


    Any help / insight would be greatly appreciated


    Thanks
     
  2. agag2

    agag2 Supreme Member

    Joined:
    Feb 17, 2009
    Messages:
    1,308
    Likes Received:
    254
    Anyone ...?