1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Best way to scrape 400,000 urls from Google Search Results

Discussion in 'Black Hat SEO' started by AgentZero, Nov 25, 2014.

  1. AgentZero

    AgentZero Jr. VIP Jr. VIP Premium Member

    Joined:
    Jul 10, 2008
    Messages:
    393
    Likes Received:
    83
    Occupation:
    Self Employed
    Location:
    Playboy Mansion
    Hi i have a list of 20,000keywords and want to scrape the top 20 ranking urls (US-BASED) for each of those urls and compile all the URLs in a massive list. So the URL list would be around 400,000 in total.

    Do you know any programs which can do this?

    I've looked at GScraper but it can't do it for US based results, nor does it give you the top ranking URLs in order.

    Scrapebox is probably my best bet, but i would probably need a shit load of proxies.

    Any other suggestions on how to do this?

    Cheers,
    AZ
     
  2. rob1977

    rob1977 Power Member

    Joined:
    Mar 25, 2013
    Messages:
    773
    Likes Received:
    666
    I would go with scrapebox but you have hit the nail on the head, proxies are going to be your issue.
     
  3. saadad

    saadad Junior Member

    Joined:
    Feb 25, 2009
    Messages:
    168
    Likes Received:
    22
    Home Page:
    Isnt google providing you api searches, like 50 000 000 for Youtube i know is free daily querry. So maybe someone can make you a software to scrape this, someone like me.
     
  4. himanuzo

    himanuzo Supreme Member

    Joined:
    Feb 10, 2010
    Messages:
    1,342
    Likes Received:
    277
    Location:
    Asia
    Home Page:
    ScrapeBox is the answer. You need to search good proxies for it.
     
  5. FBGuru

    FBGuru Senior Member

    Joined:
    Sep 22, 2013
    Messages:
    928
    Likes Received:
    1,171
    Location:
    Personality Type : ESTP
    You can get this done through SEMRush's API. I've written a detailed post for a similar task here : http://www.blackhatworld.com/blackhat-seo/white-hat-seo/701986-how-get-8-000-000-keywords-semrush.html#post7189639

    Their phrase_organic parameter gets you top 20 urls for any given keyword and you can also filter it only for (US) Database.

    Code:
    http://api.semrush.com/?type=phrase_organic&key=INSERTYOURAPIKEYHERE&display_limit=20&export_columns=Dn,Ur&phrase=blackhat&database=us
    
    It's going to cost you 4 Million API Units(10 Api Units per line) to scrape 400k keywords.

    Code:
    http://www.semrush.com/api_products.html
    
    You can buy 4x light packages for $200 which will get you 4 million API Units to play with. Feel free to drop me a PM if you need any help/assistance.
     
    • Thanks Thanks x 1
  6. stugz

    stugz Junior Member

    Joined:
    Apr 14, 2013
    Messages:
    154
    Likes Received:
    33
    Single threaded and a search every 5 seconds will take about 28 hours - you only have 20,000 searches to do and not as suggested above 400,000. You could even pause for a few seconds randomly after each search to help you beat Google rate limiting you. Or get a few proxies and go single threaded through each of them and it will be a bit quicker.
     
    • Thanks Thanks x 1