1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Scrape / harvest Google search results

Discussion in 'General Programming Chat' started by jamesyboy, Sep 12, 2011.

  1. jamesyboy

    jamesyboy Regular Member

    Joined:
    Apr 4, 2011
    Messages:
    213
    Likes Received:
    21
    Wanting a script that'll scrape urls from searches on google / adcenter.

    I've tried Google URL Harvester in Firefox (using Grease Monkey) but that doesn't seem to work.
     
  2. licorne101

    licorne101 Registered Member

    Joined:
    Aug 22, 2011
    Messages:
    88
    Likes Received:
    118
    A script wouldn't work as it will have many limitations. You will require a software with proxy support.
     
  3. kkll78

    kkll78 Junior Member

    Joined:
    May 16, 2011
    Messages:
    130
    Likes Received:
    20
    I am watching this because I have the same question and issue.
     
  4. OnlineGodfather

    OnlineGodfather Senior Member

    Joined:
    Mar 3, 2010
    Messages:
    1,117
    Likes Received:
    407
    Occupation:
    Interwebs
    Location:
    Russia
  5. jamesyboy

    jamesyboy Regular Member

    Joined:
    Apr 4, 2011
    Messages:
    213
    Likes Received:
    21
    I wasn't wanting to pay for a service.

    Closest I've got is copying and pasting into excel, then running a macro to extract the URLs. It's not a great method because you have to go through the same process for each page of google and search engines to get the top 1000 domains.
     
  6. xenon2010

    xenon2010 Regular Member

    Joined:
    Apr 27, 2010
    Messages:
    231
    Likes Received:
    48
    Occupation:
    web and desktop apps programmer
    Location:
    prison
    Home Page:
    if you know C# this can be done easily... simple webbrwoser control and some regex commands will give you great results... you don't need any proxies though.. just simple delays between 2 searches could save you the trouble of getting banned from google..
     
  7. criticalmess

    criticalmess Regular Member

    Joined:
    Feb 7, 2009
    Messages:
    237
    Likes Received:
    210
    jamesyboy, sent you a PM.

    Thanks
     
  8. LukesDad

    LukesDad Junior Member

    Joined:
    Oct 24, 2009
    Messages:
    135
    Likes Received:
    71
    Location:
    Düsseldorf
    Home Page:
    Can't this be done with scrapebox?
     
  9. licorne101

    licorne101 Registered Member

    Joined:
    Aug 22, 2011
    Messages:
    88
    Likes Received:
    118
    Yes it can. Easily. Many software can do this.
     
  10. Subject

    Subject Power Member

    Joined:
    Dec 26, 2007
    Messages:
    656
    Likes Received:
    303
    Location:
    Living With Articles!!
    I sometimes use AllSubmitter for scraping google results..
     
  11. LukesDad

    LukesDad Junior Member

    Joined:
    Oct 24, 2009
    Messages:
    135
    Likes Received:
    71
    Location:
    Düsseldorf
    Home Page:
    Of course it can. Forgot the irony tag:rolleyes:
     
  12. gettinthere

    gettinthere Regular Member

    Joined:
    Apr 17, 2010
    Messages:
    364
    Likes Received:
    58
    I'm also looking for something similar - an advance "google alerts" (& yes; I'm happy to pay for it; software or service!)......

    I am looking for something to help me find new UK information daily (probably from google.co.uk search - like an advanced realtime version of "google alerts"). Ideally it would publish data to Wordpress in "draft" mode. Our editor would manually check & edit each item before posting.

    The main problem I see is that the content is very different on each website; there is no real consistency?.the main thing is getting the urls as the editor can do the rest (anything beyond this would be a bonus!).

    If you know of anything or can create something please let me know.
     
  13. weedsmoker

    weedsmoker Junior Member

    Joined:
    May 2, 2011
    Messages:
    190
    Likes Received:
    79
    Quick and dirty solution in php to scrape first G page

    PHP:
    $keywords 'jon doe';
    $url 'http://www.google.com/search?hl=en&q=' urlencode$keywords );
    $html file_get_contents$url );
    if( 
    $html === false ): die( ); endif;
    $a 0;
    $dom = new DOMDocument( );
    @
    $dom->loadHTML$html );
    $xpath = new DOMXPath$dom );
    $tags $xpath->evaluate'/html/body//h3[@class="r"]/a' );
    foreach( 
    $tags as $tag ):
        
    $out[$a]['node'] = trim$tag->nodeValue );
        
    $out[$a]['href'] = trim$tag->getAttribute'href' ) );
        ++
    $a;
    endforeach;
    echo 
    '<pre>';
    print_r$out );
    echo 
    '</pre>';
    Want more? Put code in loop, add &start=$page*10 to $url query to change pages, add time delay between requests, use curl to add proxy support, use C#, python ... to add more threads, or just use ScrapeBox ;)
     
    Last edited: Nov 17, 2011