1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Google Scraper

Discussion in 'PHP & Perl' started by WORK@HOME, Feb 7, 2014.

  1. WORK@HOME

    WORK@HOME Senior Member

    Joined:
    Apr 25, 2013
    Messages:
    805
    Likes Received:
    363
    Location:
    Right Here
    Home Page:
    Hey guys.I have this section of code I can't get to work to scraoe google search results.I need help to make this work big time.

    Any help will be awesome.

    Code:
    function get_google($kw, $res=10, $geo='US') {
        $urls = array();
        if ($geo == '') $geo = 'US';
        $url  = 'http://www.google.com/search?hl=en&as_qdr=all&gl='.$geo.'&q='.urlencode($kw).'&num='.$res;
        
        try {
            $html = curl_fetch($url);
        
            $dom = new DOMDocument();
            @$dom->loadHTML($html);
            $xpath = new DOMXPath($dom);
            $hrefs = $xpath->evaluate("/html/body//a");
            for ($i = 0; $i < $hrefs->length; $i++) {
                $href = $hrefs->item($i);
                if ($href->getAttribute('class') == 'l') {
                    $link = explode('?', $href->getAttribute('href'));
                    $urls[] = $link[0];
                }
            }
        } catch (Exception $e) {}
        return $urls;
    }
     
  2. WebsiteLiving

    WebsiteLiving Registered Member

    Joined:
    Feb 5, 2014
    Messages:
    99
    Likes Received:
    18
    Home Page:
    Are you trying to do it many times in a short period of time? because I know that they have a captcha that pops up in that case..
     
  3. mypmmail

    mypmmail Junior Member

    Joined:
    Jan 31, 2008
    Messages:
    111
    Likes Received:
    27
    Actually, depending on your search amount, if it is not a huge amount, then you can consider using the custom search api
    developers.google.com/custom-search
    where the data is much better formatted for your manipulation.

    By scraping, google is smart enough to detect you are doing it through bot and disallow you for that.
    Thus, theoretically speaking, it's limited in the same way as custom search.
     
  4. macdonjo3

    macdonjo3 Jr. VIP Jr. VIP Premium Member

    Joined:
    Nov 8, 2009
    Messages:
    5,564
    Likes Received:
    4,317
    Location:
    Toronto
    Home Page:
  5. sockpuppet

    sockpuppet Junior Member

    Joined:
    Nov 7, 2011
    Messages:
    155
    Likes Received:
    145
    this works for me
    Code:
    function get_google($kw, $res=10, $geo='US') {
            
            $urls = array();
            if ($geo == '') $geo = 'US';
            $url  = 'http://www.google.com/search?hl=en&as_qdr=all&gl='.$geo.'&q='.urlencode($kw).'&num='.$res;
    
            try {
                    $html = curl_fetch($url);
    
                    $dom = new DOMDocument();
                    @$dom->loadHTML($html);
                    $xpath = new DOMXPath($dom);
                    $hrefs = $xpath->evaluate("/html/body//h3/a");
                    for ($i = 0; $i < $hrefs->length; $i++) { 
                            $href = $hrefs->item($i);
                            if ( preg_match('/url\?q=(.*?)&sa/', $href->getAttribute('href'),$result) ) {
                                    $link = urldecode( $result[1] );
                                    $urls[] = $link;        
                            }
                    }
            } catch (Exception $e) {}
            return $urls;
    }
    
    
     
  6. WORK@HOME

    WORK@HOME Senior Member

    Joined:
    Apr 25, 2013
    Messages:
    805
    Likes Received:
    363
    Location:
    Right Here
    Home Page:
    Hey guys...Thanks for the help.I actually found out that I could do it with Scrapebox.