1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

php google serp scraper in 3 lines of code

Discussion in 'Black Hat SEO' started by skweekykleen, Apr 15, 2009.

  1. skweekykleen

    skweekykleen Newbie

    Joined:
    Jul 8, 2008
    Messages:
    45
    Likes Received:
    68
    Was bored, so threw this together...change the value of $query to whatever you want to search for. This grabs the links of the first 100 results, but as you can see it would be easy to toss in a loop and grab the rest if needed...

    PHP:
    <?php
      
    $query 
    urlencode("big boobies"); 
    preg_match_all('/<a title=".*?" href=(.*?)>/'file_get_contents("http://www.google.com/ie?q=" urlencode($query) . "&num=100&start=1"), $matches); 
    print 
    implode("<br>"$matches[1]);

    ?>
    Also, I'm going to be sitting with nothing to do for the next couple of hours...if anyone needs help with anything programming-related, give me a shout and I'll see if I can help...

    Only requirement is it has to be a real-world project...I'm not interested in homework assignments
     
    • Thanks Thanks x 15
  2. Poison

    Poison Junior Member

    Joined:
    Feb 28, 2009
    Messages:
    103
    Likes Received:
    35
    Occupation:
    Ironically SEO and PPC manager
    Home Page:
    Just tried it out, sadly my server does not allow URL access like this.
    However I did see how the query was been run.
    http://www.google.co.uk/search?q=wgc+seo&num=100&start=1

    Looks good, I am not going to change my server settings to make full use of this but it is a very fast effective code I'll give you that.
    (yeah I changed the Google it was searching to UK, as its more useful to me)


    edit: looks like this post is number 4 in google (uk) for "google serp scraper"
     
    Last edited: Apr 15, 2009
  3. markdigerati

    markdigerati Junior Member

    Joined:
    Nov 21, 2007
    Messages:
    113
    Likes Received:
    28
    search? is changing to url? shortly...
     
  4. Sippy79

    Sippy79 Junior Member

    Joined:
    Feb 13, 2009
    Messages:
    104
    Likes Received:
    26
    Works like a charm, thank you very much :)
     
  5. Poison

    Poison Junior Member

    Joined:
    Feb 28, 2009
    Messages:
    103
    Likes Received:
    35
    Occupation:
    Ironically SEO and PPC manager
    Home Page:
    Well your post didn't make much sense is I'm honest here...
    I am using "search?" because that is how it looks for the .co.uk Google, I don't really care for the .com one.

    And my PHP settings on my server does not like me scraping. I am sure I installed PHP etc. on my PC it would work fine.
     
  6. affiliate.solutions

    affiliate.solutions Registered Member

    Joined:
    Apr 3, 2009
    Messages:
    97
    Likes Received:
    62
    Btw... Modified this for more usefulness and I wasn't able to retrieve results past 1000 so if anyone can post that info please advise. I attempted to put a sleep in the middle but it did not help.



    This is indeed quick and gives you a somewhat useful reference to your keywords

     
  7. croakingtoad

    croakingtoad Newbie

    Joined:
    Jul 28, 2008
    Messages:
    2
    Likes Received:
    0
    I tried to modify this to scrape the query "site:mydomain" to help me create a database of links I need to 301 during a site migration, but it doesn't work. Below is what I've tried, can anyone clue me in to what I'm doing wrong?

    Okay, so I just discovered I can't post links (even in code brackets) until I've posted more here, sorry. Basically though I change the value of $query to be - site:mydomain and it doesn't work. Also, if I get rid of the var $query and replace the content of the file_get_contents will the full url from Google after I do the search myself, it still returns a blank page.

    What am I doing wrong? Thanks!
     
  8. Poison

    Poison Junior Member

    Joined:
    Feb 28, 2009
    Messages:
    103
    Likes Received:
    35
    Occupation:
    Ironically SEO and PPC manager
    Home Page:
    hmm if you are working of a server a very high chance your setting will be defaulted to not allow this (like my server was) so chance is you done nothing wrong, your server just doesn't like it!
     
  9. croakingtoad

    croakingtoad Newbie

    Joined:
    Jul 28, 2008
    Messages:
    2
    Likes Received:
    0
    Well, mine let me run the first script, just not after I changed it...
     
  10. Poison

    Poison Junior Member

    Joined:
    Feb 28, 2009
    Messages:
    103
    Likes Received:
    35
    Occupation:
    Ironically SEO and PPC manager
    Home Page:
    Well what script are you using, can you post it here?
     
  11. SEOHolicc

    SEOHolicc Newbie

    Joined:
    Jan 23, 2008
    Messages:
    33
    Likes Received:
    5
    Occupation:
    Internet Marketing
    Location:
    Colorado
    Home Page:
    This is how you can use search operators such as site:yoursite.com

    What I did was removed the keyword from the quotes after the urlencode function and just left it blank. Now you'll have to change the Google URL to include whatever operator you want to use.

    In the example below, it already has site and then "%3A" which is how Google treats the colon. You can enter whatever URL you want after that.

    Code:
    <?php
    
    
    $query = urlencode("");
    $j=0;
    $t="<br>****************************************** ************************<br>";
    for ($i=100;$i<1000;$i+=100){
    
    preg_match_all('/<a title=".*?" href=(.*?)>/', file_get_contents("http://www.google.com/ie?q=site%3Ayourdomain.com" . urlencode($query) . "&num=".$i."&start=".$j), $matches);
    $j+=100;
    
    print "$t J = $j $t";
    print implode("<br>", $matches[1]);
    }
    
    
    
    ?>
     
  12. Superdude22

    Superdude22 Registered Member

    Joined:
    Jul 10, 2008
    Messages:
    79
    Likes Received:
    31
    Location:
    A Beach
    Just started playing around with this. You have to use the .tld with yourdomain. I.e. Yourdomain.com not just yourdomain

    Code:
    $site ="yourdomain.com";
    $query = urlencode("site:".$site);
    $res = 10;
    
    
    //Search for Indexed Pages
    preg_match_all('/<a title=".*?" href=(.*?)>/', file_get_contents("http://www.google.com/ie?q=".$query."&num=".$res."&start=1"), $matches);
     
  13. c0pyuk

    c0pyuk Newbie

    Joined:
    Sep 28, 2009
    Messages:
    30
    Likes Received:
    203
    if your server doesn't allow "file_get_contents" then you could try "readfile" instead
     
  14. youngguy

    youngguy Senior Member

    Joined:
    Apr 11, 2009
    Messages:
    1,053
    Likes Received:
    1,560
    Location:
    Hell
    add &gl=[country code]
     
  15. ronmac321

    ronmac321 Registered Member

    Joined:
    Nov 4, 2008
    Messages:
    59
    Likes Received:
    11
    Thanks, nice script.
     
  16. mikeyy_

    mikeyy_ Registered Member

    Joined:
    Oct 17, 2009
    Messages:
    59
    Likes Received:
    50
    Occupation:
    Self-employed, entreprenuer.
    Location:
    Underground
    Home Page:
    Here is what I was able to code up.. :)

    PHP:
    <?php
    print "Enter domain:\n";
    $domain getInput();
    print 
    "\nEnter query (Press enter if none):\n";
    $query  getInput();
    print 
    "\nEnter number to start from:\n";
    $start getInput();
    print 
    "\nEnter how many results to output:\n";
    $amount getInput();
    $content googQuery($domain$query$start$amount);
    print 
    "=======================\n";
    print 
    "Results.....\n";
    print 
    "=======================\n";
    preg_match_all('/<a title=".*?" href=(.*?)>/'$content$matches);
    foreach(
    $matches[1] as $match){
        print 
    $match."\n";
    }

    function 
    googQuery($domain$query$start$amount)
    {
        
    $fp fsockopen("www.google.com"80$errno$errstr30);
        
    fputs($fp"GET http://www.google.com/ie?q=site%3A$domain%20" urlencode($query) . "&num=$amount&start=$start&filter=0 HTTP/1.1\r\n");
        
    fputs($fp"Host: www.google.com\r\n");
        
    fputs($fp"User-agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.10) Gecko/2009042315 Firefox/3.0.10\r\n");
          
    fputs($fp"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n");
        
    fputs($fp"Accept-Language: en-us,en;q=0.5\r\n");
        
    fputs($fp"Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n");
        
    fputs($fp"Connection: close\r\n\r\n");
        while (!
    feof($fp)) {
            
    $buf .= fgets($fp,128);
        }   
       
    fclose($fp);
       return 
    $buf;
    }

    function 
    getInput($length=255)
    {
        
    $fr=fopen("php://stdin","r");
        
    $input fgets($fr,$length);
        
    $input rtrim($input);
        
    fclose ($fr);
        return 
    $input;
    }
    ?>
     
    • Thanks Thanks x 1
    Last edited: Oct 19, 2009
  17. showboytridin

    showboytridin Regular Member

    Joined:
    Sep 5, 2009
    Messages:
    348
    Likes Received:
    714
    Location:
    127.0.0.1
    If your server does not allow file_get_contents (because can be dangerous) function, you can use cURL.

     
  18. skweekykleen

    skweekykleen Newbie

    Joined:
    Jul 8, 2008
    Messages:
    45
    Likes Received:
    68
    wow... thanks everyone for the many responses... to answer some of them:

    @Crooker: The main reason I used /ie search page is because it's a non-formatted 'bare minimum' results page, which loads quicker, thus allowing it to be parsed more quickly, especially if we're talking multiple threads, or a slower connection etc. Also the start=1/0 thing... that has to start at 1 for the numbers to work out correctly, because that tells google which result to start from when showing the results... for instance, if you are looking at 100 results per page, on the first loop it would be 1, then on the second loop, it would be 101 (1+100), then 201 (101+100), etc...

    @Affiliate.Solutions: Regarding the 1000 results issue... you would just rewrite the regexp portion and place it into a loop, up to whatever # of results you want... Actually though, google has a policy not to show more than 1000 results for any one keyword, so you would have to mix up a little bit to get that...
     
  19. doroftei

    doroftei Junior Member

    Joined:
    Feb 20, 2009
    Messages:
    103
    Likes Received:
    24
  20. skweekykleen

    skweekykleen Newbie

    Joined:
    Jul 8, 2008
    Messages:
    45
    Likes Received:
    68
    :p I R DUMB

    You know, I tried that 100 times using a simple 1-word query like 'bananas' or 'cars' and I thought for sure that I was seeing it right...

    You're absolutely right about the 1 and 0, I stand corrected, and it's good to know..