1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

PHP Script to Detect Googlebot

Discussion in 'Cloaking and Content Generators' started by Nimble75, Jun 9, 2009.

  1. Nimble75

    Nimble75 Newbie

    Joined:
    Jun 7, 2009
    Messages:
    49
    Likes Received:
    15
    Hi does anyone have or know where I can get a piece of PHP script code that can detect if the agent loading the page is a GoogleBot or any other major search engine?

    Would greatly appreciate the help

    Nimble
     
  2. gimme4free

    gimme4free Executive VIP Jr. VIP Premium Member

    Joined:
    Oct 22, 2008
    Messages:
    1,884
    Likes Received:
    1,932
    Code:
    $user_agent       = $_SERVER['HTTP_USER_AGENT'];
    case (eregi('google',$user_agent)){
    echo "Hi GoogleBot";
    }
    
     
  3. Nimble75

    Nimble75 Newbie

    Joined:
    Jun 7, 2009
    Messages:
    49
    Likes Received:
    15
    Oops..sorry I should have mentioned in my first post that I needed it to filter by IP address range, rather than just using the User Agent.

    I read that Google can crawl using User Agent values like "Mozilla" and "Safari" (without loading images or executing Javascript etc..) to fake that it is a browser, in order to detect cloaking.
     
  4. gimme4free

    gimme4free Executive VIP Jr. VIP Premium Member

    Joined:
    Oct 22, 2008
    Messages:
    1,884
    Likes Received:
    1,932
    This is almost impossible considering the fact that they probably have thousands of IP addresses and changing daily. Why do you want to block it anyway? So it doesn't get indexed?
     
  5. Nimble75

    Nimble75 Newbie

    Joined:
    Jun 7, 2009
    Messages:
    49
    Likes Received:
    15
    I just want to display a backlink if its a bot indexing the site.

     
  6. gimme4free

    gimme4free Executive VIP Jr. VIP Premium Member

    Joined:
    Oct 22, 2008
    Messages:
    1,884
    Likes Received:
    1,932
    if you know the IP range that you want to display the backlink to then add a redirect into your htaccess file that redirects IP range 127.0.0 or whatever the range is to a second URL with the backlink. Alternatively use PHP on your landing page:

    Code:
    $ip = $_SERVER['REMOTE_ADDR'];
    $ip = str_replace(".","",$ip);
    
    $ip_from = "192.168.0.0";
    $ip_to = "192.168.255.255";
    
    $ip_from = str_replace(".","",$ip_from);
    $ip_to = str_replace(".","",$ip_to);
    
    if ($ip < $ip_from || $ip > $ip_to) { echo "<a href=\"http://link.com\">Anchor</a>"; }  
     
  7. blackcat.private

    blackcat.private Newbie

    Joined:
    Apr 23, 2009
    Messages:
    18
    Likes Received:
    0
    Maybe in your index.php you could add some ajax code to send a POST to itself (index.php) to set a var to True (like javascript = True). If the page is loaded and at the metarefresh the variable is false, then javascript isn't used, so it's maybe a bot or your AM ;)

    To refresh after 5 seconds....I know it's a totally lame tech, and there is many other techniques ;)

    HTML:
    <html>
    <body>
    <META HTTP-EQUIV="REFRESH" CONTENT=5>
    </body>
    </html>
    
     
  8. splurgoth

    splurgoth Registered Member

    Joined:
    May 19, 2009
    Messages:
    55
    Likes Received:
    20
    What's the point? Why is it bad to show the link to non-bots? If Google catches you cloaking, it probably won't be good for you.
     
  9. 1EightT

    1EightT Registered Member

    Joined:
    Mar 5, 2007
    Messages:
    93
    Likes Received:
    89
    User Agent Cloaking doesn't work. Too easy to spoof.
     
  10. studiofaca

    studiofaca Newbie

    Joined:
    Dec 9, 2010
    Messages:
    1
    Likes Received:
    0
    Home Page:
    Hello.

    It is possible detected GoogleBot by IP address?
     
  11. XoC--

    XoC-- Jr. VIP Jr. VIP Premium Member

    Joined:
    Mar 5, 2009
    Messages:
    211
    Likes Received:
    113
    Yes it is possible to detect by IP address which a lot of people do for creating "door way" pages.
    There's a guy called Fantasm? I can't remember if that's his name but he sells weekly updated database of search engine IPs for about $600 a year (could be more)
     
  12. corematter

    corematter Newbie

    Joined:
    Feb 19, 2011
    Messages:
    12
    Likes Received:
    1
    Why not use in_array($_SERVER['REMOTE_ADDR'], $googleips)?
     
  13. cookie48

    cookie48 Jr. VIP Jr. VIP Premium Member

    Joined:
    Aug 19, 2009
    Messages:
    128
    Likes Received:
    14
    Because you dont have reliable $googleips
    You dont build good cloak using ip or agent. You should use some hybrid between server side and user side languages.
     
  14. NIXMY

    NIXMY Regular Member Premium Member

    Joined:
    Sep 26, 2010
    Messages:
    481
    Likes Received:
    321
    Location:
    myproxylists.com
    Home Page:
    The most reliable way to detect any search engine is to use detection by RDNS. None can spoof the RDNS of a crawler. This method is 100% accurate. The only minus thing is that you need query every visitor reverse dns. Performance impact is minor compared to use.

    I did that code long time ago but because I never receive any help back whenever I need, I will not post it.

    However the above hint should give you enough guidelines how to code it on your own.
     
    • Thanks Thanks x 4
  15. needlinks

    needlinks Regular Member

    Joined:
    Jul 21, 2008
    Messages:
    208
    Likes Received:
    323
    WPCloaker has reverse DNS function.
     
  16. NIXMY

    NIXMY Regular Member Premium Member

    Joined:
    Sep 26, 2010
    Messages:
    481
    Likes Received:
    321
    Location:
    myproxylists.com
    Home Page:
    Hopefully you guys/girls are aware of google's webmaster guidelines. My site got recently penalized for cloaking. You may even get removed from google's index depending on your abuse.

    Use this code at your own risk! You'll get penalized by google for this sooner or later. You've been warned.

    Do not remove copyright notice as this is my own code! If i see even a bit of this code posted on any other forum copyright notice removed, that's it, no more support from me when it comes to any example.

    Code:
    $remoteip = $_SERVER['REMOTE_ADDR'];
    
    if (search_engine()) {
    
     // Cloaked content         
    } else {
    // Visitors content
    }
    
    
    function search_engine () {
    
    // Copyright (c) 2011 Tapio Niemela at myproxylists.com
    
            global $remoteip;
            $remotehost = gethostbyaddr($remoteip);
    
    if (preg_match("/\.googlebot\.com$/", $remotehost) OR preg_match("/\.crawl\.yahoo(\.net|\.com)$/", $remotehost) OR 
    preg_match("/search\.msn\.com$/", $remotehost)) {
    
            return true;
            } else {                
            return false;
            }             
    
    }
    
    It's pretty simple after all, you just need to think a hacker way :)

    Notes: If a visitor don't have RDNS (the IP address he/her is using), this visitor will will have a delay while browsing your site because there is no currently timeout option provided for the gethostbyaddr() function. The default timeout is 8 seconds.

    You definitely don't want to use any exec function to call linux┬┤s "host" command to resolve the timeout issue due to it will be a big performance impact for high traffic sites.
     
    • Thanks Thanks x 1
    Last edited: Mar 16, 2011
  17. Autumn

    Autumn Elite Member

    Joined:
    Nov 18, 2010
    Messages:
    2,197
    Likes Received:
    3,041
    Occupation:
    I figure out ways to make money online and then au
    Location:
    Spamville
    By far the best IP list to use is Fantomaster's: http://fantomaster.com/fasvsspy01.html

    These guys have been at it for years and their detection system and database are second to none. If you want to do IP-delivery then this is the only solution you need to worry about. No affiliation, but I have been a happy customer of these guys for 6+ years now.

    rDNS isn't a fool proof solution because not all bot hits come from the owner that you would expect in the ARIN listings. In fact, there's no such thing as a fool proof solution. It's always risk vs reward when cloaking.
     
  18. moonseo

    moonseo Newbie

    Joined:
    Feb 19, 2011
    Messages:
    9
    Likes Received:
    0
    it's good dangerous steps and very carefully doing this work
    Regard : Mamoon
    MY Site :onlinelearnholyquran
     
  19. NIXMY

    NIXMY Regular Member Premium Member

    Joined:
    Sep 26, 2010
    Messages:
    481
    Likes Received:
    321
    Location:
    myproxylists.com
    Home Page:
    How came I was not surprised when someone is trying to make money on this? :)

    Top 3 search engines are included in my function, who need other crappy search engines? For chinise users, very easy to add baidu spider etc.

    Anyone who will take my function in use, i doubt they will pay for fasys when they can enjoy very good free solution ;)

    Any other search engine expect those top 3 is quite useless. I barely see traffic from the other search engine expect from google.

    I think that page is big bullshit in fact. What are those "800" bots? Shoudl i also code my 800 bots and start taking money from stupid users and provide similar service? No thanks.
     
    • Thanks Thanks x 1
    Last edited: Mar 16, 2011
  20. Autumn

    Autumn Elite Member

    Joined:
    Nov 18, 2010
    Messages:
    2,197
    Likes Received:
    3,041
    Occupation:
    I figure out ways to make money online and then au
    Location:
    Spamville
    Like I said, I have absolutely no affiliation with Fantomaster and team apart from being a long standing customer. I wish I could work with them, but they are waaaay above my level in SEO terms.

    You are actually doing BHW readers a real disservice by suggesting that they use rDNS. Not only is it slow as shit, but GOOGLEBOT DOESN'T ALWAYS RESOLVE TO ANY OTHER DOMAIN ASSOCIATED WITH GOOGLE IN THE ARIN LISTINGS. You will not catch every instance of Googlebot using rDNS alone.

    It has been well known for a while amongst the cloaking community that Google uses bots running out of satellite companies to bust cloakers.

    If you want to be successful with cloaking, you have to be willing to invest the relatively small amount of money it takes to run with the best solution.