I just caught 2 Google IPs sniffing around my sites

anty

Newbie
Dec 8, 2008
22
0
Hi,

I just caught two IPs calling random pages on my cloaked domains. (Always a random page on a domain, then another domain.)

Both IPs came from the Google IP range (74.125.0.0 - 74.125.255.255) but have no reverse-DNS entry and are their for, afaik, not crawlers. (Actually their IPs came from this IP-range: 74.125.75.*)

Their referrers always followed the same schema: "http://www.google.com/search?hl=en&q=example.com" where example.com matched the request URL and their UserAgent was on both IPs: "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7"

I banned both IPs, because I think those are manual reviewers checking out my sites.

My question now is: Did you ever experience this and if yes, what did you do? I'm thinking about denying the whole Google-IP-range, except for the crawlers. In my opinion, a timeout for human reviewers is better than a redirect to an affiliate offer.

Another idea would be to deny all requests with a referrer with the given schema. This would be effective, but not future proof.

Ideally I would display them a wh site, but this is not gonna happen until I find a way to create wh sites on the fly :)

What would you do?
 
Last edited:
*LOL*


http:// www. google. com/search?hl=en&q=example.com

Isn't that a search term? I guess you added example.com to not sho either your domain name or your keyword.

Don't worry about that.
 
Last edited:
I am sure Google has bots that don't "play by the rules" i.e. try to impersonate normal web browsers, don't resolve as bots, and gives referers that you would "expect to see" from a person hitting that page. They're not dumb and know people display different things to their bots, and try to get around that. I highly doubt people at Google are manually reviewing your links. (Unless maybe you have a HUGE blackhat operation going on.)

Personally I would bounce all addresses in this IP range to the same page the crawlers go to, as they're most likely crawlers. Perhaps it means the crawlers found your pages to be "potentially suspicious" and so checked them out with unlisted crawlers to determine if it was shown substantially different content. (And hence if you were filtering bots.) In this case showing them both the bot pages would be ideal.
 
I just posted the same in another forum here.
Also I had googlers on my site, wondering why as I did not use Black SEO for it.

Maybe they build linklists from sources like bhw and similar that are regulary reviewed ?
I guess they can catch a lot of people using forbidden techniques if they look at those sites that are listed in bh seo forums
 
I am sure Google has bots that don't "play by the rules" i.e. try to impersonate normal web browsers, don't resolve as bots, and gives referers that you would "expect to see" from a person hitting that page. They're not dumb and know people display different things to their bots, and try to get around that. I highly doubt people at Google are manually reviewing your links. (Unless maybe you have a HUGE blackhat operation going on.)

Personally I would bounce all addresses in this IP range to the same page the crawlers go to, as they're most likely crawlers. Perhaps it means the crawlers found your pages to be "potentially suspicious" and so checked them out with unlisted crawlers to determine if it was shown substantially different content. (And hence if you were filtering bots.) In this case showing them both the bot pages would be ideal.

This is def. what's going on... Google has its smarts... my cloaked pages are always delisted when this type of thing occurs... def. something going on manual or automatic.... I'm working on a solution that doesn't hinder the effectiveness of having cloaks.
 
Personally I would bounce all addresses in this IP range to the same page the crawlers go to, as they're most likely crawlers.

I don't think showing all Google IPs the real pages is the right decision. This would reveal my content and links to competitors since they could use the translation tool to check my content.
But a manually created list could do the trick. I would just have to make sure to include only the bots.

Thanks for the response drkenneth, seems obvious now, but I didn't thought about disguised bots.
 
There is a way to deny google entering your website. I will let you know how if I remember.
 
There is a way to deny google entering your website. I will let you know how if I remember.

The problem is you still want to get indexed as relevant by Google WHILE directing real people to affiliate programs. Denying them the ability to look at your site entirely means he'd get deindex completely, which is definitely not what he wants. So, "cloaking" and showing search engines different thing than people are what he needs to do, not deny google from entering. (Which can be done w/ robots.txt)
 
Just keep in mind: there is no way to guarantee you can ever keep something like google out.
They for sure have got subnets you never heard about all over the world.
 
Don't bother. If you want traffic let it in no matter what. Google bot or human who cares?

Are you doing anything too shady you will most likely get deindexed and if not letting google in it will give the same result.

If getting banned it's not the end of the world. There are tons of ways to drive traffic even if your not indexed in the Google serps and a lot of sites aren't indexed by Google and stull profitable. Also remember that a lot of sites aren't indexed among the top 10, which is pretty much the same as not beeing indexed at all and many there still do good too.
 
The range is listed to Google:

Code:
http://ws.arin.net/whois/?queryinput=74.125.75.123

I've experienced the same thing, no Googlebot useragent but the IP listed as Google's and the referrer coming in had the same schema as google.com/search?hl=en&q=mypageurl.com

From what i've gathered is yes it's a cloaking detection bot but it's automated and not manual. The reason is, firstly the IP is registered to Google and they won't "brand" manual reviewers with the Google IP. Also i have a lot of evidence to suggest reviewers now are using Chrome, no doubt adapted with special plugins/tools.

Many people cloak right off the SERP's (SSEC Anyone) and cloak by useragent, so with the bot coming from a search referral using Firefox it can nail a good percentage of cloaked pages.

How did your setup handle the requests? Did it deliver the human or spider version?
 
Just woke up and saw another 2 ips sniffing around. This time without referrer, but one IP was resolving to 123-123-123-123.google.com and one didn't have a reverse dns entry like the ones from yesterday.

@SweetFunny: I deliver these IPs my bot-content now. You are right, it doesn't make sense to block them or show them the human version (=redirect). If they are human, they will ban me anyways.

The IP range you mentioned isn't the only one. They have at least one other namely:
Code:
http://ws.arin.net/whois/?queryinput=216.239.32.0
 
Last edited:
The problem is you still want to get indexed as relevant by Google WHILE directing real people to affiliate programs. Denying them the ability to look at your site entirely means he'd get deindex completely, which is definitely not what he wants. So, "cloaking" and showing search engines different thing than people are what he needs to do, not deny google from entering. (Which can be done w/ robots.txt)

Ah i taught he doesn't want to get indexed. hmm dnno then.
 
If you are cloaking stuff, I would get use to seeing Google snooping around in there. You can block IP's all you want because I am sure they are sending other crawlers in that you don't know about (there is no rule that says they have to identify themselves as googlebots or stay on a specific IP range). And once they are in your site snooping around, the damage is already done.

Just keep your bot list up to date...that's the best you can do. Cloaked sites don't last forever so make all that you can and move onto the next one.
 
Here's a few suggestions:

1. Encrypt Javascript: If bots want to figure out what your up too. Give 'em hell. Most places don't have the time or resources to be decrypting javascript.
2. Using PHP & .htaccess rewrite you can prevent the direct viewing of css, .js, etc. Files. This is good because it makes it much harder for humans and bots to disasmble your website and clone / analyze.
3. Use robots.txt and ban search engines from crawling stlying files and images. This way they can't do checks to see if your overlaying keywords and such. For example white on white or whatever color schemes you have going.
 
When you are cloaking, you gotta expect that Google and the other big ones aren't following robots.txt and other things. As one of the other members here mentioned in another cloaking thread, if you are using redirection on your cloaking, there is a theory that Google can detect the quick change in PR values in their toolbar. raising a flag. So as hard as you are working to fool them, they are working to fight you.

The only thing you can do is keep humans and bots separated the best you can....simple as that. If you get a bot viewing your pages intended for human eyes only, get the IP into the bot file asap.
 
Just to let you guys know: I experienced an IP ban today. So these IPs definitely where some sort of cloaking/spam detection bots.
 
No this is actual IPs of manual reviewers.

And if you would display what you display the spiders then your site will most probably be banned if you have anything that looks bad to humans.

I get this kind of checkups couple of times / day and they click and go through different pages of my cloaked sites, and still after 2 month they have not banned a single page of mine that is redirecting them to the human visitors place.


Kandor

Hi,

I just caught two IPs calling random pages on my cloaked domains. (Always a random page on a domain, then another domain.)

Both IPs came from the Google IP range (74.125.0.0 - 74.125.255.255) but have no reverse-DNS entry and are their for, afaik, not crawlers. (Actually their IPs came from this IP-range: 74.125.75.*)

Their referrers always followed the same schema: "http://www.google.com/search?hl=en&q=example.com" where example.com matched the request URL and their UserAgent was on both IPs: "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7"

I banned both IPs, because I think those are manual reviewers checking out my sites.

My question now is: Did you ever experience this and if yes, what did you do? I'm thinking about denying the whole Google-IP-range, except for the crawlers. In my opinion, a timeout for human reviewers is better than a redirect to an affiliate offer.

Another idea would be to deny all requests with a referrer with the given schema. This would be effective, but not future proof.

Ideally I would display them a wh site, but this is not gonna happen until I find a way to create wh sites on the fly :)

What would you do?
 
Uhm.. an IP ban ?
You mean your site got deleted from google index or what ?
btw i must say something important for this topic.
I read somewhere on DP that some random company was looking for people to do some online job. It turns out that the guy who got signed up had to review random sites given by some algorithm and determine are they good, bad, spam.. And he was doing it for.. guess what company ?
So i don't banning google ip would help, and i think people are still reviewing sites manual, coz google didn't yet developed something that will keep the spam out.
 
Back
Top
AdBlock Detected

We get it, advertisements are annoying!

Sure, ad-blocking software does a great job at blocking ads, but it also blocks useful features and essential functions on BlackHatWorld and other forums. These functions are unrelated to ads, such as internal links and images. For the best site experience please disable your AdBlocker.

I've Disabled AdBlock