[HowTo] Hiding your PBN Like The Pros – And Why Doing It Wrong = Deindexing

accelerator_dd

Elite Member
Joined
May 14, 2010
Messages
2,447
Reaction score
1,019
Hiding your PBN Like The Pros - And Why Doing It Wrong = Deindexing


After working with PBNs for a while and reading all kinds of things about hiding your domains from ahrefs/majestic/moz I decided to do some research on my own and conclusively find out what is the proper way to hide your PBN domains without leaving a footprint.

Currently, there are two ways everyone is aware of for hiding your backlinks so they don't appear in link checkers like Majestic, aHrefs and the rest:​


  • [*=left]Robots.txt file
    [*=left].htaccess file

The Robots.txt file is meant to tell crawlers how to behave on a certain site, what to index, what not to index and how often to do it.
By using the command:
User-agent: AhrefsBot
Disallow: /

If the ahrefs crawler abides by the robots.txt rules (which from my experience does), your PBN site won't be included in the aHrefs link index.

However, there is one big problem with this - some link checkers MIGHT NOT respect those rules, and more importantly - EVERYONE, including Googlebot can read this file and find out that you don't want your site indexed by link crawlers. I have analyzed a lot of legit sites and only a few had this rule in their robots.txt file. This makes using a robots.txt file A HUGE FOOTPRINT. If you don't believe me, and you are using this method, simply get one of your money sites that is backlinked by a PBN among other links and go through each linking domain and check its robots.txt file - more often than not, the only domains linking to you that have those rules in the Robots.txt file are your (or someone else's) PBN.

The .htaccess file
is completely different. The .htaccess file is a configuration file that tells your hosting what to do in certain situations. In our case, it is used to tell the web hosting - "If ahrefsBot or MJ16 tries to view this website, redirect them to Wikipedia.org". When the ahrefs crawler tries to index your PBN domain, and the .htaccess file is set up properly, the ahrefs crawler will receive a 301 redirect to Wikipedia (or any site you put in there) - it won't know that you have any other content in there.

The biggest advantage of the .htaccess file is that you are the only one who can see it. When google visits the website, it won't notice ANY difference at all. When a person visits your site they won't notice anything different.
In short -

Robots.txt - footprint - anyone can see it!
.htaccess - only you can see it, no footprint!



How to set up the .htaccess file to stop link checkers from indexing your PBN domains:
This can be done fairly easy from cPanel, and I'll walk you through it:


1. Log in to cPanel

2. In cPanel, press on File Manager:
XpIHW7I.png

3. Make sure the "Show Hidden Files" checkbox is checked:
XKvKXRL.png


4. In the list of files, select the file called ".htaccess" and press on "Edit" from the top tool bar

5. Once the editor opens up, add the following right after the line "RewriteEngine On":

#First we make sure that each request has a user-agent (legit browsers always use those, search engine crawlers as well)
RewriteCond %{HTTP_USER_AGENT} ^$
RewriteRule ^(.*)$ - [F,L]
#Next we block the bots
RewriteCond %{HTTP_USER_AGENT} ^.*AhrefsBot.*$ [NC]
RewriteRule ^.*.* http://www.bing.com/ [R=301,L]
RewriteCond %{HTTP_USER_AGENT} ^.*MJ12bot.*$ [NC]
RewriteRule ^.*.* http://www.bing.com/ [R=301,L]
RewriteCond %{HTTP_USER_AGENT} ^.*rogerbot.*$ [NC]
RewriteRule ^.*.* http://www.bing.com/ [R=301,L]

You can block other user-agents in there by adding:
RewriteCond %{HTTP_USER_AGENT} ^.*<useragent_name >.*$ [NC]
RewriteRule ^.*.* http://www.bing.com/ [R=301,L]
and you can also change the redirect domain.

Once all is set and done, it should look like this for a regular wordpress installation:

EekNcv0.png



Pastebin: http://pastebin.com/GwB4b3pT
3. Press Save

To test if it works, you can use Screaming Frog SEO by changing the user agent via Configuration -> User Agent and changing it to one of the blocked robots. If a guide is needed on how to check this let me know and I'll post one.

If you have any questions, post them below and I'll do my best to answer them.
 
Last edited:

rogerke

Regular Member
Joined
Oct 5, 2014
Messages
271
Reaction score
146
Great that you're trying to help people, but there is a great flaw in your code and it IS going to result in crawled backlinks from Majestic and Ahrefs. Not going to spill the beans because I regard it as proprietary, but anyone understanding the code should notice it in an instant. You're not the first one making this mistake either.

Caveat emptor.
 
Last edited:

ElseAndrew

Newbie
Joined
Jun 15, 2012
Messages
19
Reaction score
8
You don't need to create the rewrite rule each time like the OP has shown, you can create your rules like this:

RewriteCond %{HTTP_USER_AGENT} ^.*rogerbot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*exabot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Xenu.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*dotbot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*gigabot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*BlekkoBot.*$ [NC]
RewriteRule ^.*.* URL To Redirect To [L]
 
Last edited:

ElseAndrew

Newbie
Joined
Jun 15, 2012
Messages
19
Reaction score
8
Using CCarter's code ;)

Still going to result in crawled backlinks by those bots unfortunately.

Yeah I think I found the code on here a while back, I was just correcting OPs use of .htaccess, no need for it to be that long by adding in a RewriteRule for each user agent.
 

accelerator_dd

Elite Member
Joined
May 14, 2010
Messages
2,447
Reaction score
1,019
Great that you're trying to help people, but there is a great flaw in your code and it IS going to result in crawled backlinks from Majestic and Ahrefs. Not going to spill the beans because I regard it as proprietary, but anyone understanding the code should notice it in an instant. You're not the first one making this mistake either.

Caveat emptor.

I assume you were referring to the case where there is no user agent? I threw in something to take care of requests like that - thanks for pointing it out.
 

rogerke

Regular Member
Joined
Oct 5, 2014
Messages
271
Reaction score
146
Yeah I think I found the code on here a while back, I was just correcting OPs use of .htaccess, no need for it to be that long by adding in a RewriteRule for each user agent.

That's definitely true. But although CCarter is a straight genius his code doesn't take a crucial aspect into consideration either.

I assume you were referring to the case where there is no user agent? I threw in something to take care of requests like that - thanks for pointing it out.

Nope, I wasn't referring to that. The code is fundamentally flawed in another way. Ask yourself what the code blocks and what it doesn't (not referring to user agents) Sorry I can't be more specific than that, but I really don't want to state this in the open because virtually everyone is currently making the same mistake. I will tell that anyone using this code on a considerable amount of his network must have noticed backlinks popping up in Ahrefs & Majestic and by analyzing those backlinks it's painfully obvious what the code isn't blocking.
 

ORJay

Regular Member
Joined
Aug 4, 2013
Messages
304
Reaction score
124
That's definitely true. But although CCarter is a straight genius his code doesn't take a crucial aspect into consideration either.



Nope, I wasn't referring to that. The code is fundamentally flawed in another way. Ask yourself what the code blocks and what it doesn't (not referring to user agents) Sorry I can't be more specific than that, but I really don't want to state this in the open because virtually everyone is currently making the same mistake. I will tell that anyone using this code on a considerable amount of his network must have noticed backlinks popping up in Ahrefs & Majestic and by analyzing those backlinks it's painfully obvious what the code isn't blocking.

Why don't you let us know?
 

ankame

Regular Member
Joined
Feb 28, 2012
Messages
239
Reaction score
137
Why not use a simple plug in like Link Privacy (SEO Revolution)?... I think it's still free.
 

bokeboke

Junior Member
Joined
Mar 21, 2010
Messages
136
Reaction score
32
That's definitely true. But although CCarter is a straight genius his code doesn't take a crucial aspect into consideration either.



Nope, I wasn't referring to that. The code is fundamentally flawed in another way. Ask yourself what the code blocks and what it doesn't (not referring to user agents) Sorry I can't be more specific than that, but I really don't want to state this in the open because virtually everyone is currently making the same mistake. I will tell that anyone using this code on a considerable amount of his network must have noticed backlinks popping up in Ahrefs & Majestic and by analyzing those backlinks it's painfully obvious what the code isn't blocking.


Either share or shut up. What's the point of posting that? Unless you're just stroking your own e-dick.
 

Dj Co2

Jr. VIP
Jr. VIP
Joined
Aug 21, 2009
Messages
1,823
Reaction score
2,479
Age
30
Sometimes the rewrite rules don't work. Specially if you're using a plugin on wordpress that also modifies htaccess like the "wp super cache" plugin.

So, in that case, you can use the following code:

SetEnvIfNoCase User-Agent .*rogerbot.* bad_bot
SetEnvIfNoCase User-Agent .*exabot.* bad_bot
SetEnvIfNoCase User-Agent .*mj12bot.* bad_bot
SetEnvIfNoCase User-Agent .*dotbot.* bad_bot
SetEnvIfNoCase User-Agent .*gigabot.* bad_bot
SetEnvIfNoCase User-Agent .*ahrefsbot.* bad_bot
SetEnvIfNoCase User-Agent .*sitebot.* bad_bot
SetEnvIfNoCase User-Agent .*BlekkoBot.* bad_bot
SetEnvIfNoCase User-Agent .*Semrush.* bad_bot
SetEnvIfNoCase User-Agent .*BLEXBot.* bad_bot
<Limit GET POST HEAD>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>
 

accelerator_dd

Elite Member
Joined
May 14, 2010
Messages
2,447
Reaction score
1,019
Thank you OP! Is the format correct though?

Yes, I believe it is. I tested it with my own sites and it worked great when i tried crawling them with screaming seo spider.

Sometimes the rewrite rules don't work. Specially if you're using a plugin on wordpress that also modifies htaccess like the "wp super cache" plugin.

So, in that case, you can use the following code:

SetEnvIfNoCase User-Agent .*rogerbot.* bad_bot
SetEnvIfNoCase User-Agent .*exabot.* bad_bot
SetEnvIfNoCase User-Agent .*mj12bot.* bad_bot
SetEnvIfNoCase User-Agent .*dotbot.* bad_bot
SetEnvIfNoCase User-Agent .*gigabot.* bad_bot
SetEnvIfNoCase User-Agent .*ahrefsbot.* bad_bot
SetEnvIfNoCase User-Agent .*sitebot.* bad_bot
SetEnvIfNoCase User-Agent .*BlekkoBot.* bad_bot
SetEnvIfNoCase User-Agent .*Semrush.* bad_bot
SetEnvIfNoCase User-Agent .*BLEXBot.* bad_bot
<Limit GET POST HEAD>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>

Thanks, I'll give it a shot and see how it works.

Another good approach would be to whitelist all the browsers/phones/crawlers you want to access the sites, and let everything else go to 403.
 

rogerke

Regular Member
Joined
Oct 5, 2014
Messages
271
Reaction score
146
Either share or shut up. What's the point of posting that? Unless you're just stroking your own e-dick.

Lol whatever dude. Like I said, I'm not going to publically state what's wrong with the code because virtually everyone is currently making the same mistake which works to my advantage.

It's not that hard anyway. Sorry that you need to be spoon fed, but if you put your own research into it, it'll be obvious within minutes. There is more than enough information in that post to find it out.
 

Aatrox

Supreme Member
Joined
Feb 27, 2014
Messages
1,431
Reaction score
1,076
Well, I'll definitely use this on a few of my PBN domains so I'll cover up some of the tracks. Thanks OP!
 

mainhoondon

Regular Member
Joined
Jun 15, 2009
Messages
396
Reaction score
183
It just blocks the spiders of ahrefs and majestic seo from your site. I believe they can still crawl xyz.com and see that they link to yoursite.com.
Correct me if I am wrong
 

accelerator_dd

Elite Member
Joined
May 14, 2010
Messages
2,447
Reaction score
1,019
It just blocks the spiders of ahrefs and majestic seo from your site. I believe they can still crawl xyz.com and see that they link to yoursite.com.
Correct me if I am wrong

You would add this to your PBN sites so when they are crawled, the link from your PBN to your money site is not indexed.

i'm using http://www.blackhatworld.com/blackhat-seo/black-hat-seo/674231-free-wordpress-plugin-hide-your-private-blog-network-competitors.html and it is working perfectly.

I haven't tried the plugin so I can't really comment, but generally any plugin that also adds those rules in robots.txt can be a risk.

Of course, there are other trackers out there that need to be blocked aside from the ones in the example I have, but that is the proper syntax that can make it happen.
 
Top