accelerator_dd
Elite Member
- May 14, 2010
- 2,447
- 1,022
Hiding your PBN Like The Pros - And Why Doing It Wrong = Deindexing
The Robots.txt file is meant to tell crawlers how to behave on a certain site, what to index, what not to index and how often to do it.
By using the command:
If the ahrefs crawler abides by the robots.txt rules (which from my experience does), your PBN site won't be included in the aHrefs link index.
However, there is one big problem with this - some link checkers MIGHT NOT respect those rules, and more importantly - EVERYONE, including Googlebot can read this file and find out that you don't want your site indexed by link crawlers. I have analyzed a lot of legit sites and only a few had this rule in their robots.txt file. This makes using a robots.txt file A HUGE FOOTPRINT. If you don't believe me, and you are using this method, simply get one of your money sites that is backlinked by a PBN among other links and go through each linking domain and check its robots.txt file - more often than not, the only domains linking to you that have those rules in the Robots.txt file are your (or someone else's) PBN.
The .htaccess file is completely different. The .htaccess file is a configuration file that tells your hosting what to do in certain situations. In our case, it is used to tell the web hosting - "If ahrefsBot or MJ16 tries to view this website, redirect them to Wikipedia.org". When the ahrefs crawler tries to index your PBN domain, and the .htaccess file is set up properly, the ahrefs crawler will receive a 301 redirect to Wikipedia (or any site you put in there) - it won't know that you have any other content in there.
The biggest advantage of the .htaccess file is that you are the only one who can see it. When google visits the website, it won't notice ANY difference at all. When a person visits your site they won't notice anything different.
In short -
Robots.txt - footprint - anyone can see it!
.htaccess - only you can see it, no footprint!
How to set up the .htaccess file to stop link checkers from indexing your PBN domains:
This can be done fairly easy from cPanel, and I'll walk you through it:
1. Log in to cPanel
2. In cPanel, press on File Manager:
Once all is set and done, it should look like this for a regular wordpress installation:
Pastebin: http://pastebin.com/GwB4b3pT
To test if it works, you can use Screaming Frog SEO by changing the user agent via Configuration -> User Agent and changing it to one of the blocked robots. If a guide is needed on how to check this let me know and I'll post one.
If you have any questions, post them below and I'll do my best to answer them.
After working with PBNs for a while and reading all kinds of things about hiding your domains from ahrefs/majestic/moz I decided to do some research on my own and conclusively find out what is the proper way to hide your PBN domains without leaving a footprint.
Currently, there are two ways everyone is aware of for hiding your backlinks so they don't appear in link checkers like Majestic, aHrefs and the rest:
Currently, there are two ways everyone is aware of for hiding your backlinks so they don't appear in link checkers like Majestic, aHrefs and the rest:
[*=left]Robots.txt file
[*=left].htaccess file
The Robots.txt file is meant to tell crawlers how to behave on a certain site, what to index, what not to index and how often to do it.
By using the command:
User-agent: AhrefsBot
Disallow: /
If the ahrefs crawler abides by the robots.txt rules (which from my experience does), your PBN site won't be included in the aHrefs link index.
However, there is one big problem with this - some link checkers MIGHT NOT respect those rules, and more importantly - EVERYONE, including Googlebot can read this file and find out that you don't want your site indexed by link crawlers. I have analyzed a lot of legit sites and only a few had this rule in their robots.txt file. This makes using a robots.txt file A HUGE FOOTPRINT. If you don't believe me, and you are using this method, simply get one of your money sites that is backlinked by a PBN among other links and go through each linking domain and check its robots.txt file - more often than not, the only domains linking to you that have those rules in the Robots.txt file are your (or someone else's) PBN.
The .htaccess file is completely different. The .htaccess file is a configuration file that tells your hosting what to do in certain situations. In our case, it is used to tell the web hosting - "If ahrefsBot or MJ16 tries to view this website, redirect them to Wikipedia.org". When the ahrefs crawler tries to index your PBN domain, and the .htaccess file is set up properly, the ahrefs crawler will receive a 301 redirect to Wikipedia (or any site you put in there) - it won't know that you have any other content in there.
The biggest advantage of the .htaccess file is that you are the only one who can see it. When google visits the website, it won't notice ANY difference at all. When a person visits your site they won't notice anything different.
In short -
Robots.txt - footprint - anyone can see it!
.htaccess - only you can see it, no footprint!
How to set up the .htaccess file to stop link checkers from indexing your PBN domains:
This can be done fairly easy from cPanel, and I'll walk you through it:
1. Log in to cPanel
2. In cPanel, press on File Manager:
3. Make sure the "Show Hidden Files" checkbox is checked:
4. In the list of files, select the file called ".htaccess" and press on "Edit" from the top tool bar
5. Once the editor opens up, add the following right after the line "RewriteEngine On":
5. Once the editor opens up, add the following right after the line "RewriteEngine On":
#First we make sure that each request has a user-agent (legit browsers always use those, search engine crawlers as well)
RewriteCond %{HTTP_USER_AGENT} ^$
RewriteRule ^(.*)$ - [F,L]
#Next we block the bots
RewriteCond %{HTTP_USER_AGENT} ^.*AhrefsBot.*$ [NC]
RewriteRule ^.*.* http://www.bing.com/ [R=301,L]
RewriteCond %{HTTP_USER_AGENT} ^.*MJ12bot.*$ [NC]
RewriteRule ^.*.* http://www.bing.com/ [R=301,L]
RewriteCond %{HTTP_USER_AGENT} ^.*rogerbot.*$ [NC]
RewriteRule ^.*.* http://www.bing.com/ [R=301,L]
You can block other user-agents in there by adding:
and you can also change the redirect domain.RewriteCond %{HTTP_USER_AGENT} ^.*<useragent_name >.*$ [NC]
RewriteRule ^.*.* http://www.bing.com/ [R=301,L]
Once all is set and done, it should look like this for a regular wordpress installation:
3. Press Save
To test if it works, you can use Screaming Frog SEO by changing the user agent via Configuration -> User Agent and changing it to one of the blocked robots. If a guide is needed on how to check this let me know and I'll post one.
If you have any questions, post them below and I'll do my best to answer them.
Last edited: