Blocking Crawlers

InsanelySane

Power Member
Joined
Nov 23, 2013
Messages
574
Reaction score
116
How many people are doing this for their sites? Im only interested in blocking via robots.txt. However, I've heard ahrefs and majestic tend to not honor your decision.
 
This is from Ahrefs FAQ page:

Once Ahrefs bot is blocked in robots.txt, we stop crawling the site and drop blocked pages from the index altogether, but we will still continue collecting and showing the links pointing to this site from other domains.


The quick way to prevent AhrefsBot visiting your site is
to put these two lines into the /robots.txt file on your server:


user-agent: AhrefsBot
disallow: /

The problem with AHrefs is that it doesn't need to spider your site to get the data - it only needs to spider the sites that link to you and they can construct an inbound linking profile. By blocking them you are only stopping them from spidering links from your site to others.
 
Back
Top