Majestic is indexing my PBNs, even when I disallow the bot via robots.txt

99lives

Regular Member
Joined
Dec 27, 2009
Messages
435
Reaction score
160
I use ahrefs, not majestic. But today I was checking it and I saw ALL my PBNs links, even tho I disallow MJ12bot in the robots.txt.

Ahrefs respects it, Semrush respects it, but Majestic does not.
 

Starblazer

Jr. VIP
Jr. VIP
Joined
Feb 28, 2019
Messages
3,196
Reaction score
4,142
Some bots go rogue. We can't do anything to stop them.
 

alexono1

Jr. VIP
Jr. VIP
Joined
Sep 14, 2012
Messages
229
Reaction score
136
robots.txt is unreliable. Some honor it some do not.

Block it via .htaccess
If you're a real paranoid SEO ... you could also see robots.txt as a footprint...
if all your pbn sites have ahrefs blocked... google can see it...
Google can't see the contents of your htaccess file!
 

Nut-Nights

Jr. VIP
Jr. VIP
Joined
Jun 20, 2013
Messages
10,250
Reaction score
7,195
Website
shoppy.gg
Most of these trackers dont give a fuck about robots txt better find some plugin to do the job.
 

tiiberius

Jr. VIP
Jr. VIP
Joined
Sep 8, 2015
Messages
2,797
Reaction score
2,111
Website
t-ranks.com
If you're a real paranoid SEO ... you could also see robots.txt as a footprint...
if all your pbn sites have ahrefs blocked... google can see it...
Google can't see the contents of your htaccess file!
That's true.
 

99lives

Regular Member
Joined
Dec 27, 2009
Messages
435
Reaction score
160
Most of these trackers dont give a fuck about robots txt better find some plugin to do the job.

I've never had any issues with ahrefs or semrush, Majestic is the one being a cu*t
 

itz_styx

Jr. VIP
Jr. VIP
Joined
May 8, 2012
Messages
2,260
Reaction score
1,619
Website
argo-content.com

alexono1

Jr. VIP
Jr. VIP
Joined
Sep 14, 2012
Messages
229
Reaction score
136
normally bots that dont obey robots.txt rules get labeled "bad bots", i wonder why majestic is getting away with it.
Maybe they try obey the rules for Googlebot, just to make specific Backlinks reports more accurate?
 

itz_styx

Jr. VIP
Jr. VIP
Joined
May 8, 2012
Messages
2,260
Reaction score
1,619
Website
argo-content.com
Maybe they try obey the rules for Googlebot, just to make specific Backlinks reports more accurate?
doesn't matter, there are companies monitoring bot behaviour that are not associated with google and they bitch about any bot that doesnt obey robots.txt rules.
 

ayz1k

Newbie
Joined
Oct 4, 2021
Messages
15
Reaction score
4
robots.txt is unreliable. Some honor it some do not.

Block it via .htaccess
Hello, I've been searching on the forum for such a list to include into my .htaccess files but I didn't find. Should I open a new thread or can you please point me in the right direction? Thanks!
 

Yzetien

Junior Member
Joined
Jul 23, 2018
Messages
185
Reaction score
102
I use ahrefs, not majestic. But today I was checking it and I saw ALL my PBNs links, even tho I disallow MJ12bot in the robots.txt.

Ahrefs respects it, Semrush respects it, but Majestic does not.
Website root directory's .htaccess append the following

Apache config:
SetEnvIfNoCase User-Agent .*rogerbot.* bad_bot
SetEnvIfNoCase User-Agent .*exabot.* bad_bot
SetEnvIfNoCase User-Agent .*mj12bot.* bad_bot
SetEnvIfNoCase User-Agent .*dotbot.* bad_bot
SetEnvIfNoCase User-Agent .*gigabot.* bad_bot
SetEnvIfNoCase User-Agent .*ahrefsbot.* bad_bot
SetEnvIfNoCase User-Agent .*sitebot.* bad_bot
<Limit GET POST HEAD>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>

Credits: Quora
 

tiiberius

Jr. VIP
Jr. VIP
Joined
Sep 8, 2015
Messages
2,797
Reaction score
2,111
Website
t-ranks.com
Hello, I've been searching on the forum for such a list to include into my .htaccess files but I didn't find. Should I open a new thread or can you please point me in the right direction? Thanks!

Here you go:

Website root directory's .htaccess append the following

Apache config:
SetEnvIfNoCase User-Agent .*rogerbot.* bad_bot
SetEnvIfNoCase User-Agent .*exabot.* bad_bot
SetEnvIfNoCase User-Agent .*mj12bot.* bad_bot
SetEnvIfNoCase User-Agent .*dotbot.* bad_bot
SetEnvIfNoCase User-Agent .*gigabot.* bad_bot
SetEnvIfNoCase User-Agent .*ahrefsbot.* bad_bot
SetEnvIfNoCase User-Agent .*sitebot.* bad_bot
<Limit GET POST HEAD>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>

Credits: Quora
 
Top