Anybody with the code to block Majestic, Ahrefs and all others from crawling a site

Joined
Jan 18, 2013
Messages
48
Reaction score
6
so if we block those Bot then how will be Backlink Crawler Determine the Domain Metrics where we Block them..?? Like Moz DA/PA, Majestic CF/TF..?? that will be depend on the Backlink point to the PBN or any Site...
 

irdeto

Regular Member
Joined
Mar 18, 2010
Messages
391
Reaction score
164
Is there any reason to not simply block every bot and just allow ones you want through?

ie. something like (adding other SE's & services you use):

User-agent: *
Disallow: /


User-agent: Googlebot
Allow: /

yes this is fine however not all these bots honour robots.txt. ahrefs is notorious for this. .htaccess is the best way to do it and probably including IP addresses is the most accurate way.
 

leobar

Regular Member
Joined
Mar 9, 2013
Messages
267
Reaction score
133
Age
38
so if we block those Bot then how will be Backlink Crawler Determine the Domain Metrics where we Block them..?? Like Moz DA/PA, Majestic CF/TF..?? that will be depend on the Backlink point to the PBN or any Site...
why you need domain metrics? if the PBN is yours, you should already know the domain metrics.
If the PBN is not yours, then you simply can't use those robots.
@ OP, if PBN is yours reconsider adding robots, it will stop only lazy competitors and if you ask me, putting robots is like starting yelling: hey there i am manually doing SEO at my site and trying to hide the facts.... also do not forget that google is not using only Googlebot, so while you blocking other robots you may block Google also and sounding the alarm, since you are blocking G in some crawlers and allowing it on Gbot.

Best wishes.
Leobar!
 

irdeto

Regular Member
Joined
Mar 18, 2010
Messages
391
Reaction score
164
so if we block those Bot then how will be Backlink Crawler Determine the Domain Metrics where we Block them..?? Like Moz DA/PA, Majestic CF/TF..?? that will be depend on the Backlink point to the PBN or any Site...

none of that matters if this is a PBN.
 
Joined
Nov 12, 2013
Messages
26
Reaction score
5
Do you think its likely that Google, who can read the robots.txt file like anyone else, can use the blocking as an alret to further check the site/s?

i.e

If it finds the common Majestic/Ahrefs robots blocked, set aside for manual review.

Possible? Likely?
 

shizzledizzleeee

Regular Member
Joined
Jan 3, 2013
Messages
285
Reaction score
34
No, the goal is to block backlinks which you don't want to appear on your money site link profile on all the Ahrefs, Majestic and other tools, that would keep you away from competitors noticing any bad links and reporting you to G.
and how do i block these backlinks
 

fc-dh

Elite Member
Joined
Oct 20, 2012
Messages
3,076
Reaction score
2,754
Why not use a 301 (sub) domain to point your links to, and block the bots/crawlers on the 301.
 

splishsplash

Jr. Executive VIP
Jr. VIP
Joined
Oct 9, 2013
Messages
2,309
Reaction score
8,439
Website
wolfofblogstreet.com
Ahrefs/Majestic/OSE are all backlink checkers. They crawl the site in question and map the inbound links pointing to that site. They don't (currently) add to that data by adding in additional links that they know are outbound links on other sites they have crawled. Would be an interesting feature but AFAIK they don't do it yet.

As a result, the crawlers' results are limited to the site crawl for that domain. If the bots are blocked from the domain from the outset, they can't report links.

The (&%$&* are you talking about?

What you're saying is ahrefs crawls site.com and then "maps" the inbound links pointing to that site?

If you could "map" inbound links what would be the need for moz/majestic/ahrefs in the first place?

The only way to "map" inbound links is to fire up moz, ahrefs or majestic. Think about what you're saying.

Google, and any other site that knows about the interconnectivity of sites on the Internet SPIDERS web sites. This is why they're called spiders. They crawl the web.

They visit one site and they parse the HTML returned by that site to extract outbound links. They then follow those links and repeat.

This is why it takes Google a few days to a few weeks to pick up your new backlinks. You have to wait until they spider. This is why we backlink our new backlinks on sites that are spidered often.

moz/ahrefs/majestic just do what Google do except on a smaller scale. They purchase massive server farms, run their efficient spiders and store the link relationships in a database. Most likely a graph database is the best option.

There's NO way to stop spiders from picking up your backlinks unless you own the backlinks, as in the case of a PBN. Fact.
 

ecmweb

Regular Member
Joined
Jul 9, 2011
Messages
290
Reaction score
62
Im blocking spiders on my pbn using the htaccess codes above but majestic and opensite are still showing them all on my money sites. What would be the cause of this? thanks
 

cloakndagger2

Regular Member
Joined
Oct 30, 2012
Messages
296
Reaction score
93
This method is flawed and overrated. If you get someone who wants to find your links, they will regardless if you block backlink checkers or not. A better idea is to make your network real so anyone looking at the site will not be able to tell it's a network site.
If you're network site is full of articles on a 101 niches I can't blame you for not wanting it to be seen but don't kid yourself into thinking your network is not traceable just because you block the likes of majestic. It may fool a lazy seo'er not someone more serious.
 

Aatrox

Supreme Member
Joined
Feb 27, 2014
Messages
1,438
Reaction score
1,081
Sorry for bringing this thread up, but I wanna know - is it late to use this as I already posted links on some domains and they can be seen on Majestic/ahrefs and other?
 

seoforaliving

Newbie
Joined
May 16, 2015
Messages
9
Reaction score
0
I think blocking your money site is enough to tell crawlers not to index or report any backlinks about the sites in question.
 

BTSTU

Newbie
Joined
May 17, 2015
Messages
14
Reaction score
1
Sorry for bringing this thread up, but I wanna know - is it late to use this as I already posted links on some domains and they can be seen on Majestic/ahrefs and other?

The links will drop off eventually. The crawler will think the link disappeared.
 

shizzledizzleeee

Regular Member
Joined
Jan 3, 2013
Messages
285
Reaction score
34
Here you go:

Robots.txt:
Code:
User-agent: Rogerbot 
User-agent: Exabot 
User-agent: MJ12bot 
User-agent: Dotbot 
User-agent: Gigabot 
User-agent: AhrefsBot 
User-agent: BlackWidow 
User-agent: Bot\ [EMAIL="[email protected]"]mailto:[email protected][/EMAIL] 
User-agent: ChinaClaw 
User-agent: Custo 
User-agent: DISCo 
User-agent: Download\ Demon 
User-agent: eCatch 
User-agent: EirGrabber 
User-agent: EmailSiphon 
User-agent: EmailWolf 
User-agent: Express\ WebPictures 
User-agent: ExtractorPro 
User-agent: EyeNetIE 
User-agent: FlashGet 
User-agent: GetRight 
User-agent: GetWeb! 
User-agent: Go!Zilla 
User-agent: Go-Ahead-Got-It 
User-agent: GrabNet 
User-agent: Grafula 
User-agent: HMView 
User-agent: HTTrack 
User-agent: Image\ Stripper 
User-agent: Image\ Sucker 
User-agent: Indy\ Library
User-agent: InterGET 
User-agent: Internet\ Ninja 
User-agent: JetCar 
User-agent: JOC\ Web\ Spider 
User-agent: larbin 
User-agent: LeechFTP 
User-agent: Mass\ Downloader 
User-agent: MIDown\ tool 
User-agent: Mister\ PiX 
User-agent: Navroad 
User-agent: NearSite 
User-agent: NetAnts 
User-agent: NetSpider 
User-agent: Net\ Vampire 
User-agent: NetZIP 
User-agent: Octopus 
User-agent: Offline\ Explorer 
User-agent: Offline\ Navigator 
User-agent: PageGrabber 
User-agent: Papa\ Foto 
User-agent: pavuk 
User-agent: pcBrowser 
User-agent: RealDownload 
User-agent: ReGet 
User-agent: SiteSnagger 
User-agent: SmartDownload 
User-agent: SuperBot 
User-agent: SuperHTTP 
User-agent: Surfbot 
User-agent: tAkeOut 
User-agent: Teleport\ Pro 
User-agent: VoidEYE 
User-agent: Web\ Image\ Collector 
User-agent: Web\ Sucker 
User-agent: WebAuto 
User-agent: WebCopier 
User-agent: WebFetch 
User-agent: WebGo\ IS 
User-agent: WebLeacher 
User-agent: WebReaper 
User-agent: WebSauger 
User-agent: Website\ eXtractor 
User-agent: Website\ Quester 
User-agent: WebStripper 
User-agent: WebWhacker 
User-agent: WebZIP 
User-agent: Wget 
User-agent: Widow 
User-agent: WWWOFFLE 
User-agent: Xaldon\ WebSpider 
User-agent: Zeus
Disallow: /

.htaccess:
Code:
SetEnvIfNoCase User-Agent .*rogerbot.* bad_bot
SetEnvIfNoCase User-Agent .*exabot.* bad_bot
SetEnvIfNoCase User-Agent .*mj12bot.* bad_bot
SetEnvIfNoCase User-Agent .*dotbot.* bad_bot
SetEnvIfNoCase User-Agent .*gigabot.* bad_bot
SetEnvIfNoCase User-Agent .*ahrefsbot.* bad_bot
SetEnvIfNoCase User-Agent .*sitebot.* bad_bot
<Limit GET POST HEAD>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>

now am also going to block all crawling bots
 
Last edited:

alwaystoday

Junior Member
Joined
Jan 23, 2016
Messages
183
Reaction score
66
Hi all, I am just about to build a PBN. One of the first things I am now looking into after reading,post #13 http://www.blackhatworld.com/blackh...manage-rank-pbns-post7132576.html#post7132576 in this QnA on PBNs - is how to block sites from crawling the PBN like ahrefs majestic etc.

I noticed the original code here was provided 3 years ago. Could someone with relevent experience let me know if this code is still good, or provide a better version which might be updated for 2016.

Also could you advise if you think there are any other precautions we should take when building a PBN to try to limit the crawling / snooping of unwanted sites etc.

Thanks
 

shadow2015

Regular Member
Joined
Jan 17, 2015
Messages
301
Reaction score
74
Hi all, I am just about to build a PBN. One of the first things I am now looking into after reading,post #13 http://www.blackhatworld.com/blackh...manage-rank-pbns-post7132576.html#post7132576 in this QnA on PBNs - is how to block sites from crawling the PBN like ahrefs majestic etc.

I noticed the original code here was provided 3 years ago. Could someone with relevent experience let me know if this code is still good, or provide a better version which might be updated for 2016.

Also could you advise if you think there are any other precautions we should take when building a PBN to try to limit the crawling / snooping of unwanted sites etc.

Thanks

agreed, I am using this code and it looks like MJ seo still knows about my back links!! BTW, I have only block this via httpaccess
 
Top