Anybody with the code to block Majestic, Ahrefs and all others from crawling a site

aivaras

Newbie
Joined
Oct 14, 2012
Messages
41
Reaction score
12
Looking for some help if anybody has up to date htaccess code for blocking all major site crawlers like Ahrefs and Majestic. This would be obviously helpful to avoid competitors digging into any pages you dont want to appear in your link profile.
 

gorang

Elite Member
Joined
Dec 6, 2008
Messages
1,894
Reaction score
1,676
You would need to own all of the websites which link to you.

If you block ahrefs/majestic from accessing your own website it will not make a difference, they don't need to crawl your website to see the backlinks pointing to it.
 

aivaras

Newbie
Joined
Oct 14, 2012
Messages
41
Reaction score
12
No, the goal is to block backlinks which you don't want to appear on your money site link profile on all the Ahrefs, Majestic and other tools, that would keep you away from competitors noticing any bad links and reporting you to G.
 

Smeems

Regular Member
Joined
Apr 29, 2012
Messages
426
Reaction score
423
Here you go:

Robots.txt:
Code:
User-agent: Rogerbot 
User-agent: Exabot 
User-agent: MJ12bot 
User-agent: Dotbot 
User-agent: Gigabot 
User-agent: AhrefsBot 
User-agent: BlackWidow 
User-agent: Bot\ [EMAIL="[email protected]"]mailto:[email protected][/EMAIL] 
User-agent: ChinaClaw 
User-agent: Custo 
User-agent: DISCo 
User-agent: Download\ Demon 
User-agent: eCatch 
User-agent: EirGrabber 
User-agent: EmailSiphon 
User-agent: EmailWolf 
User-agent: Express\ WebPictures 
User-agent: ExtractorPro 
User-agent: EyeNetIE 
User-agent: FlashGet 
User-agent: GetRight 
User-agent: GetWeb! 
User-agent: Go!Zilla 
User-agent: Go-Ahead-Got-It 
User-agent: GrabNet 
User-agent: Grafula 
User-agent: HMView 
User-agent: HTTrack 
User-agent: Image\ Stripper 
User-agent: Image\ Sucker 
User-agent: Indy\ Library
User-agent: InterGET 
User-agent: Internet\ Ninja 
User-agent: JetCar 
User-agent: JOC\ Web\ Spider 
User-agent: larbin 
User-agent: LeechFTP 
User-agent: Mass\ Downloader 
User-agent: MIDown\ tool 
User-agent: Mister\ PiX 
User-agent: Navroad 
User-agent: NearSite 
User-agent: NetAnts 
User-agent: NetSpider 
User-agent: Net\ Vampire 
User-agent: NetZIP 
User-agent: Octopus 
User-agent: Offline\ Explorer 
User-agent: Offline\ Navigator 
User-agent: PageGrabber 
User-agent: Papa\ Foto 
User-agent: pavuk 
User-agent: pcBrowser 
User-agent: RealDownload 
User-agent: ReGet 
User-agent: SiteSnagger 
User-agent: SmartDownload 
User-agent: SuperBot 
User-agent: SuperHTTP 
User-agent: Surfbot 
User-agent: tAkeOut 
User-agent: Teleport\ Pro 
User-agent: VoidEYE 
User-agent: Web\ Image\ Collector 
User-agent: Web\ Sucker 
User-agent: WebAuto 
User-agent: WebCopier 
User-agent: WebFetch 
User-agent: WebGo\ IS 
User-agent: WebLeacher 
User-agent: WebReaper 
User-agent: WebSauger 
User-agent: Website\ eXtractor 
User-agent: Website\ Quester 
User-agent: WebStripper 
User-agent: WebWhacker 
User-agent: WebZIP 
User-agent: Wget 
User-agent: Widow 
User-agent: WWWOFFLE 
User-agent: Xaldon\ WebSpider 
User-agent: Zeus
Disallow: /

.htaccess:
Code:
SetEnvIfNoCase User-Agent .*rogerbot.* bad_bot
SetEnvIfNoCase User-Agent .*exabot.* bad_bot
SetEnvIfNoCase User-Agent .*mj12bot.* bad_bot
SetEnvIfNoCase User-Agent .*dotbot.* bad_bot
SetEnvIfNoCase User-Agent .*gigabot.* bad_bot
SetEnvIfNoCase User-Agent .*ahrefsbot.* bad_bot
SetEnvIfNoCase User-Agent .*sitebot.* bad_bot
<Limit GET POST HEAD>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>
 

gorang

Elite Member
Joined
Dec 6, 2008
Messages
1,894
Reaction score
1,676
No, the goal is to block backlinks which you don't want to appear on your money site link profile on all the Ahrefs, Majestic and other tools, that would keep you away from competitors noticing any bad links and reporting you to G.

But that would require you to have access to the sites linking to you.

Ahrefs/Majestic don't need to crawl your website in order to see the backlinks pointing to it as I understand it.
 

Smeems

Regular Member
Joined
Apr 29, 2012
Messages
426
Reaction score
423
But that would require you to have access to the sites linking to you.

Ahrefs/Majestic don't need to crawl your website in order to see the backlinks pointing to it as I understand it.

Ahrefs/Majestic/OSE are all backlink checkers. They crawl the site in question and map the inbound links pointing to that site. They don't (currently) add to that data by adding in additional links that they know are outbound links on other sites they have crawled. Would be an interesting feature but AFAIK they don't do it yet.

As a result, the crawlers' results are limited to the site crawl for that domain. If the bots are blocked from the domain from the outset, they can't report links.
 

gorang

Elite Member
Joined
Dec 6, 2008
Messages
1,894
Reaction score
1,676
Ahrefs/Majestic/OSE are all backlink checkers. They crawl the site in question and map the inbound links pointing to that site. They don't (currently) add to that data by adding in additional links that they know are outbound links on other sites they have crawled. Would be an interesting feature but AFAIK they don't do it yet.

As a result, the crawlers' results are limited to the site crawl for that domain. If the bots are blocked from the domain from the outset, they can't report links.

Hey Smeems

This doesn't seem logical to me. Maybe i'm just being dumb, but the way I imagine it, they crawl a website and log the outbound links. I don't see how they crawl a site and map its inbound links as there's no way to see inbound links without crawling other websites and logging outbound links.

So even if they can't crawl a website and they are completely blocked, they can still crawl all the other websites pointing to it and report the backlinks that point to it.
 

Psychobilly1

Junior Member
Joined
Dec 7, 2006
Messages
178
Reaction score
52
I use this to keep my private network a secret, sites I control. I don't care if my competitors see my other links, nothing you can do about that.
 

gorang

Elite Member
Joined
Dec 6, 2008
Messages
1,894
Reaction score
1,676
Hey guys

I spoke to someone at Majestic to confirm if it is possible to block them from reporting on your backlinks by the use of .htaccess rules. This is the reply.

majesticstaff said:
You are correct, you are at liberty to block our crawlers from visting your website we do respect the robots.txt standards. But this will only mean that we cannot collect information from your website, such as links to external sites, page titles, etc. Any information published about your website on other websites which we are permitted to crawl will be read and indexed.

Kind Regards
Chris

So unless you have control over the sites you build backlinks on, then you are not able to block your backlinks from showing up in their tool. I assume this is the same for Ahrefs.
 

jupiter1984

Newbie
Joined
May 11, 2012
Messages
2
Reaction score
0
Thanks for this! Have been looking for something like this for a while!

For a wordpress installation do you simply just copy the htaccess part to the bottom of your htaccess file?
 

Zso

BANNED
Joined
Nov 18, 2007
Messages
497
Reaction score
912
gorang, will you spend your day to explain the obvious?

Not everyone is a fan of web 2.0 blogs and social bookmarking. This is for those who want to block Ahrefs/Majestic and other bots from their private network sites.

If you're having a few web 2.0 blogs to your site, then that will show on Ahrefs. If you're having 10-15 high PR backlinks from a network that is private then you'll have the competition wondering how you're ranking.

Everyone figured it out...
 

gorang

Elite Member
Joined
Dec 6, 2008
Messages
1,894
Reaction score
1,676
gorang, will you spend your day to explain the obvious?

Not everyone is a fan of web 2.0 blogs and social bookmarking. This is for those who want to block Ahrefs/Majestic and other bots from their private network sites.

If you're having a few web 2.0 blogs to your site, then that will show on Ahrefs. If you're having 10-15 high PR backlinks from a network that is private then you'll have the competition wondering how you're ranking.

Everyone figured it out...

I didn't seem like everyone figured it out, but yes I do already understand why people might want to block majestic and other crawlers from their networks. I already employ this myself.
 

cloakndagger2

Regular Member
Joined
Oct 30, 2012
Messages
296
Reaction score
93
It's not too hard to find networks, yes blocking bots on your network sites will deter a lazy seo, won't deter someone who knows what they are doing though and it isn't exactly hard work either.
 

JustUs

Power Member
Joined
May 6, 2012
Messages
626
Reaction score
597
It's not too hard to find networks, yes blocking bots on your network sites will deter a lazy seo, won't deter someone who knows what they are doing though and it isn't exactly hard work either.

While true, it sure cuts down on the bandwidth. I blocked Ahref's by IP, not because I cared if they crawled or not, but because ahrefs consumed more bandwidth than my users and the the search engines - excluding the Chinese crawlers which are also banned- combined. I also sent email to ahref's informing them that they are unwelcome on the site. Ahref's responded with a line of malarkey about how it would harm my site. As a result of this email, I personally consider ahref's to be the crawling equivalent of a virus.
 

Laubster

Senior Member
Joined
May 21, 2013
Messages
1,008
Reaction score
383
I didn't seem like everyone figured it out, but yes I do already understand why people might want to block majestic and other crawlers from their networks. I already employ this myself.

Then why are you continuously asking for an explanation if you already do it yourself...
 

Corydoras007

Regular Member
Joined
Sep 17, 2012
Messages
495
Reaction score
91
didn't see the list blocking open site explorer... Will it block ose?

But definitely a useful old post.
 

SaulGoodman

Registered Member
Joined
Oct 8, 2013
Messages
72
Reaction score
43
Irony at it's best, a spam comment right under a post on how to block crawlers and spammers... http://a7host.wordpress.com/2012/04/26/how-to-block-bad-bots-crawlers-scrapers-and-malwares/


Another thing to consider is installing a honeypot script in order to even block unknown or new bots from stealing your data/bandwidth.
If you are interested in such things check out : http://jetfar.com/trap-content-scraper-spam-harvester-bots-using-honeypot-wordpress-htaccess/

Most important bots to block if you are trying to hide your PBN :

Majestic SEO -> User-agent: MJ12bot
MOZ OpenSiteExplorer -> User-agent: rogerbot
Ahrefs -> User-agent: Ahrefs

Here's another great list with lots of bad bots (including Smeems List) :

Code:
User-agent: libwww-perl
User-agent: libwwwperl
User-agent: attach
User-agent: ASPSeek
User-agent: appie
User-agent: AbachoBOT
User-agent: autoemailspider
User-agent: anarchie
User-agent: antibot
User-agent: asterias
User-agent: B2w
User-agent: BackWeb
User-agent: BackDoorBot
User-agent: Bandit
User-agent: BatchFTP
User-agent: Black\ Hole
User-agent: Baidu
User-agent: BlowFish
User-agent: BuiltBotTough
User-agent: Bot\ mailto
User-agent: BotALot
User-agent: Buddy
User-agent: Bullseye
User-agent: bumblebee
User-agent: BunnySlippers
User-agent: ClariaBot
User-agent: curl
User-agent: clsHTTP
User-agent: ChinaClaw
User-agent: CheeseBot
User-agent: CherryPicker
User-agent: Crescent
User-agent: CherryPickerSE
User-agent: CherryPickerElite
User-agent: Collector
User-agent: COAST\ WebMaster
User-agent: cosmos
User-agent: CopyRightCheck
User-agent: ColdFusion
User-agent: Copier
User-agent: Crescent
User-agent: DA
User-agent: DTS\ Agent
User-agent: DISCo\ Pump
User-agent: DittoSpyder
User-agent: Diamond
User-agent: Download\ Demon
User-agent: Download\ Wonder
User-agent: Downloader
User-agent: dloader
User-agent: Drip
User-agent: eCatch
User-agent: EirGrabber
User-agent: Express\ WebPictures
User-agent: Extreme\ Picture\ Finder
User-agent: EmailCollector
User-agent: EmailSiphon
User-agent: EmailWolf
User-agent: EasyDL
User-agent: EirGrabber
User-agent: EroCrawler
User-agent: ExtractorPro
User-agent: EyeNetIE
User-agent: FAST\ WebCrawler
User-agent: FileHound
User-agent: Fetch\ API\ Request
User-agent: FlashGet
User-agent: FlickBot
User-agent: FrontPage
User-agent: FreeFind.com
User-agent: GetRight
User-agent: GetSmart
User-agent: Generic
User-agent: Go!Zilla
User-agent: Go-Ahead-Got-It
User-agent: gotit
User-agent: Grabber
User-agent: GrabNet
User-agent: Grafula
User-agent: Gulliver
User-agent: Harvest
User-agent: HMView
User-agent: Heretrix
User-agent: HitboxDoctor
User-agent: HTTPapp
User-agent: HTTrack
User-agent: HTTPTrack
User-agent: HTTPviewer
User-agent: httplib
User-agent: httpfetcher
User-agent: httpscraper
User-agent: hloader
User-agent: humanlinks
User-agent: ia_archiver
User-agent: InterGET
User-agent: Internet\ Ninja
User-agent: InfoNaviRobot
User-agent: InternetSeer.com
User-agent: Iria
User-agent: IRLbot
User-agent: JetCar
User-agent: JOC
User-agent: JOC\ Web\ Spider
User-agent: JoBo
User-agent: Java
User-agent: JustView
User-agent: Jonzilla
User-agent: JennyBot
User-agent: Kenjin\ Spider
User-agent: Keyword\ Density
User-agent: larbin
User-agent: LeechFTP
User-agent: Lachesis
User-agent: LexiBot
User-agent: libWeb
User-agent: Libby_
User-agent: LinkScan
User-agent: LinkWalker
User-agent: LinkextractorPro
User-agent: lftp
User-agent: likse
User-agent: Link
User-agent: lwp-trivial
User-agent: lwp\ request
User-agent: Magnet
User-agent: Mag-Net
User-agent: Mass\ Downloader
User-agent: MIIxpc
User-agent: Microsoft\ URL\ Control
User-agent: MSFrontPage
User-agent: MSIECrawler
User-agent: MicrosoftURL
User-agent: Missigua
User-agent: Mewsoft\ Search\ Engine
User-agent: moget
User-agent: Mata\ Hari
User-agent: Memo
User-agent: Metacarta
User-agent: Mercator
User-agent: MIDown\ tool
User-agent: MFC_Tear_Sample
User-agent: Mirror
User-agent: MIIxpc
User-agent: Mister\ PiX
User-agent: NationalDirectory\ WebSpider
User-agent: NICErsPRO
User-agent: Nikto
User-agent: Navroad
User-agent: NearSite
User-agent: NetAnts
User-agent: NetSpider
User-agent: NICErsPRO
User-agent: NetResearchServer
User-agent: NetMechanic
User-agent: Net\ Vampire
User-agent: Net\ Probe
User-agent: NetZip
User-agent: nexuscache
User-agent: Ninja
User-agent: NPBot
User-agent: our\ agent
User-agent: onestop
User-agent: oBot
User-agent: Octopus
User-agent: Offline\ Explorer
User-agent: Openfind
User-agent: Openfind\ data\ gatherer
User-agent: OrangeBot
User-agent: PageGrabber
User-agent: Papa\ Foto
User-agent: PHP\ version
User-agent: PHP
User-agent: PHPot
User-agent: Perl
User-agent: pcBrowser
User-agent: pavuk
User-agent: Pockey
User-agent: Ping
User-agent: PingALink\ Monitoring\ Services
User-agent: ProWebWalker
User-agent: ProPowerBot
User-agent: Pump
User-agent: Pompos
User-agent: psbot
User-agent: Python\ urllib
User-agent: Python-urllib
User-agent: QueryN
User-agent: RealDownload
User-agent: Reaper
User-agent: Recorder
User-agent: RepoMonkey
User-agent: psycheclone
User-agent: RMA
User-agent: Rico
User-agent: Robozilla
User-agent: ReGet
User-agent: Siphon
User-agent: SiteSnagger
User-agent: sitecheck.internetseer.com
User-agent: SmartDownload
User-agent: Snake
User-agent: spanner
User-agent: Stealer
User-agent: SpaceBison
User-agent: SpankBot
User-agent: Spinne
User-agent: Stripper
User-agent: slysearch
User-agent: Sucker
User-agent: Snoopy
User-agent: ScoutAbout
User-agent: Scooter
User-agent: SuperBot
User-agent: SuperHTTP
User-agent: Snapbot
User-agent: Surfbot
User-agent: suzuran
User-agent: Szukacz
User-agent: Sqworm
User-agent: tAkeOut
User-agent: Teleport\ Pro
User-agent: Telesoft
User-agent: TurnitinBot
User-agent: turingos
User-agent: toCrawl
User-agent: TightTwatBot
User-agent: True_Robot
User-agent: The\ Intraformant
User-agent: TheNomad
User-agent: Titan
User-agent: UrlDispatcher
User-agent: URLy\ Warning
User-agent: Vayala
User-agent: Vagabondo
User-agent: Vintage
User-agent: Vacuum
User-agent: VCI
User-agent: VoidEYE
User-agent: W3C_Validator
User-agent: Webdownloader
User-agent: Web\ Downloader
User-agent: Webhook
User-agent: Webmole
User-agent: Webminer
User-agent: Webmirror
User-agent: Websucker
User-agent: Websites
User-agent: Web\ Image\ Collector
User-agent: Web\ Sucker
User-agent: WebAuto
User-agent: WebCopier
User-agent: WebFetch
User-agent: WebReaper
User-agent: WebSauger
User-agent: Website
User-agent: Webster
User-agent: WebStripper
User-agent: WebCopier
User-agent: WebViewer
User-agent: WebWhacker
User-agent: WebEnhancer
User-agent: Wells
User-agent: WebZIP
User-agent: Wget
User-agent: Whacker
User-agent: Widow
User-agent: Xaldon
User-agent: Wildsoft\ Surfer
User-agent: WinHttpRequest
User-agent: WinHttp
User-agent: Webster\ Pro
User-agent: Web\ Image\ Collector
User-agent: WebZip
User-agent: WebAuto
User-agent: Website\ Quester
User-agent: WWWOFFLE
User-agent: WWW-Collector-E
User-agent: Xaldon\ WebSpider
User-agent: Xenu
User-agent: Xara
User-agent: Y!TunnelPro
User-agent: YahooYSMcm
User-agent: Zade
User-agent: ZBot
User-agent: Zeus
User-agent: Rogerbot 
User-agent: Exabot 
User-agent: MJ12bot 
User-agent: Dotbot 
User-agent: Gigabot 
User-agent: AhrefsBot 
User-agent: BlackWidow 
User-agent: Bot\ mailto:[email protected] 
User-agent: ChinaClaw 
User-agent: Custo 
User-agent: DISCo 
User-agent: Download\ Demon 
User-agent: eCatch 
User-agent: EirGrabber 
User-agent: EmailSiphon 
User-agent: EmailWolf 
User-agent: Express\ WebPictures 
User-agent: ExtractorPro 
User-agent: EyeNetIE 
User-agent: FlashGet 
User-agent: GetRight 
User-agent: GetWeb! 
User-agent: Go!Zilla 
User-agent: Go-Ahead-Got-It 
User-agent: GrabNet 
User-agent: Grafula 
User-agent: HMView 
User-agent: HTTrack 
User-agent: Image\ Stripper 
User-agent: Image\ Sucker 
User-agent: Indy\ Library
User-agent: InterGET 
User-agent: Internet\ Ninja 
User-agent: JetCar 
User-agent: JOC\ Web\ Spider 
User-agent: larbin 
User-agent: LeechFTP 
User-agent: Mass\ Downloader 
User-agent: MIDown\ tool 
User-agent: Mister\ PiX 
User-agent: Navroad 
User-agent: NearSite 
User-agent: NetAnts 
User-agent: NetSpider 
User-agent: Net\ Vampire 
User-agent: NetZIP 
User-agent: Octopus 
User-agent: Offline\ Explorer 
User-agent: Offline\ Navigator 
User-agent: PageGrabber 
User-agent: Papa\ Foto 
User-agent: pavuk 
User-agent: pcBrowser 
User-agent: RealDownload 
User-agent: ReGet 
User-agent: SiteSnagger 
User-agent: SmartDownload 
User-agent: SuperBot 
User-agent: SuperHTTP 
User-agent: Surfbot 
User-agent: tAkeOut 
User-agent: Teleport\ Pro 
User-agent: VoidEYE 
User-agent: Web\ Image\ Collector 
User-agent: Web\ Sucker 
User-agent: WebAuto 
User-agent: WebCopier 
User-agent: WebFetch 
User-agent: WebGo\ IS 
User-agent: WebLeacher 
User-agent: WebReaper 
User-agent: WebSauger 
User-agent: Website\ eXtractor 
User-agent: Website\ Quester 
User-agent: WebStripper 
User-agent: WebWhacker 
User-agent: WebZIP 
User-agent: Wget 
User-agent: Widow 
User-agent: WWWOFFLE 
User-agent: Xaldon\ WebSpider
 
Last edited:

csmbh

Newbie
Joined
Feb 7, 2014
Messages
2
Reaction score
0
Is there any reason to not simply block every bot and just allow ones you want through?

ie. something like (adding other SE's & services you use):

User-agent: *
Disallow: /


User-agent: Googlebot
Allow: /
 
Top