1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Anybody with the code to block Majestic, Ahrefs and all others from crawling a site

Discussion in 'Black Hat SEO' started by aivaras, Jul 10, 2013.

  1. aivaras

    aivaras Newbie

    Joined:
    Oct 14, 2012
    Messages:
    41
    Likes Received:
    12
    Looking for some help if anybody has up to date htaccess code for blocking all major site crawlers like Ahrefs and Majestic. This would be obviously helpful to avoid competitors digging into any pages you dont want to appear in your link profile.
     
  2. gorang

    gorang Elite Member

    Joined:
    Dec 6, 2008
    Messages:
    1,891
    Likes Received:
    1,650
    Occupation:
    SEO Consultant - Marketing Strategy
    Location:
    UK
    You would need to own all of the websites which link to you.

    If you block ahrefs/majestic from accessing your own website it will not make a difference, they don't need to crawl your website to see the backlinks pointing to it.
     
    • Thanks Thanks x 2
  3. aivaras

    aivaras Newbie

    Joined:
    Oct 14, 2012
    Messages:
    41
    Likes Received:
    12
    No, the goal is to block backlinks which you don't want to appear on your money site link profile on all the Ahrefs, Majestic and other tools, that would keep you away from competitors noticing any bad links and reporting you to G.
     
  4. Smeems

    Smeems Regular Member

    Joined:
    Apr 29, 2012
    Messages:
    425
    Likes Received:
    417
    Here you go:

    Robots.txt:
    Code:
    User-agent: Rogerbot 
    User-agent: Exabot 
    User-agent: MJ12bot 
    User-agent: Dotbot 
    User-agent: Gigabot 
    User-agent: AhrefsBot 
    User-agent: BlackWidow 
    User-agent: Bot\ [EMAIL="craftbot@yahoo.com"]mailto:craftbot@yahoo.com[/EMAIL] 
    User-agent: ChinaClaw 
    User-agent: Custo 
    User-agent: DISCo 
    User-agent: Download\ Demon 
    User-agent: eCatch 
    User-agent: EirGrabber 
    User-agent: EmailSiphon 
    User-agent: EmailWolf 
    User-agent: Express\ WebPictures 
    User-agent: ExtractorPro 
    User-agent: EyeNetIE 
    User-agent: FlashGet 
    User-agent: GetRight 
    User-agent: GetWeb! 
    User-agent: Go!Zilla 
    User-agent: Go-Ahead-Got-It 
    User-agent: GrabNet 
    User-agent: Grafula 
    User-agent: HMView 
    User-agent: HTTrack 
    User-agent: Image\ Stripper 
    User-agent: Image\ Sucker 
    User-agent: Indy\ Library
    User-agent: InterGET 
    User-agent: Internet\ Ninja 
    User-agent: JetCar 
    User-agent: JOC\ Web\ Spider 
    User-agent: larbin 
    User-agent: LeechFTP 
    User-agent: Mass\ Downloader 
    User-agent: MIDown\ tool 
    User-agent: Mister\ PiX 
    User-agent: Navroad 
    User-agent: NearSite 
    User-agent: NetAnts 
    User-agent: NetSpider 
    User-agent: Net\ Vampire 
    User-agent: NetZIP 
    User-agent: Octopus 
    User-agent: Offline\ Explorer 
    User-agent: Offline\ Navigator 
    User-agent: PageGrabber 
    User-agent: Papa\ Foto 
    User-agent: pavuk 
    User-agent: pcBrowser 
    User-agent: RealDownload 
    User-agent: ReGet 
    User-agent: SiteSnagger 
    User-agent: SmartDownload 
    User-agent: SuperBot 
    User-agent: SuperHTTP 
    User-agent: Surfbot 
    User-agent: tAkeOut 
    User-agent: Teleport\ Pro 
    User-agent: VoidEYE 
    User-agent: Web\ Image\ Collector 
    User-agent: Web\ Sucker 
    User-agent: WebAuto 
    User-agent: WebCopier 
    User-agent: WebFetch 
    User-agent: WebGo\ IS 
    User-agent: WebLeacher 
    User-agent: WebReaper 
    User-agent: WebSauger 
    User-agent: Website\ eXtractor 
    User-agent: Website\ Quester 
    User-agent: WebStripper 
    User-agent: WebWhacker 
    User-agent: WebZIP 
    User-agent: Wget 
    User-agent: Widow 
    User-agent: WWWOFFLE 
    User-agent: Xaldon\ WebSpider 
    User-agent: Zeus
    Disallow: /
    .htaccess:
    Code:
    SetEnvIfNoCase User-Agent .*rogerbot.* bad_bot
    SetEnvIfNoCase User-Agent .*exabot.* bad_bot
    SetEnvIfNoCase User-Agent .*mj12bot.* bad_bot
    SetEnvIfNoCase User-Agent .*dotbot.* bad_bot
    SetEnvIfNoCase User-Agent .*gigabot.* bad_bot
    SetEnvIfNoCase User-Agent .*ahrefsbot.* bad_bot
    SetEnvIfNoCase User-Agent .*sitebot.* bad_bot
    <Limit GET POST HEAD>
    Order Allow,Deny
    Allow from all
    Deny from env=bad_bot
    </Limit>
    
     
    • Thanks Thanks x 37
  5. aivaras

    aivaras Newbie

    Joined:
    Oct 14, 2012
    Messages:
    41
    Likes Received:
    12
    Thanks a lot man! huge help.
     
  6. gorang

    gorang Elite Member

    Joined:
    Dec 6, 2008
    Messages:
    1,891
    Likes Received:
    1,650
    Occupation:
    SEO Consultant - Marketing Strategy
    Location:
    UK
    But that would require you to have access to the sites linking to you.

    Ahrefs/Majestic don't need to crawl your website in order to see the backlinks pointing to it as I understand it.
     
  7. Smeems

    Smeems Regular Member

    Joined:
    Apr 29, 2012
    Messages:
    425
    Likes Received:
    417
    Ahrefs/Majestic/OSE are all backlink checkers. They crawl the site in question and map the inbound links pointing to that site. They don't (currently) add to that data by adding in additional links that they know are outbound links on other sites they have crawled. Would be an interesting feature but AFAIK they don't do it yet.

    As a result, the crawlers' results are limited to the site crawl for that domain. If the bots are blocked from the domain from the outset, they can't report links.
     
    • Thanks Thanks x 1
  8. Free Man

    Free Man Junior Member

    Joined:
    Mar 31, 2013
    Messages:
    165
    Likes Received:
    115
    This seems logic
     
  9. gorang

    gorang Elite Member

    Joined:
    Dec 6, 2008
    Messages:
    1,891
    Likes Received:
    1,650
    Occupation:
    SEO Consultant - Marketing Strategy
    Location:
    UK
    Hey Smeems

    This doesn't seem logical to me. Maybe i'm just being dumb, but the way I imagine it, they crawl a website and log the outbound links. I don't see how they crawl a site and map its inbound links as there's no way to see inbound links without crawling other websites and logging outbound links.

    So even if they can't crawl a website and they are completely blocked, they can still crawl all the other websites pointing to it and report the backlinks that point to it.
     
    • Thanks Thanks x 1
  10. Psychobilly1

    Psychobilly1 Junior Member

    Joined:
    Dec 7, 2006
    Messages:
    176
    Likes Received:
    52
    I use this to keep my private network a secret, sites I control. I don't care if my competitors see my other links, nothing you can do about that.
     
  11. gorang

    gorang Elite Member

    Joined:
    Dec 6, 2008
    Messages:
    1,891
    Likes Received:
    1,650
    Occupation:
    SEO Consultant - Marketing Strategy
    Location:
    UK
    Hey guys

    I spoke to someone at Majestic to confirm if it is possible to block them from reporting on your backlinks by the use of .htaccess rules. This is the reply.

    So unless you have control over the sites you build backlinks on, then you are not able to block your backlinks from showing up in their tool. I assume this is the same for Ahrefs.
     
    • Thanks Thanks x 3
  12. jupiter1984

    jupiter1984 Newbie

    Joined:
    May 11, 2012
    Messages:
    2
    Likes Received:
    0
    Thanks for this! Have been looking for something like this for a while!

    For a wordpress installation do you simply just copy the htaccess part to the bottom of your htaccess file?
     
  13. Zso

    Zso BANNED BANNED

    Joined:
    Nov 18, 2007
    Messages:
    497
    Likes Received:
    880
    gorang, will you spend your day to explain the obvious?

    Not everyone is a fan of web 2.0 blogs and social bookmarking. This is for those who want to block Ahrefs/Majestic and other bots from their private network sites.

    If you're having a few web 2.0 blogs to your site, then that will show on Ahrefs. If you're having 10-15 high PR backlinks from a network that is private then you'll have the competition wondering how you're ranking.

    Everyone figured it out...
     
  14. gorang

    gorang Elite Member

    Joined:
    Dec 6, 2008
    Messages:
    1,891
    Likes Received:
    1,650
    Occupation:
    SEO Consultant - Marketing Strategy
    Location:
    UK
    I didn't seem like everyone figured it out, but yes I do already understand why people might want to block majestic and other crawlers from their networks. I already employ this myself.
     
  15. cloakndagger2

    cloakndagger2 Regular Member

    Joined:
    Oct 30, 2012
    Messages:
    294
    Likes Received:
    88
    It's not too hard to find networks, yes blocking bots on your network sites will deter a lazy seo, won't deter someone who knows what they are doing though and it isn't exactly hard work either.
     
  16. JustUs

    JustUs Power Member

    Joined:
    May 6, 2012
    Messages:
    609
    Likes Received:
    452
    While true, it sure cuts down on the bandwidth. I blocked Ahref's by IP, not because I cared if they crawled or not, but because ahrefs consumed more bandwidth than my users and the the search engines - excluding the Chinese crawlers which are also banned- combined. I also sent email to ahref's informing them that they are unwelcome on the site. Ahref's responded with a line of malarkey about how it would harm my site. As a result of this email, I personally consider ahref's to be the crawling equivalent of a virus.
     
  17. Laubster

    Laubster Senior Member Premium Member

    Joined:
    May 21, 2013
    Messages:
    1,008
    Likes Received:
    377
    Occupation:
    Self employed
    Location:
    I Travel A Lot
    Home Page:
    Then why are you continuously asking for an explanation if you already do it yourself...
     
  18. Corydoras007

    Corydoras007 Regular Member

    Joined:
    Sep 17, 2012
    Messages:
    303
    Likes Received:
    53
    didn't see the list blocking open site explorer... Will it block ose?

    But definitely a useful old post.
     
  19. SaulGoodman

    SaulGoodman Registered Member

    Joined:
    Oct 8, 2013
    Messages:
    64
    Likes Received:
    38
    Location:
    BHW
    Irony at it's best, a spam comment right under a post on how to block crawlers and spammers... http://a7host.wordpress.com/2012/04/26/how-to-block-bad-bots-crawlers-scrapers-and-malwares/


    Another thing to consider is installing a honeypot script in order to even block unknown or new bots from stealing your data/bandwidth.
    If you are interested in such things check out : http://jetfar.com/trap-content-scraper-spam-harvester-bots-using-honeypot-wordpress-htaccess/

    Most important bots to block if you are trying to hide your PBN :

    Majestic SEO -> User-agent: MJ12bot
    MOZ OpenSiteExplorer -> User-agent: rogerbot
    Ahrefs -> User-agent: Ahrefs

    Here's another great list with lots of bad bots (including Smeems List) :

    Code:
    User-agent: libwww-perl
    User-agent: libwwwperl
    User-agent: attach
    User-agent: ASPSeek
    User-agent: appie
    User-agent: AbachoBOT
    User-agent: autoemailspider
    User-agent: anarchie
    User-agent: antibot
    User-agent: asterias
    User-agent: B2w
    User-agent: BackWeb
    User-agent: BackDoorBot
    User-agent: Bandit
    User-agent: BatchFTP
    User-agent: Black\ Hole
    User-agent: Baidu
    User-agent: BlowFish
    User-agent: BuiltBotTough
    User-agent: Bot\ mailto
    User-agent: BotALot
    User-agent: Buddy
    User-agent: Bullseye
    User-agent: bumblebee
    User-agent: BunnySlippers
    User-agent: ClariaBot
    User-agent: curl
    User-agent: clsHTTP
    User-agent: ChinaClaw
    User-agent: CheeseBot
    User-agent: CherryPicker
    User-agent: Crescent
    User-agent: CherryPickerSE
    User-agent: CherryPickerElite
    User-agent: Collector
    User-agent: COAST\ WebMaster
    User-agent: cosmos
    User-agent: CopyRightCheck
    User-agent: ColdFusion
    User-agent: Copier
    User-agent: Crescent
    User-agent: DA
    User-agent: DTS\ Agent
    User-agent: DISCo\ Pump
    User-agent: DittoSpyder
    User-agent: Diamond
    User-agent: Download\ Demon
    User-agent: Download\ Wonder
    User-agent: Downloader
    User-agent: dloader
    User-agent: Drip
    User-agent: eCatch
    User-agent: EirGrabber
    User-agent: Express\ WebPictures
    User-agent: Extreme\ Picture\ Finder
    User-agent: EmailCollector
    User-agent: EmailSiphon
    User-agent: EmailWolf
    User-agent: EasyDL
    User-agent: EirGrabber
    User-agent: EroCrawler
    User-agent: ExtractorPro
    User-agent: EyeNetIE
    User-agent: FAST\ WebCrawler
    User-agent: FileHound
    User-agent: Fetch\ API\ Request
    User-agent: FlashGet
    User-agent: FlickBot
    User-agent: FrontPage
    User-agent: FreeFind.com
    User-agent: GetRight
    User-agent: GetSmart
    User-agent: Generic
    User-agent: Go!Zilla
    User-agent: Go-Ahead-Got-It
    User-agent: gotit
    User-agent: Grabber
    User-agent: GrabNet
    User-agent: Grafula
    User-agent: Gulliver
    User-agent: Harvest
    User-agent: HMView
    User-agent: Heretrix
    User-agent: HitboxDoctor
    User-agent: HTTPapp
    User-agent: HTTrack
    User-agent: HTTPTrack
    User-agent: HTTPviewer
    User-agent: httplib
    User-agent: httpfetcher
    User-agent: httpscraper
    User-agent: hloader
    User-agent: humanlinks
    User-agent: ia_archiver
    User-agent: InterGET
    User-agent: Internet\ Ninja
    User-agent: InfoNaviRobot
    User-agent: InternetSeer.com
    User-agent: Iria
    User-agent: IRLbot
    User-agent: JetCar
    User-agent: JOC
    User-agent: JOC\ Web\ Spider
    User-agent: JoBo
    User-agent: Java
    User-agent: JustView
    User-agent: Jonzilla
    User-agent: JennyBot
    User-agent: Kenjin\ Spider
    User-agent: Keyword\ Density
    User-agent: larbin
    User-agent: LeechFTP
    User-agent: Lachesis
    User-agent: LexiBot
    User-agent: libWeb
    User-agent: Libby_
    User-agent: LinkScan
    User-agent: LinkWalker
    User-agent: LinkextractorPro
    User-agent: lftp
    User-agent: likse
    User-agent: Link
    User-agent: lwp-trivial
    User-agent: lwp\ request
    User-agent: Magnet
    User-agent: Mag-Net
    User-agent: Mass\ Downloader
    User-agent: MIIxpc
    User-agent: Microsoft\ URL\ Control
    User-agent: MSFrontPage
    User-agent: MSIECrawler
    User-agent: MicrosoftURL
    User-agent: Missigua
    User-agent: Mewsoft\ Search\ Engine
    User-agent: moget
    User-agent: Mata\ Hari
    User-agent: Memo
    User-agent: Metacarta
    User-agent: Mercator
    User-agent: MIDown\ tool
    User-agent: MFC_Tear_Sample
    User-agent: Mirror
    User-agent: MIIxpc
    User-agent: Mister\ PiX
    User-agent: NationalDirectory\ WebSpider
    User-agent: NICErsPRO
    User-agent: Nikto
    User-agent: Navroad
    User-agent: NearSite
    User-agent: NetAnts
    User-agent: NetSpider
    User-agent: NICErsPRO
    User-agent: NetResearchServer
    User-agent: NetMechanic
    User-agent: Net\ Vampire
    User-agent: Net\ Probe
    User-agent: NetZip
    User-agent: nexuscache
    User-agent: Ninja
    User-agent: NPBot
    User-agent: our\ agent
    User-agent: onestop
    User-agent: oBot
    User-agent: Octopus
    User-agent: Offline\ Explorer
    User-agent: Openfind
    User-agent: Openfind\ data\ gatherer
    User-agent: OrangeBot
    User-agent: PageGrabber
    User-agent: Papa\ Foto
    User-agent: PHP\ version
    User-agent: PHP
    User-agent: PHPot
    User-agent: Perl
    User-agent: pcBrowser
    User-agent: pavuk
    User-agent: Pockey
    User-agent: Ping
    User-agent: PingALink\ Monitoring\ Services
    User-agent: ProWebWalker
    User-agent: ProPowerBot
    User-agent: Pump
    User-agent: Pompos
    User-agent: psbot
    User-agent: Python\ urllib
    User-agent: Python-urllib
    User-agent: QueryN
    User-agent: RealDownload
    User-agent: Reaper
    User-agent: Recorder
    User-agent: RepoMonkey
    User-agent: psycheclone
    User-agent: RMA
    User-agent: Rico
    User-agent: Robozilla
    User-agent: ReGet
    User-agent: Siphon
    User-agent: SiteSnagger
    User-agent: sitecheck.internetseer.com
    User-agent: SmartDownload
    User-agent: Snake
    User-agent: spanner
    User-agent: Stealer
    User-agent: SpaceBison
    User-agent: SpankBot
    User-agent: Spinne
    User-agent: Stripper
    User-agent: slysearch
    User-agent: Sucker
    User-agent: Snoopy
    User-agent: ScoutAbout
    User-agent: Scooter
    User-agent: SuperBot
    User-agent: SuperHTTP
    User-agent: Snapbot
    User-agent: Surfbot
    User-agent: suzuran
    User-agent: Szukacz
    User-agent: Sqworm
    User-agent: tAkeOut
    User-agent: Teleport\ Pro
    User-agent: Telesoft
    User-agent: TurnitinBot
    User-agent: turingos
    User-agent: toCrawl
    User-agent: TightTwatBot
    User-agent: True_Robot
    User-agent: The\ Intraformant
    User-agent: TheNomad
    User-agent: Titan
    User-agent: UrlDispatcher
    User-agent: URLy\ Warning
    User-agent: Vayala
    User-agent: Vagabondo
    User-agent: Vintage
    User-agent: Vacuum
    User-agent: VCI
    User-agent: VoidEYE
    User-agent: W3C_Validator
    User-agent: Webdownloader
    User-agent: Web\ Downloader
    User-agent: Webhook
    User-agent: Webmole
    User-agent: Webminer
    User-agent: Webmirror
    User-agent: Websucker
    User-agent: Websites
    User-agent: Web\ Image\ Collector
    User-agent: Web\ Sucker
    User-agent: WebAuto
    User-agent: WebCopier
    User-agent: WebFetch
    User-agent: WebReaper
    User-agent: WebSauger
    User-agent: Website
    User-agent: Webster
    User-agent: WebStripper
    User-agent: WebCopier
    User-agent: WebViewer
    User-agent: WebWhacker
    User-agent: WebEnhancer
    User-agent: Wells
    User-agent: WebZIP
    User-agent: Wget
    User-agent: Whacker
    User-agent: Widow
    User-agent: Xaldon
    User-agent: Wildsoft\ Surfer
    User-agent: WinHttpRequest
    User-agent: WinHttp
    User-agent: Webster\ Pro
    User-agent: Web\ Image\ Collector
    User-agent: WebZip
    User-agent: WebAuto
    User-agent: Website\ Quester
    User-agent: WWWOFFLE
    User-agent: WWW-Collector-E
    User-agent: Xaldon\ WebSpider
    User-agent: Xenu
    User-agent: Xara
    User-agent: Y!TunnelPro
    User-agent: YahooYSMcm
    User-agent: Zade
    User-agent: ZBot
    User-agent: Zeus
    User-agent: Rogerbot 
    User-agent: Exabot 
    User-agent: MJ12bot 
    User-agent: Dotbot 
    User-agent: Gigabot 
    User-agent: AhrefsBot 
    User-agent: BlackWidow 
    User-agent: Bot\ mailto:craftbot@yahoo.com 
    User-agent: ChinaClaw 
    User-agent: Custo 
    User-agent: DISCo 
    User-agent: Download\ Demon 
    User-agent: eCatch 
    User-agent: EirGrabber 
    User-agent: EmailSiphon 
    User-agent: EmailWolf 
    User-agent: Express\ WebPictures 
    User-agent: ExtractorPro 
    User-agent: EyeNetIE 
    User-agent: FlashGet 
    User-agent: GetRight 
    User-agent: GetWeb! 
    User-agent: Go!Zilla 
    User-agent: Go-Ahead-Got-It 
    User-agent: GrabNet 
    User-agent: Grafula 
    User-agent: HMView 
    User-agent: HTTrack 
    User-agent: Image\ Stripper 
    User-agent: Image\ Sucker 
    User-agent: Indy\ Library
    User-agent: InterGET 
    User-agent: Internet\ Ninja 
    User-agent: JetCar 
    User-agent: JOC\ Web\ Spider 
    User-agent: larbin 
    User-agent: LeechFTP 
    User-agent: Mass\ Downloader 
    User-agent: MIDown\ tool 
    User-agent: Mister\ PiX 
    User-agent: Navroad 
    User-agent: NearSite 
    User-agent: NetAnts 
    User-agent: NetSpider 
    User-agent: Net\ Vampire 
    User-agent: NetZIP 
    User-agent: Octopus 
    User-agent: Offline\ Explorer 
    User-agent: Offline\ Navigator 
    User-agent: PageGrabber 
    User-agent: Papa\ Foto 
    User-agent: pavuk 
    User-agent: pcBrowser 
    User-agent: RealDownload 
    User-agent: ReGet 
    User-agent: SiteSnagger 
    User-agent: SmartDownload 
    User-agent: SuperBot 
    User-agent: SuperHTTP 
    User-agent: Surfbot 
    User-agent: tAkeOut 
    User-agent: Teleport\ Pro 
    User-agent: VoidEYE 
    User-agent: Web\ Image\ Collector 
    User-agent: Web\ Sucker 
    User-agent: WebAuto 
    User-agent: WebCopier 
    User-agent: WebFetch 
    User-agent: WebGo\ IS 
    User-agent: WebLeacher 
    User-agent: WebReaper 
    User-agent: WebSauger 
    User-agent: Website\ eXtractor 
    User-agent: Website\ Quester 
    User-agent: WebStripper 
    User-agent: WebWhacker 
    User-agent: WebZIP 
    User-agent: Wget 
    User-agent: Widow 
    User-agent: WWWOFFLE 
    User-agent: Xaldon\ WebSpider
     
    • Thanks Thanks x 7
    Last edited: Feb 10, 2014
  20. csmbh

    csmbh Newbie

    Joined:
    Feb 7, 2014
    Messages:
    2
    Likes Received:
    0
    Is there any reason to not simply block every bot and just allow ones you want through?

    ie. something like (adding other SE's & services you use):

    User-agent: *
    Disallow: /


    User-agent: Googlebot
    Allow: /