How does PBN site block the access and crawling of SEO tools such as ahrefs, moz, semrush, etc

robin222

Power Member
Joined
Jan 14, 2018
Messages
699
Reaction score
130
As shown in the title.
Do I need to modify the robot.txt file for the PBN website developed based on wordpress?
If so, how to modify the robot.txt file?
thanks
 

hideath

Jr. VIP
Jr. VIP
Joined
Jun 16, 2019
Messages
3,635
Reaction score
2,607
Website
bit.ly
You can either block or cloak these crawlers,
blocking is the easier option, while cloaking does help to hide robots.txt similarity as a footprint,
anyway, if you want to block them, just add the code below to the website robots.txt
Code:
User-agent: AhrefsBot
User-agent: AhrefsSiteAudit
User-agent: adbeat_bot
User-agent: Alexibot
User-agent: AppEngine
User-agent: Aqua_Products
User-agent: archive.org_bot
User-agent: archive
User-agent: asterias
User-agent: b2w/0.1
User-agent: BackDoorBot/1.0
User-agent: BecomeBot
User-agent: BlekkoBot
User-agent: Blexbot
User-agent: BlowFish/1.0
User-agent: Bookmark search tool
User-agent: BotALot
User-agent: BuiltBotTough
User-agent: Bullseye/1.0
User-agent: BunnySlippers
User-agent: CCBot
User-agent: CheeseBot
User-agent: CherryPicker
User-agent: CherryPickerElite/1.0
User-agent: CherryPickerSE/1.0
User-agent: chroot
User-agent: Copernic
User-agent: CopyRightCheck
User-agent: cosmos
User-agent: Crescent
User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
User-agent: DittoSpyder
User-agent: dotbot
User-agent: dumbot
User-agent: EmailCollector
User-agent: EmailSiphon
User-agent: EmailWolf
User-agent: Enterprise_Search
User-agent: Enterprise_Search/1.0
User-agent: EroCrawler
User-agent: es
User-agent: exabot
User-agent: ExtractorPro
User-agent: FairAd Client
User-agent: Flaming AttackBot
User-agent: Foobot
User-agent: Gaisbot
User-agent: GetRight/4.2
User-agent: gigabot
User-agent: grub
User-agent: grub-client
User-agent: Go-http-client
User-agent: Harvest/1.5
User-agent: Hatena Antenna
User-agent: hloader
User-agent: httplib
User-agent: humanlinks
User-agent: ia_archiver
User-agent: ia_archiver/1.6
User-agent: InfoNaviRobot
User-agent: Iron33/1.0.2
User-agent: JamesBOT
User-agent: JennyBot
User-agent: Jetbot
User-agent: Jetbot/1.0
User-agent: Jorgee
User-agent: Kenjin Spider
User-agent: Keyword Density/0.9
User-agent: larbin
User-agent: LexiBot
User-agent: libWeb/clsHTTP
User-agent: LinkextractorPro
User-agent: LinkpadBot
User-agent: LinkScan/8.1a Unix
User-agent: LinkWalker
User-agent: LNSpiderguy
User-agent: looksmart
User-agent: lwp-trivial
User-agent: lwp-trivial/1.34
User-agent: Mata Hari
User-agent: Megalodon
User-agent: Microsoft URL Control
User-agent: Microsoft URL Control - 5.01.4511
User-agent: Microsoft URL Control - 6.00.8169
User-agent: MIIxpc
User-agent: MIIxpc/4.2
User-agent: Mister PiX
User-agent: MJ12bot
User-agent: moget
User-agent: moget/2.1
User-agent: mozilla
User-agent: Mozilla
User-agent: mozilla/3
User-agent: mozilla/4
User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 2000)
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 95)
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 98)
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows NT)
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows XP)
User-agent: mozilla/5
User-agent: MSIECrawler
User-agent: naver
User-agent: NerdyBot
User-agent: NetAnts
User-agent: NetMechanic
User-agent: NICErsPRO
User-agent: Nutch
User-agent: Offline Explorer
User-agent: Openbot
User-agent: Openfind
User-agent: Openfind data gathere
User-agent: Oracle Ultra Search
User-agent: PerMan
User-agent: ProPowerBot/2.14
User-agent: ProWebWalker
User-agent: psbot
User-agent: Python-urllib
User-agent: QueryN Metasearch
User-agent: Radiation Retriever 1.1
User-agent: RepoMonkey
User-agent: RepoMonkey Bait & Tackle/v1.01
User-agent: toCrawl/UrlDispatcher
User-agent: True_Robot
User-agent: True_Robot/1.0
User-agent: turingos
User-agent: Typhoeus
User-agent: URL Control
User-agent: URL_Spider_Pro
User-agent: URLy Warning
User-agent: VCI
User-agent: VCI WebViewer VCI WebViewer Win32
User-agent: Web Image Collector
User-agent: WebAuto
User-agent: WebBandit
User-agent: WebBandit/3.50
User-agent: WebCopier
User-agent: WebEnhancer
User-agent: WebmasterWorld Extractor
User-agent: WebmasterWorldForumBot
User-agent: WebSauger
User-agent: Website Quester
User-agent: Webster Pro
User-agent: Xenu's
User-agent: Xenu's Link Sleuth 1.1c
User-agent: Zeus
User-agent: Zeus 32297 Webster Pro V2.9 Win32
User-agent: Zeus Link Scout
User-agent: WebStripper
User-agent: WebVac
User-agent: WebZip
User-agent: WebZip/4.0
User-agent: Wget
User-agent: Wget/1.5.3
User-agent: RMA
User-agent: rogerbot
User-agent: scooter
User-agent: Screaming Frog SEO Spider
User-agent: searchpreview
User-agent: SEMrushBot
User-agent: SemrushBot
User-agent: SemrushBot-SA
User-agent: SEOkicks-Robot
User-agent: SiteSnagger
User-agent: sootle
User-agent: SpankBot
User-agent: spanner
User-agent: spbot
User-agent: Stanford
User-agent: Stanford Comp Sci
User-agent: Stanford CompClub
User-agent: Stanford CompSciClub
User-agent: Stanford Spiderboys
User-agent: SurveyBot
User-agent: SurveyBot_IgnoreIP
User-agent: suzuran
User-agent: Szukacz/1.4
User-agent: Szukacz/1.4
User-agent: Teleport
User-agent: TeleportPro
User-agent: Telesoft
User-agent: Teoma
User-agent: The Intraformant
User-agent: TheNomad
User-agent: Wget/1.6
User-agent: WWW-Collector-E
Disallow: /
 

Nervosa

Newbie
Joined
Aug 4, 2021
Messages
15
Reaction score
0
You should use a cloaker to feed the bots false information about your page.
 

janist

Jr. VIP
Jr. VIP
Joined
Jan 10, 2011
Messages
1,406
Reaction score
787
Website
seolutions.biz
If you want to do it right, you need to block via IPs in your .htaccess. But it's basically an impossible task.

We run a relatively high scale PBN and have never blocked any crawlers on purpose. There's really no point. Google knows about the links anyway. And there's always a way to find your competitors links (via BING for example).

Focus on minimizing footprints and you should be fine. Especially for smaller PBNs <50 sites.
 

robin222

Power Member
Joined
Jan 14, 2018
Messages
699
Reaction score
130
You can either block or cloak these crawlers,
blocking is the easier option, while cloaking does help to hide robots.txt similarity as a footprint,
anyway, if you want to block them, just add the code below to the website robots.txt
Code:
User-agent: AhrefsBot
User-agent: AhrefsSiteAudit
User-agent: adbeat_bot
User-agent: Alexibot
User-agent: AppEngine
User-agent: Aqua_Products
User-agent: archive.org_bot
User-agent: archive
User-agent: asterias
User-agent: b2w/0.1
User-agent: BackDoorBot/1.0
User-agent: BecomeBot
User-agent: BlekkoBot
User-agent: Blexbot
User-agent: BlowFish/1.0
User-agent: Bookmark search tool
User-agent: BotALot
User-agent: BuiltBotTough
User-agent: Bullseye/1.0
User-agent: BunnySlippers
User-agent: CCBot
User-agent: CheeseBot
User-agent: CherryPicker
User-agent: CherryPickerElite/1.0
User-agent: CherryPickerSE/1.0
User-agent: chroot
User-agent: Copernic
User-agent: CopyRightCheck
User-agent: cosmos
User-agent: Crescent
User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
User-agent: DittoSpyder
User-agent: dotbot
User-agent: dumbot
User-agent: EmailCollector
User-agent: EmailSiphon
User-agent: EmailWolf
User-agent: Enterprise_Search
User-agent: Enterprise_Search/1.0
User-agent: EroCrawler
User-agent: es
User-agent: exabot
User-agent: ExtractorPro
User-agent: FairAd Client
User-agent: Flaming AttackBot
User-agent: Foobot
User-agent: Gaisbot
User-agent: GetRight/4.2
User-agent: gigabot
User-agent: grub
User-agent: grub-client
User-agent: Go-http-client
User-agent: Harvest/1.5
User-agent: Hatena Antenna
User-agent: hloader
User-agent: httplib
User-agent: humanlinks
User-agent: ia_archiver
User-agent: ia_archiver/1.6
User-agent: InfoNaviRobot
User-agent: Iron33/1.0.2
User-agent: JamesBOT
User-agent: JennyBot
User-agent: Jetbot
User-agent: Jetbot/1.0
User-agent: Jorgee
User-agent: Kenjin Spider
User-agent: Keyword Density/0.9
User-agent: larbin
User-agent: LexiBot
User-agent: libWeb/clsHTTP
User-agent: LinkextractorPro
User-agent: LinkpadBot
User-agent: LinkScan/8.1a Unix
User-agent: LinkWalker
User-agent: LNSpiderguy
User-agent: looksmart
User-agent: lwp-trivial
User-agent: lwp-trivial/1.34
User-agent: Mata Hari
User-agent: Megalodon
User-agent: Microsoft URL Control
User-agent: Microsoft URL Control - 5.01.4511
User-agent: Microsoft URL Control - 6.00.8169
User-agent: MIIxpc
User-agent: MIIxpc/4.2
User-agent: Mister PiX
User-agent: MJ12bot
User-agent: moget
User-agent: moget/2.1
User-agent: mozilla
User-agent: Mozilla
User-agent: mozilla/3
User-agent: mozilla/4
User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 2000)
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 95)
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 98)
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows NT)
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows XP)
User-agent: mozilla/5
User-agent: MSIECrawler
User-agent: naver
User-agent: NerdyBot
User-agent: NetAnts
User-agent: NetMechanic
User-agent: NICErsPRO
User-agent: Nutch
User-agent: Offline Explorer
User-agent: Openbot
User-agent: Openfind
User-agent: Openfind data gathere
User-agent: Oracle Ultra Search
User-agent: PerMan
User-agent: ProPowerBot/2.14
User-agent: ProWebWalker
User-agent: psbot
User-agent: Python-urllib
User-agent: QueryN Metasearch
User-agent: Radiation Retriever 1.1
User-agent: RepoMonkey
User-agent: RepoMonkey Bait & Tackle/v1.01
User-agent: toCrawl/UrlDispatcher
User-agent: True_Robot
User-agent: True_Robot/1.0
User-agent: turingos
User-agent: Typhoeus
User-agent: URL Control
User-agent: URL_Spider_Pro
User-agent: URLy Warning
User-agent: VCI
User-agent: VCI WebViewer VCI WebViewer Win32
User-agent: Web Image Collector
User-agent: WebAuto
User-agent: WebBandit
User-agent: WebBandit/3.50
User-agent: WebCopier
User-agent: WebEnhancer
User-agent: WebmasterWorld Extractor
User-agent: WebmasterWorldForumBot
User-agent: WebSauger
User-agent: Website Quester
User-agent: Webster Pro
User-agent: Xenu's
User-agent: Xenu's Link Sleuth 1.1c
User-agent: Zeus
User-agent: Zeus 32297 Webster Pro V2.9 Win32
User-agent: Zeus Link Scout
User-agent: WebStripper
User-agent: WebVac
User-agent: WebZip
User-agent: WebZip/4.0
User-agent: Wget
User-agent: Wget/1.5.3
User-agent: RMA
User-agent: rogerbot
User-agent: scooter
User-agent: Screaming Frog SEO Spider
User-agent: searchpreview
User-agent: SEMrushBot
User-agent: SemrushBot
User-agent: SemrushBot-SA
User-agent: SEOkicks-Robot
User-agent: SiteSnagger
User-agent: sootle
User-agent: SpankBot
User-agent: spanner
User-agent: spbot
User-agent: Stanford
User-agent: Stanford Comp Sci
User-agent: Stanford CompClub
User-agent: Stanford CompSciClub
User-agent: Stanford Spiderboys
User-agent: SurveyBot
User-agent: SurveyBot_IgnoreIP
User-agent: suzuran
User-agent: Szukacz/1.4
User-agent: Szukacz/1.4
User-agent: Teleport
User-agent: TeleportPro
User-agent: Telesoft
User-agent: Teoma
User-agent: The Intraformant
User-agent: TheNomad
User-agent: Wget/1.6
User-agent: WWW-Collector-E
Disallow: /
thanks. buddy :)
 

AlbertoMR3

Jr. VIP
Jr. VIP
Joined
Sep 10, 2013
Messages
153
Reaction score
110
Website
www.ctrify.com
As shown in the title.
Do I need to modify the robot.txt file for the PBN website developed based on wordpress?
If so, how to modify the robot.txt file?
thanks
To block robots.txt should be ok... but what i do also is to block them by the user agent as they crawl just in case they decide not to honor the robots directives
 
Top