1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Blocking bots/crawlers list for robots.txt

Discussion in 'White Hat SEO' started by zxyyz, Apr 21, 2017.

  1. zxyyz

    zxyyz Newbie

    Joined:
    Jan 23, 2017
    Messages:
    20
    Likes Received:
    10
    Gender:
    Male
    I came upon some old threads with lists of bots and crawlers you can block from your site. I'm focused on the robots.txt side of things ... sorry, don't have anything for those using .htaccess. Anyway, the lists I came across had lots of duplicates. I cleaned it up and the result is shorter and more organized. For example, the previous list was 448 entries, but after deduping it is now only 298. So that should make it easier for crawlers to read.

    If you see something missing from the list, send me a message and I'll post an update.

    Code:
    User-agent: ASPSeek
    User-agent: AbachoBOT
    User-agent: AhrefsBot
    User-agent: B2w
    User-agent: BackDoorBot
    User-agent: BackWeb
    User-agent: Bandit
    User-agent: BatchFTP
    User-agent: BlackWidow
    User-agent: Black\ Hole
    User-agent: BlowFish
    User-agent: BotALot
    User-agent: Bot\ [EMAIL="[email protected]"]mailto:[email protected][/EMAIL]
    User-agent: Bot\ mailto
    User-agent: Bot\ mailto:[email protected]
    User-agent: Buddy
    User-agent: BuiltBotTough
    User-agent: Bullseye
    User-agent: BunnySlippers
    User-agent: COAST\ WebMaster
    User-agent: CheeseBot
    User-agent: CherryPicker
    User-agent: CherryPickerElite
    User-agent: CherryPickerSE
    User-agent: ChinaClaw
    User-agent: ClariaBot
    User-agent: ColdFusion
    User-agent: Collector
    User-agent: Copier
    User-agent: CopyRightCheck
    User-agent: Crescent
    User-agent: Custo
    User-agent: DA
    User-agent: DISCo
    User-agent: DISCo\ Pump
    User-agent: DTS\ Agent
    User-agent: Diamond
    User-agent: DittoSpyder
    User-agent: Dotbot
    User-agent: Download\ Demon
    User-agent: Download\ Wonder
    User-agent: Downloader
    User-agent: Drip
    User-agent: EasyDL
    User-agent: EirGrabber
    User-agent: EmailCollector
    User-agent: EmailSiphon
    User-agent: EmailWolf
    User-agent: EroCrawler
    User-agent: Exabot
    User-agent: Express\ WebPictures
    User-agent: ExtractorPro
    User-agent: Extreme\ Picture\ Finder
    User-agent: EyeNetIE
    User-agent: FAST\ WebCrawler
    User-agent: Fetch\ API\ Request
    User-agent: FileHound
    User-agent: FlashGet
    User-agent: FlickBot
    User-agent: FreeFind.com
    User-agent: FrontPage
    User-agent: Generic
    User-agent: GetRight
    User-agent: GetSmart
    User-agent: GetWeb!
    User-agent: Gigabot
    User-agent: Go!Zilla
    User-agent: Go-Ahead-Got-It
    User-agent: GrabNet
    User-agent: Grabber
    User-agent: Grafula
    User-agent: Gulliver
    User-agent: HMView
    User-agent: HTTPTrack
    User-agent: HTTPapp
    User-agent: HTTPviewer
    User-agent: HTTrack
    User-agent: Harvest
    User-agent: Heretrix
    User-agent: HitboxDoctor
    User-agent: IRLbot
    User-agent: Image\ Stripper
    User-agent: Image\ Sucker
    User-agent: Indy\ Library
    User-agent: InfoNaviRobot
    User-agent: InterGET
    User-agent: InternetSeer.com
    User-agent: Internet\ Ninja
    User-agent: Iria
    User-agent: JOC
    User-agent: JOC\ Web\ Spider
    User-agent: Java
    User-agent: JennyBot
    User-agent: JetCar
    User-agent: JoBo
    User-agent: Jonzilla
    User-agent: JustView
    User-agent: Kenjin\ Spider
    User-agent: Keyword\ Density
    User-agent: Lachesis
    User-agent: LeechFTP
    User-agent: LexiBot
    User-agent: Libby_
    User-agent: Link
    User-agent: LinkScan
    User-agent: LinkWalker
    User-agent: LinkextractorPro
    User-agent: MFC_Tear_Sample
    User-agent: MIDown\ tool
    User-agent: MIIxpc
    User-agent: MJ12bot
    User-agent: MSFrontPage
    User-agent: MSIECrawler
    User-agent: Mag-Net
    User-agent: Magnet
    User-agent: Mass\ Downloader
    User-agent: Mata\ Hari
    User-agent: Memo
    User-agent: Mercator
    User-agent: Metacarta
    User-agent: Mewsoft\ Search\ Engine
    User-agent: MicrosoftURL
    User-agent: Microsoft\ URL\ Control
    User-agent: Mirror
    User-agent: Missigua
    User-agent: Mister\ PiX
    User-agent: NICErsPRO
    User-agent: NPBot
    User-agent: NationalDirectory\ WebSpider
    User-agent: Navroad
    User-agent: NearSite
    User-agent: NetAnts
    User-agent: NetMechanic
    User-agent: NetResearchServer
    User-agent: NetSpider
    User-agent: NetZIP
    User-agent: NetZip
    User-agent: Net\ Probe
    User-agent: Net\ Vampire
    User-agent: Nikto
    User-agent: Ninja
    User-agent: Octopus
    User-agent: Offline\ Explorer
    User-agent: Offline\ Navigator
    User-agent: Openfind
    User-agent: Openfind\ data\ gatherer
    User-agent: OrangeBot
    User-agent: PHP
    User-agent: PHP\ version
    User-agent: PHPot
    User-agent: PageGrabber
    User-agent: Papa\ Foto
    User-agent: Perl
    User-agent: Ping
    User-agent: PingALink\ Monitoring\ Services
    User-agent: Pockey
    User-agent: Pompos
    User-agent: ProPowerBot
    User-agent: ProWebWalker
    User-agent: Pump
    User-agent: Python-urllib
    User-agent: Python\ urllib
    User-agent: QueryN
    User-agent: RMA
    User-agent: ReGet
    User-agent: RealDownload
    User-agent: Reaper
    User-agent: Recorder
    User-agent: RepoMonkey
    User-agent: Rico
    User-agent: Robozilla
    User-agent: Rogerbot
    User-agent: Scooter
    User-agent: ScoutAbout
    User-agent: Siphon
    User-agent: SiteSnagger
    User-agent: SmartDownload
    User-agent: Snake
    User-agent: Snapbot
    User-agent: Snoopy
    User-agent: SpaceBison
    User-agent: SpankBot
    User-agent: Spinne
    User-agent: Sqworm
    User-agent: Stealer
    User-agent: Stripper
    User-agent: Sucker
    User-agent: SuperBot
    User-agent: SuperHTTP
    User-agent: Surfbot
    User-agent: Szukacz
    User-agent: Teleport\ Pro
    User-agent: Telesoft
    User-agent: TheNomad
    User-agent: The\ Intraformant
    User-agent: TightTwatBot
    User-agent: Titan
    User-agent: True_Robot
    User-agent: TurnitinBot
    User-agent: URLy\ Warning
    User-agent: UrlDispatcher
    User-agent: VCI
    User-agent: Vacuum
    User-agent: Vagabondo
    User-agent: Vayala
    User-agent: Vintage
    User-agent: VoidEYE
    User-agent: W3C_Validator
    User-agent: WWW-Collector-E
    User-agent: WWWOFFLE
    User-agent: WebAuto
    User-agent: WebCopier
    User-agent: WebEnhancer
    User-agent: WebFetch
    User-agent: WebGo\ IS
    User-agent: WebLeacher
    User-agent: WebReaper
    User-agent: WebSauger
    User-agent: WebStripper
    User-agent: WebViewer
    User-agent: WebWhacker
    User-agent: WebZIP
    User-agent: WebZip
    User-agent: Web\ Downloader
    User-agent: Web\ Image\ Collector
    User-agent: Web\ Sucker
    User-agent: Webdownloader
    User-agent: Webhook
    User-agent: Webminer
    User-agent: Webmirror
    User-agent: Webmole
    User-agent: Website
    User-agent: Website\ Quester
    User-agent: Website\ eXtractor
    User-agent: Websites
    User-agent: Webster
    User-agent: Webster\ Pro
    User-agent: Websucker
    User-agent: Wells
    User-agent: Wget
    User-agent: Whacker
    User-agent: Widow
    User-agent: Wildsoft\ Surfer
    User-agent: WinHttp
    User-agent: WinHttpRequest
    User-agent: Xaldon
    User-agent: Xaldon\ WebSpider
    User-agent: Xara
    User-agent: Xenu
    User-agent: Y!TunnelPro
    User-agent: YahooYSMcm
    User-agent: ZBot
    User-agent: Zade
    User-agent: Zeus
    User-agent: anarchie
    User-agent: antibot
    User-agent: appie
    User-agent: asterias
    User-agent: attach
    User-agent: autoemailspider
    User-agent: bumblebee
    User-agent: clsHTTP
    User-agent: cosmos
    User-agent: curl
    User-agent: dloader
    User-agent: eCatch
    User-agent: gotit
    User-agent: hloader
    User-agent: httpfetcher
    User-agent: httplib
    User-agent: httpscraper
    User-agent: humanlinks
    User-agent: ia_archiver
    User-agent: larbin
    User-agent: lftp
    User-agent: libWeb
    User-agent: libwww-perl
    User-agent: libwwwperl
    User-agent: likse
    User-agent: lwp-trivial
    User-agent: lwp\ request
    User-agent: moget
    User-agent: nexuscache
    User-agent: oBot
    User-agent: onestop
    User-agent: our\ agent
    User-agent: pavuk
    User-agent: pcBrowser
    User-agent: psbot
    User-agent: psycheclone
    User-agent: sitecheck.internetseer.com
    User-agent: slysearch
    User-agent: spanner
    User-agent: suzuran
    User-agent: tAkeOut
    User-agent: toCrawl
    User-agent: turingos
    Disallow: /