1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

WTF google's robots.txt?

Discussion in 'Black Hat SEO' started by A N K E S H, Jul 22, 2013.

  1. A N K E S H

    A N K E S H Junior Member

    Joined:
    Sep 11, 2012
    Messages:
    110
    Likes Received:
    207
    Occupation:
    Masters
    Location:
    Arthmatic Login unit
    i was looking for a good sitemap and thought let's check what Google is using and after checking it's robots.txt i got shocked

    Code:
    User-agent: *
    Disallow: /search
    Disallow: /sdch
    Disallow: /groups
    Disallow: /images
    Disallow: /catalogs
    Allow: /catalogs/about
    Allow: /catalogs/p?
    Disallow: /catalogues
    Disallow: /news
    Allow: /news/directory
    Disallow: /nwshp
    Disallow: /setnewsprefs?
    Disallow: /index.html?
    Disallow: /?
    Allow: /?hl=
    Disallow: /?hl=*&
    Disallow: /addurl/image?
    Disallow: /pagead/
    Disallow: /relpage/
    Disallow: /relcontent
    Disallow: /imgres
    Disallow: /imglanding
    Disallow: /sbd
    Disallow: /keyword/
    Disallow: /u/
    Disallow: /univ/
    Disallow: /cobrand
    Disallow: /custom
    Disallow: /advanced_group_search
    Disallow: /googlesite
    Disallow: /preferences
    Disallow: /setprefs
    Disallow: /swr
    Disallow: /url
    Disallow: /default
    Disallow: /m?
    Disallow: /m/
    Disallow: /wml?
    Disallow: /wml/?
    Disallow: /wml/search?
    Disallow: /xhtml?
    Disallow: /xhtml/?
    Disallow: /xhtml/search?
    Disallow: /xml?
    Disallow: /imode?
    Disallow: /imode/?
    Disallow: /imode/search?
    Disallow: /jsky?
    Disallow: /jsky/?
    Disallow: /jsky/search?
    Disallow: /pda?
    Disallow: /pda/?
    Disallow: /pda/search?
    Disallow: /sprint_xhtml
    Disallow: /sprint_wml
    Disallow: /pqa
    Disallow: /palm
    Disallow: /gwt/
    Disallow: /purchases
    Disallow: /hws
    Disallow: /bsd?
    Disallow: /linux?
    Disallow: /mac?
    Disallow: /microsoft?
    Disallow: /unclesam?
    Disallow: /answers/search?q=
    Disallow: /local?
    Disallow: /local_url
    Disallow: /shihui?
    Disallow: /shihui/
    Disallow: /froogle?
    Disallow: /products?
    Disallow: /products/
    Disallow: /froogle_
    Disallow: /product_
    Disallow: /products_
    Disallow: /products;
    Disallow: /print
    Disallow: /books/
    Disallow: /bkshp?*q=*
    Disallow: /books?*q=*
    Disallow: /books?*output=*
    Disallow: /books?*pg=*
    Disallow: /books?*jtp=*
    Disallow: /books?*jscmd=*
    Disallow: /books?*buy=*
    Disallow: /books?*zoom=*
    Allow: /books?*q=related:*
    Allow: /books?*q=editions:*
    Allow: /books?*q=subject:*
    Allow: /books/about
    Allow: /booksrightsholders
    Allow: /books?*zoom=1*
    Allow: /books?*zoom=5*
    Disallow: /ebooks/
    Disallow: /ebooks?*q=*
    Disallow: /ebooks?*output=*
    Disallow: /ebooks?*pg=*
    Disallow: /ebooks?*jscmd=*
    Disallow: /ebooks?*buy=*
    Disallow: /ebooks?*zoom=*
    Allow: /ebooks?*q=related:*
    Allow: /ebooks?*q=editions:*
    Allow: /ebooks?*q=subject:*
    Allow: /ebooks?*zoom=1*
    Allow: /ebooks?*zoom=5*
    Disallow: /patents?
    Disallow: /patents/related/
    Allow: /patents?id=
    Allow: /patents?vid=
    Disallow: /scholar
    Disallow: /citations?
    Allow: /citations?user=
    Allow: /citations?view_op=new_profile
    Allow: /citations?view_op=top_venues
    Disallow: /complete
    Disallow: /s?
    Disallow: /sponsoredlinks
    Disallow: /videosearch?
    Disallow: /videopreview?
    Disallow: /videoprograminfo?
    Allow: /maps?hq=http://maps.google.com/help/maps/directions/biking/mapleft.kml&ie=UTF8&ll=37.687624,-122.319717&spn=0.346132,0.727158&z=11&lci=bike&dirflg=b&f=d
    Allow: /maps/api/js?
    Disallow: /maps?
    Disallow: /mapstt?
    Disallow: /mapslt?
    Disallow: /maps/stk/
    Disallow: /maps/br?
    Disallow: /mapabcpoi?
    Disallow: /maphp?
    Disallow: /mapprint?
    Disallow: /maps/api/js/
    Disallow: /maps/api/staticmap?
    Disallow: /mld?
    Disallow: /staticmap?
    Disallow: /places/
    Allow: /places/$
    Disallow: /maps/preview
    Disallow: /maps/place
    Disallow: /help/maps/streetview/partners/welcome/
    Disallow: /help/maps/indoormaps/partners/
    Disallow: /lochp?
    Disallow: /center
    Disallow: /ie?
    Disallow: /sms/demo?
    Disallow: /katrina?
    Disallow: /blogsearch?
    Disallow: /blogsearch/
    Disallow: /blogsearch_feeds
    Disallow: /advanced_blog_search
    Disallow: /reader/
    Allow: /reader/play
    Disallow: /uds/
    Disallow: /chart?
    Disallow: /transit?
    Disallow: /mbd?
    Disallow: /extern_js/
    Disallow: /xjs/
    Disallow: /calendar/feeds/
    Disallow: /calendar/ical/
    Disallow: /cl2/feeds/
    Disallow: /cl2/ical/
    Disallow: /coop/directory
    Disallow: /coop/manage
    Disallow: /trends?
    Disallow: /trends/music?
    Disallow: /trends/hottrends?
    Disallow: /trends/viz?
    Disallow: /notebook/search?
    Disallow: /musica
    Disallow: /musicad
    Disallow: /musicas
    Disallow: /musicl
    Disallow: /musics
    Disallow: /musicsearch
    Disallow: /musicsp
    Disallow: /musiclp
    Disallow: /browsersync
    Disallow: /call
    Disallow: /archivesearch?
    Disallow: /archivesearch/url
    Disallow: /archivesearch/advanced_search
    Disallow: /base/reportbadoffer
    Disallow: /urchin_test/
    Disallow: /movies?
    Disallow: /codesearch?
    Disallow: /codesearch/feeds/search?
    Disallow: /wapsearch?
    Disallow: /safebrowsing
    Allow: /safebrowsing/diagnostic
    Allow: /safebrowsing/report_badware/
    Allow: /safebrowsing/report_error/
    Allow: /safebrowsing/report_phish/
    Disallow: /reviews/search?
    Disallow: /orkut/albums
    Allow: /jsapi
    Disallow: /views?
    Disallow: /c/
    Disallow: /cbk
    Allow: /cbk?output=tile&cb_client=maps_sv
    Disallow: /recharge/dashboard/car
    Disallow: /recharge/dashboard/static/
    Disallow: /translate_a/
    Disallow: /translate_c
    Disallow: /translate_f
    Disallow: /translate_static/
    Disallow: /translate_suggestion
    Disallow: /profiles/me
    Allow: /profiles
    Disallow: /s2/profiles/me
    Allow: /s2/profiles
    Allow: /s2/photos
    Allow: /s2/static
    Disallow: /s2
    Allow: /s2/search/social
    Disallow: /transconsole/portal/
    Disallow: /gcc/
    Disallow: /aclk
    Disallow: /cse?
    Disallow: /cse/home
    Disallow: /cse/panel
    Disallow: /cse/manage
    Disallow: /tbproxy/
    Disallow: /imesync/
    Disallow: /shenghuo/search?
    Disallow: /support/forum/search?
    Disallow: /reviews/polls/
    Disallow: /hosted/images/
    Disallow: /ppob/?
    Disallow: /ppob?
    Disallow: /ig/add?
    Disallow: /adwordsresellers
    Disallow: /accounts/o8
    Allow: /accounts/o8/id
    Disallow: /topicsearch?q=
    Disallow: /xfx7/
    Disallow: /squared/api
    Disallow: /squared/search
    Disallow: /squared/table
    Disallow: /toolkit/
    Allow: /toolkit/*.html
    Disallow: /globalmarketfinder/
    Allow: /globalmarketfinder/*.html
    Disallow: /qnasearch?
    Disallow: /app/updates
    Disallow: /sidewiki/entry/
    Disallow: /quality_form?
    Disallow: /labs/popgadget/search
    Disallow: /buzz/post
    Disallow: /compressiontest/
    Disallow: /analytics/reporting/
    Disallow: /analytics/admin/
    Disallow: /analytics/web/
    Disallow: /analytics/feeds/
    Disallow: /analytics/settings/
    Disallow: /alerts/
    Disallow: /ads/search
    Disallow: /phone/compare/?
    Allow: /alerts/manage
    Allow: /alerts/remove
    Disallow: /travel/clk
    Disallow: /hotelfinder/rpc
    Disallow: /hotels/rpc
    Disallow: /flights/rpc
    Disallow: /commercesearch/services/
    Disallow: /evaluation/
    Disallow: /chrome/browser/mobile/tour
    Disallow: /compare/*/apply*
    Disallow: /forms/perks/
    Disallow: /baraza/*/search
    Disallow: /baraza/*/report
    Disallow: /shopping/suppliers/search
    Disallow: /ct/
    Disallow: /edu/cs4hs/
    Sitemap: http://www.gstatic.com/culturalinstitute/sitemaps/www_google_com_culturalinstitute/sitemap-index.xml
    Sitemap: http://www.google.com/hostednews/sitemap_index.xml
    Sitemap: http://www.google.com/sitemap_hreflang.xml
    Sitemap: http://www.google.com/sitemaps_webmasters.xml
    Sitemap: http://www.google.com/ventures/sitemap_ventures.xml
    Sitemap: http://www.gstatic.com/dictionary/static/sitemaps/sitemap_index.xml
    Sitemap: http://www.gstatic.com/earth/gallery/sitemaps/sitemap.xml
    Sitemap: http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml
    Sitemap: http://www.gstatic.com/trends/websites/sitemaps/sitemapindex.xml
    Source

    Code:
    http://www.google.com/robots.txt

    What a giant list of allow,disallow ?

    what do you think guyz
     
    • Thanks Thanks x 1
  2. ttrox

    ttrox Regular Member

    Joined:
    Jun 28, 2013
    Messages:
    217
    Likes Received:
    75
    Well it kinda depends on what kind of site you're running, in the case of Google it's so big and optimized that it's natural to see so many rules. On a common wordpress site you could hardly get more than 10 IMO.
     
    • Thanks Thanks x 1
  3. Endire

    Endire Elite Member Premium Member

    Joined:
    Mar 27, 2012
    Messages:
    1,756
    Likes Received:
    1,061
    Gender:
    Male
    ANKESH,

    I agree, the robots file should be configured according to the goals of the site for the most part. For example most of Facebook's site is configured to exclude robots from crawling the site. I think for the most part, the more information you see in a robots text file, the more the website owner knows how to use it effectively.

    Best,

    Shawn
     
  4. youtalkmedia

    youtalkmedia Senior Member

    Joined:
    Dec 5, 2011
    Messages:
    832
    Likes Received:
    376
    Occupation:
    Web Developer
    Location:
    Toronto
    Home Page:
    I think it is configured that way cuz if a robot started to go through google, it would never stop...
     
    • Thanks Thanks x 1