robots txt question

Discussion in 'Blogging' started by crysis1, Apr 26, 2009.

  1. crysis1

    crysis1 Junior Member

    Joined:
    Apr 3, 2009
    Messages:
    100
    Likes Received:
    44
    How does this look for the robots.txt????


    Sitemap: /sitemap.xml

    User-agent: *
    Disallow: /wp-content/cache/
    Disallow: /wp-content/themes/
    Disallow: /wp-content/plugins/
    Disallow: /wp-admin/
    Disallow: /wp-includes/
    Disallow: /wp-login.php


    # disallow all files in these directories
    Disallow: /cgi-bin/
    Disallow: /z/j/
    Disallow: /z/c/
    Disallow: /stats/
    Disallow: /dh_
    Disallow: /wp-admin/
    Disallow: /wp-includes/
    Disallow: /contact/
    Disallow: /tag/
    Disallow: /wp-content/b
    Disallow: /wp-content/p
    Disallow: /wp-content/themes/askapache/4
    Disallow: /wp-content/themes/askapache/c
    Disallow: /wp-content/themes/askapache/d
    Disallow: /wp-content/themes/askapache/f
    Disallow: /wp-content/themes/askapache/h
    Disallow: /wp-content/themes/askapache/in
    Disallow: /wp-content/themes/askapache/p
    Disallow: /wp-content/themes/askapache/s
    Disallow: /trackback/
    Disallow: /*?*
    Disallow: */trackback/

    User-agent: Googlebot
    # disallow all files ending with these extensions
    Disallow: /*.php$
    Disallow: /*.js$
    Disallow: /*.inc$
    Disallow: /*.css$
    Disallow: /*.gz$
    Disallow: /*.cgi$
    Disallow: /*.wmv$
    Disallow: /*.png$
    Disallow: /*.gif$
    Disallow: /*.jpg$
    Disallow: /*.cgi$
    Disallow: /*.xhtml$
    Disallow: /*.php*
    Disallow: */trackback*
    Disallow: /*?*
    Disallow: /z/
    Disallow: /wp-*
    Allow: /wp-content/uploads/

    # allow google image bot to search all images
    User-agent: Googlebot-Image
    Allow: /*

    # allow adsense bot on entire site
    User-agent: Mediapartners-Google*
    Disallow: /*?*
    Allow: /z/
    Allow: /about/
    Allow: /contact/
    Allow: /wp-content/
    Allow: /tag/
    Allow: /manual/*
    Allow: /docs/*
    Allow: /*.php$
    Allow: /*.js$
    Allow: /*.inc$
    Allow: /*.css$
    Allow: /*.gz$
    Allow: /*.cgi$
    Allow: /*.wmv$
    Allow: /*.cgi$
    Allow: /*.xhtml$
    Allow: /*.php*
    Allow: /*.gif$
    Allow: /*.jpg$
    Allow: /*.png$

    # disallow archiving site
    User-agent: ia_archiver
    Disallow: /

    # disable duggmirror
    User-agent: duggmirror
    Disallow: /
     
  2. crysis1

    crysis1 Junior Member

    Joined:
    Apr 3, 2009
    Messages:
    100
    Likes Received:
    44
    bump :eek: lol anybody
     
  3. neo

    neo Power Member

    Joined:
    May 5, 2007
    Messages:
    550
    Likes Received:
    397
    If you put Disallow: /wp-content/themes/ the it will stop the spiders from crawling the entire themes directory. That means you don't have to enter,
    Disallow: /wp-content/themes/askapache/4 etc... again.
     
  4. keinehabe

    keinehabe Supreme Member

    Joined:
    Nov 4, 2008
    Messages:
    1,207
    Likes Received:
    472
    Gender:
    Male
    Occupation:
    -= CEO =-
    Location:
    Heaven
    Home Page:
    since when robots crawl the css and js files ?:O? and where's the point to disallow pages by extension ? since the wordpress htaccess file is so nicely made and make clean urls for the spyders ...