Robots.txt help

Discussion in 'Blogging' started by Knoxgates, Dec 12, 2010.

  1. Knoxgates

    Knoxgates Supreme Member

    Joined:
    Aug 9, 2008
    Messages:
    1,266
    Likes Received:
    919
    Hi,

    I have created a wordpress blog and i want to make sure that the Robots.txt file is correct.

    Code:
    Sitemap: http://www.yourblog.com/sitemap.xml
    User-agent: *
    Disallow: /wp-content/
    Disallow: /wp-admin/
    Disallow: /wp-includes/
    Disallow: /wp-
    Disallow: /feed/
    Disallow: /trackback/
    Disallow: /cgi-bin/
    User-agent: Googlebot
    Disallow: /*.php$
    Disallow: /*.js$
    Disallow: /*.cgi$
    Disallow: /*.xhtml$
    Disallow: /*.php*
    Disallow: */trackback*
    Disallow: /*?*
    Disallow: /z/
    Disallow: /wp-*
    Disallow: /*.inc$
    Disallow: /*.css$
    Disallow: /*.txt$
     
    • Thanks Thanks x 1
  2. Knoxgates

    Knoxgates Supreme Member

    Joined:
    Aug 9, 2008
    Messages:
    1,266
    Likes Received:
    919
    A small bump. Any experts out there.
     
  3. penguin

    penguin Junior Member

    Joined:
    Nov 20, 2009
    Messages:
    149
    Likes Received:
    60
    Occupation:
    SEO Manager
    I think you figured indepth files and folders to block google bot, brilliant than the usual methods which only prevent wp folders to block from bots.
    I think this is pretty good to proceed.
     
  4. speedy5044

    speedy5044 Regular Member

    Joined:
    Jul 29, 2008
    Messages:
    456
    Likes Received:
    994
    Occupation:
    IM
    After some research , im using this one :
    Code:
    # This rule means it applies to all user-agents
    User-agent:  *
    Disallow: /wp-content/
    Disallow: /wp-admin/
    Disallow: /wp-includes/
    Disallow: /wp-
    Disallow: /trackback/
    Disallow: /cgi-bin/
     
    # Disallow all monthly archive pages
    Disallow: /2005/0
    Disallow: /2005/1
    Disallow: /2006/0
    Disallow: /2006/1
    Disallow: /2007/0
    Disallow: /2007/1
     
    # The Googlebot is the main search bot for google
    User-agent: Googlebot
     
    # Disallow all files ending with these extensions
    Disallow: /*.php$
    Disallow: /*.js$
    Disallow: /*.inc$
    Disallow: /*.css$
    Disallow: /*.gz$
    Disallow: /*.wmv$
    Disallow: /*.tar$
    Disallow: /*.tgz$
    Disallow: /*.cgi$
    Disallow: /*.xhtml$
     
    # Disallow Google from parsing indididual post feeds and trackbacks..
    Disallow: */feed/
    Disallow: */trackback/
     
    # Disallow all files with ? in url
    Disallow: /*?*
    Disallow: /*?
     
    # Disallow all archived monthlies
    Disallow: /2006/0*
    Disallow: /2007/0*
    Disallow: /2006/1*
    Disallow: /2007/1*
    
    # The Googlebot-Image is the image bot for google
    User-agent: Googlebot-Image
     
    # Allow Everything
    Allow: /*
     
    # This is the ad bot for google
    User-agent: Mediapartners-Google*
     
    # Allow Everything
    Allow: /*
     
    • Thanks Thanks x 1
  5. bigfred

    bigfred Power Member

    Joined:
    Mar 15, 2009
    Messages:
    745
    Likes Received:
    144
    Very interesting. Why do you guys stop googlebot for so many items? What is the benefit of doing this?
     
  6. Knoxgates

    Knoxgates Supreme Member

    Joined:
    Aug 9, 2008
    Messages:
    1,266
    Likes Received:
    919
    @speedy5044: hey i think you don't use sitemap. Am i correct.
     
  7. Bross

    Bross Senior Member

    Joined:
    Feb 6, 2010
    Messages:
    859
    Likes Received:
    355
    Google won't index these pages anyways. (In case you install with fantastico and use all in one seo).

    No hassle..
     
  8. radi2k

    radi2k Junior Member

    Joined:
    Nov 29, 2009
    Messages:
    117
    Likes Received:
    34
    Location:
    Germany
    i'm not quite sure if the robots.txt standard supports regular expression related terms like $ at the end. where did you get that information from?

    in my opinion you could also prevent the bots from indexing your contents by using proper meta-headers. do you know noindex, nofollow, nocache, noydir, ... in the html headers? i'm using that for directing the bots through my sites - it works well!

    edit: if you use wordpress try "all in one seo" plugin that fits all your needs i'm sure :)
     
  9. Largo

    Largo Newbie

    Joined:
    Aug 6, 2010
    Messages:
    34
    Likes Received:
    29
    Location:
    Under the Radar
    Check out this:
    FaceBook Robots.txt // It is good example anty robots politic (pro-BING)
    Code:
    hxxp://facebook.com/robots.txt
    and also interesting
    Code:
    hxxp://w3.exorbyte.de/robots.txt
    and small tutorial HOW TO:
    Code:
    hxxp://w3.irkawebpromotions.com/robots-txt-tutorials/
    and main FAQ about this
    Code:
    hxxp://w3.robotstxt.org/faq.html
     
    • Thanks Thanks x 1
    Last edited: Dec 13, 2010