1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Robots.txt help

Discussion in 'Blogging' started by Knoxgates, Dec 12, 2010.

  1. Knoxgates

    Knoxgates Supreme Member

    Joined:
    Aug 9, 2008
    Messages:
    1,266
    Likes Received:
    918
    Hi,

    I have created a wordpress blog and i want to make sure that the Robots.txt file is correct.

    Code:
    Sitemap: http://www.yourblog.com/sitemap.xml
    User-agent: *
    Disallow: /wp-content/
    Disallow: /wp-admin/
    Disallow: /wp-includes/
    Disallow: /wp-
    Disallow: /feed/
    Disallow: /trackback/
    Disallow: /cgi-bin/
    User-agent: Googlebot
    Disallow: /*.php$
    Disallow: /*.js$
    Disallow: /*.cgi$
    Disallow: /*.xhtml$
    Disallow: /*.php*
    Disallow: */trackback*
    Disallow: /*?*
    Disallow: /z/
    Disallow: /wp-*
    Disallow: /*.inc$
    Disallow: /*.css$
    Disallow: /*.txt$
     
    • Thanks Thanks x 1
  2. Knoxgates

    Knoxgates Supreme Member

    Joined:
    Aug 9, 2008
    Messages:
    1,266
    Likes Received:
    918
    A small bump. Any experts out there.
     
  3. penguin

    penguin Junior Member

    Joined:
    Nov 20, 2009
    Messages:
    149
    Likes Received:
    60
    Occupation:
    SEO Manager
    Home Page:
    I think you figured indepth files and folders to block google bot, brilliant than the usual methods which only prevent wp folders to block from bots.
    I think this is pretty good to proceed.
     
  4. speedy5044

    speedy5044 Regular Member

    Joined:
    Jul 29, 2008
    Messages:
    456
    Likes Received:
    993
    Occupation:
    IM
    After some research , im using this one :
    Code:
    # This rule means it applies to all user-agents
    User-agent:  *
    Disallow: /wp-content/
    Disallow: /wp-admin/
    Disallow: /wp-includes/
    Disallow: /wp-
    Disallow: /trackback/
    Disallow: /cgi-bin/
     
    # Disallow all monthly archive pages
    Disallow: /2005/0
    Disallow: /2005/1
    Disallow: /2006/0
    Disallow: /2006/1
    Disallow: /2007/0
    Disallow: /2007/1
     
    # The Googlebot is the main search bot for google
    User-agent: Googlebot
     
    # Disallow all files ending with these extensions
    Disallow: /*.php$
    Disallow: /*.js$
    Disallow: /*.inc$
    Disallow: /*.css$
    Disallow: /*.gz$
    Disallow: /*.wmv$
    Disallow: /*.tar$
    Disallow: /*.tgz$
    Disallow: /*.cgi$
    Disallow: /*.xhtml$
     
    # Disallow Google from parsing indididual post feeds and trackbacks..
    Disallow: */feed/
    Disallow: */trackback/
     
    # Disallow all files with ? in url
    Disallow: /*?*
    Disallow: /*?
     
    # Disallow all archived monthlies
    Disallow: /2006/0*
    Disallow: /2007/0*
    Disallow: /2006/1*
    Disallow: /2007/1*
    
    # The Googlebot-Image is the image bot for google
    User-agent: Googlebot-Image
     
    # Allow Everything
    Allow: /*
     
    # This is the ad bot for google
    User-agent: Mediapartners-Google*
     
    # Allow Everything
    Allow: /*
     
    • Thanks Thanks x 1
  5. bigfred

    bigfred Power Member

    Joined:
    Mar 15, 2009
    Messages:
    743
    Likes Received:
    143
    Very interesting. Why do you guys stop googlebot for so many items? What is the benefit of doing this?
     
  6. Knoxgates

    Knoxgates Supreme Member

    Joined:
    Aug 9, 2008
    Messages:
    1,266
    Likes Received:
    918
    @speedy5044: hey i think you don't use sitemap. Am i correct.
     
  7. Bross

    Bross Senior Member

    Joined:
    Feb 6, 2010
    Messages:
    859
    Likes Received:
    355
    Google won't index these pages anyways. (In case you install with fantastico and use all in one seo).

    No hassle..
     
  8. radi2k

    radi2k Junior Member

    Joined:
    Nov 29, 2009
    Messages:
    117
    Likes Received:
    34
    Location:
    Germany
    i'm not quite sure if the robots.txt standard supports regular expression related terms like $ at the end. where did you get that information from?

    in my opinion you could also prevent the bots from indexing your contents by using proper meta-headers. do you know noindex, nofollow, nocache, noydir, ... in the html headers? i'm using that for directing the bots through my sites - it works well!

    edit: if you use wordpress try "all in one seo" plugin that fits all your needs i'm sure :)
     
  9. Largo

    Largo Newbie

    Joined:
    Aug 6, 2010
    Messages:
    34
    Likes Received:
    29
    Location:
    Under the Radar
    Check out this:
    FaceBook Robots.txt // It is good example anty robots politic (pro-BING)
    Code:
    hxxp://facebook.com/robots.txt
    and also interesting
    Code:
    hxxp://w3.exorbyte.de/robots.txt
    and small tutorial HOW TO:
    Code:
    hxxp://w3.irkawebpromotions.com/robots-txt-tutorials/
    and main FAQ about this
    Code:
    hxxp://w3.robotstxt.org/faq.html
     
    • Thanks Thanks x 1
    Last edited: Dec 13, 2010