1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Tips when creating a robots.txt for your Wordpress blog

Discussion in 'White Hat SEO' started by flipflop101, Aug 27, 2010.

  1. flipflop101

    flipflop101 Junior Member

    Joined:
    Dec 2, 2008
    Messages:
    146
    Likes Received:
    17
    Hi guys, I've recently needed to create a robots.txt for my Wordpress blog and thought I'd contribute something back to BHW for those that need help doing the same. I'm by no means an expert this is just information I've researched for myself.

    Creating a decent robots.txt for your Wordpress blog is important as without one, even your unique self-written articles can appear to be duplicate content due to the way Wordpress is built. Additionally there are areas of the Wordpress installation where the Googlebot need not look i.e. /wp-admin

    By making an organised robots.txt ourselves we can improve the efficiency of the crawler across our site and that helps improve SEO.

    I know there are plug-ins for robots.txt however the ones I looked at were either messy or no real help, you still needed to sort it out yourself - so creating a simple, tidy robots.txt from scratch seems to be the best way.

    It's also quick and easy.

    So what should be included / excluded?

    User-agent: *
    Disallow: /wp-admin
    Disallow: /wp-includes
    Disallow: /wp-content/plugins
    Disallow: /wp-content/cache
    Disallow: /wp-content/themes
    Disallow: /trackback
    Disallow: /comments
    Disallow: /category/*/*
    Disallow: */trackback
    Disallow: */comments
    Allow: /wp-content/uploads

    Here we are saying that the bots (user-agents) are allowed to index your blog. We have then disallowed all folders that make up the Wordpress installation excluding the uploads folder. We allow this as it contains uploaded files i.e. images and videos.

    Disallow: /*?*
    Disallow: /*?

    This part disallows all files with a ? in the url.

    Be careful with this one, you need to have modified your file structure (permalinks) in Wordpress, it's best to do this anyway as this also helps with SEO. If you have left your permalinks as the default setting then the generated URL will contain a ? when you click on your articles/categories etc. so you should not use this part.

    I prefer to setup a custom structure in Wordpress:

    /%category%/%postname%/


    Back to the robots.txt!

    Disallow: /*.php$
    Disallow: /*.js$
    Disallow: /*.inc$
    Disallow: /*.css$
    Disallow: /*.gz$
    Disallow: /*.wmv$
    Disallow: /*.cgi$
    Disallow: /*.xhtml$

    Here we disallow files with extensions such as .php, .js - you can alter this list to suit yourself however the above is a good starting point.

    # allow google image bot to search all images
    User-agent: Googlebot-Image
    Disallow:
    Allow: /*

    # allow Google adsense bot on entire site
    User-agent: Mediapartners-Google*
    Disallow:
    Allow: /*

    Above we have now allowed Google image bot and Google adsense bot to access everything.

    Additionally if you use the XML-Sitemap plugin for Wordpress you can add this at the end:

    # BEGIN XML-SITEMAP-PLUGIN
    Sitemap: your domain com/sitemap.xml.gz
    # END XML-SITEMAP-PLUGIN

    This tells the robots where your sitemap is.

    I hope this was helpful, any input is welcome. As I said I'm no expert this is just me trying various plug-ins and researching online.
     
    • Thanks Thanks x 1
  2. Billy Blue

    Billy Blue Newbie

    Joined:
    Aug 7, 2010
    Messages:
    9
    Likes Received:
    0
    Home Page:
    Thanks, for that post. I just made one too...

    I reckon you have to be really careful about what you exclude. Don't wanna shoot yourself in the foot, argh!
     
  3. flipflop101

    flipflop101 Junior Member

    Joined:
    Dec 2, 2008
    Messages:
    146
    Likes Received:
    17
    Ah apologies Billy, I actually did a quick search on this forum to see if there was any useful information before I started looking online. Didn't notice anything at the time, hope I didn't step on your toes mate :)

    Yes you definately need to be careful what you exclude - but as far as I can tell this is working great. My sitemap got picked up just a few hours ago via the line in the robots.txt and all my articles have been picked up through it.
     
  4. MikaLuke

    MikaLuke Newbie

    Joined:
    Jul 6, 2010
    Messages:
    35
    Likes Received:
    6
    I want to ask 1 question, if i don't use robots file then SE's bots can index everything?
     
  5. nufaman

    nufaman Elite Member

    Joined:
    May 29, 2009
    Messages:
    1,697
    Likes Received:
    1,185
    wow... what a huge waste of time...

    I've never touched a robots txt file and have top rankings everywhere. You should spend your time working on your site and not on things that are useless as a robots file. You could even delete it and nothing would happen
     
    • Thanks Thanks x 1
  6. flipflop101

    flipflop101 Junior Member

    Joined:
    Dec 2, 2008
    Messages:
    146
    Likes Received:
    17
    • Thanks Thanks x 1
    Last edited: Aug 30, 2010
  7. keekn

    keekn Newbie

    Joined:
    Sep 25, 2009
    Messages:
    42
    Likes Received:
    1
    hi,

    heared that /%category%/%postname%/ is not recommended...it's a pain in the ass if you change your category...

    just >> /%postname%.html << should be fine.
     
  8. flipflop101

    flipflop101 Junior Member

    Joined:
    Dec 2, 2008
    Messages:
    146
    Likes Received:
    17
    Keekn, agreed. I've always had the category in there for additional SEO advantage however when weighing up the pro's and con's of this versus just %postname% I've since changed to using just postname. Keeps things simpler :)

    Thanks for your input - I would edit my post but I can't unfortunately.