Tips when creating a robots.txt for your Wordpress blog

flipflop101

Junior Member
Dec 2, 2008
149
17
Hi guys, I've recently needed to create a robots.txt for my Wordpress blog and thought I'd contribute something back to BHW for those that need help doing the same. I'm by no means an expert this is just information I've researched for myself.

Creating a decent robots.txt for your Wordpress blog is important as without one, even your unique self-written articles can appear to be duplicate content due to the way Wordpress is built. Additionally there are areas of the Wordpress installation where the Googlebot need not look i.e. /wp-admin

By making an organised robots.txt ourselves we can improve the efficiency of the crawler across our site and that helps improve SEO.

I know there are plug-ins for robots.txt however the ones I looked at were either messy or no real help, you still needed to sort it out yourself - so creating a simple, tidy robots.txt from scratch seems to be the best way.

It's also quick and easy.

So what should be included / excluded?

User-agent: *
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /comments
Disallow: /category/*/*
Disallow: */trackback
Disallow: */comments
Allow: /wp-content/uploads

Here we are saying that the bots (user-agents) are allowed to index your blog. We have then disallowed all folders that make up the Wordpress installation excluding the uploads folder. We allow this as it contains uploaded files i.e. images and videos.

Disallow: /*?*
Disallow: /*?

This part disallows all files with a ? in the url.

Be careful with this one, you need to have modified your file structure (permalinks) in Wordpress, it's best to do this anyway as this also helps with SEO. If you have left your permalinks as the default setting then the generated URL will contain a ? when you click on your articles/categories etc. so you should not use this part.

I prefer to setup a custom structure in Wordpress:

/%category%/%postname%/


Back to the robots.txt!

Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.wmv$
Disallow: /*.cgi$
Disallow: /*.xhtml$

Here we disallow files with extensions such as .php, .js - you can alter this list to suit yourself however the above is a good starting point.

# allow google image bot to search all images
User-agent: Googlebot-Image
Disallow:
Allow: /*

# allow Google adsense bot on entire site
User-agent: Mediapartners-Google*
Disallow:
Allow: /*

Above we have now allowed Google image bot and Google adsense bot to access everything.

Additionally if you use the XML-Sitemap plugin for Wordpress you can add this at the end:

# BEGIN XML-SITEMAP-PLUGIN
Sitemap: your domain com/sitemap.xml.gz
# END XML-SITEMAP-PLUGIN

This tells the robots where your sitemap is.

I hope this was helpful, any input is welcome. As I said I'm no expert this is just me trying various plug-ins and researching online.
 
Thanks, for that post. I just made one too...

I reckon you have to be really careful about what you exclude. Don't wanna shoot yourself in the foot, argh!
 
Ah apologies Billy, I actually did a quick search on this forum to see if there was any useful information before I started looking online. Didn't notice anything at the time, hope I didn't step on your toes mate :)

Yes you definately need to be careful what you exclude - but as far as I can tell this is working great. My sitemap got picked up just a few hours ago via the line in the robots.txt and all my articles have been picked up through it.
 
wow... what a huge waste of time...

I've never touched a robots txt file and have top rankings everywhere. You should spend your time working on your site and not on things that are useless as a robots file. You could even delete it and nothing would happen
 
Last edited:
hi,

heared that /%category%/%postname%/ is not recommended...it's a pain in the ass if you change your category...

just >> /%postname%.html << should be fine.
 
Keekn, agreed. I've always had the category in there for additional SEO advantage however when weighing up the pro's and con's of this versus just %postname% I've since changed to using just postname. Keeps things simpler :)

Thanks for your input - I would edit my post but I can't unfortunately.
 
Back
Top
AdBlock Detected

We get it, advertisements are annoying!

Sure, ad-blocking software does a great job at blocking ads, but it also blocks useful features and essential functions on BlackHatWorld and other forums. These functions are unrelated to ads, such as internal links and images. For the best site experience please disable your AdBlocker.

I've Disabled AdBlock