flipflop101
Junior Member
- Dec 2, 2008
- 149
- 17
Hi guys, I've recently needed to create a robots.txt for my Wordpress blog and thought I'd contribute something back to BHW for those that need help doing the same. I'm by no means an expert this is just information I've researched for myself.
Creating a decent robots.txt for your Wordpress blog is important as without one, even your unique self-written articles can appear to be duplicate content due to the way Wordpress is built. Additionally there are areas of the Wordpress installation where the Googlebot need not look i.e. /wp-admin
By making an organised robots.txt ourselves we can improve the efficiency of the crawler across our site and that helps improve SEO.
I know there are plug-ins for robots.txt however the ones I looked at were either messy or no real help, you still needed to sort it out yourself - so creating a simple, tidy robots.txt from scratch seems to be the best way.
It's also quick and easy.
So what should be included / excluded?
User-agent: *
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /comments
Disallow: /category/*/*
Disallow: */trackback
Disallow: */comments
Allow: /wp-content/uploads
Here we are saying that the bots (user-agents) are allowed to index your blog. We have then disallowed all folders that make up the Wordpress installation excluding the uploads folder. We allow this as it contains uploaded files i.e. images and videos.
Disallow: /*?*
Disallow: /*?
This part disallows all files with a ? in the url.
Be careful with this one, you need to have modified your file structure (permalinks) in Wordpress, it's best to do this anyway as this also helps with SEO. If you have left your permalinks as the default setting then the generated URL will contain a ? when you click on your articles/categories etc. so you should not use this part.
I prefer to setup a custom structure in Wordpress:
/%category%/%postname%/
Back to the robots.txt!
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.wmv$
Disallow: /*.cgi$
Disallow: /*.xhtml$
Here we disallow files with extensions such as .php, .js - you can alter this list to suit yourself however the above is a good starting point.
# allow google image bot to search all images
User-agent: Googlebot-Image
Disallow:
Allow: /*
# allow Google adsense bot on entire site
User-agent: Mediapartners-Google*
Disallow:
Allow: /*
Above we have now allowed Google image bot and Google adsense bot to access everything.
Additionally if you use the XML-Sitemap plugin for Wordpress you can add this at the end:
# BEGIN XML-SITEMAP-PLUGIN
Sitemap: your domain com/sitemap.xml.gz
# END XML-SITEMAP-PLUGIN
This tells the robots where your sitemap is.
I hope this was helpful, any input is welcome. As I said I'm no expert this is just me trying various plug-ins and researching online.
Creating a decent robots.txt for your Wordpress blog is important as without one, even your unique self-written articles can appear to be duplicate content due to the way Wordpress is built. Additionally there are areas of the Wordpress installation where the Googlebot need not look i.e. /wp-admin
By making an organised robots.txt ourselves we can improve the efficiency of the crawler across our site and that helps improve SEO.
I know there are plug-ins for robots.txt however the ones I looked at were either messy or no real help, you still needed to sort it out yourself - so creating a simple, tidy robots.txt from scratch seems to be the best way.
It's also quick and easy.
So what should be included / excluded?
User-agent: *
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /comments
Disallow: /category/*/*
Disallow: */trackback
Disallow: */comments
Allow: /wp-content/uploads
Here we are saying that the bots (user-agents) are allowed to index your blog. We have then disallowed all folders that make up the Wordpress installation excluding the uploads folder. We allow this as it contains uploaded files i.e. images and videos.
Disallow: /*?*
Disallow: /*?
This part disallows all files with a ? in the url.
Be careful with this one, you need to have modified your file structure (permalinks) in Wordpress, it's best to do this anyway as this also helps with SEO. If you have left your permalinks as the default setting then the generated URL will contain a ? when you click on your articles/categories etc. so you should not use this part.
I prefer to setup a custom structure in Wordpress:
/%category%/%postname%/
Back to the robots.txt!
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.wmv$
Disallow: /*.cgi$
Disallow: /*.xhtml$
Here we disallow files with extensions such as .php, .js - you can alter this list to suit yourself however the above is a good starting point.
# allow google image bot to search all images
User-agent: Googlebot-Image
Disallow:
Allow: /*
# allow Google adsense bot on entire site
User-agent: Mediapartners-Google*
Disallow:
Allow: /*
Above we have now allowed Google image bot and Google adsense bot to access everything.
Additionally if you use the XML-Sitemap plugin for Wordpress you can add this at the end:
# BEGIN XML-SITEMAP-PLUGIN
Sitemap: your domain com/sitemap.xml.gz
# END XML-SITEMAP-PLUGIN
This tells the robots where your sitemap is.
I hope this was helpful, any input is welcome. As I said I'm no expert this is just me trying various plug-ins and researching online.