Hi, I have created a wordpress blog and i want to make sure that the Robots.txt file is correct. Code: Sitemap: http://www.yourblog.com/sitemap.xml User-agent: * Disallow: /wp-content/ Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: /wp- Disallow: /feed/ Disallow: /trackback/ Disallow: /cgi-bin/ User-agent: Googlebot Disallow: /*.php$ Disallow: /*.js$ Disallow: /*.cgi$ Disallow: /*.xhtml$ Disallow: /*.php* Disallow: */trackback* Disallow: /*?* Disallow: /z/ Disallow: /wp-* Disallow: /*.inc$ Disallow: /*.css$ Disallow: /*.txt$
I think you figured indepth files and folders to block google bot, brilliant than the usual methods which only prevent wp folders to block from bots. I think this is pretty good to proceed.
After some research , im using this one : Code: # This rule means it applies to all user-agents User-agent: * Disallow: /wp-content/ Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: /wp- Disallow: /trackback/ Disallow: /cgi-bin/ # Disallow all monthly archive pages Disallow: /2005/0 Disallow: /2005/1 Disallow: /2006/0 Disallow: /2006/1 Disallow: /2007/0 Disallow: /2007/1 # The Googlebot is the main search bot for google User-agent: Googlebot # Disallow all files ending with these extensions Disallow: /*.php$ Disallow: /*.js$ Disallow: /*.inc$ Disallow: /*.css$ Disallow: /*.gz$ Disallow: /*.wmv$ Disallow: /*.tar$ Disallow: /*.tgz$ Disallow: /*.cgi$ Disallow: /*.xhtml$ # Disallow Google from parsing indididual post feeds and trackbacks.. Disallow: */feed/ Disallow: */trackback/ # Disallow all files with ? in url Disallow: /*?* Disallow: /*? # Disallow all archived monthlies Disallow: /2006/0* Disallow: /2007/0* Disallow: /2006/1* Disallow: /2007/1* # The Googlebot-Image is the image bot for google User-agent: Googlebot-Image # Allow Everything Allow: /* # This is the ad bot for google User-agent: Mediapartners-Google* # Allow Everything Allow: /*
Very interesting. Why do you guys stop googlebot for so many items? What is the benefit of doing this?
Google won't index these pages anyways. (In case you install with fantastico and use all in one seo). No hassle..
i'm not quite sure if the robots.txt standard supports regular expression related terms like $ at the end. where did you get that information from? in my opinion you could also prevent the bots from indexing your contents by using proper meta-headers. do you know noindex, nofollow, nocache, noydir, ... in the html headers? i'm using that for directing the bots through my sites - it works well! edit: if you use wordpress try "all in one seo" plugin that fits all your needs i'm sure
Check out this: FaceBook Robots.txt // It is good example anty robots politic (pro-BING) Code: hxxp://facebook.com/robots.txt and also interesting Code: hxxp://w3.exorbyte.de/robots.txt and small tutorial HOW TO: Code: hxxp://w3.irkawebpromotions.com/robots-txt-tutorials/ and main FAQ about this Code: hxxp://w3.robotstxt.org/faq.html