index.php? url blok in Robots.txt

nbseo

Junior Member
Joined
Nov 18, 2010
Messages
127
Reaction score
11
Hi all

I have an E-commerce web site which deals with maternity wear... and i have done all onpage work on it and also submit in GWT. I got 1018 links error in GWT says 404 not found.. I already fixed the issue with one of my developer and he told me now those kind of links will not generate. But still it's generated from some where in the site.. And all those URLs have one common string is /index.php?...

I want your help on.. "Can i block these kind of URLs by Disallowing in Robots.txt? like Disallow: /index.php?"

One of my friend said that by doing these may Google will not crawl your whole site !!! I m very much upset with this issue :( plz any one help me as I didn't get ranking for this project.. PLZ help me with the detail information on how can i fix this issue..

Thank you
 
Last edited:
Can you give us an example of one of those URLs?

Have you crawled your website (with Xenu, LinkExaminer...)?
 
Step one ... fire every one of your developers. Seriously.

I'm pretty good with ecommerce sites, if you want to PM me your URL I'll take a look at it and give you some free recommendations.

edit: to answer your question, to disallow index.php is paramount to disallowing the entire site. It sounds like you might have a canonical issue (both domain and domain.com/index.php are getting indexed), but instead of them using canonical to fix it they want to disallow the index.php which is paramount to disallowing your entire site depending on your setup, especially if that's the source of your google errors on thousands of pages.
 
Last edited:
Step one ... fire every one of your developers. Seriously.

I'm pretty good with ecommerce sites, if you want to PM me your URL I'll take a look at it and give you some free recommendations.

edit: to answer your question, to disallow index.php is paramount to disallowing the entire site. It sounds like you might have a canonical issue (both domain and domain.com/index.php are getting indexed), but instead of them using canonical to fix it they want to disallow the index.php which is paramount to disallowing your entire site depending on your setup, especially if that's the source of your google errors on thousands of pages.

PM send with my homepage.. Thank you buddy.. I have checked that there is no canonical issue with this.. the problem is something different that i tried heard to figured it out.. plz let me know the best solution you have.. thax appreciate..
 
Can you give us an example of one of those URLs?

Have you crawled your website (with Xenu, LinkExaminer...)?

"index.php?main_page=FILENAME_PRODUCT_FILTER&options_4=27&options_3=9&options_2=5&categories_id=14&zenid=d8r2tuujcamjqhiskpnh62krj5"

These kind of urls.. and yes i do with xenu..
 
In that case I would obfuscate those links with JS, but you have first to be sure spiders can crawl your entire site without them.

If you don't have obfuscating skills you can nofollow them but it's not the best method because you'll loose internal PageRank (see Bruce Clay article on his blog).

Rewrite the URLs (no more index.php) of the pages used to make your products/articles discovered by bots, you'll be then able to exclude safely index.php pages in your robots.txt.

If you can't rewrite them, exclude parameters in Google Webmaster Tools.
 
Back
Top