1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

index.php? url blok in Robots.txt

Discussion in 'White Hat SEO' started by nbseo, Nov 28, 2011.

  1. nbseo

    nbseo Junior Member

    Joined:
    Nov 18, 2010
    Messages:
    127
    Likes Received:
    11
    Occupation:
    SEO... my Food of life.. ;)
    Hi all

    I have an E-commerce web site which deals with maternity wear... and i have done all onpage work on it and also submit in GWT. I got 1018 links error in GWT says 404 not found.. I already fixed the issue with one of my developer and he told me now those kind of links will not generate. But still it's generated from some where in the site.. And all those URLs have one common string is /index.php?...

    I want your help on.. "Can i block these kind of URLs by Disallowing in Robots.txt? like Disallow: /index.php?"

    One of my friend said that by doing these may Google will not crawl your whole site !!! I m very much upset with this issue :( plz any one help me as I didn't get ranking for this project.. PLZ help me with the detail information on how can i fix this issue..

    Thank you
     
    Last edited: Nov 28, 2011
  2. karnabal

    karnabal Newbie

    Joined:
    Nov 28, 2011
    Messages:
    19
    Likes Received:
    0
    Can you give us an example of one of those URLs?

    Have you crawled your website (with Xenu, LinkExaminer...)?
     
  3. phpbuilt

    phpbuilt Jr. VIP Jr. VIP

    Joined:
    May 16, 2011
    Messages:
    1,650
    Likes Received:
    5,208
    Occupation:
    $ from websites I own.
    Location:
    putting monkeys in paypal
    Step one ... fire every one of your developers. Seriously.

    I'm pretty good with ecommerce sites, if you want to PM me your URL I'll take a look at it and give you some free recommendations.

    edit: to answer your question, to disallow index.php is paramount to disallowing the entire site. It sounds like you might have a canonical issue (both domain and domain.com/index.php are getting indexed), but instead of them using canonical to fix it they want to disallow the index.php which is paramount to disallowing your entire site depending on your setup, especially if that's the source of your google errors on thousands of pages.
     
    • Thanks Thanks x 1
    Last edited: Nov 29, 2011
  4. nbseo

    nbseo Junior Member

    Joined:
    Nov 18, 2010
    Messages:
    127
    Likes Received:
    11
    Occupation:
    SEO... my Food of life.. ;)
    PM send with my homepage.. Thank you buddy.. I have checked that there is no canonical issue with this.. the problem is something different that i tried heard to figured it out.. plz let me know the best solution you have.. thax appreciate..
     
  5. nbseo

    nbseo Junior Member

    Joined:
    Nov 18, 2010
    Messages:
    127
    Likes Received:
    11
    Occupation:
    SEO... my Food of life.. ;)
    "index.php?main_page=FILENAME_PRODUCT_FILTER&options_4=27&options_3=9&options_2=5&categories_id=14&zenid=d8r2tuujcamjqhiskpnh62krj5"

    These kind of urls.. and yes i do with xenu..
     
  6. karnabal

    karnabal Newbie

    Joined:
    Nov 28, 2011
    Messages:
    19
    Likes Received:
    0
    In that case I would obfuscate those links with JS, but you have first to be sure spiders can crawl your entire site without them.

    If you don't have obfuscating skills you can nofollow them but it's not the best method because you'll loose internal PageRank (see Bruce Clay article on his blog).

    Rewrite the URLs (no more index.php) of the pages used to make your products/articles discovered by bots, you'll be then able to exclude safely index.php pages in your robots.txt.

    If you can't rewrite them, exclude parameters in Google Webmaster Tools.