1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How to block URLs like this in robots.txt from Google crawl

Discussion in 'Blogging' started by sviedinys, Dec 19, 2012.

  1. sviedinys

    sviedinys Jr. VIP Jr. VIP

    Joined:
    Apr 18, 2010
    Messages:
    503
    Likes Received:
    69
    I was using Transposh plugin for wordpress, but after update got really bad results for my site, so I removed the plugin. Google still crawls all these URLs with languages ?lang=sv ?lang=ru ?lang=jp ...

    Example
    www.site.com/page/?lang=jp

    How I should block it correctly in robots.txt to stop bot from crawling?

    Tried like:
    Disallow: /?lang=af/
    Disallow: /?lang=sq/
    Disallow: /?lang=ar/
    Disallow: /?lang=hy/

    And

    Disallow: /af/
    Disallow: /sq/
    Disallow: /ar/
    Disallow: /hy/

    Nothing helped, when I checked, it still crawls these URLs.
     
  2. Endire

    Endire Elite Member Premium Member

    Joined:
    Mar 27, 2012
    Messages:
    1,756
    Likes Received:
    1,061
    Gender:
    Male
  3. sviedinys

    sviedinys Jr. VIP Jr. VIP

    Joined:
    Apr 18, 2010
    Messages:
    503
    Likes Received:
    69
    The problem is that it doesnt create any "folder" or htm file like you gave example.

    I need to block from crawling only the end line
    site.com/page/?lang=jp
    ?lang=jp

    and leave
    site.com/page/