1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Having some problems with my robots.txt file

Discussion in 'White Hat SEO' started by thedon23, Jan 15, 2012.

  1. thedon23

    thedon23 Elite Member

    Joined:
    Dec 21, 2009
    Messages:
    1,759
    Likes Received:
    1,268
    Last night I noticed that everything on my site was being indexed (even the the wp-admin folder and all that).

    I went into the root folder of my site: public_html/www.site.com

    I noticed there was no robots.txt, so I created one, and added the following to it (I noticed one of the mods here had recommended this):

    When I go into Google Webmaster Tools, it is still show my old robots.txt settings, which is weird because I had never even created one for the site. It shows the following:

    Then I notice, GWT is showing my robots.txt file at http://site.com/robots.txt,
    whereas my robots.txt file is really at http://[B]www[/B].site.com

    Anybody know how to fix this?!
     
  2. tortuga

    tortuga Newbie

    Joined:
    Nov 22, 2011
    Messages:
    14
    Likes Received:
    4
    301 redirect...
     
  3. dan777

    dan777 Junior Member

    Joined:
    Mar 3, 2011
    Messages:
    106
    Likes Received:
    15
    Location:
    Europe
    What is the meaning of this robots function:
    Disallow: /category/*/*
    and
    Disallow: /*?*
    Disallow: /*?

    And what is the difference with:
    Disallow: /category/*/*
    and
    Disallow: /category/
    though, if you end with a "/" then it will specify that as the match.
    That means this;
    Disallow: /wp-includes/
    will block these;
    Disallow: /wp-includes/this.html
    Disallow: /wp-includes/that.php
    Disallow: /wp-includes/thistoo.jpg
    Disallow: /wp-includes/here/here2/anythinginhere.aswell

    I would be very thank full, if anybody clarify this statements.
     
  4. thedon23

    thedon23 Elite Member

    Joined:
    Dec 21, 2009
    Messages:
    1,759
    Likes Received:
    1,268
    To be honest, I have no idea what it means haha. I just use it because it seems like that's what most people are using.

    Okay, so it looks like Google Webmaster Tools recognized my new robots.txt file. Now, like I said, all of my wp-admin folder is indexed in Google. Now that I've updated the updated the robots.txt file to not index those sites, will they soon be removed from Google's index? Or do I need to request a URL removal?
     
  5. thedon23

    thedon23 Elite Member

    Joined:
    Dec 21, 2009
    Messages:
    1,759
    Likes Received:
    1,268
    Bump. Anybody? I just checked, and all those useless pages are still indexed in Google.
     
  6. BlackhatUser

    BlackhatUser Registered Member

    Joined:
    Jan 28, 2009
    Messages:
    97
    Likes Received:
    43
    I believe Google will remove those links soon. Give it some time :)
     
    • Thanks Thanks x 1
    Last edited: Jan 17, 2012
  7. thedon23

    thedon23 Elite Member

    Joined:
    Dec 21, 2009
    Messages:
    1,759
    Likes Received:
    1,268
    Yeah, I did that a couple days ago. Thanks for pointing that out though. So Google sees my new robots.txt file, but all those pages are still indexed. Do I have to wait a couple of weeks or something?
     
  8. mark27

    mark27 Regular Member

    Joined:
    Dec 19, 2011
    Messages:
    224
    Likes Received:
    105
    To break it down, the root of your domain is / and then the paths are based on that. So if you have a file at .com/images/uploads/my-picture.jpg and you didn't want google to see it you'd type

    Disallow: /images/uploads/my-picture.jpg

    Now let's say you want to block google from indexing the entire uploads folder you'd type

    Disallow: /images/uploads/*

    The star means everything that uses the beginning of the path is blocked, so /images/uploads/your-picture.jpg is also blocked with the * example.

    Disallow: /images/uploads/*?*

    I'm pretty sure this would block any string that includes a ? followed by another string, so if that part of the extension has a ? in it it will be blocked. So it would block /images/uploads/1234?12345.html but wouldn't block /images/uploads/12345-12345.html

    Disallow: /images/uploads/*/*

    That would block the contents of any folder in the uploads directory but would allow indexing of the files in the uploads directory.

    Pretty sure that's right, but you should google if you really want the correct answer.
     
    • Thanks Thanks x 1
  9. SEOWhizz

    SEOWhizz Power Member

    Joined:
    Oct 22, 2011
    Messages:
    606
    Likes Received:
    432
    Location:
    Lat: 38N 43' 11.298" Long: 27W 12' 7.733"
    1. You've already taken the first step by blocking the required folders in robots txt.
    2. Now, you need to request a directory (or URL) removal in Google Webmaster tools:

    More info:
    Code:
    http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1663427
    The pages should drop out of the index as g00gle recrawls.
     
    • Thanks Thanks x 1
    Last edited: Jan 17, 2012