1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

blocked googlebot on robots.txt, but page still not removed?

Discussion in 'White Hat SEO' started by ruger999, Nov 4, 2013.

  1. ruger999

    ruger999 Newbie

    Joined:
    Sep 18, 2011
    Messages:
    45
    Likes Received:
    1
    Hi,
    I have blocked googlebot from a certain folder in my website, after I found out that it indexed a few webpages that I didn't want to get indexed.
    I've made the block about 11 days ago, but the webpages kept showing, the only change I saw in the last 2-3 days was that the title in some of them was changed to "Untitled", and that a new webpage that got indexed said instead "A description for this result is not available because of this site's robots.txt"

    But that's not good enough, I want to get theses pages removed completely as if they were never indexed, how can I do this right?

    thanks
     
  2. Cogitasoft

    Cogitasoft BANNED BANNED

    Joined:
    Sep 25, 2013
    Messages:
    125
    Likes Received:
    33
    Solution 1 : Password protection


    Protecting site with htaccess password is the best way to block anyone else accessing the site. But that is not possible all the time when you have demo audience test.


    Solution 2 : Robots.txt


    Another Solution Google is providing is to use Robots.txt file to tell Bots not to crawl or list pages in results. But that's not always a solution. Google's Matt Cuts has confirmed that Google may include pages from such sites if Google think is relevant.


    User-agent: *
    Disallow: /
    Solution 3 : Using .htaccess RewriteCond


    So the solution is to block Google and other similar bots from accessing your site. For that, put following code in your htaccess.


    RewriteEngine on
    RewriteCond %{HTTP_USER_AGENT} AltaVista [OR]
    RewriteCond %{HTTP_USER_AGENT} Googlebot [OR]
    RewriteCond %{HTTP_USER_AGENT} msnbot [OR]
    RewriteCond %{HTTP_USER_AGENT} Slurp
    RewriteRule ^.*$ "http\:\/\/htmlremix\.com" [R=301,L]
    Change URL in last line to your main site so that your site gets SEO ranking if someone linked in to your blocked site.

    from http://www.htmlremix.com/seo/block-google-and-bots-using-htaccess-and-robots-txt
     
  3. GreyKnight

    GreyKnight Regular Member

    Joined:
    Mar 19, 2013
    Messages:
    399
    Likes Received:
    200
    Home Page:
    I think the no-index feature means that you tell Google to not index your site, but that doesn't mean it will tell Google to de-index it.
    Google will not visit that part from now on, but Google will still remember what is there.

    You can ask Google to remove pages, by reporting it to Google.
    Or, if you prefer the content not reviewed by Google team, just delete the content for now, and then create another folder, don't forget to no-index it first.
    Then tell Google to index the first folder again, which should be nothing now.

    I remember when one of my writers accidentally put up his password as his article's title, until now the article (although the title has been changed) can still be found by searching my writer's password. Google really remember a lot of things.
     
  4. ruger999

    ruger999 Newbie

    Joined:
    Sep 18, 2011
    Messages:
    45
    Likes Received:
    1
    I am using the folder to test landing pages, this is why I don't want google to index it,
    therefore I cannot put a password, and I already used the robots.txt to block google like this:

    User-Agent: Googlebot
    Disallow: /subfolder/

    I didn't get the third solution, as:
    a) I want to block only a certain folder, not the whole site
    b) how would I Get SEO ranking if the site should not be indexed according to this code?
     
  5. ruger999

    ruger999 Newbie

    Joined:
    Sep 18, 2011
    Messages:
    45
    Likes Received:
    1
    I want to keep some other webpages in that folder, that still weren't indexed by google,
    If I unblock the folder, I'm afraid google would index these files..
    is there a way to block the whole folder except these files (Which I would re-upload with no actual content)?
     
  6. TZ2011

    TZ2011 Senior Member

    Joined:
    Jun 26, 2011
    Messages:
    832
    Likes Received:
    863
    Occupation:
    Cleaning servers
    Some time ago I made a .php script that can protect site or files separately, depend what you need (it can be applied in header, or append over .htaccess) so basically you need to put line of code in your landing page and it will redirect all bots/ip's/hosts that you need to be redirected. Aprotect
     
  7. Techxan

    Techxan Elite Member

    Joined:
    Dec 7, 2011
    Messages:
    3,093
    Likes Received:
    3,585
    Occupation:
    Local SEOist
    Location:
    TEXAS (you have to yell, its the law.)
    Just use a "noindex, nofollow" meta tag on each page you want to exclude.
     
  8. lonhot2000

    lonhot2000 Newbie

    Joined:
    Sep 23, 2013
    Messages:
    19
    Likes Received:
    3
    I agree, this will remove the pages from the index, and will keep them out.

    <meta name="robots" content="noindex, nofollow" />

    To speed up the removal process, once the noindex is in place, you can remove the pages using the "Remove URLs" feature in Google webmaster tools, usually takes only a few hours.
     
  9. ruger999

    ruger999 Newbie

    Joined:
    Sep 18, 2011
    Messages:
    45
    Likes Received:
    1
    thanks, but can you explain why would it work better then the robots.txt code I currently have?
    also, is the google remove feature automatic, once I have the code, or does it go a manual review?
     
  10. lonhot2000

    lonhot2000 Newbie

    Joined:
    Sep 23, 2013
    Messages:
    19
    Likes Received:
    3
    Having URLs in robots.txt will not remove them from the index, it will just prevent them from being crawled. So if your goal is to remove them from the index, using robots.txt is not good enough.

    The URL remove in Google is automatic, you can remove individual pages, or entire directories. It typically takes a few hours.