blocked googlebot on robots.txt, but page still not removed?

ruger999

Newbie
Joined
Sep 18, 2011
Messages
46
Reaction score
1
Hi,
I have blocked googlebot from a certain folder in my website, after I found out that it indexed a few webpages that I didn't want to get indexed.
I've made the block about 11 days ago, but the webpages kept showing, the only change I saw in the last 2-3 days was that the title in some of them was changed to "Untitled", and that a new webpage that got indexed said instead "A description for this result is not available because of this site's robots.txt"

But that's not good enough, I want to get theses pages removed completely as if they were never indexed, how can I do this right?

thanks
 

Cogitasoft

BANNED
Joined
Sep 25, 2013
Messages
125
Reaction score
33
Solution 1 : Password protection


Protecting site with htaccess password is the best way to block anyone else accessing the site. But that is not possible all the time when you have demo audience test.


Solution 2 : Robots.txt


Another Solution Google is providing is to use Robots.txt file to tell Bots not to crawl or list pages in results. But that's not always a solution. Google's Matt Cuts has confirmed that Google may include pages from such sites if Google think is relevant.


User-agent: *
Disallow: /
Solution 3 : Using .htaccess RewriteCond


So the solution is to block Google and other similar bots from accessing your site. For that, put following code in your htaccess.


RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} AltaVista [OR]
RewriteCond %{HTTP_USER_AGENT} Googlebot [OR]
RewriteCond %{HTTP_USER_AGENT} msnbot [OR]
RewriteCond %{HTTP_USER_AGENT} Slurp
RewriteRule ^.*$ "http\:\/\/htmlremix\.com" [R=301,L]
Change URL in last line to your main site so that your site gets SEO ranking if someone linked in to your blocked site.

from http://www.htmlremix.com/seo/block-google-and-bots-using-htaccess-and-robots-txt
 

GreyKnight

Regular Member
Joined
Mar 19, 2013
Messages
398
Reaction score
201
I think the no-index feature means that you tell Google to not index your site, but that doesn't mean it will tell Google to de-index it.
Google will not visit that part from now on, but Google will still remember what is there.

You can ask Google to remove pages, by reporting it to Google.
Or, if you prefer the content not reviewed by Google team, just delete the content for now, and then create another folder, don't forget to no-index it first.
Then tell Google to index the first folder again, which should be nothing now.

I remember when one of my writers accidentally put up his password as his article's title, until now the article (although the title has been changed) can still be found by searching my writer's password. Google really remember a lot of things.
 

ruger999

Newbie
Joined
Sep 18, 2011
Messages
46
Reaction score
1
Solution 1 : Password protection


Protecting site with htaccess password is the best way to block anyone else accessing the site. But that is not possible all the time when you have demo audience test.


Solution 2 : Robots.txt


Another Solution Google is providing is to use Robots.txt file to tell Bots not to crawl or list pages in results. But that's not always a solution. Google's Matt Cuts has confirmed that Google may include pages from such sites if Google think is relevant.


User-agent: *
Disallow: /
Solution 3 : Using .htaccess RewriteCond


So the solution is to block Google and other similar bots from accessing your site. For that, put following code in your htaccess.


RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} AltaVista [OR]
RewriteCond %{HTTP_USER_AGENT} Googlebot [OR]
RewriteCond %{HTTP_USER_AGENT} msnbot [OR]
RewriteCond %{HTTP_USER_AGENT} Slurp
RewriteRule ^.*$ "http\:\/\/htmlremix\.com" [R=301,L]
Change URL in last line to your main site so that your site gets SEO ranking if someone linked in to your blocked site.

I am using the folder to test landing pages, this is why I don't want google to index it,
therefore I cannot put a password, and I already used the robots.txt to block google like this:

User-Agent: Googlebot
Disallow: /subfolder/

I didn't get the third solution, as:
a) I want to block only a certain folder, not the whole site
b) how would I Get SEO ranking if the site should not be indexed according to this code?
 

ruger999

Newbie
Joined
Sep 18, 2011
Messages
46
Reaction score
1
I think the no-index feature means that you tell Google to not index your site, but that doesn't mean it will tell Google to de-index it.
Google will not visit that part from now on, but Google will still remember what is there.

You can ask Google to remove pages, by reporting it to Google.
Or, if you prefer the content not reviewed by Google team, just delete the content for now, and then create another folder, don't forget to no-index it first.
Then tell Google to index the first folder again, which should be nothing now.

I remember when one of my writers accidentally put up his password as his article's title, until now the article (although the title has been changed) can still be found by searching my writer's password. Google really remember a lot of things.

I want to keep some other webpages in that folder, that still weren't indexed by google,
If I unblock the folder, I'm afraid google would index these files..
is there a way to block the whole folder except these files (Which I would re-upload with no actual content)?
 

TZ2011

Senior Member
Joined
Jun 26, 2011
Messages
846
Reaction score
878
Some time ago I made a .php script that can protect site or files separately, depend what you need (it can be applied in header, or append over .htaccess) so basically you need to put line of code in your landing page and it will redirect all bots/ip's/hosts that you need to be redirected. Aprotect
 

Techxan

Elite Member
Joined
Dec 7, 2011
Messages
3,092
Reaction score
3,614
Age
66
Just use a "noindex, nofollow" meta tag on each page you want to exclude.
 

lonhot2000

Newbie
Joined
Sep 23, 2013
Messages
19
Reaction score
3
Just use a "noindex, nofollow" meta tag on each page you want to exclude.

I agree, this will remove the pages from the index, and will keep them out.

<meta name="robots" content="noindex, nofollow" />

To speed up the removal process, once the noindex is in place, you can remove the pages using the "Remove URLs" feature in Google webmaster tools, usually takes only a few hours.
 

ruger999

Newbie
Joined
Sep 18, 2011
Messages
46
Reaction score
1
I agree, this will remove the pages from the index, and will keep them out.

<meta name="robots" content="noindex, nofollow" />

To speed up the removal process, once the noindex is in place, you can remove the pages using the "Remove URLs" feature in Google webmaster tools, usually takes only a few hours.

thanks, but can you explain why would it work better then the robots.txt code I currently have?
also, is the google remove feature automatic, once I have the code, or does it go a manual review?
 

lonhot2000

Newbie
Joined
Sep 23, 2013
Messages
19
Reaction score
3
Having URLs in robots.txt will not remove them from the index, it will just prevent them from being crawled. So if your goal is to remove them from the index, using robots.txt is not good enough.

The URL remove in Google is automatic, you can remove individual pages, or entire directories. It typically takes a few hours.
 
Top