1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Google ignores Robots.txt

Discussion in 'Black Hat SEO' started by mindmaster, Jun 27, 2014.

  1. mindmaster

    mindmaster Jr. VIP Jr. VIP Premium Member

    Joined:
    Sep 16, 2010
    Messages:
    2,501
    Likes Received:
    1,135
    Location:
    at my new office
    I just saw 2 domains that uses robots.txt (disallow: / ) are reindexed by google.


    A week ago google had deindexed this 2 domains after I implemented robots.txt. Today they are indexed again.


    Is there any solution to keep the domains not indexed by google?
     
  2. Scripteen

    Scripteen Elite Member

    Joined:
    Sep 19, 2009
    Messages:
    1,811
    Likes Received:
    1,918
    Home Page:
    Block google bot through htaccess

    Code:
    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider) [NC]
    RewriteRule .* - [R=403,L]
    
    Sources

    http://stackoverflow.com/questions/10735766/block-all-bots-crawlers-spiders-for-a-special-directory-with-htaccess

    https://www.google.com.eg/search?num=100&espv=2&q=htaccess+block+googlebot&oq=htaccess+block+googlebot
     
    • Thanks Thanks x 2
  3. Tanckom

    Tanckom Power Member

    Joined:
    May 4, 2014
    Messages:
    570
    Likes Received:
    172
    Location:
    ☯ Karma ☯
    Home Page:
    Wordpress? disable robots.txt with seo plugins
     
  4. thesam

    thesam Jr. VIP Jr. VIP Premium Member

    Joined:
    Aug 13, 2013
    Messages:
    458
    Likes Received:
    61
    Maybe domain is getting some strong backlinks, in that case google index domain in its search results and show description as its blocked by robots.txt

    You can also send removal request via webmaster tools
     
    • Thanks Thanks x 2
  5. Tealover

    Tealover Junior Member

    Joined:
    Aug 5, 2013
    Messages:
    181
    Likes Received:
    65
    You may also try to use robots meta tags.
    Something like <meta name="robots" content="noindex" /> or <meta name="googlebot" content="noindex"> in template.
     
    • Thanks Thanks x 1
  6. netgeek

    netgeek Registered Member

    Joined:
    Jan 5, 2011
    Messages:
    52
    Likes Received:
    15
    Location:
    With Central Banks !!
    Strange and i wonder why it is still indexing, how ever if you are using wordpress ,then you can go to dashboard-settings-reading and here you have check where it says " Discourage search engines from indexing this site " and save changes .

    Hope this helps.
     
  7. roodie

    roodie Newbie

    Joined:
    Jun 8, 2012
    Messages:
    3
    Likes Received:
    1
    Occupation:
    SEO Analyst
    Location:
    mumbai, maharashtra, india
    Home Page:
    You can use <meta name="robots" content="noindex,nofollow">. This is the best to de-index site from Google indexeing.

    Robots.txt only prevent the bots from crawling the webpages, url can be indexed in Google's database.
     
    • Thanks Thanks x 1