1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

In Google's index despite NOINDEX directive and robots.txt

Discussion in 'BlackHat Lounge' started by Unreliable Witness, Dec 20, 2016.

  1. Unreliable Witness

    Unreliable Witness Regular Member

    Joined:
    Apr 21, 2016
    Messages:
    320
    Likes Received:
    174
    Can anyone solve this mystery?

    I work on a site that has lots of good content, and also a directory. The directory pages could be classed as thin content, so they have had both a noindex directive placed on them, and been noindexed via robots.txt. Those have been in place for 18 months.

    The html used on the pages is:

    <!DOCTYPE html>
    <html lang="en-GB">
    <head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=Edge" />
    <meta name="robots" content="noindex, follow" />

    The entry in robots.txt is:

    User-agent: *
    Noindex: /subdirectory-name/

    where subdirectory-name is the name of the subdirectory where all these directory pages can be found.

    Three months ago, Search Console (Webmaster Tools) didn't show them in the Index Status (just the count of the correct number of pages), and they weren't returned in search results.

    So the noindex directive seemed to work.

    On the 6th November, Total Index in Search Console rocketed. It now shows the number of pages that should be indexed, plus all those that shouldn't.

    If I do a "inurl:sitename.com site:sitename.com" search, the pages that shouldn't be indexed are returned (but only some of them). They aren't looked on well by Google because they can only be seen if you opt to look at excluded results.

    Robots.txt hasn't changed. The on-page directives haven't changed.

    So why is Google suddenly ignoring the noindex directive?

    Has anyone else had similar issues?
     
  2. Sherbert Hoover

    Sherbert Hoover Jr. Executive VIP Jr. VIP

    Joined:
    Dec 26, 2010
    Messages:
    1,295
    Likes Received:
    10,822
  3. tb303

    tb303 Senior Member

    Joined:
    Dec 18, 2011
    Messages:
    850
    Likes Received:
    539
    have you tried removing the urls/directory using webmaster tools?
    https://support.google.com/webmasters/answer/1663419?hl=en

    i use this to block google indexing and its always worked for me.
    <META NAME="ROBOTS" CONTENT="NOINDEX,FOLLOW">
    <META NAME="GOOGLEBOT" CONTENT="NOARCHIVE">
    so yours should be fine.

    nofollow in robots.txt is a new one on me so cant help you with that - ive only used it for disallow.