1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

[Urgent Help] Google showing my sitemaps in SERP. How do I stop it?

Discussion in 'White Hat SEO' started by puneetas3, Oct 21, 2012.

  1. puneetas3

    puneetas3 Senior Member

    Joined:
    Jan 8, 2012
    Messages:
    876
    Likes Received:
    384
    I have a WP site with Yoast SEO plugin which produces a sitemap at domain.com/sitemap_index.xml (with 3 inner sitemaps - post-sitemap.xml, page-sitemap.xml and post_tag-sitemap.xml).

    The problem is Google has indexed all these sitemaps and showing them in SERP. These also shows up in my site bigger inline results(don't know what term is there for it). I tried finding a solution and came to know that I should add a nofollow X-Robots tag to the header of these xml sitemaps. I did the same through htaccess. But I am also using W3 Total cache plugin and the X-Robots tag doesn't appear in the header of cached xml sitemaps (checked through firebug). Although when I visit the site as logged in user (as admin - caching is disable in W3 total cache) the header appears for the sitemaps.

    I can't deactivate html/xml caching from W3 total cache as this site is pretty heavy and needs pages to be parsed already before serving. What is a consistent solution for it? I read somewhere and added this in robots.txt file (don't know if it is even valid)-
    Code:
    user-agent: googlebot noindex: /sitemap_index.xml noindex: /post-sitemap.xml noindex: /page-sitemap.xml noindex: /post_tag-sitemap.xml
    Please I need urgent help on this as it is affecting the CTR of my site. Thanks in advance.

    Edit-Problem solved. Mods can please close or delete this thread.
    Solved

    I solved the conflict in W3 Total cache and Yoast SEO. Don't add any tags like above in robots.txt. Add X-Robots-Tag for noindex in your root .htacess file. And then in your wp admin > W3 total cache > Page cache settings > Add all your sitemaps to 'Never cache the following pages' textarea and save. This will prevent your sitemaps to be cached and have the required headers. Do check the headers in firebug or chrome. Wait for 1-2 week for Google to update your sitelinks.

    Other thing is if you want to have page cache enabled for sitemaps, then you will need to add the x-robots tag in the htaccess file in the page cache directory of W3 Total instead of your main .htaccess. But i believe that the w3t htaccess is dynamically created and deleted at the time of page caching and purge. So your edit may not live there permanently (I am not sure about it, someone with more knowledge on it can throw some light).
     
    Last edited: Oct 21, 2012
  2. soothsayerpg

    soothsayerpg Power Member

    Joined:
    Feb 23, 2011
    Messages:
    584
    Likes Received:
    225
    Home Page:
    You can just try to do "Disallow: /sitemap_index.xml" no need for noindex etc. etc. you might also have this problem with the other SE.

    Just use this in your robots.txt (Use "*" for all the bots just in case it haven't been index by other SEs)

    At the same time it has already been indexed. you just have to submit a request from Google to remove it from index.
     
    • Thanks Thanks x 1
  3. puneetas3

    puneetas3 Senior Member

    Joined:
    Jan 8, 2012
    Messages:
    876
    Likes Received:
    384
    Thanks, but doesn't Disallow means not to crawl that page (a protected page). But we want a sitemap to be public and crawlable. Am I missing some basic info about it?
     
  4. ritesh

    ritesh Senior Member

    Joined:
    Oct 26, 2009
    Messages:
    1,046
    Likes Received:
    443
    htaccess can also do the trick.