1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

XML Sitemap for 60K+ pages

Discussion in 'White Hat SEO' started by sirron, Apr 15, 2010.

  1. sirron

    sirron Newbie

    Joined:
    Apr 5, 2010
    Messages:
    2
    Likes Received:
    0
    I have a site with 60K+ pages and am looking to EXclude URLs from my sitemap, or prevent Google indexing, if the URLs contain certain keywords. Using disallow on robots.txt doesn't work because the URLs I'm looking to exclude are not in a particular directory. Any thoughts much appreciated.
     
  2. littleg2008

    littleg2008 Senior Member

    Joined:
    Dec 3, 2009
    Messages:
    861
    Likes Received:
    421
    Location:
    Cambridgeshire, UK
    your robots.txt file should be enough to limit any URL within your domain.

    you dont have to just use disallow either. Have a look in some code forums for help on it and advise
     
  3. radi2k

    radi2k Junior Member

    Joined:
    Nov 29, 2009
    Messages:
    117
    Likes Received:
    34
    Location:
    Germany
    use an .htaccess file and add basic authentication to sites you want to protect. thats simple and works for every visitor (bots and normal users). you can use mod_rewrite rules for that. setup is simple and straight forward.
     
    Last edited: Apr 15, 2010
  4. sirron

    sirron Newbie

    Joined:
    Apr 5, 2010
    Messages:
    2
    Likes Received:
    0
    thanks for the help here. I'm likely going to take another route which is to dynamically insert the Noindex tags. Going to use something like this to add noindex tags on pages that matched the criteria for being excluded (in PHP)

    if (preg_match($regex, $_SERVER['REQUEST_URI']))
    {
    header('X-Robots-Tag: noindex, nofollow, noarchive');
    }