What to robots.txt in wordpress

Discussion in 'White Hat SEO' started by Blueprint, Jan 6, 2010.

  1. Blueprint

    Blueprint BANNED BANNED Jr. VIP Premium Member

    Joined:
    Nov 10, 2009
    Messages:
    286
    Likes Received:
    117
    This is a bit of a noob question and I think I know the answer. Maybe the Christmas break mangled my mind lol

    I've suddenly thought of a potential duplicate content issue I may have on one of my money pages - I have a site indexed in Google and it's a wordpress blog.

    The home page (index) is a "page" in wordpress, however, from the settings the home page is set to default for example the home page is the one that displays at the root:

    http://www.example.com

    Should I enter the http://www.example.com/home/ into the robots.txt to avoid:

    http://www.example.com/home/ being indexed?
     
  2. Blueprint

    Blueprint BANNED BANNED Jr. VIP Premium Member

    Joined:
    Nov 10, 2009
    Messages:
    286
    Likes Received:
    117
    Anyone on this? I know it's not blackhat, but it IS SEO and it's in the whitehat thread?
     
  3. Blueprint

    Blueprint BANNED BANNED Jr. VIP Premium Member

    Joined:
    Nov 10, 2009
    Messages:
    286
    Likes Received:
    117
    Still no one on this? Fascinating...
     
  4. gregstereo

    gregstereo Elite Member

    Joined:
    Oct 5, 2009
    Messages:
    1,834
    Likes Received:
    1,028
    Occupation:
    I'm known to locate certain things from time to ti
    Location:
    Moose Factory, ON
    \plugins\
     
  5. bobandcolh

    bobandcolh Newbie

    Joined:
    Nov 16, 2008
    Messages:
    26
    Likes Received:
    1
    yea maybe, you just tipe file/folder
     
  6. Blueprint

    Blueprint BANNED BANNED Jr. VIP Premium Member

    Joined:
    Nov 10, 2009
    Messages:
    286
    Likes Received:
    117
    i'm not sure what you guys mean, so... you're saying I need to block the plugins folder... okay... what else... i'm asking about the /home/ which is also forwarded to be the index.php page...
     
  7. gregstereo

    gregstereo Elite Member

    Joined:
    Oct 5, 2009
    Messages:
    1,834
    Likes Received:
    1,028
    Occupation:
    I'm known to locate certain things from time to ti
    Location:
    Moose Factory, ON
    I don't know where to start.

    There's been threads about this before, try searching for robots.txt.

    Good luck.
     
  8. porkchop

    porkchop Newbie

    Joined:
    Dec 31, 2009
    Messages:
    3
    Likes Received:
    2
    Occupation:
    I make money online
    Location:
    World Travler
    User-Agent: *
    Allow: /
    Disallow: /secret pages
    Disallow: /out/ hop links to aff sites
    Disallow: /wp-login.php
    Disallow: /wp-register.php
    Disallow: /wp-admin/
    Disallow: /wp-content/
    Disallow: /2009/
    Disallow: /2010/
    Disallow: /2011/
    Disallow: /2012/
    Disallow: /2013/
    Disallow: /2014/
    Disallow: /?s= search url
    Disallow: /privacy-policy/
    Disallow: /tag/
    Disallow: /downloads/
    NoIndex: /secret pages
    NoIndex: /out/
    Noindex: /wp-login.php
    Noindex: /wp-register.php
    Noindex: /wp-admin/
    Noindex: /wp-content/
    Noindex: /2009/
    Noindex: /2010/
    Noindex: /2011/
    Noindex: /2012/
    Noindex: /2013/
    Noindex: /2014/
    Noindex: /?s=
    Noindex: /privacy-policy/
    Noindex: /tag/
    Noindex: /downloads/

    sitemap: http://www.domain-name.com/sitemap.xml

    Some people also think you should block images, but I can go either way.
     
  9. qxygene

    qxygene Newbie

    Joined:
    Mar 27, 2008
    Messages:
    18
    Likes Received:
    4
    # PARTIAL access (Googlebot)
    User-agent: Googlebot
    Disallow: /*?
    Disallow: /*.php$
    Disallow: /wp-admin/
    Disallow: /comment-page/*
    Disallow: /*/trackback/$
    Disallow: /?p=$*
     
  10. stock

    stock Registered Member

    Joined:
    Sep 4, 2009
    Messages:
    87
    Likes Received:
    10
    Occupation:
    trader
    Location:
    NY
    Sitemap: /sitemap.xml
    User-Agent: *
    # disallow all files in these directories
    Disallow: /dh_
    Disallow: /cgi-bin/
    Disallow: /blog/wp-admin/
    Disallow: /blog/wp-includes/
    Disallow: /blog/wp-content/plugins/
    Disallow: /blog/about
    Disallow: /blog/contact
    Disallow: /blog/tag/
    Disallow: /feed/
    Disallow: /trackback/
    Disallow: */feed$
    Disallow: */trackback$
    Disallow: /*comment-*
    User-Agent: Googlebot
    # disallow all files ending with these extensions
    Disallow: /*.php$
    Disallow: /*.js$
    Disallow: /*.inc$
    Disallow: /*.css$
    Disallow: /*.gz$
    Disallow: /*.wmv$
    Disallow: /*.cgi$
    Disallow: /*.xhtml$
    # disallow all files with ? in url
    Disallow: /*?*
    # disable duggmirror
    User-Agent: duggmirror
    Disallow: /
    # allow google image bot to search all images
    User-Agent: Googlebot-Image
    Disallow:
    Allow: /*
    # allow adsense bot on entire site
    User-Agent: Mediapartners-Google*
    Disallow:
    Allow: /*