1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Help with robots.txt and sitemap.xml

Discussion in 'White Hat SEO' started by Mydragonsfly, Feb 9, 2014.

Tags:
  1. Mydragonsfly

    Mydragonsfly Newbie

    Joined:
    Feb 7, 2013
    Messages:
    9
    Likes Received:
    1
    Occupation:
    High school
    Hi guys.

    For a while now I've been messing with google webmasters, and it always shows that my robots txt is blocking my sitemap. I've tried editing it, but it keeps going back to this default after a few hours.
    HTML:
     Sitemap: websitedotcom/sitemapdotxml
    User-agent: *
    Disallow: /
    I've also deleted the robots file from my directory and re-created it, and the same result happens.

    Also with my sitemap xml file, webmasters has been saying that there is an xml tag missing. The parent tag is urlset and the tag is url. This is just a small snippet of my xml file.
    HTML:
    <?xml version="1.0" encoding="UTF-8"?>
    <urlset
    <url>
      <loc>hyperlinkwebsitedotcom/indexdothtml</loc>
      <lastmod>2014-02-02T07:00:01+00:00</lastmod>
      <changefreq>weekly</changefreq>
      <priority>0.80</priority>
    </url>
    </urlset>
    It also comes with a heading which I have also tested by deleting a few times. So far nothing has changed.

    Any help would is much appreciated.
     
  2. kalasnikof1

    kalasnikof1 Registered Member

    Joined:
    Jul 5, 2013
    Messages:
    50
    Likes Received:
    15
    1.
    plugin mess up
    2.
    admin panel >> Settings >> Reading
    before the Save Changes button is a checkbox with the label Search Engine Visibility make sure this is checked.
     
  3. Mydragonsfly

    Mydragonsfly Newbie

    Joined:
    Feb 7, 2013
    Messages:
    9
    Likes Received:
    1
    Occupation:
    High school
    By "admin panel" I'm assuming you mean google analytics?

    If so I see just "account", "property" and "view" settings. Nothing that is just "settings" and nothing can be found to do with "Reading".

    Im not finding an "admin panel" in google webmasters, the only "settings" I see are "site settings" which dont seem to contain anything to do with "reading". Could you please be further exact?
     
  4. kvchosting

    kvchosting Jr. VIP Jr. VIP

    Joined:
    Aug 23, 2012
    Messages:
    296
    Likes Received:
    75
    Location:
    Oklahoma
    Home Page:
    Remove it completely from GWT then create your own robots.txt, this would be better therefore you will not confuse between GWT and robots.txt file
     
  5. zee007

    zee007 Senior Member

    Joined:
    Jun 25, 2012
    Messages:
    806
    Likes Received:
    95
    Location:
    Texas
    Is there a reason you are creating a robots.txt file? Unless you have pages you want to actually block, there is no need to create a robots.txt file in the first place! An XML sitemap on the other hand is very valuable. You should create an XML sitemap and upload it to the root folder of your website. Then submit it via WMT.
     
  6. evilclown

    evilclown Senior Member

    Joined:
    Jul 31, 2012
    Messages:
    805
    Likes Received:
    575
    Occupation:
    Party Clown
    Location:
    Clownville
    Your sitemap is incomplete, here use this:

    Code:
    <?xml version="1.0" encoding="UTF-8"?>
    <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>stuff goes her</url>
    
    </urlset>
    
    If you are not blocking pages, delete your robot.txt file.
     
    • Thanks Thanks x 1
  7. GloCk99

    GloCk99 Regular Member

    Joined:
    Mar 12, 2009
    Messages:
    368
    Likes Received:
    224
    Location:
    The BigSmoke
    The solution to this is actually fairly simple. There is a lag in what google webmaster tools shows as your robots.txt and what you actually have as a robots.txt setup. It can take up to 2 days for the correct one to show in webmaster tools.

    Set your robots.txt in your root directory sitename.com/robots.txt . Once you've confirmed it's correct by visiting the url, then just wait! Stop changing the robots.txt frequently as you'll keep hitting this lag. Leave it for a couple of days and go back in WMT and you'll see the correct one being displayed.

    Good luck.

    @kvchosting if you don't know what you're talking about, then don't try and give advice.
     
    • Thanks Thanks x 1
  8. CredibleZephyre

    CredibleZephyre Registered Member

    Joined:
    Jun 10, 2013
    Messages:
    95
    Likes Received:
    27
    I'm not sure he did this on purpose but in his first example he's disallowing his entire site. If that's what he is trying to do then yes, he would need a robots.txt file
     
  9. GloCk99

    GloCk99 Regular Member

    Joined:
    Mar 12, 2009
    Messages:
    368
    Likes Received:
    224
    Location:
    The BigSmoke

    That's not what he's using on his site. The example he's given is the WMT default. There are numerous reasons why someone would want to optimise a robots.txt file.
     
  10. Mydragonsfly

    Mydragonsfly Newbie

    Joined:
    Feb 7, 2013
    Messages:
    9
    Likes Received:
    1
    Occupation:
    High school
    Thanks evilclown, that got my site map up for me.

    When I originally purchased my domain, that's what the robots file looked like when I initially opened it the first time after publishing it. Im trying to modify it so that 2 subdomains I'm creating for more pages to add to my website (hosted by hostgator and created with weebly sitebuilder. Max page limit is 6, but with unlimited subdomains I can very easily work around this issue) get blocked. In the footer section I have my site name which links back to my original domain. I want those links blocked because I don't want search engines to see those as unnecessary links. This is just 1 subdomain that I'm trying to disallow, and my sitemap file that I'm trying to allow (but I don't think will be necessary anymore since my sitemap is seen in GWT), I can remove it if you guys think it is irrelevant. Also included, I've found a list of bad robots that either seem malicious, or have already been disallowed by many other websites.

    Code:
    User-agent: *
    Disallow: hyperlinksubdomainDOTsiteDOTcom/
    
    User-agent: *
    Allow: hyperlinkdomainDOTcom/sitemapDOTxml
    
    
    ###
    #Unsafe robots to keep away
    ###
    User-agent: Aqua_Products
    Disallow: /
    User-agent: asterias
    Disallow: /