1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Robots.txt help

Discussion in 'White Hat SEO' started by pctank, Oct 22, 2013.

  1. pctank

    pctank Newbie

    Joined:
    Jun 1, 2011
    Messages:
    10
    Likes Received:
    0
    Hi

    I need help with this, currently my Robots.txt file is setup in this order as i dont want my website to be crawled in other directories other than the important ones?

    User-agent: *
    Disallow:
    Disallow: /cgi-bin/
    Disallow: /Admin/
    Disallow: /images/
    Disallow: /Account/
    Disallow: /bin/
    Disallow: /ckeditor/
    Disallow: /Controls/
    Disallow: /Css/
    Disallow: /Emailing/
    Disallow: /feeds/
    Disallow: /js/
    Disallow: /obj/
    Disallow: /Parent/
    Disallow: /slider/
    Disallow: /*.PDF$
    Disallow: /*.jpeg$
    Disallow: /*.exe$
    Disallow: /asp/

    My website tree structure looks like this:
    ID.jpg

    Can you help?
     
  2. bluehatface

    bluehatface Regular Member

    Joined:
    Oct 19, 2013
    Messages:
    232
    Likes Received:
    98
    Location:
    Here
    I don't understand the question. That's the way a robots file works. Do you want to know if you've missed anything?
     
  3. anil190185

    anil190185 Junior Member

    Joined:
    Jan 9, 2011
    Messages:
    111
    Likes Received:
    44
    Occupation:
    SEO
    Location:
    India
    You should remove the first "Disallow:" line as it allows the crawler access to all directories and files.
     
  4. dazk2002

    dazk2002 Power Member

    Joined:
    Oct 23, 2012
    Messages:
    706
    Likes Received:
    220
    Location:
    Here and There
    I always make my images "Allowed" because the search engines will index the images and if you've used your image alt tags correctly by using your target KW, you can gain quite a bit of extra traffic from them.
     
    • Thanks Thanks x 1
  5. pctank

    pctank Newbie

    Joined:
    Jun 1, 2011
    Messages:
    10
    Likes Received:
    0
    I dont want the bots to crawl all my content which is unnecessary or useless info! As you can see from the image i uploaded a screenshot of my website's FTP. Obviously there is a default.asp ......! I want to know in the image of my FTP which ones should i block from Google, Yahoo and Bing Crawlers for my website still to be indexed correctly?

    Thx for the current comments!!!
     
  6. bradsteves

    bradsteves Registered Member

    Joined:
    Nov 23, 2009
    Messages:
    79
    Likes Received:
    12
    Why do you want to block those pages from the crawlers? It seems more likely you harm yourself here.
     
  7. WebMeUp

    WebMeUp Regular Member

    Joined:
    Aug 8, 2012
    Messages:
    204
    Likes Received:
    51
    Home Page:
    From this image it's impossible to say which areas of your site should be restricted from indexing by search engines.

    The point is that, there must be some reasons why you need to restrict certain pages of your site from search engine bots. These reasons can be:

    - privacy: you may want to hide pages that have your clients' info, info about your company alpha products, etc;
    - duplicate content: pages that have content that can be found somewhere else, should be hidden from indexation;
    - little of no value to site visitors: pages that are not useful to your site visitors ('thank you' pages, pages under development, etc.) should not be accessible to search engine crawlers.

    So, if you have the reasons listed above, you need to hide the corresponding pages of your site from search engine bots.

    If you don't have any reason to hide your site pages, let search engines index them. To improve the process of indexation:

    - generate an .xml sitemap: it will help search engines better crawl your site and its pages;
    - check if your site pages are properly interlinked.

    Hope, I have answered your question.
     
    • Thanks Thanks x 1
  8. pctank

    pctank Newbie

    Joined:
    Jun 1, 2011
    Messages:
    10
    Likes Received:
    0
    OK Thanks for that guys!!
    This is my Google issue as this has been generated by Google ERROR!
    Google NOT FOLLOWED ERRORS.jpg Google NOT FOLLOWED ERRORS2.jpg
    Bing:
    Bing HTTP 302 code ERRORS.jpg

    Very frustrating!!!
     
    Last edited: Oct 24, 2013
  9. CredibleZephyre

    CredibleZephyre Registered Member

    Joined:
    Jun 10, 2013
    Messages:
    95
    Likes Received:
    27
    Those are 302 errors in the images which can't be fixed by your robots.txt. You're going to want the .htaccess file for those and change them to 301 redirects (don't just change the 2 to a 1, look up how to write them, it's easy.)

    Are those errors the original problem of this thread? I'm not quite sure what problem you're trying to fix here. the GWMT errors or your robots.txt?
     
    • Thanks Thanks x 1
  10. pctank

    pctank Newbie

    Joined:
    Jun 1, 2011
    Messages:
    10
    Likes Received:
    0
    "Are those errors the original problem of this thread? I'm not quite sure what problem you're trying to fix here. the GWMT errors or your robots.txt?"

    Hi, thank you!!! Well both was my issue but the 302 error is more important! I will look into this as this is more of an issue!
     
  11. ContentLockPro

    ContentLockPro Power Member Premium Member

    Joined:
    Nov 7, 2012
    Messages:
    723
    Likes Received:
    129
    You better use .haccess for permissions/restrictions. It has way more power then robots.txt
     
    • Thanks Thanks x 1
  12. pctank

    pctank Newbie

    Joined:
    Jun 1, 2011
    Messages:
    10
    Likes Received:
    0
    Thank you ContentLockPro! The advise highly appreciated!!!