1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Can I copy content to my site which robot.txt disallow

Discussion in 'Black Hat SEO' started by SEEmarket, Aug 8, 2013.

Tags:
  1. SEEmarket

    SEEmarket Junior Member

    Joined:
    Jul 18, 2013
    Messages:
    105
    Likes Received:
    4
    I find some site use robot.txt to block google search, then , an idea came up.

    I just copy the content to my site and that is the original ones.

    but , I search the content in google and the original site still got #1 in google.

    so,The robots.txt do not works?
     
    Last edited: Aug 9, 2013
  2. SEEmarket

    SEEmarket Junior Member

    Joined:
    Jul 18, 2013
    Messages:
    105
    Likes Received:
    4
    someone there?
     
  3. HerpDerpSlerp

    HerpDerpSlerp Power Member

    Joined:
    Mar 19, 2013
    Messages:
    778
    Likes Received:
    623
    so you are asking if you find a site and check their robots file and they block search engines (google for example) on a specific page or entire website and you take that sites content, place it on your own site and allow google to crawl it will your site get indexed and you will not get nailed with "duplicate content" penalty?

    Yes, you can do this.
     
  4. SEEmarket

    SEEmarket Junior Member

    Joined:
    Jul 18, 2013
    Messages:
    105
    Likes Received:
    4
    I am testing this now
    BUT I find google still rank the content/post which the site is block google.
     
  5. dinkish

    dinkish Power Member

    Joined:
    Apr 19, 2013
    Messages:
    689
    Likes Received:
    159
    Perhaps they updated the robots.txt to disallow after the page was already crawled?

    If you're stating that you did this, and that's not the case, then correct me.
     
  6. SEEmarket

    SEEmarket Junior Member

    Joined:
    Jul 18, 2013
    Messages:
    105
    Likes Received:
    4
    nope
    Google index all latest post after robots.txt set
     
  7. E=MC²

    E=MC² Junior Member

    Joined:
    Apr 12, 2013
    Messages:
    176
    Likes Received:
    183
    Well, I think you don't understand how robots.txt works.
    It's used to block SE bots from crawling specific pages you don't want indexed, not the entire site.
     
  8. dinkish

    dinkish Power Member

    Joined:
    Apr 19, 2013
    Messages:
    689
    Likes Received:
    159
    Ok, so you're saying you did this, and they crawled it anyways.

    Interesting. Not surprising a bot didn't obey a robots.txt file, but I've never experienced this.
     
  9. bartosimpsonio

    bartosimpsonio Jr. VIP Jr. VIP Premium Member

    Joined:
    Mar 21, 2013
    Messages:
    8,883
    Likes Received:
    7,481
    Occupation:
    ZLinky2Buy SEO Services
    Location:
    ⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩
    Home Page:
    Google sometimes bypasses the robots.txt because it's taking "thumbnails" of pages, or crawling for the adsense data.....so lots of robots.txt blocked pages end up in their index. You may be successful to some extent, but eventually you'll hit lots of duplicate content collisions
     
  10. SEEmarket

    SEEmarket Junior Member

    Joined:
    Jul 18, 2013
    Messages:
    105
    Likes Received:
    4
    e.g.
    vogueknitting.com/robots.txt
     
  11. SEEmarket

    SEEmarket Junior Member

    Joined:
    Jul 18, 2013
    Messages:
    105
    Likes Received:
    4
    any one ever research about this?
     
  12. vishnudath

    vishnudath Newbie

    Joined:
    Jun 13, 2013
    Messages:
    19
    Likes Received:
    0

    if the originol site content is in google. then you should not copy content from it.

    you can deindex that domain. and then after a week you can copy the same content in your new website.
     
  13. SEEmarket

    SEEmarket Junior Member

    Joined:
    Jul 18, 2013
    Messages:
    105
    Likes Received:
    4
    I don't understand what u said
    The content always in google database
     
  14. vishnudath

    vishnudath Newbie

    Joined:
    Jun 13, 2013
    Messages:
    19
    Likes Received:
    0
    actually NO, ones you deindex the domain. every thing will be removed from google cache database. i have tried it personally :) is your original hosted in wordpress ?

    if yes, then first take a back of all your posts. then delete every article from your site. and then after a week add this meta tag to your head section <meta name="robots" content="noindex,nofollow" />

    and this to your robots. txt

    HTML:
    User-agent: *
     Disallow: /
    and wait 1 more week. and then you can copy all those articles to your new blog.


    first week - take backup and delete all posts
    second week - add above meta tag and update robots.txt
    3rd week - start coping those articles to any blog you want :)
     
  15. vishnudath

    vishnudath Newbie

    Joined:
    Jun 13, 2013
    Messages:
    19
    Likes Received:
    0
    then only problem you may face is, there may be few copies who already copied content from your original blog.

    if you deindex the originol blog, then google MAY treat those copied sits as original

    ones you copied that content to your new blog, it will become as DUPLICATE CONTENT

    hope you got it..