1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Getting scraped, how to stop it?

Discussion in 'Black Hat SEO' started by bonao, Jan 17, 2011.

  1. bonao

    bonao Newbie

    Joined:
    Nov 19, 2010
    Messages:
    36
    Likes Received:
    9
    Hi Guys:
    I have a website that keeps getting scraped by competitor and I can't seem to stop it. Here is why.... All of the visitors appear to be legit (I know they are not, some show a referrer, some don't. Many from different proxies (I know they are proxies) from comcast, Verizon, quest, windstream etc. The user agent is spoofed and appears to be normal. They get a few pages and then come back in through another IP. Since there is no single identifying factor in the user-agent or a common IP, etc. How can I prevent others from scraping my hard work? I use htaccess / php site.
     
  2. BHopkins

    BHopkins Moderator Staff Member Moderator Jr. VIP

    Joined:
    Dec 31, 2010
    Messages:
    2,311
    Likes Received:
    1,387
    Gender:
    Male
    Occupation:
    ORM and SEO company owner
    Location:
    California
    Home Page:
    Start dropping embedded links to your deep pages.
     
    • Thanks Thanks x 1
  3. deface

    deface Registered Member

    Joined:
    Jul 23, 2010
    Messages:
    83
    Likes Received:
    362
    IMHO.u cant stop it.
     
  4. dummydecoy

    dummydecoy Junior Member

    Joined:
    Jul 4, 2010
    Messages:
    154
    Likes Received:
    39
    change your html structure, class name , etc
    make it really hard for them to change their scraper engine
     
    • Thanks Thanks x 1
  5. ivictus

    ivictus Regular Member

    Joined:
    Jan 26, 2010
    Messages:
    223
    Likes Received:
    31
    File a dcma with their hosting company. Changing the site structure, like your html tags, as mentioned above is good idea too. If you ever tried to write a scrapper you would see usually it is based on div tags, etc to find the meat.

    You could also try to set up a honey pot to catch them.
     
    • Thanks Thanks x 1
  6. mataff

    mataff Junior Member

    Joined:
    Sep 21, 2008
    Messages:
    139
    Likes Received:
    54
    Very difficult.

    Even rearranging html code can only go so far and only works if your using a custom built website. In article scrapers I've written I remove all of the junk code surrounding the text and deal with what's left. Regardless of how the code itself is changed language structure can't (Ex: Sentences can only begin and end a certain amount of ways: ! . " '). A second check against a dictionary and an unwanted list helps (but isn't perfect).

    In the grand scheme of things its not worth your time and effort. They could easily hire someone overseas on the cheap to watch your page for any changes. Just look at what they're paying for captcha cracking.
     
    • Thanks Thanks x 1
  7. masterwaldo

    masterwaldo Registered Member

    Joined:
    Jul 3, 2008
    Messages:
    95
    Likes Received:
    49
    I build my own scraper. Usually I'll refer to some common items such as the html tag. You can change these, but then if he found his scraper is break, then he just modified his scraper accordingly.

    There is no way to avoid that unless you are willing to change your html tag frequently and making him tired of updating his scraper.

    Dropping embedded link is also not effective since it just requires a few line of code to remove it from the scrapped content.

    Probably you can check the access of your server to block anyone who access it too fast. But still, the scraper can also slow down his access. :)
     
    • Thanks Thanks x 1
  8. bonao

    bonao Newbie

    Joined:
    Nov 19, 2010
    Messages:
    36
    Likes Received:
    9
    Thanks Guys. How Is filing a DCMA effective?
     
  9. zachtan

    zachtan Newbie

    Joined:
    Oct 11, 2010
    Messages:
    17
    Likes Received:
    4
    Make your site html tags dynamic. Add random numbers to your table names, classes etc.
     
    • Thanks Thanks x 1
  10. Bross

    Bross Senior Member

    Joined:
    Feb 6, 2010
    Messages:
    859
    Likes Received:
    355
    It's a war you can't win.
    That's the internet life. I had a website copied as a whole.. Shit happens and it's a bummer but there is nothing to do about it.
     
    • Thanks Thanks x 1