1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How do you reduce the number of duplicate url's when harvesting with scrapebox

Discussion in 'Black Hat SEO' started by nik-0, Feb 26, 2012.

  1. nik-0

    nik-0 BANNED BANNED

    Joined:
    Jan 19, 2012
    Messages:
    510
    Likes Received:
    96
    I always get tons of duplicate url's with scrapebox.

    I use product related keywords and it go's like this (I use HMA VPN for scraping):

    1st run 45k urls - after removing duplicates 15k uniques
    2nd run 45k urls - after removing duplicates 22k uniques in total, +7k
    3rd run 45k urls - after removing duplicates 27k uniques in total, +5k

    And it gets less with each new run. Ofcourse I use different keywords each run. Any tips how I can get more unique domains?

    My list contains about 800 1 and 2 word keywords. So I can do about 8 runs with it, 100 keywords before I burn through my iP.
     
    Last edited: Feb 26, 2012
  2. download

    download Jr. VIP Jr. VIP Premium Member

    Joined:
    May 4, 2010
    Messages:
    1,271
    Likes Received:
    712
    Location:
    USA
    Use more diverse keywords and footprints - if you're searching for similar keywords and looking down the first 50 pages for each it's inevitable that you're going to get tons of duplicates.
     
  3. nik-0

    nik-0 BANNED BANNED

    Joined:
    Jan 19, 2012
    Messages:
    510
    Likes Received:
    96
    Do you think it would help to reduce the number of pages that I look through or won't that make much of a difference? I go deep on purpose cause most of the huge sites fill lot of first 5 pages of results.

    The product keywords aren't very much related though, you could call it the 800 main product categories. I've scraped a national marketplace for 2nd hand and new stuff, similiar to eBay. (manually scraped btw :) It was part of kw research back then, to get fresh new ideas
     
    Last edited: Feb 26, 2012