How do you reduce the number of duplicate url's when harvesting with scrapebox

Discussion in 'Black Hat SEO' started by nik-0, Feb 26, 2012.

  1. nik-0

    nik-0 BANNED BANNED

    Joined:
    Jan 19, 2012
    Messages:
    510
    Likes Received:
    96
    I always get tons of duplicate url's with scrapebox.

    I use product related keywords and it go's like this (I use HMA VPN for scraping):

    1st run 45k urls - after removing duplicates 15k uniques
    2nd run 45k urls - after removing duplicates 22k uniques in total, +7k
    3rd run 45k urls - after removing duplicates 27k uniques in total, +5k

    And it gets less with each new run. Ofcourse I use different keywords each run. Any tips how I can get more unique domains?

    My list contains about 800 1 and 2 word keywords. So I can do about 8 runs with it, 100 keywords before I burn through my iP.
     
    Last edited: Feb 26, 2012
  2. download

    download Supreme Member

    Joined:
    May 4, 2010
    Messages:
    1,268
    Likes Received:
    712
    Location:
    USA
    Use more diverse keywords and footprints - if you're searching for similar keywords and looking down the first 50 pages for each it's inevitable that you're going to get tons of duplicates.
     
  3. nik-0

    nik-0 BANNED BANNED

    Joined:
    Jan 19, 2012
    Messages:
    510
    Likes Received:
    96
    Do you think it would help to reduce the number of pages that I look through or won't that make much of a difference? I go deep on purpose cause most of the huge sites fill lot of first 5 pages of results.

    The product keywords aren't very much related though, you could call it the 800 main product categories. I've scraped a national marketplace for 2nd hand and new stuff, similiar to eBay. (manually scraped btw :) It was part of kw research back then, to get fresh new ideas
     
    Last edited: Feb 26, 2012