What is the secret to getting a low number of dupe results from Scrapebox harvests? When I scrape for URLs I'm getting 60+% dupes (often more), which means wasted time and burned proxies. I've tried using bigger keyword lists and setting pages to scrape down from 1000 to 100, but I'm still getting a high percentage of dupe URLs. I've also made an awk script that chomps up lines of my keyword list so that each and every line is full of unique words (every single word and phrase in the file is 100% unique). Sadly, this method drastically cuts down my list and also doesn't really seem to reduce the dupe URL results much. Furthermore, I've made a script to only keep long tail keywords (3+ words). Why? Well I wondered if less generic phrases would stop the same old popular and authorative sites popping up in the harvests and creating so many dupes. Alas, even this doesn't reduce the amount. I don't know.... perhaps I've just been unlucky and need to try again. Or is this just an inherent problem of scraping search engines that can't really be remedied to any noticeable extent? If that's the case, I'd rather just concentrate on building massive keyword lists. One method I've developed for getting massive lists is creating a seed file and Google suggest scraping from it. I then use an awk script to look for the seed file keywords in the resulting file and replace them with blanks, so all that is left over is the unique suggest words that were generated. This gives me pretty unique lists every time, but I wonder if having such fragmented and unnatural phrases such as this will produce more spammy results in the harvests (I'm looking for quality, high PR domains)? Should I just stick to natural language phrases (whole phrases people are actually searching for)? Thanks for reading this long and confusing post.