1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Scrapebox 90% Duplicate Urls / Domains

Discussion in 'Black Hat SEO' started by baedorf, Jan 6, 2013.

  1. baedorf

    baedorf Registered Member

    Joined:
    Mar 6, 2011
    Messages:
    78
    Likes Received:
    2
    Hi,
    today i had harvesting for 6 hours.
    Result 2 Million Urls.
    But After removing duplicate Urls and Domains i got only 30K Results.
    How can i keep if i harvest 2 Million Urls 50%?
    everyone is telling keywords are Key.
    but i dont know how i should do it otherwise.
    I have used a dictionaryunique wordlist without unique Results.
    thanks for all help! keep scrapebox Junkie!
     
  2. pokerjk

    pokerjk Senior Member

    Joined:
    Dec 26, 2010
    Messages:
    1,167
    Likes Received:
    384
    Occupation:
    Online Marketer
    Location:
    England
    How deep into Google are you scraping? If your going max e.g. 9999 it will show lots of dupes so try less if that's the case.
     
  3. baedorf

    baedorf Registered Member

    Joined:
    Mar 6, 2011
    Messages:
    78
    Likes Received:
    2
    Hi,
    where can i change this setting?
    Results i have set to 1000.
    I thought google and Yahoo could only scrape First 100 Results?
     
  4. themidiman

    themidiman Power Member

    Joined:
    Feb 25, 2011
    Messages:
    701
    Likes Received:
    1,536
    Location:
    root@pts/0
    Also, you need to use more specific keywords. General keywords that are similar are going to turn up the same results.
     
  5. frazgta

    frazgta Power Member

    Joined:
    Jan 24, 2011
    Messages:
    574
    Likes Received:
    381
    I'd say thats normal. You can only scrape first 1000 results. Out of those many expect that many duplicates.
     
  6. the_demon

    the_demon Jr. Executive VIP

    Joined:
    Nov 23, 2008
    Messages:
    3,177
    Likes Received:
    1,563
    Occupation:
    Search Engine Marketing
    Location:
    The Internet
    You need to use bigger word and phrase dictionaries in your scraping. It can often take days or even weeks depending on your server speed to build massive lists. 100K - 1MM unique urls in size.
     
  7. pokerjk

    pokerjk Senior Member

    Joined:
    Dec 26, 2010
    Messages:
    1,167
    Likes Received:
    384
    Occupation:
    Online Marketer
    Location:
    England
    Personally I'd change it to top 100.

     
  8. SEO_Alchemy

    SEO_Alchemy Senior Member

    Joined:
    Sep 8, 2012
    Messages:
    1,134
    Likes Received:
    1,213
    Location:
    USA
    Exactly.... welcome to the wonderful world of Scraping. Use many many specific keywords, mash them up with other keywords..... wait for days, then watch 75% go up in smoke as dupes.
     
  9. dog-tag

    dog-tag Senior Member

    Joined:
    Oct 19, 2010
    Messages:
    811
    Likes Received:
    912
    Occupation:
    Full-Time Internet Marketer + Business Consultant
    Location:
    Thailand
    wide range of keywords both long and short with a big footprint.
    Your always gona have alot of dupes, part of the game dude.
     
  10. Cindy

    Cindy Power Member

    Joined:
    Apr 7, 2008
    Messages:
    632
    Likes Received:
    88
    This is even more true now that google shows pages of the same sites over and over again if you manually search a term and click on pages 2, 3, 4, etc.

    I'm no sb expert, but I'm good, and yet I've had a great comment on a high pr edu page removed because sb found that same site for a completely unrelated search on a different project. sb seems to be almost useless for this now, at least for scraping google, but the other engines don't allow as many specific parameters that you used to be able to use in google.