1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Scrapebox removing like 90% of my scraped URL's?

Discussion in 'Black Hat SEO Tools' started by marcdk, May 23, 2011.

  1. marcdk

    marcdk BANNED BANNED

    Joined:
    Jan 8, 2009
    Messages:
    205
    Likes Received:
    57
    Yeah, so if i do a scrape for urls using a keyword list of my own, i can manage to get a list of like 200k sometimes.

    Then when it finishes it says something like: 87% of the URL's has been removed, maybe you used too similar keywords

    Wtf?

    I hope someone can help me out here :/ It's annoying when i want to blast 200-500k websites.

    Thanks goes to anyone who can help :)
     
  2. hybridmoment1904

    hybridmoment1904 Newbie

    Joined:
    May 13, 2011
    Messages:
    1
    Likes Received:
    1
    Perhaps change your blacklist?
     
    • Thanks Thanks x 1
  3. Cnotey

    Cnotey Power Member

    Joined:
    Jun 25, 2010
    Messages:
    707
    Likes Received:
    912
    Location:
    Seattle
    Home Page:
    It's because your keywords are too similar.
     
    • Thanks Thanks x 1
  4. xpleet

    xpleet Regular Member

    Joined:
    Jan 18, 2010
    Messages:
    377
    Likes Received:
    327
    Location:
    Morocco
    Don't use similar keywords. Try to get a genaral word list or a keywords list in different niches and you'll have less removed duplicates.
     
    • Thanks Thanks x 1
  5. marcdk

    marcdk BANNED BANNED

    Joined:
    Jan 8, 2009
    Messages:
    205
    Likes Received:
    57
    So if my keywords are too similar it scrapes the same website URL's over and over again, making duplicates? I don't get it. I tried with these:

    Seo
    Google
    Revenue
    money
    testimonial
    cash

    Why are they too similar? And about the blacklist, where do i find that? i got a new update today regarding that blacklist, what exactly does it do? I mean, obviously, it blocks some URL's from the program, but why?

    I know these questions probably are dumb as hell, but i'm asking anyways.
     
  6. marcdk

    marcdk BANNED BANNED

    Joined:
    Jan 8, 2009
    Messages:
    205
    Likes Received:
    57
    Bumping the thread again. Sorry, but i need to know this asap!
     
  7. Ramsweb

    Ramsweb Senior Member

    Joined:
    Mar 31, 2010
    Messages:
    1,121
    Likes Received:
    658
    Occupation:
    Internet Marketer - Self Employed
    Location:
    In front of my PC
    You are scraping the same URL's again and again because there is very good chance that all the 6 keywords you are using is available in all the URL's. You have to diversify your keywords and include a wide range to get better results

    Try to search for long tail keywords like
    Google ranking
    Google SERP
    Google dance
    organic traffic
    free google traffic
    etc. etc.

    Also, you are demanding way too many answers to questions that you will have to research on your own. This forum won't spoon feed you. Things like blacklist are very easy to understand to research on your own. Just spend some time on figuring things out. Don't be in a rush to get results. Do the scraping after research and thinking and you will really come up with good results.
     
    • Thanks Thanks x 1
  8. darshan1994

    darshan1994 BANNED BANNED

    Joined:
    Oct 9, 2009
    Messages:
    654
    Likes Received:
    318
    Just by chance do you have auto remove duplicate domain on? If so the harvester will remove duplicate Domains as soon as it finishes harvesting.
     
    • Thanks Thanks x 2
  9. Maruk

    Maruk Power Member

    Joined:
    Jun 15, 2009
    Messages:
    562
    Likes Received:
    898
    Home Page:
    Forget about the blacklist it is not the problem.
    The problem is that you are using to many similar keywords.
    Also, if you just 6 keywords you will get only a small amount of results thus increasing the change of dupe urls.

    Your seo,money,revenue,testimonial,cash are all keywords within the same niche and because it is only 5 keywords, you'll get maybe 5k results max.

    Also, I suspect you remove duplicate DOMAINS rather than URL's no?
    duplicate domains will even be more with such little amount of keywords.

    Bottom line is, you need more keywords, like thousands of them. Search for my keyword thread.
     
    • Thanks Thanks x 1
  10. nethead01

    nethead01 Regular Member

    Joined:
    Sep 21, 2009
    Messages:
    424
    Likes Received:
    229
    Seo
    Google
    Revenue
    money
    testimonial
    cash

    if someone were bloging about "seo" then they are probably also talking about google, money, you see where im headed here..

    if my keyword list was cars, trucks, nissan, ford, gas mileage then im going to get quite a bit of dups. it happends you can only scrape so many blog talking about these keywords.. like another member said get a better list and if this is your list then thats your problem..

    search around you can find general keyword lists.. i have been scraping for the past few days with a 200k keyword list and out of about 1mil scraped ill get 150k uniques it just how it works.. hope that helps

    so get a big list, let it scrape until you get about 1mil then remove dups you should end up with 100k minimum
     
    • Thanks Thanks x 1
  11. cyberzilla

    cyberzilla Elite Member Premium Member

    Joined:
    Nov 15, 2009
    Messages:
    2,204
    Likes Received:
    3,364
    Location:
    zeta reticuli
    This thread is fully loaded with keywords probably in millions! Use this ;) Regarding blacklist, read this thread

    Search bar is there for some reason ;)
     
    • Thanks Thanks x 1
  12. marcdk

    marcdk BANNED BANNED

    Joined:
    Jan 8, 2009
    Messages:
    205
    Likes Received:
    57
    Thanks a lot! That really helped me understand Scrapebox a whole lot better :)
     
  13. muchacho

    muchacho Supreme Member

    Joined:
    May 14, 2009
    Messages:
    1,293
    Likes Received:
    187
    Location:
    Lancashire, England.
    This happens if you select 'Remove Duplicate Domains'.

    It's not because you have the same URLs, just so many of the same domain in your list.