1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

OVER 15,000,000 URLS with Scrapebox, How to deal with them?

Discussion in 'Black Hat SEO Tools' started by youssef93, Feb 17, 2011.

Tags:
  1. youssef93

    youssef93 Senior Member

    Joined:
    Sep 14, 2008
    Messages:
    828
    Likes Received:
    1,148
    Occupation:
    Student, Part-time Online Marketer
    Location:
    Egypt
    Hello there everyone,

    Yesterday I started a harvesting operation on my VPS running scrapebox. It's the first time for me to harvest over 1 mil URL. Till this moment, the harvesting operation is still ongoing and it has now crossed 16 million. How I did that? Harvested over 200,000 competitor URLs and then used the custom "link:" operator. Now to the questions:

    1. I was using around 200 public proxies with latency <800, now, harvesting process has now gone extremely slow, probably because the proxies are burnt. Now what can I do about it? Can't pause, get new proxies, then resume as far as I know. What do you suggest?

    2. I noticed just 500,000 URLs harvested from Google used 7 GB of bandwidth while till now, I've crossed the 16 mil mark with Yahoo and used only 15.5 GB up till now. Why is this major difference between number of URLs and bandwidth used?

    3. When scrapebox finishes, what will happen? I heard it can't harvest more than 1 mil URL per session so how come I've crossed the 16 mil till now? How do you manage those large lists?

    Thanks a lot!
     
    • Thanks Thanks x 2
  2. aldragon

    aldragon Power Member

    Joined:
    Aug 5, 2010
    Messages:
    688
    Likes Received:
    192
    Location:
    ^^
    Nice but anything over a million SB does not support everytime
     
  3. vickygarg

    vickygarg Power Member

    Joined:
    Jan 25, 2010
    Messages:
    646
    Likes Received:
    531
    The harvested files are in the the scrapebox folder in harvestor session.
     
  4. youssef93

    youssef93 Senior Member

    Joined:
    Sep 14, 2008
    Messages:
    828
    Likes Received:
    1,148
    Occupation:
    Student, Part-time Online Marketer
    Location:
    Egypt
    Hmmm...Thanks aldragon and vickygarg but your sayings look contradicting at least to me. Care to explain a bit more? Also appreciating help from anyone else! :)
     
  5. dzoniij

    dzoniij Regular Member

    Joined:
    May 24, 2009
    Messages:
    469
    Likes Received:
    111
    Big question is are these unique url's/domains??? I really doubt... BTW I used scrapebox for harvesting before some months harvested 300 million links.... so what when only 800K were unique domains....
     
  6. youssef93

    youssef93 Senior Member

    Joined:
    Sep 14, 2008
    Messages:
    828
    Likes Received:
    1,148
    Occupation:
    Student, Part-time Online Marketer
    Location:
    Egypt
    Definetly a hell lot of duplicates but that's not the problem :)
     
  7. Gibbzee

    Gibbzee Regular Member

    Joined:
    Jun 17, 2009
    Messages:
    399
    Likes Received:
    142
    All the urls will be saved in the Harvester Sessions folder in scrapebox. They will be saved in individual files of 1,000,000 urls.

    The best thing to do afterwards is to open the new DupeRemover addon for scrapebox, merge all these files together and then save that list. Next, with the same addon, select that list and choose whether to remove Duplicate urls or Duplicate domains. If you removed duplicate domains then you will probably have less than a million left.

    It may freeze whilst your doing it btw, just leave it to do it's thing and it should respond when it has finished.
     
  8. youssef93

    youssef93 Senior Member

    Joined:
    Sep 14, 2008
    Messages:
    828
    Likes Received:
    1,148
    Occupation:
    Student, Part-time Online Marketer
    Location:
    Egypt
    Thanks a lot!

    Can you comment on question #1 and #2?

    Thanks once again :D
     
  9. Gibbzee

    Gibbzee Regular Member

    Joined:
    Jun 17, 2009
    Messages:
    399
    Likes Received:
    142
    For question number 1, you just have to keep going. It will tell you which keywords it successfully harvested for and which it didn't, so if you want to harvest the keywords which you didn't initially, then save the failed keywords and harvest with those ones.

    For question number 2, I can't really give you an answer for that because i don't know. I think it's normal though.
     
  10. Jared255

    Jared255 Jr. Executive VIP Jr. VIP Premium Member

    Joined:
    May 10, 2009
    Messages:
    1,908
    Likes Received:
    1,664
    Location:
    Boston, MA
    1) stop, export keywords you didnt harvest, get new ones, do a new harvesting sesh with unused keywords

    2) just differences in g/yahoo, not much you can do about it

    3) it will be in the harvester sessions as explained above.

    to combine all the files into one massive 16m file, use this