1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

What computer resources are needed to harvest/scrape links all day ?!

Discussion in 'Black Hat SEO' started by THUNDERELVI, Jun 10, 2014.

  1. THUNDERELVI

    THUNDERELVI Elite Member

    Joined:
    Sep 12, 2009
    Messages:
    2,202
    Likes Received:
    1,725
    Gender:
    Male
    Location:
    W3
    Hello guys, I hope you are all doing well.
    I was hoping if any of you could help me out. I want to harvest/scrape links with ScrapeBox 24/7. I have plenty of keywords to scrape for a long time as well as footprints, and will plan to use 30 semi-dedicated proxies (from buyproxies.org) and around 50-60 threads/connections.

    What do you think the optimum needed resources would be ? In terms of: RAM, Processor Cores and Internet Speed.
    The resources will not be used to perform any other tasks, just to leave ScrapeBox to do its job all the time.
    Finally, can you recommend any cheap VPS with those kind of resources and good reputation ?
    Thank you!
     
  2. THUNDERELVI

    THUNDERELVI Elite Member

    Joined:
    Sep 12, 2009
    Messages:
    2,202
    Likes Received:
    1,725
    Gender:
    Male
    Location:
    W3
    BUMP! Common guys, no opinions on this matter ?
     
  3. wowfactorinme

    wowfactorinme Newbie

    Joined:
    Apr 29, 2014
    Messages:
    13
    Likes Received:
    2
    You will need more proxies that would cater to such a project more than anything else. Get a vps that supports gscraper with their proxies and you would be good to go.
     
    • Thanks Thanks x 1
  4. Zombie Pop

    Zombie Pop Jr. VIP Jr. VIP Premium Member

    Joined:
    Dec 18, 2013
    Messages:
    360
    Likes Received:
    121
    i scrape with scrapebox (even using multiple instances) 24/7 using 40 private proxies on a VPS (3 3.0ghz cores and 2gb ram), and the automator plugin. I have the random delay set between 3 and 8 seconds and I dont burn up my proxies but it does take a long time scrape with so few proxies and the delays. Check out newipnow's bst. I use their proxies. whatever you buy they will double it if you post in their thread.

    I own gscraper too and its lighting fast if you can afford their proxy service. I don't use it because I am on a budget (not selling services to anyone just doing my own SEO) and use those same proxies in gsa ser. Otherwise I would use gscraper and their proxy service for fast scraping of targets.

    IDK if you want to scrape to find targets to post to with software or not, but heres a trick for building a massive list in conjunction with scraping, if thats what you want to scrape for

    http://www.blackhatworld.com/blackh...ow-easily-build-huge-sites-lists-gsa-ser.html
     
    • Thanks Thanks x 1
  5. theMagicNumber

    theMagicNumber Regular Member

    Joined:
    May 13, 2010
    Messages:
    345
    Likes Received:
    195
    The cheapest and relatively fast method is using DSL connection.
    This is the workflow:
    1) Set the threads to X - you need to calculate it based on your bandwidth.
    2) Scrape until google blocks you, usually after a few thousand searches. You MUST NOT use cookies.
    3) Disconnect/Connect your DSL modem/router to change the IP. I am doing this automatically with a simple script.

    Change the user agent to IE6 - google returns content 5 times smaller compared,for example, to chrome.
    You can change the user agent to some of the older browsers to return WAP(XML) content, for example -> Alcatel-BE3/1.0 UP/4.1.8h

    Not sure if scrapebox can handle this with some kind of plugin, but it is a good custom solution.

    I am doing this for years without any problems. Recently, i had to scrape the results from 20,000,000 searches- it took me around ~120 hours(5days) with a 12mbit DSL connection.
     
    • Thanks Thanks x 2
  6. Groen

    Groen Regular Member

    Joined:
    Nov 7, 2009
    Messages:
    397
    Likes Received:
    221
    Really... Scrapebox?

    Get Gscraper (it's a lot faster than Scrapebox) and good port scanned proxies and scrape 80-200 million urls a day. Forget all about Gscraper's proxy service as it sucks donkey balls. I'm running Gscraper along side other SEO tools on a powerful dedi, but I doubt you'll need more than a few gb ram and a core or two.
     
    • Thanks Thanks x 1
  7. THUNDERELVI

    THUNDERELVI Elite Member

    Joined:
    Sep 12, 2009
    Messages:
    2,202
    Likes Received:
    1,725
    Gender:
    Male
    Location:
    W3
    Thanks to all for answering guys!
    @wowfactorinme: Their proxy subscription is good, but not as good as it used to be. I guess because more and more people starting using it. And I have other tools to scrape fast for working public proxies every day, so except the private I will use those as well, but thanks for your suggestion.

    @zombie pop: Thanks a lot for your reply mate and your suggestions. How many links you scrape per day with those settings ?
    Thanks for the link, I have read that thread before :) I actually want to scrape targets for GSA so then I can just identify and post using GSA. Also, what VPS are you using if I may ask ?

    @theMagicNumber: Those are pretty good numbers. I have the script as well to easily get new dynamic IP, however since I changed my ISP I now have a static IP which sucks, and I am not sure why my ISP does that lol since static IP-s are more expensive anyway to get. Although the user agent trick is a really good idea, thanks.
    If I may ask, how many threads you use with 12 Mbit/sec connection and what are your computer's parameters ?

    @Groen: I have tested them both (Gscraper the free version) and I think they are the same in terms of speed if you use the same sets of proxies. Everyone says Gscraper is faster, but that is only if you use their proxy service, which is pretty overpriced IMO.
    How many proxies you use ?
     
  8. xxpansion

    xxpansion Newbie

    Joined:
    Sep 25, 2009
    Messages:
    18
    Likes Received:
    1
    I've heard numerous people say to move away from Gscraper proxies. I'll be trying that soon and report back on the difference for me.
     
    • Thanks Thanks x 1
  9. Groen

    Groen Regular Member

    Joined:
    Nov 7, 2009
    Messages:
    397
    Likes Received:
    221
    They are not the same in terms of speed. I'm scraping about 10-15 times faster with gscraper than scrapebox with the exact same proxies and the same amount of threads. Also with Gscraper you don't have to bother with it constantly crashing and needing to stop it to reload proxies.

    Also the free version of Gscraper is an ancient version and it doesn't do the paid version any justice at all. Gscraper is very slow with their proxy service, trust me. I wouldn't even pay $10/month for that junk.
     
    • Thanks Thanks x 1
  10. THUNDERELVI

    THUNDERELVI Elite Member

    Joined:
    Sep 12, 2009
    Messages:
    2,202
    Likes Received:
    1,725
    Gender:
    Male
    Location:
    W3
    Alright then, I will buy a copy of Gscraper and test it, thanks for the suggestion mate.
    Yeah I agree, and it's pretty overpriced
     
  11. theMagicNumber

    theMagicNumber Regular Member

    Joined:
    May 13, 2010
    Messages:
    345
    Likes Received:
    195
    ~40-50 threads, the hardware is nothing special too, quad 9550+samsung 840 pro SSD.
    I was scraping around 3000-3500 pages per minute, 10 results per page, with IE6 user agent and gzip compression.
    It took me around 5 days to scrape 20M pages.
     
    • Thanks Thanks x 1
  12. sudorank

    sudorank Power Member

    Joined:
    Jun 24, 2013
    Messages:
    640
    Likes Received:
    473
    Occupation:
    Web Developer
    Location:
    Swansea, UK
    Home Page:
    Quad Core, 4Gb ram+ and lots of proxies :cool:

    Simple!
     
    • Thanks Thanks x 1
  13. Aty

    Aty Jr. VIP Jr. VIP

    Joined:
    Jan 27, 2011
    Messages:
    5,400
    Likes Received:
    3,691
    Home Page:
    Three years ago I was scraping with single core cpu and 1 gig of ram. Internet connection must be good though.
     
    • Thanks Thanks x 1
  14. sudorank

    sudorank Power Member

    Joined:
    Jun 24, 2013
    Messages:
    640
    Likes Received:
    473
    Occupation:
    Web Developer
    Location:
    Swansea, UK
    Home Page:
    I remember those day ha ha! I still have a Pentium 4 with 2Gb knocking around the house for GSA today :cool:
     
  15. THUNDERELVI

    THUNDERELVI Elite Member

    Joined:
    Sep 12, 2009
    Messages:
    2,202
    Likes Received:
    1,725
    Gender:
    Male
    Location:
    W3
    And why today it has changed ? I mean scraping is the same, it mostly depends on proxies and internet connection.
    Yeah I still have 2 of those old computers, 1 Pentium 4 2GB and 1 dual core 4GB, but I cannot put them to good use, because with my internet connection I can barely even download a video or a movie lol.
     
  16. Aty

    Aty Jr. VIP Jr. VIP

    Joined:
    Jan 27, 2011
    Messages:
    5,400
    Likes Received:
    3,691
    Home Page:
  17. THUNDERELVI

    THUNDERELVI Elite Member

    Joined:
    Sep 12, 2009
    Messages:
    2,202
    Likes Received:
    1,725
    Gender:
    Male
    Location:
    W3
    Ahaa ok I know what you mean.
    But with just 1 core and 1GB of ram, were you able to scrape many URLS per day ?