1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Are my ScrapeBox setting jacked.. WTF?

Discussion in 'Black Hat SEO Tools' started by meatro, Nov 2, 2011.

  1. meatro

    meatro BANNED BANNED

    Joined:
    Nov 21, 2009
    Messages:
    568
    Likes Received:
    997
    So I fired up ScrapeBox for the first time in a while today.

    I have about 15 private proxies and 400 premium shared proxies which end up as about 50 proxies altogether.

    I've given up completely on scraping Google as it basically seems to be a waste of time. If it doesn't take forever, my IP is banned quickly or I get a few thousand URLs.

    I'm searching for a specific forum footprint and a few hundred different keywords. My settings are pretty basic, 'use proxies,' '10 connections,' Yahoo! only.. It doesn't seem to matter how many connections I use either.

    My footprint is simply "powered by vbulletin" and then a generic keyword. I simply scraped keywords related to "home, garden, computers, pets, jewelry and cars."

    Is there some new limitations? I used to scrape 30-50,000 URLs no problem. Do I need more proxies? WTH?
     
  2. BHopkins

    BHopkins Moderator Staff Member Moderator Jr. VIP

    Joined:
    Dec 31, 2010
    Messages:
    2,316
    Likes Received:
    1,389
    Gender:
    Male
    Occupation:
    ORM and SEO company owner
    Location:
    California
    Home Page:
    Yahoo kinda sucks with advanced search operators in my opinion. If you have 400 shared proxies, how many of them pass the google test?

    With 400 proxies, you should easily be able to scrape 1mil+ links. I set my settings to about half as many proxies as I have. So with 400 proxies, I would use 150-200 connections.
     
    • Thanks Thanks x 1
  3. meatro

    meatro BANNED BANNED

    Joined:
    Nov 21, 2009
    Messages:
    568
    Likes Received:
    997
    I've signed up for a couple more premium shared proxy providers that I found on page 3-4 of Google, so hopefully they're not as popular. Out of about 1,000 altogether, around 75-100 seem to be consistently passing Google.

    I've upped the connections to 30 for each, scraped the same keywords (home, garden, automotive, computers, pets, music, etc.) then scraped those as well.. So there's thousands of keywords.

    Seems to be doing much, much better. Thank you. Almost to 60,000 results and still getting 75-100 URLs/s using both Google and Yahoo with even more operators. (and yes, Yahoo sucks very badly for operators. I don't think inurl even works.)

    Hopefully this is my answer, because these 3 premium lists combined are cheaper than my private proxies.

    Thanks again. :)
     
  4. BHopkins

    BHopkins Moderator Staff Member Moderator Jr. VIP

    Joined:
    Dec 31, 2010
    Messages:
    2,316
    Likes Received:
    1,389
    Gender:
    Male
    Occupation:
    ORM and SEO company owner
    Location:
    California
    Home Page:
    I only scrape with shared/public proxies and post with both public and private proxies. When enough share/public proxies, I can scrape more than 1k/s usually. If you have time to wait, 75-100 should be plenty
     
    • Thanks Thanks x 1
  5. dannyduberstein

    dannyduberstein Junior Member

    Joined:
    Nov 1, 2011
    Messages:
    189
    Likes Received:
    105
    you should never use private or for scraping. there is no reason to waste your good proxies you've paid money for to scrape. use public proxies to scrape. Just use something like proxyfire to scrape for public proxies.
     
    • Thanks Thanks x 2
  6. GoldenGlovez

    GoldenGlovez Moderator Staff Member Moderator Jr. VIP

    Joined:
    Mar 23, 2011
    Messages:
    701
    Likes Received:
    1,713
    Location:
    Guangdong, China
    Home Page:
    This is wrong advice. You can certainly harvest Google with as little as 10 private proxies. Just make sure to set a 4-5 second delay between connections (longer for operator heavy footprints).
     
    • Thanks Thanks x 3
  7. meatro

    meatro BANNED BANNED

    Joined:
    Nov 21, 2009
    Messages:
    568
    Likes Received:
    997
    I feel like a retard.

    One big problem.. Like I said, I scrape the keyword suggestion tools. But I wasn't transferring the found keywords to the main list. I used my general keywords (home, garden, auto, etc), scraped the suggestions, then, instead of transferring to the main keyword list, I was transferring to the scraper's keyword list.. Which was clearing all of my shorter keywords.

    So I was ending up with only very long tail keywords like:
    inurl:"member.php" +"powered by vbulletin" +"signature" pink ipad 2 for sale outside of chicago

    So obviously... There weren't many results. :p

    Thanks for your guys' help.
     
  8. BHopkins

    BHopkins Moderator Staff Member Moderator Jr. VIP

    Joined:
    Dec 31, 2010
    Messages:
    2,316
    Likes Received:
    1,389
    Gender:
    Male
    Occupation:
    ORM and SEO company owner
    Location:
    California
    Home Page:
    I wouldn't say that was "wrong" advice, but just different than yours. When I harvest, I'm looking for huge numbers. When I harvest, I aim for 700k (beyond that, SB crashes for me). Setting a delay would take days or weeks to get to 700k. His advice is exactly what I would need since I'm harvesting more than a few hundred/thousand results.
     
    • Thanks Thanks x 1
  9. GoldenGlovez

    GoldenGlovez Moderator Staff Member Moderator Jr. VIP

    Joined:
    Mar 23, 2011
    Messages:
    701
    Likes Received:
    1,713
    Location:
    Guangdong, China
    Home Page:

    I harvest 15,000,000 URL's a day just from Google on private proxies alone. Using just 10 can still net you around 3 million a day. I'd look into whats causing your 700k crash.
     
    • Thanks Thanks x 1
  10. dannyduberstein

    dannyduberstein Junior Member

    Joined:
    Nov 1, 2011
    Messages:
    189
    Likes Received:
    105
    interesting. i've always been told never use private proxies. I'll have to try that out.
     
    • Thanks Thanks x 1
  11. BHopkins

    BHopkins Moderator Staff Member Moderator Jr. VIP

    Joined:
    Dec 31, 2010
    Messages:
    2,316
    Likes Received:
    1,389
    Gender:
    Male
    Occupation:
    ORM and SEO company owner
    Location:
    California
    Home Page:
    Really, with a couple second delay? I've never tested it, I just blow through proxies with 125-250 connections.

    I'll see if I can find an error log that's causing mine to crash. I just assumed I was harvesting to many at a time.
     
    • Thanks Thanks x 1
  12. meatro

    meatro BANNED BANNED

    Joined:
    Nov 21, 2009
    Messages:
    568
    Likes Received:
    997
    When I first started with ScrapeBox, I used 12 private proxies and that was that. I never once had problems getting 403'd on any of them and I scraped constantly almost all day long. Never wanted to deal with finding public proxies that 403 1/4 way through the job and never finish.

    That was 2 years ago, I haven't really been scraping anything for the past several months and a lot has changed. :)

    ProxyFire is awesome, though. I had a list of over 400 earlier. They still get banned very quickly, but at least it scrapes more than before.

    Stupid question.. Where are the delay settings? The only delays I can find are for the blog poster. Not interested in that.

    Thanks again.
     
    Last edited: Nov 3, 2011
  13. BHopkins

    BHopkins Moderator Staff Member Moderator Jr. VIP

    Joined:
    Dec 31, 2010
    Messages:
    2,316
    Likes Received:
    1,389
    Gender:
    Male
    Occupation:
    ORM and SEO company owner
    Location:
    California
    Home Page:
    I came back to this post to ask the same question! I looked for a harvesting delay setting but couldn't find anything.
     
    • Thanks Thanks x 1
  14. linkr

    linkr Newbie

    Joined:
    Feb 18, 2010
    Messages:
    15
    Likes Received:
    4
    Just set enough delay (5 sec) and you should be just fine.
     
    • Thanks Thanks x 1
  15. BHopkins

    BHopkins Moderator Staff Member Moderator Jr. VIP

    Joined:
    Dec 31, 2010
    Messages:
    2,316
    Likes Received:
    1,389
    Gender:
    Male
    Occupation:
    ORM and SEO company owner
    Location:
    California
    Home Page:
    I PM'ed GG and he showed me in the top right corner of the comment poster there is a "Delay" box that works for harvesting as well.
     
    • Thanks Thanks x 1
  16. meatro

    meatro BANNED BANNED

    Joined:
    Nov 21, 2009
    Messages:
    568
    Likes Received:
    997
    Thanks.. I've come to that conclusion myself, went through all the add ons, settings, etc and finally set that thing to 4s and seen that it worked for harvesting.

    Funny, I've never used that thing because.. Well, I've never blasted blog comments with SB. I only used it for harvesting and like I said, I've not run into these problems until now.

    I was pulling lists of several hundred thousand URLs back to back. Now I'm left wondering how folks are doing it once. :\