1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

scrapebox google harvester going very slow

Discussion in 'Black Hat SEO' started by nonai, Jan 23, 2014.

  1. nonai

    nonai Power Member

    Joined:
    Oct 10, 2013
    Messages:
    524
    Likes Received:
    63
    10 semi private proxies
    maximum connection for yahoo:3 for google:3
    been scraping for 24 hours
    yahoo: over 1 million url
    google: only 100,000

    google harverster is going very very slow, like 1 url per minute?

    anyone know why this is happening, and how to fix it?
     
  2. innozemec

    innozemec Jr. VIP Jr. VIP

    Joined:
    Aug 19, 2011
    Messages:
    5,288
    Likes Received:
    1,799
    Location:
    www.Indexification.com
    Home Page:
    that is really slow indeed and for sure theres some problem

    in most cases it is because of the proxies.. try running a test scraping without proxies for few seconds and see whats the rate.. also check your settings if you changed something by a mistake..
     
  3. ija61

    ija61 Senior Member

    Joined:
    Mar 2, 2011
    Messages:
    960
    Likes Received:
    634
    Gender:
    Male
    Occupation:
    The first SEO economist:)
    Location:
    Romania
    Home Page:
    Your proxy are banned... you are using too many connections for 10 shared proxy.

    I have 30 private proxy and in the best scenario I go to 2 connections. The recommended ratio is 1:10 but with the newest changes in the G this should be increased to 1:15

    When the proxy are getting banned by a search engine the software will still run until all the task are completed.

    Check out loopline videos on youtube
     
  4. netmatrix

    netmatrix Regular Member

    Joined:
    Mar 21, 2010
    Messages:
    303
    Likes Received:
    83
    Location:
    Midst of the Matrix... Exploiting Glitches
    Yeah, sounds like a proxy issue. They've most likely been banned by Google.
     
  5. xenergy81

    xenergy81 Junior Member

    Joined:
    Jul 6, 2009
    Messages:
    105
    Likes Received:
    6
    Occupation:
    Full time Online Marketer & Software Engineer
    Location:
    IPv4
    Some advanced google search parameters (i.e. inurl) footprint burn proxies much faster than the other. As other members mentioned, your Proxy number to Connections ratio will determine how good is your scrape result.
    Don't use shared proxy, cause you never know what other people have done with your shared proxies - probably these proxies have been abused already.
     
  6. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    3,371
    Likes Received:
    1,799
    Gender:
    Male
    Home Page:
    As noted above, Proxies are banned, but thats not the only reason its happening. The reason its going slow is that some of your proxies are banned, and scrapebox has a retry. So if you got to settings you can set the retry for the custom harvester as well as for the muli threaded harvester.

    So if you have retries set to say 3, then each time scrapebox tries to harvest with a banned proxy and it doesn't work, it then tries again with another proxy, if that ones banned it tries again. If you have retries set to 20 then it could try again, up to 20 times. It picks a random proxy without working thru an array and eliminating them (there is good reason for this, which I won't go into here). So if you have 7 of your 10 proxies banned it could technically work thru 20 retries and never even touch a good proxy and then start over on the process with the next keyword. So the retries combined with banned proxies are why your urls per second are slow, because the time is counted during all the failed proxy trying. So that throws the average off.

    The Volume of urls scraped though is the important one. 100K with 10 proxies set to 3 connections tells me you are probably not using advanced operators, or you wouldn't have got that far. (guessing). But as noted, set connections to 1, else add a delay and go to settings and untick the use custom harvester and untick the use multi threaded harvester.

    Using shared proxies is also fine, I do it and have maybe almost 200 shared proxies from various providers I use. If the provider is solid, typically they will try and match you up with other shared users that are not doing what your doing. Thats why they ask what your using them for when you purchase them. I use MPP and Buyproxies.org shared proxies, and always have great success harvesting with them.

    As for retries, its good to have it set to a fair number, but 3 is the default and probably good with private/shared proxies, if using public proxies you could run it up, but running it up higher then 3 on private/shared doesn't make sense.
     
    • Thanks Thanks x 1
  7. nonai

    nonai Power Member

    Joined:
    Oct 10, 2013
    Messages:
    524
    Likes Received:
    63
    excellent answer. not sure if I processed it all, but really great answer.
     
  8. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    3,371
    Likes Received:
    1,799
    Gender:
    Male
    Home Page:
    Glad you found it helpful. :) I tend to "dump" info, but you really don't need to process it all. So long as you can scrape, and you understand the ratio of connections to the number of proxies you have, your good. Stick with it, and you will find your sweet spot.
     
  9. ѕє∂σηιc

    ѕє∂σηιc Registered Member

    Joined:
    May 27, 2013
    Messages:
    67
    Likes Received:
    10
    OOPS posted in the wrong thread by accident!
     
    Last edited: Oct 4, 2014
  10. AquaticGamer

    AquaticGamer Jr. VIP Jr. VIP

    Joined:
    Apr 13, 2013
    Messages:
    4,069
    Likes Received:
    1,512
    Gender:
    Male
    Location:
    http://www.AQSocials.com
    Home Page:
    Why are you bumping a thread that is almost a year old?