1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Proxies going bad faster than I can scrape them...

Discussion in 'Black Hat SEO Tools' started by Ranko Jones, Jul 15, 2011.

  1. Ranko Jones

    Ranko Jones BANNED BANNED

    Joined:
    Mar 3, 2011
    Messages:
    1,677
    Likes Received:
    146
    So I am scraping now with custom footprints and getting nice lists up to 120k+ easily.

    Trouble is tho it'll be running through (timeout 10s, connections maxed @ 100) and by the time I get sufficient (1k+) and go to check them often more than half are dead.

    I just scraped 11k good proxies and when I went to check I ended up with 300ish by the 3rd pass :(.

    Maybe I'll try setting the timeout a bit lower at 20 sec since I know alot are due to failing timeout but quite a few are also from blocking too.

    I'm not sure how to get round this bottleneck since if I scrape more proxies then there will just be more errors at the backend...
     
  2. VanillaH

    VanillaH Regular Member

    Joined:
    Dec 23, 2009
    Messages:
    323
    Likes Received:
    266
    Public proxies die fast; if you can, go buy ten good private proxies. They last for weeks.
     
  3. Ranko Jones

    Ranko Jones BANNED BANNED

    Joined:
    Mar 3, 2011
    Messages:
    1,677
    Likes Received:
    146
    I'm not using private for harvesting/backlink checking tho...

    The publics the way to go for this esp. since I learned to lower timeout and increase max conns. just shooting a post to see if I can refine that technique any more.
     
  4. VanillaH

    VanillaH Regular Member

    Joined:
    Dec 23, 2009
    Messages:
    323
    Likes Received:
    266
    You didn't mention you were talking about harvesting, so I'm sorry. :D But, yeah, public proxies die fast; I usually scrape only small lists.
     
  5. Autumn

    Autumn Elite Member

    Joined:
    Nov 18, 2010
    Messages:
    2,197
    Likes Received:
    3,044
    Occupation:
    I figure out ways to make money online and then au
    Location:
    Spamville
    Your numbers are fairly typical from my experience. You can expect 1-10% of what you scrape to work, tending towards the lower end. It's not uncommon to test 100K an hour to get 1000 working.

    It also depends on where you're scraping from; if you're scraping from Google rather than getting the latest listing on specific proxy sites, a LOT of proxies you scrape will be long dead.

    Keep a list of every proxy that has ever worked for you and retest it every time you run a batch.

    If you want less hammered proxies, take your list of archived working proxies and calculate the most frequent IP ranges and ports. Then use nmap to scan up and down those ranges, on the most popular ports, to find the proxies that aren't on the lists. :)
     
    • Thanks Thanks x 1
  6. Ranko Jones

    Ranko Jones BANNED BANNED

    Joined:
    Mar 3, 2011
    Messages:
    1,677
    Likes Received:
    146
    Hmm you say retest proxies that were working? I presumed once they were dead they were dead forever or do they come back from the dead for some more fun sometimes?
     
  7. Autumn

    Autumn Elite Member

    Joined:
    Nov 18, 2010
    Messages:
    2,197
    Likes Received:
    3,044
    Occupation:
    I figure out ways to make money online and then au
    Location:
    Spamville
    Proxies go up and down like yoyos, but you will always get an extra few that are still working off your old lists. Sometimes a proxy might just not be responding for the particular second that you test it; someone else might be doing something bandwidth intensive on that proxy that makes it appear dead; some proxies are misconfigured network equipment at businesses that gets turned off at the weekends etc.
     
    • Thanks Thanks x 1
  8. KREAM

    KREAM Junior Member

    Joined:
    Nov 21, 2009
    Messages:
    128
    Likes Received:
    5
    i'm a neewbie, but what is a proxy?