1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

New Proxy Harvester in Scrapebox 1.15.39

Discussion in 'Proxies' started by LakeForest, Mar 9, 2012.

  1. LakeForest

    LakeForest Supreme Member

    Joined:
    Nov 11, 2009
    Messages:
    1,269
    Likes Received:
    1,802
    Location:
    Location Location
    ...makes me want to cry.

    1 step forward, 2 steps back... This might be the first version of Scrapebox I roll back to a previous version.

    Grievances

    1. Can't add sources to find proxies from clipboard to Harvester (F***)
    2. Harvester is almost a second thought as it's the third option in the "Import Proxies" menu
    3. In the Harvester, you need to highlight and select remove to delete. Checking them and pressing remove doesn't work.
    4. Speed has changed from being reported in ms to "Fast, Medium, Slow". Fast = -1500ms. Medium = 1501 - 3000ms. Slow = +3000ms. (talk about speed being relative...)
    5. Doesn't tell time of last operation. If you want to compare, get a stopwatch.
    6. "Save Proxies" by default saves the entire list you put in. All 9000 were saved to file or put into Scrapebox. Can't save only Anon or only Google Passed, making the filtering much less useful.
    6(2). I don't know if it's good or not that it tests Anon and then Google Passed simultaneously because the amount that pass both tests are relatively consistent now as opposed to 1000 GPassed and 300 IP passed in 1.15.38. I'm not sure what bearing this has on overall testing speed, though.
    7. I dare you to put in 300,000+ proxies into the new harvester. I dare you.
    8. Filtering the list when testing is completed with "Keep Anonymous Proxies" reduces the amount of anonymous proxies in the list after the test. Perhaps this is a duplicate removal? Filtering by Google Passed doesn't reduce the number of Google Passed, though.
    9. I hope the new tester filters duplicates automatically, because I can't find a button to remove dups anywhere in the Proxy Manager.

    Postives:(I guess)

    1. Shows proxy's country
    2. Does Anon check (but no idea on the level)
    3. More Descriptive as to why the proxy failed the check (old harvester used to do this before 1.15.38)
    4. Saw unusual port numbers passing tests and working in scrapes. Fun!!
    5. 200 connections is fast.
    6. Can mark proxy sources in harvester as a SOCK source.
    7. Scraper is responding...so there's that...or did I speak too soon?

    Test

    I ran 4 instances testing with a list of 9000 proxies harvested in 1.15.38 because I CAN COPY/PASTE THE SOURCES. 3 instances were version 1.15.39 and had connections set at 25, 100, and 200 (Default is 25. Does Maximum Connections setting of Proxy Harvester not count anymore?) and 1 instance of 1.15.38 running at 100 Requests.

    25 connections was painfully slow. 200 connections didn't even finish testing the list all the time. There were ~100 Proxies stuck on "Testing Proxy..."

    Results:
    (100 Proxy Harvester Max. Connections and 30s Timeout)

    25 connections: 2172 Anonymous(filtered to 1244) and 1119 Google Passed

    100 connections: 2036 Anonymous(filtered to 1088) and 1065 Google Passed

    200 connections: 2205 Anonymous(filtered to 1369) and 1127 Google Passed (wat)

    200 connections with Proxy Harvester Max. Connection reduced to 50 from 100: 2124 Anonymous(filtered to 1050) and 1040 Google Passed

    100 Requests/Maximum Connections (1.15.38): 483 Google Passed and 356 that also IP Passed.

    - This is the first proxy checker I've used that didn't substantially increase results with reduced connections.
    - The number of Google Passed results were reduced after Anonymous filtering.

    Scraping

    I scraped "franklin" (heyyy it's franklin) with the 356 proxies from the old harvester and got 98 results in 1.15.38 and 398 in 1.15.39.

    Scraping "franklin" with 1127 proxies from new harvester(200 connections) placed into in 1.15.38 made the scrape fail almost immediately with 0 results. I got 398 scraped urls in 1.15.39. Testing the 1127 proxies in 1.15.38 I got 170 that GPassed and IP Passed and WAS able to scrape, but got 98 results again.

    Lately, in all google scrapers, I've been noticing a heavy emphasis to focus on finding the keyword in the domain. This is without a footprint. If anyone has suggestions as to why this is or how to overcome it, do share, please.

    Update: wtf...I just put "franklin" in quotes in 1.15.38 with proxies harvested from 1.15.38 and got 566 results. I just scraped in 1.15.39 with franklin in and out of quotes using 1000+ proxies tested and "working" in 1.15.39's tester and got 0 results in two instances and 99 in the other instance.

    Proxy Test Consistency

    In 1.15.39 retesting has been getting pretty consistent results in terms of what the proxy manager says are good proxies. 1.15.38 is notorious for having results get thrown around the board until you test 3+ times.

    Further Tests

    I'll be mixing up speeds and working proxies from 1.15.38 and 39 as well as rolling back the version to test some more.

    I'm going to sleep. Scrapebox's Proxy issue is at the very least being worked on very diligently and the desire to improve is clearly intended. Fix what's broken, but please don't touch what works.
     
    • Thanks Thanks x 2
  2. kokoloko75

    kokoloko75 Elite Member

    Joined:
    Jan 1, 2011
    Messages:
    1,628
    Likes Received:
    1,935
    Occupation:
    Design director
    Location:
    Paris (France)
    • Thanks Thanks x 1
    Last edited: Mar 9, 2012
  3. LakeForest

    LakeForest Supreme Member

    Joined:
    Nov 11, 2009
    Messages:
    1,269
    Likes Received:
    1,802
    Location:
    Location Location
    The feature labeling sources as SOCK sources is actually kind of neat.

    If there are 400 proxies in the source and 200 are SOCK, it only loads the 200.

    I tested 1000 straight from the harvester and got 0 working in 1.15.39 but 60 did pass google and 40 IP in 1.15.38

    I'm not tooo well versed in SOCK, but the ones that worked did scrape in 1.15.38 but scraping failed in 1.15.39
     
  4. proxygo

    proxygo Jr. VIP Jr. VIP Premium Member

    Joined:
    Nov 2, 2008
    Messages:
    10,246
    Likes Received:
    8,704
    i have 3 copys of s/box 1.38 . 1.39 wich im going to try today
    and a really old version as a backup - results to follow
     
  5. LakeForest

    LakeForest Supreme Member

    Joined:
    Nov 11, 2009
    Messages:
    1,269
    Likes Received:
    1,802
    Location:
    Location Location
    Earliest I can roll back to is 1.15.31 and I'm seriously considering it to test some more, and to remember what it was like when things made sense.
     
  6. proxygo

    proxygo Jr. VIP Jr. VIP Premium Member

    Joined:
    Nov 2, 2008
    Messages:
    10,246
    Likes Received:
    8,704
    i was stuck when i did a small test on 1.39
    i tested 7k i was like ok how do i save the google
    passed proxies... ermmmm
     
  7. okok2323

    okok2323 Newbie

    Joined:
    Mar 7, 2011
    Messages:
    10
    Likes Received:
    1
    are duplicates removed automatically? Obviously there's no button and it's very frustrating
     
  8. kokoloko75

    kokoloko75 Elite Member

    Joined:
    Jan 1, 2011
    Messages:
    1,628
    Likes Received:
    1,935
    Occupation:
    Design director
    Location:
    Paris (France)
    Yes, auto-removed.

    Beny
     
  9. mazgalici

    mazgalici Supreme Member

    Joined:
    Jan 2, 2009
    Messages:
    1,489
    Likes Received:
    881
    Home Page:
    I hate it because
    some proxies are becoming as timeout even if they are ok....
    can't export the failed ones anymore...
     
    Last edited: Mar 9, 2012
  10. kokoloko75

    kokoloko75 Elite Member

    Joined:
    Jan 1, 2011
    Messages:
    1,628
    Likes Received:
    1,935
    Occupation:
    Design director
    Location:
    Paris (France)
    I agree, the cleanup function was good...

    Beny
     
  11. mazgalici

    mazgalici Supreme Member

    Joined:
    Jan 2, 2009
    Messages:
    1,489
    Likes Received:
    881
    Home Page:
    1.15.40 is out but is just a small fix
     
  12. proxygo

    proxygo Jr. VIP Jr. VIP Premium Member

    Joined:
    Nov 2, 2008
    Messages:
    10,246
    Likes Received:
    8,704
    making multiple folders
    1.38 - 1.39 - 1.40
    even got a 1.29 folder

    also question is with all the updates what numbers
    are people getting compared to b4

    just did a test 386 passed G ON V1.40
    testing same 381 on v 1.38 250 passed g
    less than 3 mins later 136 drop off

    put same 386 back in v 1.40 and got 282
    --------------------------------------------------------

    just got 1200 out of v 1.40 google passsed
    put them straight into the harvester and lots
    of them are comming back blocked right away

    place same 1200 in v 1.38 and almost all
    are failing google pass

    place same list v1.40 says 1200 passed google
    v 1.38 says only 200 passed google.
     
    • Thanks Thanks x 1
    Last edited: Mar 9, 2012
  13. proxygo

    proxygo Jr. VIP Jr. VIP Premium Member

    Joined:
    Nov 2, 2008
    Messages:
    10,246
    Likes Received:
    8,704
    see for yourself - same list post in v 140 results 887 google proxies
    from 1200 . time laps to scrape attempt 2 mins
    [​IMG]

    attempting to scrape with the same list saved showed right away the
    results are wrong as the proxies are 80% blocked from the go
    the results are wrong

    [​IMG]


    final note testing the same list of 1200 proxies
    in version 1.38 actually showed that only just over 100 passed
    the google check but thoes that did were usable to scrape.
    result from v 1.38 test, of the same 1200 - 146 passed google
    thats a massive difference to 800+ meaning 600 failed proxies
    on version 1.40
     
    • Thanks Thanks x 1
    Last edited: Mar 9, 2012
  14. LakeForest

    LakeForest Supreme Member

    Joined:
    Nov 11, 2009
    Messages:
    1,269
    Likes Received:
    1,802
    Location:
    Location Location
    list of 1234 proxies put in v#.38/.39.40:

    .38: 81 G-Passed/66 IP Passed

    .39: 793 Anon/0 G-Passed

    .40: 807 Anon/582 G-Passed

    wat

    Scrape Test (using 66 working proxies tested in .38):
    -Every instance of "/" designates a new scrape using all engines.

    .38: 0 scraped/628 (488 google)/531 (191 google)

    .39: 0 scraped/142 (0 google)/ 622 (381 google)

    .40: 0 scraped(weird)/691 (642 google)/ 626 (286 google)

    Scrape Test (using the 582 G-Passed found in .40):
    -yeah, i just threw the list of 582 in without testing again. Every instance of "/" designates a new scrape using all engines. The results were absurd:

    .38: 0 scraped/50 (0 google)/183 (0 google)

    .39: 145 scraped/50 (0 google)/190 (0 google)

    .40: 974 scraped(only google)/248(50 google)/340(197 google)

    ----------------------------------------------------------------
    Well, .38 and .40 results are much more reliable than .39 and are more functional versions all things considered. Results may be all over the board, but it's moving somewhere.

    I'll test SOCK next
     
    Last edited: Mar 9, 2012
  15. proxygo

    proxygo Jr. VIP Jr. VIP Premium Member

    Joined:
    Nov 2, 2008
    Messages:
    10,246
    Likes Received:
    8,704
    how is 40 reliable when it tells me 800 proxies
    pass the google test but when i scrape with them
    70% are blocked from the start.?

    but the same list of 800 tested in 138 show what
    i already new about v140 70% failed the google test
    and only just over 100 worked ?

    how is this better, at least when i test a list in 138
    what is says passed google will scrape..
    with version 140 im gettins at least 70% of any passed
    list already blocked when using to scrape
     
    Last edited: Mar 9, 2012
  16. LakeForest

    LakeForest Supreme Member

    Joined:
    Nov 11, 2009
    Messages:
    1,269
    Likes Received:
    1,802
    Location:
    Location Location
    Scraping in .40 isn't terrible, the little update did more than just SOCK and .40 is much more stable and consistent than .39. Good update.

    I don't like that in the harvester in .39/.40 you can't add SOCK sources by typing in url like you could in .38. You can mark them SOCK sources, but should still be able to type in URL.

    Test list of 1046 SOCK:

    .38: 176 G-Passed/123 IP Passed

    .39: 20 Anonymous/0 G-Passed

    .40: 102 Anonymous/66 G-Passed

    Scrape (using 123 from .38):

    .38: 298 (199 google)/504 (204 google)/833 (98 google, 795 Bing)

    .39: 1136 (304 google, 600+ Bing)/790 (490 google)/1049 (0 google, 750 Bing)

    .40: 398 (198 google)/1278 (529 google, 749 Bing)/942 (0 google, 742 Bing)

    Scrape (using 66 from .40):

    .38: 149 (99 google)/749 (199 google, 550 Bing)/1439 (791 google, 449 Bing)

    .39: 199 (100 google)/350 (0 google, 300 yahoo)/499 (299 google, 200 yahoo)

    .40: 1041 scraped (982 google)/150 (100 google)/848 (399 google, 349 Bing)

    In the results, if I didn't mention the other search engines, it's either because their results were very similar to each other, or very low number of results. AOL produced 0 results in all tests.

    Something I noticed about scraping with SOCK...The results aren't so concentrated with having the keyword in the domain. I wonder if anyone else has been noticing the keyword in domain inconsistency.

    Also, Bing liiiikey SOCK
     
  17. LakeForest

    LakeForest Supreme Member

    Joined:
    Nov 11, 2009
    Messages:
    1,269
    Likes Received:
    1,802
    Location:
    Location Location
    :\ I'm just trying to not shit on scrapebox's parade. .39 from every test I conducted caused more problems than .38 (I can't believe I'm labeing .38 as what I consider to be a "stable" proxy check, that's how backwards .39 got)

    .40 did show some improvement, primarily in scraping.

    .38, I cannot believe I'm saying this, is what I'll be using to test proxies until things improve.
     
  18. proxygo

    proxygo Jr. VIP Jr. VIP Premium Member

    Joined:
    Nov 2, 2008
    Messages:
    10,246
    Likes Received:
    8,704
    hay i agree its great there trying but when i provide
    a service for proxies for s/box imagine if i test 1000
    proxies - it tells me 800 passed google but the minute
    i scrape with them 70% are blocked right from the go
    right after the google test. ide get massive complaints
    about un-responsive proxies.
    what use is a test were 70% of something that says
    passed has actually failed b4 using them

    at least with v138 what is says passes the google
    test actually works