1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Captcha Breaker vs Captcha Sniper - A Texas Death Cage Match

Discussion in 'Black Hat SEO' started by TheEditor, Mar 26, 2013.

  1. TheEditor

    TheEditor Regular Member

    Joined:
    Aug 20, 2007
    Messages:
    425
    Likes Received:
    206
    Looking for captcha solving software? There are 2 options: Captcha Sniper and Captcha Breaker. Captcha Sniper has been around longer, and is cheaper and is run by a two-man team that includes well known BHW guy Mikey (mikeybobikey). Captcha Breaker is the new kid on the block. CB comes from Sven @ GSA, the maker of the famous Search Engine Ranker software. Sven's designated rep here is the indefatigable S4ntOs. Below are the results of 48 hours of testing I did with both solvers.

    Let's cover the basics. The testing was done on my VPS with Berman Hosting. I am a happy Berman customer and if you're looking for a VPS they deserve consideration. Each solver was run in conjunction with the aforementioned Search Engine Ranker software. If you aren't familiar with it you probably should be. For this project I ran a campaign on 4 sites with 3 tiers apiece.

    Time for some numbers. Captcha Breaker vs Captcha Sniper mano a mano: whose cuisine reigns supreme?

    Captcha Breaker alone for 24 hrs:

    SER reports 32420 link submissions, 557 verified
    SER reports 17438 captchas were assigned to CB, 2378 to AskMeBot*
    CB says it "recognized" 11730 of 14938 with average time of 0.247.

    Catpcha Sniper alone for 24 hrs:

    SER reports 30978 link submissions 606 verified
    SER reports 21262 captchas sent to CS, 2266 to AskMeBot*
    CS says it solved 10788 solved with an average time of ~0.538

    * AskMeBot.com offers a pay-as-you-go solution for so-called "text captchas". Questions like "Is the sky blue?". Cost is $5 per month and so far I'm pretty happy with it.

    Before I analyze I gotta say I expected CS to win this. I'm not a CS fanboy. My expectation simply derives from CS having been around a lot longer. Captcha solving is a decidedly non-trivial task. You don't see coders hacking around with stuff like this for the sheer fun of it. Experience counts for something.

    A look at the forest says this contest is a tie. The submitted link count goes to CB by about 5%. The verified goes to CS by about 5%. I consider both to be trivial differences that could easily be reversed were I to run another test. For me a tie means CB is the effective winner. CB is the new kid on the block and to get a tie against more experienced opposition suggests that in the future it will probably overtake CS or at least stay even.

    My brother-in-law sometimes calls me Mr Qualifier, because he says I can't answer even the simplest of questions in under 35 words. Let's qualify these results a bit. There are a large number of captcha types. I'm sure that at some types CS is better then CB and vice versa. Test results are in some measure dependent on the types of captchas encountered on the platforms and targets. I'd bet that on my first tier my projects probably don't share so much in common with the average SER user. On the lower tiers my stuff is going to look a lot more like yours. Your mileage may vary.

    Now let's look at the trees.

    The SER stat bar says it assigned nearly 4K more captchas to CS then CB. I have no idea why this would be. Given the overall results I'm not sure it matters. Maybe Sven/S4ntOs can offer a guess on this. Average solve time looks like a big edge to CB. Maybe. Depends on how the solving time is calculated by each program. SER isn't telling us how fast each solver returns an answer. These times come from the solvers themselves. Since the big picture numbers are similar I'm guessing the solve time differential isn't really as much as it appears at first glance. Or as my BIL would put it, I'm putting a qualifier on this.

    I had meant to make this exercise a 12 hour test, not 24. I screwed up in the CS testing phase. After testing for 12 I didn't take the proper steps in SER to change over to CB and ended up getting another 12 of CS. I threw up the numbers for the 2nd 12-hour block and things looked up so much for CS that Sven suggested (half-jokingly I think) that Mikey had contacted me to help goose things along.

    That didn't happen. Mikey has spent some time with me personally and I'm very appreciative. But no help was offered on this exercise. Still the improvement was stark enough to take a look. In the first 12 hours CS racked up 12055/227, in the 2nd 12 the numbers were 18923/379. Big improvement. More then 50%.

    I've been stopping SER and resetting its stats and futzing with the captcha settings every 12 hours. When I started it was on a schedule of 10:15am EST to 10:15pm EST. Now with the time at breaks spent futzing I'm up to about 11:30 as the break point. What I realized for EACH solver the prime time was from the AM to the PM. Overnight numbers are relatively weak.

    This result may be *far* more important then which solver is used. It suggests that perhaps - I qualify everything don't I?! - the VPS's CPU should be dedicated to other tasks at some point in the overnight hours. If I'm seeing results this strong I'll bet you will too. Implications? I'm using SER to scrape for new targets. Maybe what I should be doing is turning loose Scrapebox on the the search engines to find new targets during those hours when SER is less productive. If SER is producing a third less submissions during a large 12 hour block its stands to reason that the performance hit is even more severe in some subsection of that block. That would be the time to run SB to get fresh targets. With the target scraping done for it SER would be even more productive during the fertile hours of the day.

    Testing isn't done yet. GSA SER is capable of sending failed captchas to a *second* provider. Next up I will look at how CS & CB can work together. In the meantime I'm sure you have some questions. Fire away!

    Oh, one more thing. The thread title. I actually saw a live Texas Death Cage match once. At Cobo Hall in Detroit. My grandfather wanted to go, and my dad offered - very, very reluctantly - to take him along with me and my brother. The feature event pitted Chief Jay Strongbow against someone I don't recall. Am pretty sure Strongbow won. He was carried from the ring on a stretcher, rose from the dead and hoisted the stretcher over his head and conked his opponent. My dad wanted to disappear. Don't think he ever willingly spoke of it again.
     
    • Thanks Thanks x 11
  2. Z0mbie

    Z0mbie Regular Member

    Joined:
    Jun 24, 2012
    Messages:
    339
    Likes Received:
    151
    "TheEditor" - a befitting name :p
     
  3. TheEditor

    TheEditor Regular Member

    Joined:
    Aug 20, 2007
    Messages:
    425
    Likes Received:
    206
    Not sure how to take that!

     
  4. mikie46

    mikie46 Jr. VIP Jr. VIP

    Joined:
    Aug 6, 2008
    Messages:
    1,454
    Likes Received:
    1,102
    These statistics mean nothing. Your banking on the Solved stats? Its a known fact that CS and CB cannot determine SOLVED. Its only a number and does not reflect the true # solved.

    You cant judge which is better based on that. There is just not enough information. There are no checks that say, YES that was the exact matched image.
     
  5. Z0mbie

    Z0mbie Regular Member

    Joined:
    Jun 24, 2012
    Messages:
    339
    Likes Received:
    151
    A compliment. :) I like detailed.

    Did you scrape or did you use a prescraped list?
     
  6. TheEditor

    TheEditor Regular Member

    Joined:
    Aug 20, 2007
    Messages:
    425
    Likes Received:
    206
    Of course there is no guarantee. But I listed the numbers anyway, because they are there. If I didn't someone would want to know what they are. And CB doesn't really list "solved". They list "recognized". Which I note in the summary. For me the key numbers are "submitted" and "verified".

     
  7. TheEditor

    TheEditor Regular Member

    Joined:
    Aug 20, 2007
    Messages:
    425
    Likes Received:
    206
    Well then I thank you.

    I let SER do the scraping. Its quite possible that if I had done the scraping myself that the average solving time would have become important. Scraping and submitting is a multi-step process. Captcha solving is a small if important part of that.

     
  8. Z0mbie

    Z0mbie Regular Member

    Joined:
    Jun 24, 2012
    Messages:
    339
    Likes Received:
    151
    If you let SER do the scraping then the experiment is pointless IMO.
    the average solve time is important. The bottom line is how long it takes each software to process a list of XXX randomly mixed urls and with how much accuracy they do it.

    Maybe you can run another experiment. Use the list already scraped. Import it as target URLs. Or I can scrape a small 10-50k list for you using mixed footprints and you can run those.
     
    • Thanks Thanks x 1
  9. dennica

    dennica Jr. VIP Jr. VIP Premium Member

    Joined:
    Dec 17, 2012
    Messages:
    820
    Likes Received:
    197
    Home Page:
    i use cs and cb... but i get better results using cs....
     
  10. TheEditor

    TheEditor Regular Member

    Joined:
    Aug 20, 2007
    Messages:
    425
    Likes Received:
    206
    I don't think its *pointless*. Many - and maybe most - SER users prefer to have it do the scraping. Average solve time might be important. Maybe not. Depends on how its calculated.

    CB reports a much faster solve time. My survey of this is not exhaustive, but Mikey gets a little defensive about this so I assume its universally true. Certainly I found that too. But we don't know how its calculated. Suppose for instance that CS starts its timing clock the moment SER hands it a captcha. And suppose CB doesn't start the clock until the captcha has been classified as something it can solve in the first place. Obviously CS would then report a longer time.

    Since its unlikely either author will open up the source code we can't know for sure if the two progs calculate this in a comparable fashion.

     
  11. mikie46

    mikie46 Jr. VIP Jr. VIP

    Joined:
    Aug 6, 2008
    Messages:
    1,454
    Likes Received:
    1,102
    I own both CB and CS. CS huh? It seems to me the opposite is true. But that's my opinion.
     
  12. mmulder1985

    mmulder1985 Registered Member

    Joined:
    Dec 16, 2008
    Messages:
    92
    Likes Received:
    12
    I see many times that CS counts something solved while it doesnt match the captcha at all.
     
  13. TheEditor

    TheEditor Regular Member

    Joined:
    Aug 20, 2007
    Messages:
    425
    Likes Received:
    206
    Precisely. We don't know how CS counts something as solved. Nor do we know what "recognized" means in CB. I reported the numbers because that is what I found. To me the most important numbers are "submitted" and "verified".

     
  14. IMTopgun

    IMTopgun Jr. VIP Jr. VIP Premium Member

    Joined:
    Sep 5, 2011
    Messages:
    645
    Likes Received:
    201
    Location:
    Texas
    What numbers you do have show a virtual tie. You cannot declare a winner based on these numbers. The age of the two softwares should not be a factor for consideration. Support, staff, frequency of updates, etc. might be more of a factor than how long it has been in existence. Both CS and CB rely on data received from solves attempted to determine when updates or neccessary, etc. The one that updates more frequently would then be your expedited winner IMHO. A virtual tie tells me they both do what they say they will based upon their coding. Anyway you look at, both are useful services.
     
  15. TheEditor

    TheEditor Regular Member

    Joined:
    Aug 20, 2007
    Messages:
    425
    Likes Received:
    206
    I'm in basic agreement. Support, cost, expectations of *future* support I consider to be very important.

    Frequency of updates? Well, that depends on the updates themselves. I think CS's lack of self-updating to be a real issue. The updating process itself ain't real smooth.

    Age of software in and of itself doesn't seem important. But CS had something like a 15 month lead on CB. To gain rough parity with CS in such a short period of time does portend well for the future. Its not hard to extrapolate from such quick gains.

     
  16. blackmoss

    blackmoss Regular Member

    Joined:
    Nov 20, 2012
    Messages:
    255
    Likes Received:
    23
    I would like to add that I tried both demo CB and I personally have CS, used them and found CB to be the faster and overall more accurate program.
    I've tried switching between both, although the tests are not as extensive as yours but from what I observed, CB is definitely a better program and I will be purchasing it as soon as I have the spare resources for it.

    One other feature of CB is its SDK that enables users to brute force a captcha and enables CB to find the most effective way of solving the captcha more accurately and this is all automated, while for CS you actually have to manually tweak the settings.

    Interface wise, CB offers a more "visually appealing" or "cleaner" interface, this is a personal opinion and some may disagree but to me it feels "higher-end" visual wise, but its not like we are able to stare at it when it is minimized most of the time or running on the VPS but its just a personal observation.
     
    Last edited: Mar 26, 2013
  17. keizer

    keizer Regular Member

    Joined:
    Oct 22, 2008
    Messages:
    376
    Likes Received:
    396
    What version of CS did you used? V3 or the upcoming version in test phase?
     
  18. Narrator

    Narrator Power Member

    Joined:
    Oct 5, 2010
    Messages:
    507
    Likes Received:
    396
    Occupation:
    Internet Marketing
    Location:
    /dev/null
    I have a big difference between solve time for the two. But my LPM is about the same so I figured they just calculate it differently. For CS I have all types with less than 30% solve rate unchecked.

    [​IMG]
     
    Last edited: Mar 26, 2013
  19. TheEditor

    TheEditor Regular Member

    Joined:
    Aug 20, 2007
    Messages:
    425
    Likes Received:
    206
    I used v3. I've played a little with the upcoming version but don't have it on my VPS and I didn't think it fair to use it in this test. Hopefully it'll be out soon.

     
  20. TheEditor

    TheEditor Regular Member

    Joined:
    Aug 20, 2007
    Messages:
    425
    Likes Received:
    206
    This is an odd one. I used to have larger avg solve times with CS then I quoted in this test, but I haven't had the VPS for very long and the machine I was using is *quite* a bit slower. I wish I'd paid closer attention but I have the feeling my CS solve time went down when I bought the AskMeBot service. That shouldn't make any sense whatsoever. And it would be easy to test. But I'm not going to do it.