1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

So I attempted a CAPTCHA breaker

Discussion in 'YouTube' started by vexusdev, Jan 7, 2009.

  1. vexusdev

    vexusdev BANNED BANNED

    Joined:
    Dec 13, 2008
    Messages:
    284
    Likes Received:
    107
    So.. I've read a bunch of times on here that the images youtube uses in its CAPTCHA repeats a lot.. so I thought hey why not I'll see what I can do.

    I opened delphi and created a simple program that populates a folder with any amount of youtube's CAPTCHA images (i did 10,000) then it grabs a new CAPTCHA (say you were creating an account) then it compares a small section of that image(20 pixels in the middle) against the 10,000 pre-downloaded images.

    Let me tell you, not one has matched so far.


    Any ideas? Kind of disappointing put some time into this.. ;/


    EDIT: I lied, I tried that with only 2000 I just moved it up to 8,000 and it finally just solved one! OMG success! :) Takes about a second for each thousand images.
     
    Last edited: Jan 7, 2009
  2. fatboy

    fatboy Elite Member

    Joined:
    Aug 13, 2008
    Messages:
    1,618
    Likes Received:
    3,227
    Occupation:
    Retired
    Location:
    Old Peoples Home
    Just a thought - for each of the capcha you have downloaded, can you do an MD5 of the image, are they different?

    Then your account creator could grab the image, MD5 it and then compare against the other ones you have already hashed?

    Only speculation as I wouldn't know where to start on a capcha breaker!
     
  3. vexusdev

    vexusdev BANNED BANNED

    Joined:
    Dec 13, 2008
    Messages:
    284
    Likes Received:
    107
    Honestly I didn't take in the MD5 because even if you refresh the md5 stays the same and the image changes.. so maybe its just a session? I wonder its real value, does anyone know what its there for?

    edit: try removing the hash it still generates an image.

    Or is the hash where it keeps the value of the image? and then it compares?
     
  4. Barbacamanitu

    Barbacamanitu Jr. VIP Jr. VIP Premium Member

    Joined:
    May 8, 2008
    Messages:
    500
    Likes Received:
    91
    he means, you get all of the .jpgs, and hash their data. When you get an image on youtube, download it, and hash it the same way. Now, instead of comparing 20 pixels to another 20 pixels, you compare 1 string to another string.
     
  5. Arthas

    Arthas BANNED BANNED

    Joined:
    Jan 5, 2009
    Messages:
    637
    Likes Received:
    322
    i think hes suggesting that you create a hash for each of the images you have downloaded and then when a captcha appears, create a hash for that and compare against all your recorded hashes.
     
  6. fatboy

    fatboy Elite Member

    Joined:
    Aug 13, 2008
    Messages:
    1,618
    Likes Received:
    3,227
    Occupation:
    Retired
    Location:
    Old Peoples Home
    thats exactly what i mean :D
     
  7. vexusdev

    vexusdev BANNED BANNED

    Joined:
    Dec 13, 2008
    Messages:
    284
    Likes Received:
    107
    I would compare the hashes.. only problem is I do not think they are unique to the image. Like I said no matter what its always a random image no matter whats in the URL.
     
  8. Arthas

    Arthas BANNED BANNED

    Joined:
    Jan 5, 2009
    Messages:
    637
    Likes Received:
    322
    im not talking about any hashes in the URL. im talking about generating hashes based on the image files you download
     
  9. WickednDivine

    WickednDivine Executive VIP Premium Member

    Joined:
    Jul 29, 2008
    Messages:
    468
    Likes Received:
    344
    MD5s are always unique. If the MD5s are different, it's a different image. Plain and simple
     
  10. Arthas

    Arthas BANNED BANNED

    Joined:
    Jan 5, 2009
    Messages:
    637
    Likes Received:
    322
    dude if the md5 way works how I think it should, then the current tedious and difficult method of breaking captchas will be obsolete. we'll be famous!
     
  11. vexusdev

    vexusdev BANNED BANNED

    Joined:
    Dec 13, 2008
    Messages:
    284
    Likes Received:
    107
    http://www.youtube.com/cimg?c=-EKfyfCZllUw7usistktTpcu_tW5v0h42QDZmT5VAGQL0547UMXYs4NMxkyPe-cj


    http://www.youtube.com/cimg?c=


    refresh both so you understand what I'm saying.

    I'm pretty sure if you were to get that same exact picture again it would not have that same MD5.

    We won't know unless I make a program that tries it out I guess. lol oh fun
     
  12. Arthas

    Arthas BANNED BANNED

    Joined:
    Jan 5, 2009
    Messages:
    637
    Likes Received:
    322
    hmm. does file meta data affect the hash? if so then it might be possible to make it so all your files have the exact same meta data, thus basing the hash solely on the content of the image?
     
  13. vexusdev

    vexusdev BANNED BANNED

    Joined:
    Dec 13, 2008
    Messages:
    284
    Likes Received:
    107
    Who knows, could be anything in that damn thing.. I'm sure its not just the image word.
     
  14. Arthas

    Arthas BANNED BANNED

    Joined:
    Jan 5, 2009
    Messages:
    637
    Likes Received:
    322
    well you could always do it the old fashioned way... segment the image and detect the letters by pixels
     
  15. blazed

    blazed Junior Member

    Joined:
    Aug 15, 2008
    Messages:
    178
    Likes Received:
    119
    i think OCR is a better way to go than md5 because one pixel could be off and that's a new hash all together. Lots of captchas have a noise level you can set, so if it's putting in just a few random pixels and the letters are the same the md5s wont match up
     
  16. WickednDivine

    WickednDivine Executive VIP Premium Member

    Joined:
    Jul 29, 2008
    Messages:
    468
    Likes Received:
    344
    the URL has absolutely nothing to do with the image hash. You need to download the raw BINARY image data, and do an md5sum on it.

    I have a hard time believing that these images are ever exactly the same, ever. Especially since they are almost certainly randomly generated by a php script of some kind. even if you change one bit in the image, the MD5 sum would be totally different.
     
  17. Arthas

    Arthas BANNED BANNED

    Joined:
    Jan 5, 2009
    Messages:
    637
    Likes Received:
    322
    i want to code captcha breakers.. such a task though
     
  18. vexusdev

    vexusdev BANNED BANNED

    Joined:
    Dec 13, 2008
    Messages:
    284
    Likes Received:
    107
    Exactly, was just going by what a few said.. I knew the outcome probably wouldn't be good.
     
  19. the_demon

    the_demon Jr. Executive VIP

    Joined:
    Nov 23, 2008
    Messages:
    3,177
    Likes Received:
    1,563
    Occupation:
    Search Engine Marketing
    Location:
    The Internet
    I used photoshop and broke down MySpace captcha to basic readable text black and white by running a certain series of filters in photo shop. If you did this you could then just apply the technique and use simple OCR. I assume my method could be applied to YouTube as well. Not sure if I have the method saved on my laptop or hdd in another state. Send me a PM and if I can find my text file that shows the process I'll send it to you.
     
  20. Arthas

    Arthas BANNED BANNED

    Joined:
    Jan 5, 2009
    Messages:
    637
    Likes Received:
    322
    PM that photoshop thing to me. cause ya know you can script things in photoshop