1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Anyone that solves this is AMAZING (JS regex)

Discussion in 'General Programming Chat' started by JesusBack, Dec 18, 2010.

  1. JesusBack

    JesusBack Executive VIP Premium Member

    Joined:
    Sep 15, 2010
    Messages:
    1,159
    Likes Received:
    1,284
    Occupation:
    Almost done :D
    Location:
    {calm|cool|collected}
    http://members.virtualtourist.com/m/j/

    they have some javascript random number as their captcha but I haven't been able to get it.

    If someone manages to get it I'll add it to the free bot I have already here.
     
  2. arbydee2

    arbydee2 Regular Member

    Joined:
    Mar 20, 2010
    Messages:
    413
    Likes Received:
    223
    Location:
    127.0.0.1
    Home Page:
  3. JesusBack

    JesusBack Executive VIP Premium Member

    Joined:
    Sep 15, 2010
    Messages:
    1,159
    Likes Received:
    1,284
    Occupation:
    Almost done :D
    Location:
    {calm|cool|collected}
  4. arbydee2

    arbydee2 Regular Member

    Joined:
    Mar 20, 2010
    Messages:
    413
    Likes Received:
    223
    Location:
    127.0.0.1
    Home Page:
    Because the numbers aren't showing in the source code.
     
  5. JesusBack

    JesusBack Executive VIP Premium Member

    Joined:
    Sep 15, 2010
    Messages:
    1,159
    Likes Received:
    1,284
    Occupation:
    Almost done :D
    Location:
    {calm|cool|collected}
    your browser is updating in live time for some reason my browser and curl won't do that.
     
  6. JesusBack

    JesusBack Executive VIP Premium Member

    Joined:
    Sep 15, 2010
    Messages:
    1,159
    Likes Received:
    1,284
    Occupation:
    Almost done :D
    Location:
    {calm|cool|collected}
    edit: I'm being told python can via beautiful soap or something.
     
  7. MaDeuce

    MaDeuce Newbie

    Joined:
    Oct 24, 2008
    Messages:
    45
    Likes Received:
    16
    Location:
    Austin, TX
    Browsers, e.g., Firefox, do not always display the entire source when viewing page source. Similarly, when you save source that you are viewing, you may not get the entire page source. This can cause you to chase your tail until you figure out this 'feature'.

    Curl will get the entire source. Just run it from the command line and save source to a file. Edit/view the file, and you should see all the JS, including the included js files.

    I've done a good bit of research on the subject, and Python + javascript for this purpose is a tough nut to crack.

    If you haven't done so already, you really ought to check out Selenium. It gives you full access to the javascript in a page and even offers a debugger-like interface so you can inspect/change variables. You can even insert javascript code in to the DOM. The icing on the cake is that you can drive it from the command line, making it pretty cool for web automation tasks.

    --Ma
     
  8. MaDeuce

    MaDeuce Newbie

    Joined:
    Oct 24, 2008
    Messages:
    45
    Likes Received:
    16
    Location:
    Austin, TX
    It's BeautifulSoup. Don't blame me, I didn't name it.

    BeautifulSoup is THE tool for HTML and XML parsing. It really shines when working with mal-formed source.

    Python + BeautifulSoup make an awesome web automation platform.

    However, BeautifulSoup doesn't do javascript.

    --Ma
     
  9. JesusBack

    JesusBack Executive VIP Premium Member

    Joined:
    Sep 15, 2010
    Messages:
    1,159
    Likes Received:
    1,284
    Occupation:
    Almost done :D
    Location:
    {calm|cool|collected}
    I don't have a problem getting the source, I just can't track down where the JS is updating the the span with a random number...
     
  10. MaDeuce

    MaDeuce Newbie

    Joined:
    Oct 24, 2008
    Messages:
    45
    Likes Received:
    16
    Location:
    Austin, TX
    Agreed, python is a turing machine, so... I don't want to hijack the thread, but what are you suggesting? I haven't come across any combo of things that are really usable off-the-shelf. Jython/Rhino aren't complete (e.g., you can execute js within python, but you can't call a function defined in js from python, etc.). Python-spidermonkey is a work-in-process as well. If something exists that really gives full access to js from within python, it would quickly become a part of my toolkit. If you know of something and share it, it would make my weekend. thanks.

    --Ma
     
  11. MaDeuce

    MaDeuce Newbie

    Joined:
    Oct 24, 2008
    Messages:
    45
    Likes Received:
    16
    Location:
    Austin, TX
    Ah, I see. Get the Venkman javascript debugger for FF. Load the page. Set a breakpoint on the js function that does the randomization. When you hit the breakpoint, look at the backtrace. It will at least show you how the function got called.

    --Ma
     
  12. JesusBack

    JesusBack Executive VIP Premium Member

    Joined:
    Sep 15, 2010
    Messages:
    1,159
    Likes Received:
    1,284
    Occupation:
    Almost done :D
    Location:
    {calm|cool|collected}
    thanks I will try but I had no luck with firebug I guess I'll try this.
     
  13. roachZ

    roachZ Newbie

    Joined:
    Jun 22, 2009
    Messages:
    14
    Likes Received:
    73
    Occupation:
    Developer
    Location:
    The Netherlands
    Just get the span content "cu".
     
  14. JesusBack

    JesusBack Executive VIP Premium Member

    Joined:
    Sep 15, 2010
    Messages:
    1,159
    Likes Received:
    1,284
    Occupation:
    Almost done :D
    Location:
    {calm|cool|collected}
    it updates automatically in a normal browser but if taken with curl or urllib it'll get the full source where it doesn't parse js.
     
  15. boo blizzi

    boo blizzi Regular Member

    Joined:
    May 28, 2009
    Messages:
    361
    Likes Received:
    267
    this is nice nerdy talk...sure wish i knew what yall were saying...i guess i gotta get back to work...btw Madeuce are u a woman??...if so...that is sooo sexy u know this shit
     
  16. Donnie Darko

    Donnie Darko Regular Member

    Joined:
    Aug 22, 2007
    Messages:
    229
    Likes Received:
    356
    Location:
    USA
    Looked at all the source codes and all I could find that could possibly be what's incorporating what you're talking about could be this chunk of code from engine.js:

    Code:
    /** The original page id sent from the server */
    dwr.engine._origScriptSessionId = "E38EB46A656C70C10825E26A6092F8A0";
    
    /** The session cookie name */
    dwr.engine._sessionCookieName = "JSESSIONID"; // JSESSIONID
    
    /** Is GET enabled for the benefit of Safari? */
    dwr.engine._allowGetForSafariButMakeForgeryEasier = "false";
    
    /** The script prefix to strip in the case of scriptTagProtection. */
    dwr.engine._scriptTagProtection = "throw 'allowScriptTagRemoting is false.';";
    
    /** The default path to the DWR servlet */
    dwr.engine._defaultPath = "/dwr";
    
    /** The read page id that we calculate */
    dwr.engine._scriptSessionId = null;
    
    /** The function that we use to fetch/calculate a session id */
    dwr.engine._getScriptSessionId = function() {
      if (dwr.engine._scriptSessionId == null) {
        dwr.engine._scriptSessionId = dwr.engine._origScriptSessionId + Math.floor(Math.random() * 1000);
      }
      return dwr.engine._scriptSessionId;
    };
    
    They hardcoded E38EB46A656C70C10825E26A6092F8A0 as the id and then they add a random number to it here:

    dwr.engine._scriptSessionId = dwr.engine._origScriptSessionId + Math.floor(Math.random() * 1000);


    Not sure if this is it, though.
     
    • Thanks Thanks x 1
  17. Grizzy

    Grizzy Senior Member

    Joined:
    Nov 11, 2008
    Messages:
    919
    Likes Received:
    999
    Here it is.

    On line 376 of MP1.1.7_all.js you'll see this:
    Code:
    $D.ready(function(){if($B.hasClass("legacySignUp")){var a=$("input[name='sow']").val();$("#cu").html(parseInt(a,16))}});
    It's taking the value of <input name="sow" and using parseInt() to return an integer (the captcha) from that hex string.
     
    • Thanks Thanks x 2
  18. Donnie Darko

    Donnie Darko Regular Member

    Joined:
    Aug 22, 2007
    Messages:
    229
    Likes Received:
    356
    Location:
    USA
    Good job man, that's it. However, that "sow" field is a hidden field on the page and it changes every time the page refreshes. So we would also need to figure out where that "sow" value is coming from. Or I suppose it could be just fetched from the page?
     
  19. Grizzy

    Grizzy Senior Member

    Joined:
    Nov 11, 2008
    Messages:
    919
    Likes Received:
    999
    Yea exactly, scrape the page > parse the sow > convert to int. All done :)
     
  20. JesusBack

    JesusBack Executive VIP Premium Member

    Joined:
    Sep 15, 2010
    Messages:
    1,159
    Likes Received:
    1,284
    Occupation:
    Almost done :D
    Location:
    {calm|cool|collected}
    <input type=hidden name="sow" value="17515">
    Please Enter this number 95509
    I don't see how they're getting 95509 out of 1715 lol...
    what a weird formula