1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Question - Can I have a PHP cURL script use existing browser login credentials?

Discussion in 'PHP & Perl' started by dtang4, Feb 14, 2012.

  1. dtang4

    dtang4 Regular Member

    Joined:
    Apr 7, 2010
    Messages:
    291
    Likes Received:
    43
    I am looking to scrape a login-based website using PHP cURL.

    What I am hoping to achieve is manually login to the website via the browser. Then, once I login, I would like to run my PHP script and have it use the existing browser/cookie credentials for the site.

    The reason I don't want to complicate the script and have it login to the site is because the site's login form has CAPTCHA.

    So, my question is -- how do I have my PHP cURL script use the existing browser login credentials?

    Thanks.
     
  2. jazzc

    jazzc Moderator Staff Member Moderator Jr. VIP

    Joined:
    Jan 27, 2009
    Messages:
    2,468
    Likes Received:
    10,148
    That 's an interesting question. It my work if you set up a cookie file in the curl script and the manually copy the cookie from the browser to that cookie file and then make curl use it.

    But it 's quit insane. Take a look at your other thread, I 'll refer you an example of how to do login with cookies in curl.
     
  3. dtang4

    dtang4 Regular Member

    Joined:
    Apr 7, 2010
    Messages:
    291
    Likes Received:
    43
    Thanks for the responses.

    I took a look at the other thread.

    However, my situation has bit of a twist... being the CAPTCHA field for login. So, in order to do login thru my cURL script, I would need to submit the username, password, and CAPTCHA value... which I don't think I'll be able to do.

    If I just copy the cookie file associated w/ my target site into my local web server directory, do I just use the following line to read from it?
    PHP:
    curl_setopt($ch,CURLOPT_COOKIEJAR"PATH TO COOKIE FILE");
    Thanks.
     
  4. jazzc

    jazzc Moderator Staff Member Moderator Jr. VIP

    Joined:
    Jan 27, 2009
    Messages:
    2,468
    Likes Received:
    10,148
    I can't help you with the cookie transfer, never done it, I don't even know if it will work.
     
  5. tripper_john_md

    tripper_john_md Newbie

    Joined:
    Feb 21, 2011
    Messages:
    40
    Likes Received:
    35
    Location:
    Southern Germany
    Use your script as a proxy, get the login form and captcha via curl, so your script already got a session and a cookie.
    Post your login data and the captcha to your script, let it do the login and enjoy whatever you want to do.
    This will need an extra bunch of coding if it's e.g. reCaptcha and not a simple captcha...

    PHP:
    <?php
    $c
    =curl_init();
    $cookie='/tmp/cookie.txt';
    if (empty(
    $_POST['captcha'])){
         
    curl_setopt_array($c,array(
              
    CURLOPT_URL                => 'url',
              
    CURLOPT_FOLLOWLOCATION    => 1,
              
    CURLOPT_COOKIEFILE        => $cookie,
              
    CURLOPT_COOKIEJAR        => $cookie,
              
    CURLOPT_RETURNTRANSFER    => 1
         
    ));
         
    $data=curl_exec($c);
         
    curl_close($c);
         
    //replace the form-tag's action attribute with this proxy file
         //might need preg_replace if the form-url isn't always the same
         
    $data=str_replace('action="original_form_url"''action="this_file"',$data);
         echo 
    $data;
    }
    else{
         
    $post_data=array();
         foreach(
    $_POST as $k=>$v){
              
    $post_data[]=$k.'='.$v;
         }
         
    $post_data=implode('&',$post_data);
         
    curl_setopt_array($c,array(
              
    CURLOPT_URL                => 'original_form_url',
              
    CURLOPT_FOLLOWLOCATION    => 1,
              
    CURLOPT_COOKIEFILE        => $cookie,
              
    CURLOPT_COOKIEJAR        => $cookie,
              
    CURLOPT_RETURNTRANSFER    => 1,
              
    CURLOPT_POSTFIELDS        => $post_data
         
    ));
         
    $data=curl_exec($c);
         
    //you should be logged in
         //do what you want to do...
    }
    ?>
    Hope you get the idea... :)
     
    • Thanks Thanks x 1
    Last edited: Feb 15, 2012
  6. dtang4

    dtang4 Regular Member

    Joined:
    Apr 7, 2010
    Messages:
    291
    Likes Received:
    43
    Wow, that's a pretty ingenius idea.

    I just tried implementing this, but can't get it to work properly. I'm noticing that my cookie info isn't being saved into cookie.txt. Not sure if that's the only issue right now.

    Any thoughts on why the cookie.txt isn't being modified?

    Thanks. +rep given.

     
  7. tripper_john_md

    tripper_john_md Newbie

    Joined:
    Feb 21, 2011
    Messages:
    40
    Likes Received:
    35
    Location:
    Southern Germany
    I use several variations of this script, but none of them need captcha solving, so they do an automatic log in.

    In my opinion it's just the session handling. Everything should be fine as soon as the cookie is working.

    Is tmp writable? Did you try a different directory?

    Add the following to curl_setopt_array to see, what the remote server says:
    curlopt_header=>true

    Are you logging into your own script? If not, try to add a Referer (curlopt_referer=>"url"), maybe it gets checked? Also good for cloaking your semi-manual login.

    Try adding curlinfo_header_out=>true to see, what your script sends to the server.

    Try to get as much information as possible, 'echo' every little step, try to create the cookie.txt first and set chmod to 777.
     
  8. dtang4

    dtang4 Regular Member

    Joined:
    Apr 7, 2010
    Messages:
    291
    Likes Received:
    43
    Hmm, still not working. I am testing it locally on my Windows laptop, so permissions should all be set.

    Would you mind taking a look at my actual script? I can pay you for your help if you can get the login to work.

    My other option, I'm thinking, is to copy over the cookie data like jazzc had suggested.
     
  9. tripper_john_md

    tripper_john_md Newbie

    Joined:
    Feb 21, 2011
    Messages:
    40
    Likes Received:
    35
    Location:
    Southern Germany
    Just send me a message, I will take a look and try to get it to work.

    I haven't got enough posts to send pm's and don't know if this affects answers too, so please include some way for me to answer. ;)
     
  10. tripper_john_md

    tripper_john_md Newbie

    Joined:
    Feb 21, 2011
    Messages:
    40
    Likes Received:
    35
    Location:
    Southern Germany
    I mailed you the working script.
    Since I hate it when I've a problem and find someone with the exact question somewhere in a forum but nobody posted an answer, here is my solution:

    cUrl on windows did not save the cookie, I could not find it anywhere and even some workarounds with absolute pathes did not work.
    So we create the cookie on our own, since the response headers include a "Set-Cookie" - with a little RegExp it's not hard to get it.

    Then we send another curl request to get a captcha for our script's session and include our cookie informations, otherwise the displayed captcha would be for us, not for our script, since it would be a direct request and there are two sessions. We save this captcha locally and display it inside our form. Now we can perform the login.

    After the successfull login the site sets some new cookies, we catch and add them to our file. We will send them with every request from now on.
    Now we are logged in and can do whatever we want to do.

    This script needs to be adapted for every new site, since every login form works differently. But it's a nice point to start from. :)

    PHP:
    <?php
    $c
    =curl_init();

    //password is empty, no login yet
    if (empty($_POST['password'])) {
         
    curl_setopt_array($c,array(
              
    CURLOPT_URL               => 'url',
              
    CURLOPT_USERAGENT         => 'Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90)',
              
    CURLOPT_FOLLOWLOCATION    => 1,
              
    CURLOPT_RETURNTRANSFER    => 1,
              
    CURLOPT_HEADER            => TRUE,
              
    CURLOPT_REFERER           => 'referer'
         
    ));
         
    $data=curl_exec($c);
         
         
    //curl doesn't write cookies, so we do it ourself
         
    preg_match('#PHPSESSID=([a-z0-9]+)#',$data,$sessid);
         
    $sessid=$sessid[0];
         
    $cookie=fopen('cookie.txt','w');
         
    fwrite($cookie,$sessid);
         
    fclose($cookie);
             
         
    //replace the form-tag's action attribute with this proxy file
         //might need preg_replace if the form-url isn't always the same
         
    $data=str_replace('login.php'$_SERVER['SCRIPT_NAME'],$data);
         
         
    //replace the captcha with our local image
         
    $data=str_replace('original.captcha''captcha.png',$data);
         
         
    //get captcha for this session and save it locally
         //otherwise the shown captcha is not for
         //this script but for our real browser session
         //include our selfmade cookie-data
         
    $f=fopen('captcha.png','w');
         
    curl_setopt_array($c,array(
             
    CURLOPT_URL               =>'url/original.captcha',
             
    CURLOPT_COOKIE            => file_get_contents('cookie.txt'),
             
    CURLOPT_HEADER            => false,
             
    CURLOPT_REFERER           => 'referer'
         
    ));
         
    $captcha=curl_exec($c);
         
    curl_close($c);
         
    fwrite($f,$captcha);
         
    fclose($f);
         
         
    //display the new form
         
    echo $data;
    }
    else {
        
        
    //generate data-string to post to form
         
    $post_data=array();
         foreach(
    $_POST as $k=>$v) {
              
    $post_data[]=$k.'='.$v;
         }
         
    $post_data=implode('&',$post_data);
         
         
    //read our selfmade cookie to get the response for our script's session
         
    $cookie=file_get_contents('cookie.txt');
         
    curl_setopt_array($c,array(
              
    CURLOPT_URL               => 'form_url',
              
    CURLOPT_FOLLOWLOCATION    => 1,
              
    CURLOPT_COOKIE            => $cookie,
              
    CURLOPT_RETURNTRANSFER    => 1,
              
    CURLOPT_POSTFIELDS        => $post_data,
              
    CURLOPT_HEADER            => TRUE,
              
    CURLOPT_REFERER           => 'referer'
         
    ));
         
    $data=curl_exec($c);
         
         
    //several new cookies, get the values and save them
         
    if (strstr($data,'Set-Cookie')) {
             
    preg_match('#cookievalue=([a-z0-9]+)#',$data,$match);
             
    $match=$match[0];
             
    $f=fopen('cookie.txt','w');
             
    fwrite($f,$cookie.'; '.$match.';');
             
    fclose($f);
         }
         
         
    //we are logged in, now move on and get everything you want

         //close connection in the end
         
    curl_close($c);
    }
    ?>
     
    • Thanks Thanks x 2
  11. dtang4

    dtang4 Regular Member

    Joined:
    Apr 7, 2010
    Messages:
    291
    Likes Received:
    43
    Works like a charm. You are a genius.