1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

CURL class that works like simple HTML DOM?

Discussion in 'PHP & Perl' started by Saulyx, Sep 4, 2012.

  1. Saulyx

    Saulyx Junior Member

    Joined:
    Jan 10, 2010
    Messages:
    107
    Likes Received:
    5
    So i've been using both CURL and simple_html_dom for a while, for anyone who is not familiar with simple HTML DOM - It allows you to go through elements with ease and without the hassle of having to use regex/exploding stuff and so on.

    E.g.
    PHP:
        $html file_get_html($obj->loc);
        
    $item['title'] = $html->find('#Prod-Name h1',0)->plaintext;
    However as far as i'm aware this does not support cookies - like CURL does, is there something out there that does?


    Would be interested to hear peoples experience in this screen scraping/bot creation.
     
  2. jazzc

    jazzc Moderator Staff Member Moderator Jr. VIP

    Joined:
    Jan 27, 2009
    Messages:
    2,468
    Likes Received:
    10,148
    Curl fetches, simplehtmldom processes. The cookies belong to the fetching part, not the processing.
     
  3. boomboomer

    boomboomer Executive VIP

    Joined:
    Feb 7, 2008
    Messages:
    705
    Likes Received:
    865
    You can handle cookies with CURL, fetch the data with CURL or file_get_contents and process the data with DOMDocument/simple_html_dom.

    Which part is posing a problem? If you could explain where you're getting stuck with a simple example, you might get the right solution.
     
  4. keval007

    keval007 Junior Member

    Joined:
    Jun 12, 2012
    Messages:
    145
    Likes Received:
    26
    Occupation:
    Web Scraper & PHP Developer
    Hi,

    Here is the code for you.

    Code:
    <?php
    
    include "simple_html_dom.php";
    
    
    function curlReq($reqURL)
    {
    
    
    @$cookie_file_path = "c.txt"; //where cookie get stored
    $fp = fopen($cookie_file_path,"w");
    fclose($fp);
    
    
    
    
    $url=$reqURL;//url passed as a function argument
    
    
    $cookie="";
    $ch=curl_init();
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch,CURLOPT_COOKIEFILE,$cookie_file_path);
    curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
    curl_setopt($ch,CURLOPT_HEADER,0);
    curl_setopt($ch,CURLOPT_FOLLOWLOCATION,true);
    curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
    
    
    $result=curl_exec($ch);
    curl_close($ch);
    
    
    return result;
    
    
    }
    
    
    $htmlDoc=curlReq("http://url-that-uou-want-to-scrape.html"); //calling function which handles cURL request
    
    
    $html=new simple_html_dom();
    $html->load($htmlDoc); //Simple HTML now use the result craped by cURL.
    
    
    ?>

    Function handles cURL request which handles cookies and proxies as well. While the html document retrieved by the function using cURL request is feed to Simple Html Dom object.
     
    • Thanks Thanks x 1