CURL class that works like simple HTML DOM?

Discussion in 'PHP & Perl' started by Saulyx, Sep 4, 2012.

  1. Saulyx

    Saulyx Junior Member

    Joined:
    Jan 10, 2010
    Messages:
    107
    Likes Received:
    5
    So i've been using both CURL and simple_html_dom for a while, for anyone who is not familiar with simple HTML DOM - It allows you to go through elements with ease and without the hassle of having to use regex/exploding stuff and so on.

    E.g.
    PHP:
        $html file_get_html($obj->loc);
        
    $item['title'] = $html->find('#Prod-Name h1',0)->plaintext;
    However as far as i'm aware this does not support cookies - like CURL does, is there something out there that does?


    Would be interested to hear peoples experience in this screen scraping/bot creation.
     
  2. jazzc

    jazzc Moderator Staff Member Moderator Jr. VIP

    Joined:
    Jan 27, 2009
    Messages:
    2,891
    Likes Received:
    12,791
    Occupation:
    Potentate
    Location:
    Asuncion
    Curl fetches, simplehtmldom processes. The cookies belong to the fetching part, not the processing.
     
  3. boomboomer

    boomboomer Executive VIP

    Joined:
    Feb 7, 2008
    Messages:
    717
    Likes Received:
    885
    You can handle cookies with CURL, fetch the data with CURL or file_get_contents and process the data with DOMDocument/simple_html_dom.

    Which part is posing a problem? If you could explain where you're getting stuck with a simple example, you might get the right solution.
     
  4. keval007

    keval007 Junior Member

    Joined:
    Jun 12, 2012
    Messages:
    144
    Likes Received:
    26
    Occupation:
    Web Scraper & PHP Developer
    Hi,

    Here is the code for you.

    Code:
    <?php
    
    include "simple_html_dom.php";
    
    
    function curlReq($reqURL)
    {
    
    
    @$cookie_file_path = "c.txt"; //where cookie get stored
    $fp = fopen($cookie_file_path,"w");
    fclose($fp);
    
    
    
    
    $url=$reqURL;//url passed as a function argument
    
    
    $cookie="";
    $ch=curl_init();
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch,CURLOPT_COOKIEFILE,$cookie_file_path);
    curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
    curl_setopt($ch,CURLOPT_HEADER,0);
    curl_setopt($ch,CURLOPT_FOLLOWLOCATION,true);
    curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
    
    
    $result=curl_exec($ch);
    curl_close($ch);
    
    
    return result;
    
    
    }
    
    
    $htmlDoc=curlReq("http://url-that-uou-want-to-scrape.html"); //calling function which handles cURL request
    
    
    $html=new simple_html_dom();
    $html->load($htmlDoc); //Simple HTML now use the result craped by cURL.
    
    
    ?>

    Function handles cURL request which handles cookies and proxies as well. While the html document retrieved by the function using cURL request is feed to Simple Html Dom object.
     
    • Thanks Thanks x 1