CURL class that works like simple HTML DOM?

Saulyx · Sep 4, 2012

So i've been using both CURL and simple_html_dom for a while, for anyone who is not familiar with simple HTML DOM - It allows you to go through elements with ease and without the hassle of having to use regex/exploding stuff and so on.

E.g.

PHP:

    $html = file_get_html($obj->loc);
    $item['title'] = $html->find('#Prod-Name h1',0)->plaintext;

However as far as i'm aware this does not support cookies - like CURL does, is there something out there that does?

Would be interested to hear peoples experience in this screen scraping/bot creation.

jazzc · Sep 4, 2012

Curl fetches, simplehtmldom processes. The cookies belong to the fetching part, not the processing.

boomboomer · Sep 4, 2012

You can handle cookies with CURL, fetch the data with CURL or file_get_contents and process the data with DOMDocument/simple_html_dom.

Which part is posing a problem? If you could explain where you're getting stuck with a simple example, you might get the right solution.

keval007 · Sep 4, 2012

Hi,

Here is the code for you.

Code:

<?php

include "simple_html_dom.php";


function curlReq($reqURL)
{


@$cookie_file_path = "c.txt"; //where cookie get stored
$fp = fopen($cookie_file_path,"w");
fclose($fp);




$url=$reqURL;//url passed as a function argument


$cookie="";
$ch=curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_COOKIEFILE,$cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch,CURLOPT_HEADER,0);
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,true);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);


$result=curl_exec($ch);
curl_close($ch);


return result;


}


$htmlDoc=curlReq("http://url-that-uou-want-to-scrape.html"); //calling function which handles cURL request


$html=new simple_html_dom();
$html->load($htmlDoc); //Simple HTML now use the result craped by cURL.


?>

Function handles cURL request which handles cookies and proxies as well. While the html document retrieved by the function using cURL request is feed to Simple Html Dom object.

CURL class that works like simple HTML DOM?

Saulyx

Junior Member

jazzc

Elite Member

boomboomer

Power Member

keval007

Junior Member

Main Menu

Marketplace

Making Money

BlackHat World