php curl

blackbob1

Newbie
Joined
Jul 30, 2013
Messages
5
Reaction score
0
I'm using php curl to do some webpage scraping (cant post url) and it working on root domain name but I seem to be having a problem with the page I want to scrape and I can only think it's related to the url structure. I cant and dont really want to pot the exact url but it looks like this [domain].co.uk/word-word-123-c.asp wold this cause a problem with curl and how do I get around it?

note:
I dont use php
 
I'm a little lost. What exactly is the problem? PM me if you don't want to post the URL, I'll help you out. I have plenty experience scraping sites.
 
with CURL you can fetch any kind of pages, no matter their URL structure.. if thats what you are asking?
 
apologies if this is incorrect, but shouldn't this be in the scripting section? You may get more help from people who know PHP there.

Back on topic:
there's a ton of scrapping scripts out there that can probably do what you want. You may have some luck finding one similar to what you want and then pulling the code out of it. If you don't know php, a great way to learn is to look at working code and figure out how to make it do what you want.

Best of luck!
 
an easy approach could be
PHP:
file_get_contents("[domain].co.uk/word-word-123-c.asp")
but curl does have better benefits
 
Try your code on a simple site first before you move to the site you really want to scrape. Maybe there is some other stuff causing trouble on that site you want to scrape.
 
Try executing curl or wget from the command line. If something on the site is blocking you, you'll be able to see your errors immediately without having to go through wherever the web server is logging them. Alternately, start reading through your error logs (if you're using cPanel, they should be easily accessible to you either directly through cPanel or in FTP).
 
it's good idea to tell us what is your problem. but if the url you want to scrap is secure (httpS) add the following OPT
Code:
curl_setopt($cn, CURLOPT_SSL_VERIFYHOST,  0);
curl_setopt($cn, CURLOPT_SSL_VERIFYPEER, 0);
 
I'm using php curl to do some webpage scraping (cant post url) and it working on root domain name but I seem to be having a problem with the page I want to scrape and I can only think it's related to the url structure. I cant and dont really want to pot the exact url but it looks like this [domain].co.uk/word-word-123-c.asp wold this cause a problem with curl and how do I get around it?

note:
I dont use php
You could use the get_file_contents just to see if you get an error. I have a curl function but don't get an error from it if the website is protected.

Rick
 
get_file_contents


hqdefault.jpg


Did you mean file_get_contents?
 
would help if you tell us what the actual error is otherwise nobody can help you really :)

edit: should have checked the date of the OP :P
 
Back
Top