1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

php curl proxy

Discussion in 'PHP & Perl' started by lanbo, Oct 23, 2011.

  1. lanbo

    lanbo Jr. VIP Jr. VIP

    Joined:
    Aug 23, 2009
    Messages:
    3,599
    Likes Received:
    615
    Home Page:
    does anyone have a working curl script with proxy support?

    i found tons online but none of them properly work
     
    • Thanks Thanks x 1
  2. jazzc

    jazzc Moderator Staff Member Moderator Jr. VIP

    Joined:
    Jan 27, 2009
    Messages:
    2,636
    Likes Received:
    11,315
    Occupation:
    Pusillanimous Knitter
    Location:
    Buenos Aires
    Curl script to do what? "Curl script" can be anything....
     
  3. gimme4free

    gimme4free Executive VIP Jr. VIP Premium Member

    Joined:
    Oct 22, 2008
    Messages:
    1,930
    Likes Received:
    1,945
    Home Page:
    PHP:
    <?php
    $max_connect_timeout 
    3;
    $max_timeout 10;
    $curl_defaults = array(
        
    CURLOPT_HEADER => 0,
        
    CURLOPT_FOLLOWLOCATION => 1,
        
    CURLOPT_AUTOREFERER => 1,
        
    CURLOPT_RETURNTRANSFER => 1,
        
    CURLOPT_CONNECTTIMEOUT => $max_connect_timeout,
        
    CURLOPT_TIMEOUT => $max_timeout,
        
    CURLOPT_VERBOSE => 0,
        
    CURLOPT_SSL_VERIFYHOST => 0,
        
    CURLOPT_SSL_VERIFYPEER => 0
        
    );
    function 
    Return_Content_From_URL($url,$proxy,$port,$loginpassw){
        global 
    $curl_defaults;
        
    $ch curl_init();
        
    curl_setopt_array($ch$curl_defaults);
        
    curl_setopt($chCURLOPT_USERAGENT"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)");
        
    curl_setopt($chCURLOPT_URL,$url);
        
    curl_setopt($chCURLOPT_PROXYPORT$port);
        
    curl_setopt($chCURLOPT_PROXYTYPE"HTTP");
        
    curl_setopt($chCURLOPT_PROXY$proxy);
        if (
    $loginpassw!="0:0"){
            
    curl_setopt($chCURLOPT_PROXYUSERPWD$loginpassw);
            }
        
    $htmlcurl_exec($ch);
        
    $err 0;
        
    $err curl_errno($ch);
        
    curl_close($ch);
        if (
    $err!=0){
            
    $curl_error Echo_Curl_Error($err);
            if(
    $silent==0){echo "<b>Error Connecting To Proxy: $proxy. CURL Error: $err: ".$curl_error."</b><br />";}
            return 
    false;
            }
        return 
    $html;
        }
    $url "http://www.google.com/";
    $proxy "127.0.0.1";
    $port "8080";
    $loginpassw "Username:Password"// Enter 0:0 For IP Authentication Proxies
    echo Return_Content_From_URL($url,$proxy,$port,$loginpassw);
    ?>
     
    • Thanks Thanks x 3
  4. zbigbz

    zbigbz Newbie

    Joined:
    Apr 30, 2011
    Messages:
    38
    Likes Received:
    16
    Thanks for the script, it clarified a couple of points about using proxies that I was having problems with - however I have a new issue: MOST, but not all, of my attempts are returning a CURL 7 Couldn't connect to host error. I have used a number of different "fresh" public proxies and haven't noticed any kind of pattern as to which ones fail vs. succeed. I have run my script on a couple of different servers and have the same problem: some times I get a web page back that I am asking for, sometimes I get a connection timeout, but most of the time I get the CRUL 7 failed to connect error.

    My research on the CURL 7 error suggests that my server may have a firewall that's blocking the proxy and that I have to enable the proxy from my cpanel.

    My problem is that I am running on shared servers and the FAQ's and Forums on the servers say that they do not allow you to change the firewall settings so that I don't mess things up for the other users on the shared server. They say I have to upgrade to a dedicated server to be able to do this.....

    My question for the experts here at BHW is: have I correctly identified the problem with the CURL 7 error, and if so can anyone recommend a reasonably priced dedicated server to sign up with? Or are there any other suggestions for running this on a shared server?
     
  5. gimme4free

    gimme4free Executive VIP Jr. VIP Premium Member

    Joined:
    Oct 22, 2008
    Messages:
    1,930
    Likes Received:
    1,945
    Home Page:
    Curl Error # 7 = Failure to connect to proxy.

    Reasons for this to occur are:
    Authentication IP not setup correctly;
    Incorrect User/Pass for this type of authentication;
    Firewall issues (Ask provider to add the port of your proxies to the allowed outgoing connections list);
    Proxy is down.
     
  6. zbigbz

    zbigbz Newbie

    Joined:
    Apr 30, 2011
    Messages:
    38
    Likes Received:
    16
    Thanks for the feed back gimme4free.

    As I said these proxies are ones I'm scraping from public sources like proxyfire and hidemyass therefore I have no control of their setup, I also have no usernames/passwords. I could try to add each proxy to my shared hosting accounts but there are hundreds of different proxies and I don't know well that will go over - I can see getting booted off the shared host.....

    The simple answer that proxy may be down could be what's going on, these are public proxies - but almost all of them????

    I've seen some examples using the CURLOPT_HTTPPROXYTUNNEL option which your script doesn't use - would tunneling help?
     
  7. soull

    soull Junior Member

    Joined:
    May 19, 2011
    Messages:
    151
    Likes Received:
    35
    PHP:
    <?php 
    /*
    * Check si le proxy est valide, fonctionne en ajax et en fonction
    */

    function test_proxy($url$proxy_host$proxy_ident$timeout)


    $ch curl_init(); 

    curl_setopt($chCURLOPT_URL$url);
    curl_setopt($chCURLOPT_HEADER0);

    curl_setopt($chCURLOPT_FRESH_CONNECTtrue); 
    curl_setopt($chCURLOPT_TIMEOUT$timeout); 
    curl_setopt($chCURLOPT_CONNECTTIMEOUT$timeout); 

    if (
    preg_match('`^https://`i'$url)) 

    curl_setopt($chCURLOPT_SSL_VERIFYPEERfalse); 
    curl_setopt($chCURLOPT_SSL_VERIFYHOST0); 


    curl_setopt($chCURLOPT_FOLLOWLOCATIONtrue); 
    curl_setopt($chCURLOPT_RETURNTRANSFERtrue); 

    curl_setopt($chCURLOPT_PROXY$proxy_host); 
     
    if (
    $proxy_ident!=""
    curl_setopt($chCURLOPT_PROXYUSERPWD$proxy_ident); 

    $page_content curl_exec($ch); 

    curl_close($ch); 


    //echo $page_content; 
    if($page_content==false) {echo "0"; return false;} else {echo "1"; return true;}
    }

    if(!isset(
    $_GET["url"]))
    $_GET["url"]='http://www.google.com';

    if(!isset(
    $_GET["proxy_ident"]))
    $_GET["proxy_ident"]='';

    if(!isset(
    $_GET["timeout"]))
    $_GET["timeout"]=5;



    if(isset(
    $_GET["proxy_host"]))
    {
        
    test_proxy($_GET["url"], $_GET["proxy_host"], $_GET["proxy_ident"], $_GET["timeout"]);
    }

    ;)
     
    • Thanks Thanks x 2
  8. zbigbz

    zbigbz Newbie

    Joined:
    Apr 30, 2011
    Messages:
    38
    Likes Received:
    16
    Thanks Soull - and gimme4free - MOST of the "fresh" publc proxies I was scraping turned out to be invalid/dead by Soull's script; the ones that passed worked 99% of the time through gimme4free's code.

    BHW to the rescue again!
     
  9. relaxin

    relaxin Junior Member

    Joined:
    Aug 13, 2007
    Messages:
    100
    Likes Received:
    25
    Occupation:
    CEO
    Most public proxies are usually down within minutes if not seconds. The best you can do is to include a testing routine before using that proxy.

    Personally I like to put all proxies in a database so when I scrape URLs from g*** for example, my script grabs one proxy at random from a database and test it if works ( I've created a simple text file on my server with text like "text proxy") by downloading the text file. Then I use strpos() to check if the downloaded content has "text proxy" in it. If strpos() returns true it means that proxy is good, then the script continues executing other routines. On the other hand if strpos() returns false the script delete that proxy from a database and grab another one. So it loops though all proxies until there is no working proxy left.

    I like to set it to 0 or false to avoid complications especially with public ones
    PHP:
    curl_setopt($chCURLOPT_HTTPPROXYTUNNEL,0);  //no tunneling
     
    Last edited: Nov 4, 2011
  10. zbigbz

    zbigbz Newbie

    Joined:
    Apr 30, 2011
    Messages:
    38
    Likes Received:
    16
    I'm doing something very similar to this except instead of grabbing just one proxy at random when needed, I have been collecting all of the proxies on a page once an hour or so. If, as you say, most public proxies are dead almost immediately then I'm just wasting my time and database xactions! I'll look into changing my proxy collection approach - thanks for the suggestion.

    I'm also going to set tunneling OFF explicitly to avoid the "complications" you mention.
     
  11. gimme4free

    gimme4free Executive VIP Jr. VIP Premium Member

    Joined:
    Oct 22, 2008
    Messages:
    1,930
    Likes Received:
    1,945
    Home Page:
    You can actually speed up the process of this by disabling CURLOPT_RETURNTRANSFER and instead checking the status of the page:
    PHP:
    function url_exists($url){
        
    $handle curl_init($url);
        
    curl_setopt($handleCURLOPT_HEADERtrue);
        
    curl_setopt($handleCURLOPT_FAILONERRORtrue);
        
    curl_setopt($handleCURLOPT_HTTPHEADER, Array("User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.15) Gecko/20080623 Firefox/2.0.0.15") );
        
    curl_setopt($handleCURLOPT_NOBODYtrue);
        
    curl_setopt($handleCURLOPT_RETURNTRANSFERtrue);
        
    curl_setopt($handleCURLOPT_CONNECTTIMEOUT3);
        
    curl_setopt($handleCURLOPT_TIMEOUT5);
        
    $connectable curl_exec($handle);
        
    curl_close($handle);  
        
    $err 0;
        
    $err curl_errno($ch);
        
    curl_close($ch);
        if (
    $err!=0){
            
    // Error Connecting To Proxy
            
    return false;
            }
        if (
    preg_match('/200 OK/i',substr_replace($connectable,'',30))) {
            return 
    true;
        }else{
             
    // Error Connecting To URL
            
    return false;
            }
        }
    This way it checks if it can:
    1) Connect to the proxy;
    2) If it returns content from the page.

    So it doesn't waste any bandwidth or time actually downloading the content from the page.
     
    • Thanks Thanks x 1
  12. zbigbz

    zbigbz Newbie

    Joined:
    Apr 30, 2011
    Messages:
    38
    Likes Received:
    16

    Thanks gimme4free, this code gave me another way to compare results with soulls code. However there are a couple of errors (or perhaps tests for noobs like me :naughty:) in what you posted.

    Your code sneaks in a close ($handle) BEFORE your test for error and final close which use $ch instead of $handle.

    The code as I got it to work looks like

    PHP:
     function url_exists($url){
        
    $handle curl_init($url);
        
    curl_setopt($handleCURLOPT_HEADERtrue);
        
    curl_setopt($handleCURLOPT_FAILONERRORtrue);
        
    curl_setopt($handleCURLOPT_HTTPHEADER, Array("User-Agent:  Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.15)  Gecko/20080623 Firefox/2.0.0.15") );
        
    curl_setopt($handleCURLOPT_NOBODYtrue);
        
    curl_setopt($handleCURLOPT_RETURNTRANSFERtrue);
        
    curl_setopt($handleCURLOPT_CONNECTTIMEOUT3);
        
    curl_setopt($handleCURLOPT_TIMEOUT5);
        
    $connectable curl_exec($handle);
        
    // NO NOT YET - have to get errno first - curl_close($handle);  
        
    $err 0;
        
    $err curl_errno($handle); // USE $handle NOT $ch   $err = curl_errno($ch);
        
    curl_close($handle);  // USE $handle NOT $ch   curl_close($ch);
        
    if ($err!=0){
            
    // Error Connecting To Proxy
            
    return false;
            }
        if (
    preg_match('/200 OK/i',substr_replace($connectable,'',30))) {
            return 
    true;
        }else{
             
    // Error Connecting To URL
            
    return false;
            }
        }
    The good news is that both programs identify the proxies the same, either good or bad.

    In addition, I like your code better because it doesn't have anything to do with big Goo (or any other website) and may help keep me off their radar, however I wasn't able to verify your "faster" claim - they both seemed quite fast. I agree with your claim as logically transmitting less data back and forth should be faster but I haven't bothered to get before and after time stamps to calculate if there really is a difference. I'll leave that exercise to some other student.....
     
  13. zbigbz

    zbigbz Newbie

    Joined:
    Apr 30, 2011
    Messages:
    38
    Likes Received:
    16
    I finally got back to this and have another question about curl. I'm still having proxies fail on me. Looking closely, it appears that my curl requests still have my REAL server name in them.

    The azenv.php generic script shows that my curl request is being sent with HTTP_HOST = my REAL server name.

    It shows I'm successfully passing through the proxy ID, user agent, etc. successfully - but the HTTP_HOST has my fingerprints all over it.

    I can't see how to set the HTTP_HOST to the referer I'm trying to be. It doesn't seem matter if I set CURLOPT_AUTOREFERER to true or false, or see what I put in CURLOPT_REFERER.

    I've played with php's ini_set and have changed some things related to location and time, but can't seem to get my real server name surpressed.

    Any thoughts/suggestions?


     
  14. Tensegrity

    Tensegrity Elite Member

    Joined:
    Apr 22, 2009
    Messages:
    1,846
    Likes Received:
    976
    FWIW, I just ran a test with gimme4free's url_exists function and I don't see a referrer being passed. Which curl script are you running? One you wrote or one that was posted here?
     
  15. zbigbz

    zbigbz Newbie

    Joined:
    Apr 30, 2011
    Messages:
    38
    Likes Received:
    16
    Thanks for the followup, Tensegrity.

    I am running the Return_Content_From_URL shown above with two minor changes - the first is to NOT close the CURL connection until AFTER I've tested for errors and gotten the $curl_error text, the second is that instead of calling big G, I call a copy of azenv.php (the source is available at several places on the web) with a couple of additional parameters being displayed.

    The azenv.php code I used is:

    foreach ($_SERVER as $header => $value )
    { if (strpos($header , 'REMOTE')!== false || strpos($header , 'HTTP')!== false ||
    strpos($header , 'REQUEST')!== false) {echo $header.' = '.$value."\n"; }
    //look at some other header param's too
    if (strpos($header , 'REFER')!== false || strpos($header , 'FORWARD')!== false ||
    strpos($header , 'VIA')!== false || strpos($header , 'SERVER')!== false) {echo $header.' = '.$value."\n"; }
    }


    The output from this call using 140.211.15.75 : 80, a recently working reportedly elite proxy from hidemyass dot com , returned:

    HTTP_HOST = MyRealDomain HTTP_USER_AGENT = Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1) HTTP_ACCEPT = */* HTTP_VIA = 1.1 2010.foss4g dot org HTTP_VIA = 1.1 2010.foss4g dot org HTTP_CONNECTION = close SERVER_SIGNATURE = Apache/2.2.21 (Unix) mod_ssl/2.2.21 OpenSSL/0.9.8e-fips-rhel5 DAV/2 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635 mod_fcgid/2.3.6 mod_perl/2.0.5 Perl/v5.8.8 Server at MyRealDomain.com Port 80 SERVER_SOFTWARE = Apache/2.2.21 (Unix) mod_ssl/2.2.21 OpenSSL/0.9.8e-fips-rhel5 DAV/2 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635 mod_fcgid/2.3.6 mod_perl/2.0.5 Perl/v5.8.8 SERVER_NAME = MyRealDomain dot com SERVER_ADDR = IpOfMyRealDomain dot com SERVER_PORT = 80 REMOTE_ADDR = 140.211.15.75 SERVER_ADMIN = webmaster at myrealsomain REMOTE_PORT = 57171 SERVER_PROTOCOL = HTTP/1.1 REQUEST_METHOD = GET REQUEST_URI = /h1/azenv.php REQUEST_TIME = 1328577878

    These standard php parameters show MyRealDomain and its real IP at least 4 times to anyone interested, even though the CURL successfully sent the proxy 140.211.15.75 as shown by the reported REMOTE_ADDR.

    So without trying, big G could easily see MyRealDomain and MyRealIP and quickly catch me in the "Are you a human?" trap.

    What am I missing???
     
  16. luuvan

    luuvan Newbie

    Joined:
    Feb 12, 2012
    Messages:
    1
    Likes Received:
    0
    Code:
    CURLOPT_HEADER => 0
    
    should probably be => 1.
    Code:
    $err = 0;
    	$err = curl_errno($ch);
    	if ($err!=0){
    
    should simply be if($html===false)

    also, this script got no cookie-support :p

    $cookiefile=tmpfile();
    curl_setopt($ch, CURLOPT_COOKIEFILE,$cookiefile);
     
    Last edited: Feb 13, 2012
  17. gimme4free

    gimme4free Executive VIP Jr. VIP Premium Member

    Joined:
    Oct 22, 2008
    Messages:
    1,930
    Likes Received:
    1,945
    Home Page:
    These functions were removed from a much larger script that checks for individual CURL errors & uses other cookies, may not be needed for some peoples uses though :)
     
  18. Bouga

    Bouga Newbie

    Joined:
    Jul 24, 2013
    Messages:
    1
    Likes Received:
    0
    I am experiencing the same problem. I have tried to hide my IP through proxies but still they get my sever IP.
    Has anyone had a solution to this