1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Google scraper problem ..need help

Discussion in 'PHP & Perl' started by rabies69, Jan 17, 2012.

  1. rabies69

    rabies69 Registered Member

    Joined:
    Mar 20, 2009
    Messages:
    91
    Likes Received:
    26
    Location:
    Internet
    hello , look at this script to scrape google result


    it's not working with Proxy , when i removing bellow two lines it's working for few time , after that result showing blank for few hours ??

    PHP:
     CURLOPT_PROXY   => "193.138.185.51"//Proxy host if needed
            
    CURLOPT_PROXYPORT  => "3128",   //Proxy port if needed
    what is the problem ?? any other scraper available
     
  2. BlueZero

    BlueZero Power Member

    Joined:
    Jul 6, 2011
    Messages:
    500
    Likes Received:
    257
    Occupation:
    Webdeveloper, Project Manager
    Location:
    Byte in the Net
    Home Page:
    Are you sure, that your proxy is working?

    And google shows captcha when there is too many requests from one ip.
    This could be why you are getting blank reslut.
     
  3. hip_hop_x

    hip_hop_x Jr. VIP Jr. VIP Premium Member

    Joined:
    Aug 27, 2009
    Messages:
    299
    Likes Received:
    61
    Occupation:
    Developer
    Home Page:
    using that you are allowed to see only 1000 results per ip, and will probably show captcha.
     
  4. gimme4free

    gimme4free Executive VIP Jr. VIP Premium Member

    Joined:
    Oct 22, 2008
    Messages:
    1,879
    Likes Received:
    1,932
    Here are a couple of CURL functions I made up a long time back that still work fine with cookies & proxies:
    PHP:
    <?php
    // CURL Function Defaults
    $curl_defaults = array(
        
    CURLOPT_HEADER => 0,
        
    CURLOPT_FOLLOWLOCATION => 1,
        
    CURLOPT_AUTOREFERER => 1,
        
    CURLOPT_RETURNTRANSFER => 1,
        
    CURLOPT_CONNECTTIMEOUT => $max_connect_timeout,
        
    CURLOPT_TIMEOUT => $max_timeout,
        
    CURLOPT_VERBOSE => 0,
        
    CURLOPT_SSL_VERIFYHOST => 0,
        
    CURLOPT_SSL_VERIFYPEER => 0
        
    );
    function 
    Return_Content_From_URL($url,$accountid){
        global 
    $curl_defaults;
        
    $ch curl_init();
        
    curl_setopt_array($ch$curl_defaults);
        
    curl_setopt($chCURLOPT_USERAGENT"Mozilla/5.0 (Windows; U; Windows NT 6.1; ru; rv:1.9.2) Gecko/20100115 Firefox/3.6",);
        
    curl_setopt($chCURLOPT_URL,$url);
        if(
    $referrer!=0){curl_setopt($chCURLOPT_REFERER$referrer);}
        
    curl_setopt($chCURLOPT_PROXYPORT$port);
        if (
    $proxytype=="SOCKS5"){curl_setopt($chCURLOPT_PROXYTYPECURLPROXY_SOCKS5);}else{curl_setopt($chCURLOPT_PROXYTYPE"HTTP");}
        
    curl_setopt($chCURLOPT_PROXY$proxy);
        if (
    $loginpassw!="0:0"){
            
    curl_setopt($chCURLOPT_PROXYUSERPWD$loginpassw);
            }
        
    curl_setopt($chCURLOPT_COOKIEJARstr_replace('\\','/',dirname(__FILE__)).'/cookies/'.$accountid.'.txt');
        
    curl_setopt($chCURLOPT_COOKIEFILEstr_replace('\\','/',dirname(__FILE__)).'/cookies/'.$accountid.'.txt');
        
    $htmlcurl_exec($ch);
        
    $err 0;
        
    $err curl_errno($ch);
        
    curl_close($ch);
        if (
    $err!=0){
            
    $curl_error Echo_Curl_Error($err);
            if(
    $silent==0){echo "<b>Error Connecting To Proxy With Account ID: $accountid & Proxy: $proxy. CURL Error: $err: ".$curl_error."</b><br />";}
            return(
    false);
            }
        return 
    $html;
        }
    function 
    Echo_Curl_Error($err){
        
    $error[1] = "The URL you passed to libcurl used a protocol that this libcurl does not support.";
        
    $error[2] = "Very early initialization code failed. This is likely to be an internal error or problem.";
        
    $error[3] = "The URL was not properly formatted.";
        
    $error[5] = "Couldn't resolve proxy. The given proxy host could not be resolved. ";
        
    $error[6] = "Couldn't resolve host. The given remote host was not resolved.";
        
    $error[7] = "Failed to connect() to host or proxy.";
        
    $error[15] = "An internal failure to lookup the host used for the new connection.";
        
    $error[22] = "Connection Timeout.";
        
    $error[26] = "Failed creating formpost data.";
        
    $error[28] = "Operation timeout. The specified time-out period was reached according to the conditions.";
        
    $error[52] = "Nothing was returned from the server, and under the circumstances, getting nothing is considered an error.";
        
    $error[56] = "Failure with receiving network data.";
        
    $error[60] = "Peer certificate cannot be authenticated with known CA certificates.";
        return 
    $error[$err];
        }
    ?>
     
  5. randomnumbers

    randomnumbers Newbie

    Joined:
    Jun 18, 2011
    Messages:
    20
    Likes Received:
    2
    Which type of proxy is it?

    Try to change the GET url to some "what is my ip"- service and see which ip is returned.