1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Simple problem

Discussion in 'PHP & Perl' started by Chris Devon, Jul 10, 2008.

  1. Chris Devon

    Chris Devon Jr. VIP Jr. VIP Premium Member

    Joined:
    Jul 2, 2008
    Messages:
    507
    Likes Received:
    192
    I am having some trouble with the php script i'm writing (php newbie). The problem is that i have to write a preg_match_all array to a text file, and i need your help. Forgot to mention that the script that i'm writing is a alltheweb search results link scraper. Thanks in advance

    Here is the code:

    Code:
    <?php
    // get the HTML
    
    $target_url = "http://alltheweb.com/search?advanced=1&cat=web&jsact=&_stype=norm&type=all&q=site%3Ayahoo.com&itag=crv&l=en&ics=utf-8&cs=iso88591&wf[n]=3&wf[0][r]=%2B&wf[0][q]=&wf[0][w]=&wf[1][r]=%2B&wf[1][q]=&wf[1][w]=&wf[2][r]=-&wf[2][q]=&wf[2][w]=&dincl=&dexcl=&geo=&doctype=&dfr[d]=1&dfr[m]=1&dfr[y]=1980&dto[d]=9&dto[m]=7&dto[y]=2008&hits=100";
    $userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';
    
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
    curl_setopt($ch, CURLOPT_URL,$target_url);
    curl_setopt($ch, CURLOPT_FAILONERROR, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_AUTOREFERER, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
    curl_setopt($ch, CURLOPT_TIMEOUT, 30);
    $html = curl_exec($ch);
    
    // find the links
    preg_match_all ('/<span class="resTitle"><a class="res" href="(.*?)" >(.*?)</a></span>/s', $html, $matches,PREG_SET_ORDER);
    
    // print the links to the screen
    foreach ($matches as $above) {
    
    	$hyper = $above[0];
    	echo $hyper;
    	echo "<br/>";
    	$scrlink = $above[1];
    	echo $scrlink;
    	echo "<br/>";
    	$ancortexttext = $above[2];
    	echo $ancortexttext;
    	echo "<br/>";
    
    	}
    
    ?> 
     
  2. drdankmendez

    drdankmendez Junior Member

    Joined:
    May 30, 2008
    Messages:
    194
    Likes Received:
    316
    Location:
    In front of my computer
    • Thanks Thanks x 1
  3. Chris Devon

    Chris Devon Jr. VIP Jr. VIP Premium Member

    Joined:
    Jul 2, 2008
    Messages:
    507
    Likes Received:
    192
    Thanks for your help. Solved the problem
     
  4. barigain

    barigain Junior Member

    Joined:
    Aug 23, 2012
    Messages:
    100
    Likes Received:
    12
    when you are usually scraping sites you dont tell everyone, why dont you show just the problem you are having and maybe there will be a solution
     
  5. Rokebono

    Rokebono Senior Member

    Joined:
    Jan 28, 2013
    Messages:
    1,120
    Likes Received:
    1,672
    Location:
    The solution was found 5 years ago.
     
    • Thanks Thanks x 2