1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How follow (and reveal) nofollow links with php curl ?

Discussion in 'Other Languages' started by Herna, May 10, 2016.

  1. Herna

    Herna Newbie

    Joined:
    May 10, 2016
    Messages:
    4
    Likes Received:
    0
    Location:
    España
    Hi,
    I would like to grab all links of a webpage with PHP curl using XPath.

    My code is well working but when it's a nofollow link, my script is "printing" "nofollow" :confused:

    How can I do to "force" my script to follow this kind of links ?

    Regards,
     
  2. AdvancedDevelopment

    AdvancedDevelopment BANNED BANNED

    Joined:
    Mar 23, 2016
    Messages:
    91
    Likes Received:
    28
    We will need to see your code to help you with this. If you post it here or pm i'll see if I can post the resolution for you :)
     
  3. Herna

    Herna Newbie

    Joined:
    May 10, 2016
    Messages:
    4
    Likes Received:
    0
    Location:
    España
    Yeah sure, but in my first post I was unable to paste it.

    This is the little code :
    PHP:
    $url 'myurl';
    $curl curl_init($url);
    curl_setopt($curlCURLOPT_RETURNTRANSFERtrue);
    curl_setopt($curlCURLOPT_FOLLOWLOCATIONtrue);
    curl_setopt($curlCURLOPT_USERAGENT'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.224 Safari/534.10');
    $html curl_exec($curl);
    curl_close($curl);

    if (!
    $html) {
        die(
    "something's wrong!");
    }


    $dom = new DOMDocument();
    (
    at)$dom->loadHTML($html);

    $xpath = new DOMXPath($dom);

    $liens $xpath->query('//a');

    print_r($links);
    $nblinks $links->length;

    for (
    $i=0$i $nblinks$i++) { 
        
    print_r($links->item($i));
        
    print_r($links->item($i)->attributes->item(0)->value);
     
  4. AdvancedDevelopment

    AdvancedDevelopment BANNED BANNED

    Joined:
    Mar 23, 2016
    Messages:
    91
    Likes Received:
    28
    Try this and let me know if it works. I removed some of your unneeded code, this should grab all links.

    Code:
    <?php
    
    
    $url = '';
    
    
    $curl = curl_init($url);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.10 
    
    
    (KHTML, like Gecko) Chrome/8.0.552.224 Safari/534.10');
    $html = curl_exec($curl);
    curl_close($curl);
    
    
    if (empty($html)) {
        die("something's wrong!");
    }
    
    
    $dom = new DOMDocument();
    @$dom->loadHTML($html);
    
    
        $links = array(); 
    
    
        foreach($dom->getElementsByTagName('a') as $link) { 
            $links[] = array('url' => $link->getAttribute('href'), 'text' => $link->nodeValue); 
        } 
    
    
        return $links; 
    
    
    
    
    ?>
    
     
  5. Herna

    Herna Newbie

    Joined:
    May 10, 2016
    Messages:
    4
    Likes Received:
    0
    Location:
    España
    Thanks AdvancedDevelopment,

    So the advise is to prefer using getElementByTagName than XPath ?

    I tried to print / echo / var_dump the $links variable without success :(