1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Scrape page and find if a followed link exists

Discussion in 'PHP & Perl' started by Runatic, Apr 30, 2013.

  1. Runatic

    Runatic Newbie

    Joined:
    Apr 28, 2013
    Messages:
    15
    Likes Received:
    3
    I'm playing around with using dom for parsing html. How would I go about scraping a page to find out if there is a followed link to my site on it?

    Thanks. I'm new to BHW but have been reading for 2 days now and am impressed with the knowledge that is in here.
     
  2. hustleharderrrr

    hustleharderrrr Junior Member

    Joined:
    Jan 2, 2012
    Messages:
    109
    Likes Received:
    33
    Location:
    Bugatti Pure Sang
    I would get the page using cUrl, then parsing the result dom for the anchor with preg_match and your url, then check if the anchor has a nofollow tag with preg_match
     
  3. Runatic

    Runatic Newbie

    Joined:
    Apr 28, 2013
    Messages:
    15
    Likes Received:
    3
    Thanks. That sounds like a good plan. However, I hired someone on elance to write a script for me.
     
  4. JayEs

    JayEs Newbie

    Joined:
    May 3, 2013
    Messages:
    10
    Likes Received:
    1
    Occupation:
    dev
    This library makes dom parsing pretty easy. It has similarities with jquery if you are familiar with it...


    http : // simplehtmldom . sourceforge . net

    pseudo example:

    Code:
    foreach($html->find('a') as $element) {	
    	echo $element->href . '<br>';
    	//do some pregmatch
    }
    
     
    Last edited: May 3, 2013
  5. Wister_fr

    Wister_fr Registered Member

    Joined:
    Sep 6, 2012
    Messages:
    62
    Likes Received:
    23
    Location:
    Internet
    +1 For simple HTML DOM Parser :) even if the class has some problems, this will work fine for you.
     
  6. JayEs

    JayEs Newbie

    Joined:
    May 3, 2013
    Messages:
    10
    Likes Received:
    1
    Occupation:
    dev
    An alternative would be to parse the content with jquery e.g. like this:

    Code:
    grrr tried to post an example code here but the moderation system keeps blocking me :-/
    You could e.g. store all found links via ajax request to another script that stores the stuff in a database. And then after finishing get an unscraped url from the database via ajax and call the above script -> name . php ? url=...Little bot like... ;)


    P.S.: echo file_get_contents( isset($_REQUEST['url']) ? $_REQUEST['url'] : 'start url' ) and after that include jquery and parse with it
     
    Last edited: May 4, 2013