1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

[GET] Trendsbuzz scraper

Discussion in 'PHP & Perl' started by xpwizard, Nov 24, 2011.

  1. xpwizard

    xpwizard Junior Member

    Joined:
    Nov 6, 2010
    Messages:
    198
    Likes Received:
    122
    Needed this for a side project today, and why not share :)

    Features:
    - Scrapes homepage of Trendsbuzz, giving you the hottest trends for 9 different sites (90 trends in total).
    - Can display results or save to file
    - Can be run via cron to have an ongoing list of trends.

    Requirements:
    - Curl enabled
    - If saving to file. Account must have permissions to create folders/files

    How to use:
    - Choose whether to output text or save to file
    - If you save to file, define a file name
    - Files will be saved into a subfolder named "logs"

    Optional:
    - Can be run via cron. Will save into daily files (duplicates will be removed)


    Code:
    PHP:
    <?php
    /*
     *      Trendsbuzz Scraper: scraper.php
     *      Copyright 2011 Adam <xpwizard @ Blackhatworld.com>
     *      Date: 2011-11-24
     * 
     *                         READ ME
     * 
     * - This file can be run via cron as long as the "file" is chosen as the
     *      output. By doing this, trends will be logged by day, and will be
     *      appended to the end of the file.
     *      Duplicates will be not be appended into file.
     * 
     * - Choosing "text" is good for once off scrapes where you just want
     *      to copy and paste the output.
     * 
     * - This script only scrapes the homepage of Trendsbuzz.
     * 
     */


    // choose output type
    $file_type "file";    // either "file" or "text"

    // if you chose file, define filename
    $file_name "trends.txt";



    /*********** DO NOT EDIT BELOW THIS LINE ***********/

    // check for Curl
    if (!function_exists('curl_version')) { die("Curl is not enabled. Contact your host to enable Curl."); }

    // check whether folder exists
    if (!is_dir(getcwd()."/logs")) { mkdir(getcwd()."/logs"0755true); }

    // check file permissions
    if (!is_writable(getcwd()."/logs/") || !is_readable(getcwd()."/logs/")) { die("Folder does not have correct permissions."); }

    // if file, open file
    if ($file_type == "file") {
        
    $file getcwd()."/logs/".date('d-m-y')."_$file_name";
        
    $fp fopen($file'ab+');
    }

    // get trendsbuzz homepage
    $res curlGET("http://trendsbuzz.com");
    // make sure request isn't blank
    if (empty($res)) { die("Request was blank. Make sure trendsbuzz is online."); }

    // regex to find all trends
    preg_match_all("/html\">(.+?)<\/a>/i"$res$matches);

    // display/log results
    if ($file_type == "file") {
        if (
    filesize($file) > 0) {
            
    $file_check fread($fpfilesize($file));
            foreach (
    $matches[1] as $val) {
                if(
    stripos($file_check$val) === false) { fwrite($fp"$val\n"); }
            }
        } else {
            foreach (
    $matches[1] as $val) {
                
    fwrite($fp"$val\n");
            }
        }
        
    fclose($fp);
        echo 
    "Scrape successful";
    } else {
        foreach (
    $matches[1] as $val) {
            echo 
    "$val<br>";
        }
    }

    // curl fucntion for get requests
    function curlGET($url) {
        
    $ch curl_init();
        
    curl_setopt($chCURLOPT_URL$url);
        
    curl_setopt($chCURLOPT_RETURNTRANSFER1);
        
    curl_setopt($chCURLOPT_FOLLOWLOCATION1);
        
    curl_setopt($chCURLOPT_REFERER$_SERVER['HTTP_USER_AGENT']);
        
    curl_setopt($chCURLOPT_AUTOREFERER1);
        
    $res curl_exec($ch);
        
    curl_close($ch);
        return 
    $res;
    }

    ?>
     
    • Thanks Thanks x 5
    Last edited: Nov 24, 2011