Y T Nuke  
Results 1 to 4 of 4
Modify to fit your own needs (proxies, url length, pages scraped etc), enjoy. Code: <?php ...
  1. #1
    Join Date
    Dec 2009
    Posts
    12
    Reputation
    7
    Thanks
    8
    Thanked 40 Times in 5 Posts

    Lightbulb PPV SE Scraper

    Modify to fit your own needs (proxies, url length, pages scraped etc), enjoy.


    Code:
       <?php
    class scraper
    {
        var $ch;
        var $result;
        
        function __construct(){}
        
        private function init()
        {    
            $ch = curl_init();
            $this->ch = $ch;
            
            $agent = array(    'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6',
                            'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)',
                            'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)',
                            'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)',
                            'Mozilla/4.0 (compatible; MSIE 5.0; Windows NT 5.1; .NET CLR 1.1.4322)',
                            'Opera/9.20 (Windows NT 6.0; U; en)',
                            'Opera/9.00 (Windows NT 5.1; U; en)',
                            'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.50',
                            'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.0',
                            'Mozilla/4.0 (compatible; MSIE 6.0; MSIE 5.5; Windows NT 5.1) Opera 7.02 [en]',
                            'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20060127 Netscape/8.1' );
                        
            shuffle($agent);
                    
            curl_setopt($this->ch,CURLOPT_RETURNTRANSFER,1);
            curl_setopt($this->ch,CURLOPT_FOLLOWLOCATION,1);
            curl_setopt($this->ch,CURLOPT_USERAGENT,$agent[0]);
            curl_setopt($this->ch,CURLOPT_TIMEOUT,10);
            curl_setopt($this->ch,CURLOPT_******SESSION,1);
            curl_setopt($this->ch,CURLOPT_SSL_VERIFYHOST,0);
            curl_setopt($this->ch,CURLOPT_SSL_VERIFYPEER,0);
            
            return;
        }
        
        private function get($url)
        {
            curl_setopt($this->ch,CURLOPT_URL,$url);
            curl_setopt($this->ch,CURLOPT_POST,0);
            
            $s = curl_exec($this->ch);
    
            return $s;        
        }
        
        /* parse related */
        private function parse_all($source,$tag1,$tag2)
        {
            $source=str_replace($tag1,'<tiny:parse>',$source);
            $source=str_replace($tag2,'</tiny:parse>',$source);
            
            preg_match_all('#<tiny:parse>(.*?)</tiny:parse>#',$source,$result);
    
            return($result[1]);        
        }
        
        function go($keyword)
        {
            $this->init();
            
            $start = 0; while($start<200)
            {
                $s = $this->get('google.com/search?hl=en&q='.urlencode($keyword).'&start='.$start.'&sa=N');
                $urls = $this->parse_all($s,'<h3 class=r><a href="','" class=l>');
    
                if(is_array($urls) && count($urls)>0)
                {
                    foreach($urls as $url)
                    {
                        if(strlen($url)>40) $fin[] = $url;
                    }
                }
                
                $start = $start + 10;
            }
            
            array_unique($fin);
            
            foreach($fin as $result) echo $result.'<br />';
        }
    }
    
    $q = $_GET['q'];
    
    if(trim($q)=='')
    {
        echo 'You must provide a query';
    }else{
        $scr = new scraper();
        $scr->go($q);
    }
    ?>

  2. The Following 2 Users Say Thank You to StackingDough For This Useful Post:

    fear91 (12-30-2009), lenhan555 (12-30-2009)

  3. #2
    fear91's Avatar
    fear91 is offline Regular Member
    Join Date
    Dec 2007
    Posts
    360
    Reputation
    11
    Thanks
    55
    Thanked 83 Times in 67 Posts

    Default Re: PPV SE Scraper

    Good Share!

  4. #3
    fistuk's Avatar
    fistuk is offline Newbies
    Join Date
    Dec 2008
    Posts
    23
    Reputation
    10
    Thanks
    20
    Thanked 68 Times in 10 Posts

    Default Re: PPV SE Scraper

    Thanks a lot for the share.

    Not that I care but why reinvent the wheel?
    Laser URL is such a great tool and free...

  5. #4
    terebl7's Avatar
    terebl7 is offline Newbies
    Join Date
    Jul 2009
    Posts
    11
    Reputation
    10
    Thanks
    4
    Thanked 1 Time in 1 Post

    Default Re: PPV SE Scraper

    how can i use it? thanks

Dot Gov Backlinks Sale


Smarter Submit

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
  SEnukeX SEO Software
Proudly Powered by Hostwinds.com Web Hosting Click Here For Exclusive BHW Discounts!

Cheap Web Hosting


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75