1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

*BHW Exclusive Freebie* - Related Content, Phone # , Email Scraper by Amrak

Discussion in 'Black Hat SEO Tools' started by amrak, Mar 14, 2012.

  1. amrak

    amrak Registered Member

    Joined:
    Jun 29, 2010
    Messages:
    74
    Likes Received:
    21
    Hey Everyone!

    I figured I'd take some time out of my day to write this handy-dandy script and share it exclusively with BHW :) ... Of course while keeping it 100% free!

    Just felt like showing some appreciation to the community, since BHW rocks :D

    Let me know if you have any questions...

    ----------------------------------------------------------------------------------------------------------------------

    *Related Content, Phone # , Email Scraper* by Amrak

    [​IMG]

    ----------------------------------------------------------------------------------------------------------------------

    Description:

    Web-based script written in PHP allowing super simple, but blazing fast scraping of whats important! It offers an easy way to scrape the top 3 search engines (google,yahoo,bing) and their results pages (up to 10 full pages each). It then automatically parses and displays the emails and phone numbers related to the keyword you specify. Its extremely quick, and can scrape and parse 30 pages of SERP's in seconds (make sure to use proxies to avoid bans from the sources). Enjoy!


    Features:

    - Collect email addresses, phone numbers, and related content for any keyword

    - Scrapes google,yahoo,bing simultaneously (Up to 30 Threads at a time!)

    - Randomly chooses from a built-in list of common User-Agents for extra anonymity.

    - Choose how many pages of SERP's to scrape per source

    - Anonymous HTTP Proxy Support

    - Randomly chooses new proxies for each thread (yes up to 30)

    - "Multi-threaded", blazing fast parallel Scraping using PHP's curl_multi

    - Super simple and easy to install even on your windows PC! (wamp stack)

    - Under 200 lines of code!


    Example Uses:

    - Build targeted contact lists (and even sell them if you want :D)

    - Use for SMS or Email marketing (so many ways available)

    - Setup macros to auto generate content for your BH needs

    - Basic filtering of emails/phone numbers from SERP snippets

    - And many more creative ways!


    Requirements:

    - PHP5 with DOM extension enabled

    Installation:

    1. Copy php code onto your server (paste the code into a file and name it with .php extension).
    2. Access the script in any browser
    3. Enjoy!

    Screenshots:

    [​IMG]


    Code:

    PHP:
    <?php
    /**************************************************************************\

        Related Content, Phone # , Email Scraper
        Copyright (C) 2012 Amrak @ BHW

        This program is free software: you can redistribute it and/or modify
        it under the terms of the GNU General Public License as published by
        the Free Software Foundation, either version 3 of the License, or any 
        later version.

        This program is distributed in the hope that it will be useful,
        but WITHOUT ANY WARRANTY; without even the implied warranty of
        MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
        GNU General Public License for more details.

        You should have received a copy of the GNU General Public License
        along with this program.  If not, see <http://www.gnu.org/licenses/>.
        
    \***************************************************************************/
    error_reporting(0);
    ini_set('display_errors','0');
    class 
    BHW_Contact_Scraper{
        
        public 
    $results '<h2>Harvested Results</h2>';
        
        public 
    $query '"contact us at"';
        
        private 
    $_rawKeyword;
            
        private 
    $_maxPages;
        
        private 
    $_multiHandle;
        
        private 
    $_proxies = array();
            
        private 
    $_userAgents = array(
            
    "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.125 Safari/533.4",
            
    "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4",
            
    "Mozilla/5.0 (Windows; U; Windows NT 6.1; nl; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13",
            
    "Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.5; en-US; rv:1.9.0.3) Gecko/2008092414 Firefox/3.0.3",
            
    "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB0.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; GACID=)"
        
    );
        
        public function 
    __construct($keyword,$depth=1,$proxies=''){
            
            if(
    $proxies!=''){
                
    $proxies preg_replace('/\r/'''$proxies);
                
    $this->_proxies explode("\n",$proxies);        
            }
            
            
    $this->_maxPages $depth;
            
            
    $this->_rawKeyword $keyword;        
            
            
    $query urlencode($this->query.$keyword);
            
            
    //setup urls to check
            
    for($x=0;$x<$this->_maxPages;$x++){
                
    ///google
                
    $urls[] = 'http://www.google.com/search?q='.$query.($x>0?'&start='.($x 10):'');            
                
    //yahoo
                
    $urls[] = 'http://search.yahoo.com/search?p='.$query.($x>0?'&b='.(($x 10)+1):'');                
                
    //bing
                
    $urls[] = 'http://bing.com/search?q='.$query.($x>0?'&first='.(($x 10)+1):'');            
            }
            
    $this->_setupHandles($urls);
            
            
    $this->_crawlAll();
            
            
    $this->_process();
        }
        
        private function 
    _setupHandles($urls){
            
            
    $this->_multiHandle curl_multi_init();
            
            if(
    is_array($urls)){
                foreach(
    $urls as $url){                
                    
    $ch curl_init();
                    
    $options = array();
                    
    $options[CURLOPT_URL]                 = $url;
                    
    $options[CURLOPT_USERAGENT]         = mt_rand(0,count($this->_userAgents)-1);
                    
    $options[CURLOPT_FOLLOWLOCATION]    = 1;
                    
    $options[CURLOPT_RETURNTRANSFER]     = 1;
                    
    $options[CURLOPT_TIMEOUT]             = 10;
                    
    $options[CURLOPT_CONNECTTIMEOUT]     = 10;
                    
                    
    //proxy support
                    
    if(count($this->_proxies) > 0){
                        
    $proxy $this->_proxies[mt_rand(0,count($this->_proxies)-1)];
                        
    $p explode(':',$proxy);
                        
    $options[CURLOPT_PROXY] = $p[0];
                        
    $options[CURLOPT_PROXYPORT] = $p[1];                    
                    }
                    
                    
    curl_setopt_array($ch$options);
                    
    $this->_handles[] = $ch;
                    
    curl_multi_add_handle($this->_multiHandle$ch);

                }
            }
        }
        private function 
    _crawlAll(){
            
    $mh $this->_multiHandle;
            
    $active null;        
            do{
    $mrc curl_multi_exec($mh$active);}while ($mrc == CURLM_CALL_MULTI_PERFORM);
            
    //run in parallel
            
    while ($active && $mrc == CURLM_OK){
                if (
    curl_multi_select($mh) != -1){
                    do{
                        
    $mrc curl_multi_exec($mh$active);
                    }
                    while (
    $mrc == CURLM_CALL_MULTI_PERFORM);
                }
            }
        }
        
        private function 
    _process(){
            
            
    $snippets '';
            
            foreach(
    $this->_handles as $ch){
                
    $html curl_multi_getcontent($ch);            
                if(
    $html == ''){
                    
    $e curl_error($ch);
                    if(
    stristr($e,"Couldn't resolve proxy")){
                        echo 
    "<h1 style=\"color:red;\">ERROR: $e</h1>";
                        return;
                    }
                    
                }
                
    $curlInfo curl_getinfo($ch);
                if(
    preg_match('/google/',$curlInfo['url'])){
                    
    $xquery '//li[@class="g"]/div';
                    
    $t='g';
                }
                if(
    preg_match('/yahoo/',$curlInfo['url'])){
                    
    $xquery '//div[@id="web"]/ol/li/div/div[@class="abstr"]';
                    
    $t='y';
                }
                if(
    preg_match('/bing/',$curlInfo['url'])){
                    
    $xquery '//div[@id="results_container"]/div[@id="results"]/ul/li/div/p';
                    
    $t='b';
                }
                
    $dom = new DOMDocument();
                @
    $dom->loadHTML($html);
                
    $xpath = new DOMXPath($dom);        
                
    $result $xpath->query($xquery);                    
                for (
    $i 0$i $result->length$i++) {
                    
    $snippets .= ' '.$result->item($i)->nodeValue.' ';
                    
    $snips[$t] .= ' '.$result->item($i)->nodeValue.' ';                
                }
            }
            
            
    //numbers
            
    preg_match_all('/[^0-9]([0-9]{3})[^0-9]*?([0-9]{3})[^0-9]*?([0-9]{4})[^0-9]/'$snippets$resultPREG_PATTERN_ORDER);        
            foreach(
    $result[0] as $i => $n){
                
    $sets = array($result[1][$i],$result[2][$i],$result[3][$i]);
                
    $numbers[] = implode('-',$sets);
            }
            
    $numbers array_unique($numbers);
            
            
    //emails
            
    preg_match_all('/([a-zA-Z_.0-9]{1,}?)@([a-zA-Z\-0-9]{2,}?)\.(([a-zA-Z\-0-9]{3})|(co\.uk))/'$snippets$resultPREG_PATTERN_ORDER);        
            foreach(
    $result[0] as $i => $n){            
                
    $emails[] = $result[1][$i].'@'.$result[2][$i].'.'.$result[3][$i];
            }
            
    $emails array_unique($emails);

            
    $this->results .= '<h3>Numbers Related to "'.$this->_rawKeyword.'"</h3>'.implode('<br/>',$numbers);
            
    $this->results .= '<h3>Emails Related to "'.$this->_rawKeyword.'"</h3>'.implode('<br/>',$emails);
            
    $this->results .= '<h3>Google Content Related to "'.$this->_rawKeyword.'"</h3>'.$snips['g'];
            
    $this->results .= '<h3>Yahoo Content Related to "'.$this->_rawKeyword.'"</h3>'.$snips['y'];        
            
    $this->results .= '<h3>Bing Content Related to "'.$this->_rawKeyword.'"</h3>'.$snips['b'];    
            
    $this->results .= '<h3>Combined Raw SERP Content Related to "'.$this->_rawKeyword.'"</h3>'.$snippets;                    
        }
        
    }

        if(
    $_POST['submit']){
            if(isset(
    $_POST['keyword']) && !empty($_POST['keyword'])){
                
    $scraper = new BHW_Contact_Scraper($_POST['keyword'],$_POST['pagedepth'],$_POST['proxies']);
                
    $results $scraper->results;    
            }else{
    ?>
                <h1 style="color:red; font-weight: bold;">YOU FORGOT THE KEYWORD</h1>
            <?php }
        }
    ?>
            
    <html>
        <h2>R elated Content, Phone # , Email Scraper by Amrak @ BHW</h2>
        
        <form action="http://<?php echo $_SERVER['HTTP_HOST'].$_SERVER['REQUEST_URI']?>" method="post">    
        
        <label for="keyword">Keyword:</label>
        <input style="width:358px;" name="keyword" type="text" value="<?php echo $_POST['keyword'];?>" />
        
        <br/>
        Anonymous HTTP Proxies (optional):
        <br/>
        <textarea placeholder="host:port (one per line)" name="proxies" type="text" rows="5" cols="50"><?php echo $_POST['proxies'];?></textarea>    
        <br/>        
        <label for="keyword">Pages Per Source:</label>
        <select  name="pagedepth">
        <?php for($x=1;$x<11;$x++){?>
            <option <?php echo ($_POST['pagedepth']==$x?'selected="selected"':'')?> value="<?php echo $x;?>"><?php echo $x;?></option>
        <?php }?>
        </select>(*USE WITH CAUTION*)
        <br />
        
        <br />    
        <input name="submit" type="submit" />
        </form>
        <br />
        <?php echo($results?$results:'');?>
    </html>


     
    • Thanks Thanks x 13
  2. NedieNed

    NedieNed Regular Member

    Joined:
    May 27, 2011
    Messages:
    282
    Likes Received:
    75
    Impressive work. Thanks for sharing with BHW.
     
  3. amrak

    amrak Registered Member

    Joined:
    Jun 29, 2010
    Messages:
    74
    Likes Received:
    21
    Thanks! It feels good when I can help out. I've been coding to my advantage with marketing for years now and it only feels right to reciprocate on sources of income!
     
  4. vanessasweet

    vanessasweet Junior Member

    Joined:
    Jan 8, 2012
    Messages:
    184
    Likes Received:
    23
    Occupation:
    law school/webmaster
    Thanks,this is a awesome share.
     
  5. spenzo

    spenzo Senior Member

    Joined:
    Oct 20, 2009
    Messages:
    967
    Likes Received:
    553
    wow... appreciate ur contribution :)...
    rep+
     
  6. richim1

    richim1 Guest

    Great share dude...
    +1
     
  7. amrak

    amrak Registered Member

    Joined:
    Jun 29, 2010
    Messages:
    74
    Likes Received:
    21
    I forgot to mention that the emails it generates aren't 100% accurate. Sometime a couple extra letters and periods come up but its rare...
     
  8. Execute

    Execute Supreme Member

    Joined:
    Aug 30, 2010
    Messages:
    1,349
    Likes Received:
    5,017
    Location:
    United Kingdom
    Brilliant contribution thanks!
     
  9. Bubba_Hotep

    Bubba_Hotep Newbie

    Joined:
    Jan 18, 2010
    Messages:
    30
    Likes Received:
    101
    Location:
    Metro Detroit
    Thanks for this share, it is appreciated
     
  10. amrak

    amrak Registered Member

    Joined:
    Jun 29, 2010
    Messages:
    74
    Likes Received:
    21
    no problem dudes!

    rep is appreciated as usual, but I'd also like to ask if you guys can share your unique ways of using a tool like this, but only if you want to of course :D
     
  11. Bonafide Jones

    Bonafide Jones Registered Member

    Joined:
    Feb 1, 2012
    Messages:
    65
    Likes Received:
    10
    Occupation:
    Beast
    Location:
    Earth Server
    Thanks for the share!

    When I try to use a proxy this error pops up
    Warning: array_unique() [function.array-unique]: The argument should be an array in "/directory/etc."

    I used it without one and it worked, but only got admin emails :(

    Also, where's the part in the code that gives you access to my server so I can remove it? (just j/k :D but not really. I'm paranoid)

    Anyway thanks a lot for this I have absolutely NO list at all maybe this will help me get started!

    ***EDIT: nevermind about the proxies. I was copying and pasting wrong. Got it now.
     
    Last edited: Mar 15, 2012
  12. amrak

    amrak Registered Member

    Joined:
    Jun 29, 2010
    Messages:
    74
    Likes Received:
    21

    Ok, you got that error because the proxy didnt resolve. I've updated the code to remove the error messages, but I also added an error message when it cant resolve proxies. So just copy/paste the new code overwriting your old copy.

    I've tested everything with and without proxies and everything looks good.

    As for the secret code, I know you're kidding, but do you honestly think I'd try to slip that by when the code is open for everyone to see? lol

    And all this tool does is parse a search engine result page for emails and phones in the description snippets. But from what I've seen so far, it does a good job at grabbing the related contact info...

    Goodluck. Let me know if you have any questions...
     
  13. amrak

    amrak Registered Member

    Joined:
    Jun 29, 2010
    Messages:
    74
    Likes Received:
    21
    I just did an SMS blast to about 500 numbers I collected in a few minutes with this tool and made a $18.33 commission...

    Anyone else having fun? :D
     
  14. beaurock

    beaurock Newbie

    Joined:
    Apr 19, 2012
    Messages:
    12
    Likes Received:
    1
    Does this one still work?
     
  15. RushingWind

    RushingWind Elite Member

    Joined:
    Apr 6, 2013
    Messages:
    2,416
    Likes Received:
    3,333
    thanks for the share mate.