1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Instagram Web Scraper

Discussion in 'PHP & Perl' started by Professeur, Nov 1, 2016.

  1. Professeur

    Professeur BANNED BANNED

    Joined:
    Oct 29, 2016
    Messages:
    3
    Likes Received:
    10
    I just found this a moment ago while browsing through github.

    Instagram Web Scraper by Cosmocatalano (Github User)

    Page
    Code:
    https://gist.github.com/cosmocatalano/4544576


    Code

    PHP:
    <?php
    //returns a big old hunk of JSON from a non-private IG account page.
    function scrape_insta($username) {
        
    $insta_source file_get_contents('http://instagram.com/'.$username);
        
    $shards explode('window._sharedData = '$insta_source);
        
    $insta_json explode(';</script>'$shards[1]);
        
    $insta_array json_decode($insta_json[0], TRUE);
        return 
    $insta_array;
    }
    //Supply a username
    $my_account 'cosmocatalano';
    //Do the deed
    $results_array scrape_insta($my_account);
    //An example of where to go from there
    $latest_array $results_array['entry_data']['ProfilePage'][0]['user']['media']['nodes'][0];
    echo 
    'Latest Photo:<br/>';
    echo 
    '<a href="http://instagram.com/p/'.$latest_array['code'].'"><img src="'.$latest_array['display_src'].'"></a></br>';
    echo 
    'Likes: '.$latest_array['likes']['count'].' - Comments: '.$latest_array['comments']['count'].'<br/>';
    /* BAH! An Instagram site redesign in June 2015 broke quick retrieval of captions, locations and some other stuff.
    echo 'Taken at '.$latest_array['location']['name'].'<br/>';
    //Heck, lets compare it to a useful API, just for kicks.
    echo '<img src="http://maps.googleapis.com/maps/api/staticmap?markers=color:red%7Clabel:X%7C'.$latest_array['location']['latitude'].','.$latest_array['location']['longitude'].'&zoom=13&size=300x150&sensor=false">';
    ?>
    */
    Other shares on the page:

    aboustayyef - "Here's a quick class to only get the image from an Instagram image url:"
    PHP:
    <?php

    class InstagramScraper
    {
        protected 
    $content;
        function 
    __construct($url)
        {
            
    $this->content = @file_get_contents($url);
        }

        public function 
    image(){
            
    preg_match('#<meta +property=\\"og:image\\" +content=\\"(http.+?\.jpg)\\"#'$this->content$result);
            return 
    $result[1];
        }
    }
    ?>
    lmj0011 - "if you want to get back a certain number of images, just increment the $results_array like
    so:"
    PHP:
     for($cnt=0$cnt 20$cnt++)
    {
     
    $latest_array $results_array['entry_data']['ProfilePage'][0]['user']['media']['nodes'][$cnt];

     echo 
    'Latest Photo:<br/>';
     echo 
    '<a href="http://instagram.com/p/'.$latest_array['code'].'"><img src="'.$latest_array['display_src'].'"></a></br>';
     echo 
    'Likes: '.$latest_array['likes']['count'].' - Comments: '.$latest_array['comments']['count'].'<br/>';
    }
    hardiksondagar - "I've made web scrapper to get user X's followers using javascript that stores followers and exports in csv files. All you need to do is to keep scrolling."
    Code:
    /**
     *
     * Instagram Follower Web Scrapper
     *
     * Steps to use.
     * 1. Open instagram user's profile in browser https://www.instagram.com/tvfpitchers/
     * 2. Open console ( press F12 in chrome ) and paste all the code below
     * 3. Click on followers button and load all the followers
     * 4. Call function downloadAsCsv() by writing "downloadAsCsv()" in console to download csv file containing user's all the followers .
     *
     * @author : Hardik Sondagar <[email protected]>
     *
     */
    
    var followers = [];
    
    (function(XHR) {
        "use strict";
    
        var stats = [];
        var timeoutId = null;
    
        var open = XHR.prototype.open;
        var send = XHR.prototype.send;
    
        XHR.prototype.open = function(method, url, async, user, pass) {
            this._url = url;
            open.call(this, method, url, async, user, pass);
        };
    
        XHR.prototype.send = function(data) {
            var self = this;
            var start;
            var oldOnReadyStateChange;
            var url = this._url;
    
            function onReadyStateChange() {
                if(self.readyState == 4 && url == 'https://www.instagram.com/query/') {
    
                  var response = JSON.parse(self.response);
                  followers = followers.concat(response.followed_by.nodes);
    
                }
    
                if(oldOnReadyStateChange) {
                    oldOnReadyStateChange();
                }
            }
    
            if(!this.noIntercept) {
                start = new Date();
    
                if(this.addEventListener) {
                    this.addEventListener("readystatechange", onReadyStateChange, false);
                } else {
                    oldOnReadyStateChange = this.onreadystatechange;
                    this.onreadystatechange = onReadyStateChange;
                }
            }
    
            send.call(this, data);
        }
    })(XMLHttpRequest);
    
    function downloadAsCsv() {
    
    
    
        var csvContent = "data:text/csv;charset=utf-8,";
    
        var header = "Username,Requested_by_viewer,Followed_by_viewer,Profile_pic_url,Full Name,is_verified,Id\n";
        csvContent += header;
    
        followers.forEach(function(infoArray, index){
    
            var data = $.map(infoArray, function(value) {
            return [value];
            });
    
            dataString = data.join(",");
            csvContent += index < followers.length ? dataString+ "\n" : dataString;
    
        });
    
        var encodedUri = encodeURI(csvContent);
        var link = document.createElement("a");
        link.setAttribute("href", encodedUri);
    
        var pathArray = window.location.pathname.split('/');
    
        var milliseconds = (new Date).getTime();
        var filename = 'followers.'+ milliseconds +'.csv';
    
        if(pathArray && pathArray.length > 1) {
            filename = pathArray[1] + '.' + milliseconds +'.csv';
        }
        link.setAttribute("download",filename);
        document.body.appendChild(link); // Required for FF
    
        link.click(); // This will download the data file named "my_data.csv".
    None of the code posted on this thread is mine! All credit goes to:
    • cosmocatalano
    Code:
    https://gist.github.com/cosmocatalano
    • aboustayyef
    Code:
    https://gist.github.com/aboustayyef
    • lmj0011
    Code:
    https://gist.github.com/lmj0011
    • hardiksondagar
    Code:
    https://gist.github.com/hardiksondagar