1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Looking for tumblr post url scrape

Discussion in 'Black Hat SEO Tools' started by 335ix, Jun 30, 2013.

  1. 335ix

    335ix Junior Member

    Joined:
    Mar 23, 2011
    Messages:
    129
    Likes Received:
    73
    Home Page:
    hey guys

    does anyone know about some simple post url scraper for tumblr? Need to get urls from my own accounts for mass reblogging.

    the only script I've found is this exporter http://tumblr2wordpress.benapps.net/ but you need to clean the code to get urls ... too much work :)
     
  2. 335ix

    335ix Junior Member

    Joined:
    Mar 23, 2011
    Messages:
    129
    Likes Received:
    73
    Home Page:
    nobody? how do you get your urls for mass promote in xtumble e.g.?
     
  3. sanmao

    sanmao Newbie

    Joined:
    May 28, 2013
    Messages:
    48
    Likes Received:
    5
    Home Page:
    tell me the rule that how do u want to scrape
    maybe i can make one for u (not for sure, if it's simple)
     
  4. 335ix

    335ix Junior Member

    Joined:
    Mar 23, 2011
    Messages:
    129
    Likes Received:
    73
    Home Page:
    just found solution :)

    I've cleaned that wordpress to tumblr exporter and now it returns the post urls only :)

    source below, just save it as index.php

    PHP:

    <?php
    set_time_limit
    (120);

    $username = isset($_REQUEST['username']) && !empty($_REQUEST['username']) ? str_replace(".tumblr.com"""strtolower($_REQUEST['username'])) : '';

    if(empty(
    $username))
    {
    ?>
    <html>
            <head>
                    <title>Tumblr's Post Scraper v 1.0 :D</title>
                    <style>
                            body
                            {
                                    font-family: Verdana, Tahoma, Arial, Helvetica, sans-serif;
                            }
                            .sep
                            {
                                    color: #CCCCCC;
                            }
                    </style>
            </head>
            <body>

                    Original Tumblr blog name [<i><b>xyz</b></i><font color="#aaa">.tumblr.com,</font> not your email address or custom domain]:<br/><br/>
                    <form method="POST" action="">
                    <input type="text" id="username" name="username" size="40"/>
                    <input type="submit" value="       Export       "/>
                    <br/><br/>
                
                    </form>
                    <br/><br/>

                    <script type="text/javascript">document.getElementById('username').focus();</script>
                    <script type="text/javascript">
                            var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
                            document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
                    </script>
                    <script type="text/javascript">
                            var pageTracker = _gat._getTracker("UA-5335675-1");
                            pageTracker._trackPageview();
                    </script>
            </body>
    </html>
    <?php
    die();
    }
    $type $_REQUEST["type"];
    $i 0;

    $posts = array();
    $feed '';
    $allTags = array();

    do
    {
            
    $url 'http://'.$username.'.tumblr.com/api/read?start='$i '&num=50';
            
    $file file_get_contents($url);
            
    $feed = new SimpleXMLElement($file);
            
    $posts array_merge($posts$feed->xpath('posts//post'));
            
    $i = (int)$feed->posts->attributes()->start 50;
    }while(
    $i <= (int)$feed->posts["total"]);

    function 
    formatForWP($str)
    {
            global 
    $type;
            switch(
    $type)
            {
                    case 
    "wordpress.com":
                            
    $str formatVideoForWP(formatImageForWP($str));
            }
            return 
    $str;
    }
    function 
    formatImageForWP($str)
    {
            if(
    preg_match_all('/(<p>)?\s*(<img[^>]*\/?>)\s*(<\/p>)?/'$str$matches))
            {
                    for(
    $i=0;$i<sizeof($matches[0]);$i++)
                    {
                            
    $str str_replace($matches[0][$i], str_replace('/>',' alt=""/>'$matches[2][$i]), $str);
                    }
            }
            return 
    $str;
    }

    function 
    formatVideoForWP($str)
    {
            if(
    preg_match_all('/<object[\s\S]*src="([\S\s]*?)&[\s\S]*"[\s\S]*<\/object>/'$str$matches))
            {
                    for(
    $i=0;$i<sizeof($matches);$i++)
                    {
                            if((
    strpos($matches[1][$i], 'youtube.com') !== false))
                            {
                                    
    $str str_replace($matches[0][$i], '[youtube='.$matches[1][$i].']'$str);
                            }
                    }
            }
            return 
    $str;
    }

    function 
    removeWeirdChars($str)
    {
            return 
    trim(preg_replace('{(-)\1+}','$1',preg_replace('/[^a-zA-Z0-9-]/'''str_replace(' ','-',strtolower(strip_tags($str))))),'-');
    }

    function 
    getTags($post)
    {
            if(
    $post->attributes()->type)
            {
                    echo 
    "<category><![CDATA[" $post->attributes()->type "]]></category>\n";
                    echo 
    "\t\t<category domain=\"category\" nicename=\"" $post->attributes()->type "\"><![CDATA[" $post->attributes()->type "]]></category>\n";
            }
            else
            {
                    echo 
    "<category><![CDATA[Uncategorized]]></category>\n";
                    echo 
    "\t\t<category domain=\"category\" nicename=\"uncategorized\"><![CDATA[Uncategorized]]></category>\n";
            }
            if(
    $post->tag)
            {
                    foreach(
    $post->tag as $tag)
                    {
                            echo 
    "\t\t<category domain=\"tag\"><![CDATA[$tag]]></category>\n";
                            echo 
    "\t\t<category domain=\"tag\" nicename=\"" removeWeirdChars($tag) . "\"><![CDATA[$tag]]></category>\n";
                            
    addTag((String)$tag);
                    }
            }
    }

    function 
    addTag($tag)
    {
            global 
    $allTags;
            if(!
    in_array($tag$allTags))
                    
    $allTags[] = $tag;
    }

    function 
    getAllTags()
    {
            global 
    $allTags;
            foreach(
    $allTags as $tag)
            {
                    echo 
    "\t<wp:tag><wp:tag_slug>"removeWeirdChars($tag) . "</wp:tag_slug><wp:tag_name><![CDATA[$tag]]></wp:tag_name></wp:tag>\n";
            }
    }

    header('content-type: text/xml');
    header("content-disposition: attachment; filename=tumblr_$username.xml");
    ?>
    <?php
    ob_start
    ();
            foreach(
    $posts as $post)
            {
    ?>

    <?php
                    
    switch($post->attributes()->type)
                    {
                            case 
    "regular":
    ?>
    <?php 
    echo $post->attributes()->url ?>
    <?php
                                    
    break;
                            case 
    "photo":
    ?>
    <?php 
    echo $post->attributes()->url ?>
    <?php
                                    
    break;
                            case 
    "quote":
    ?>
    <?php 
    echo $post->attributes()->url ?>
    <?php
                                    
    break;
                            case 
    "link":
    ?>
    <?php 
    echo $post->attributes()->url ?>
    <?php
                                    
    break;
                            case 
    "conversation":
    ?>
    <?php 
    echo $post->attributes()->url ?>
    <?php
                                    
    break;
                            case 
    "video":
    ?>
    <?php 
    echo $post->attributes()->url ?>
    <?php
                                    
    break;
                            case 
    "audio":
    ?>
    <?php 
    echo $post->attributes()->url ?>
    <?php
                                    
    break;
                    }
    ?>
    <?php
            
    }
            
    $out ob_get_contents();
            
    ob_end_clean();
            
    getAllTags();
            echo 
    $out;
    ?>

     
    • Thanks Thanks x 2
  5. Orlin.b

    Orlin.b Newbie

    Joined:
    Jul 4, 2013
    Messages:
    5
    Likes Received:
    5
    Location:
    Bulgaria
    tnx bro, here something from me
    PHP:
    <?php $link=$_GET['lnk'];$data file_get_contents($link);$tmimageurl=explode(' <!-- START POSTS -->                
                            <li class="post group" style="padding:0">  
                                                        <section class="top media" style="display:block;"><img src="'
    $data);$tmimageurl=explode('"'$tmimageurl[1]);$tmimageurl=$tmimageurl[0];echo $tmimageurl;$tmimagetag=explode('<div class="cont"><p>'$data);$tmimagetag=explode('</p>'$tmimagetag[1]);$tmimagetag=$tmimagetag[0];echo'<br/>';echo $tmimagetag;//save img with the tag name$my_file = 'tumblrimg/'.$tmimagetag.'.jpg';$handle = fopen($my_file, 'w') or die('Cannot open file:  '.$my_file);$data = file_get_contents($tmimageurl);fwrite($handle, $data);//save end?>
     
    Last edited: Jul 8, 2013
  6. gullsinn

    gullsinn Jr. VIP Jr. VIP Premium Member

    Joined:
    Dec 24, 2009
    Messages:
    2,429
    Likes Received:
    2,210
    Gender:
    Male
    Occupation:
    Jobless :D
    Location:
    Graveyard
    Home Page:
    If you have Scrapebox, This task can be done with the help of that.
     
  7. zenoGlitch

    zenoGlitch Executive VIP Jr. VIP Premium Member

    Joined:
    Jun 25, 2009
    Messages:
    963
    Likes Received:
    1,511
    Location:
    Thailand
    Talking to gimme4free now about implementing this feature in to make it easier to collect your accounts post URL's.