1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

[RELEASE][FREE] Super-fast Tumblr Scraper (30 Pages in 20~ seconds)

Discussion in 'Black Hat SEO Tools' started by matessim, Nov 8, 2012.

  1. matessim

    matessim Junior Member

    Joined:
    Nov 22, 2008
    Messages:
    164
    Likes Received:
    72
    Occupation:
    Being funny and kind to puppies
    Location:
    UT 2003
    SUPER-FAST TUMBLR SCRAPING FOR ALL, REJOICE

    Hey Guys!,
    This thread is a followup to another thread I've made here, where i showed off the non-concurrent version of what i had here which i made in Python, i decided to rewrite it last night in Java and added some sweet concurrency to it, it currently benches 30 Pages in around 20 seconds with 25 downloading threads and 10 scraping threads, It can probably do quite a bit more depending on your network and computer, if it's a problem PM me and i'll tweak it.

    Note, it saves all the images to a directory where the jar is stored at called images, it names them serially from 1 to x and preserves the original file extension. BE SURE TO MOVE THE IMAGES OVER AFTER EVERY RUN OR THEY WILL BE OVERWRITTEN next time you download a different tumblr blog, i will also fix this later but i wanted to bring this out to you guys asap.

    Here is a quick and dirty Video POC of me downloading 450~ corgi photos in 20 seconds:


    IF THE SCRAPER DOWNLOADS SOMETHING IT SHOULDN'T (NOT a IMAGE POST, things like Favicons, Avatars,logos, w/e) PLEASE LET ME KNOW, so far from my testing it manages to ignore everything it's suppose to ignore.

    Instructions: You need Java SE 1.7 and your Path setup correctly (If its not,Google "Java Path set up"), now open a CMD window and browse to where ever you have the file, to run it, use the command: java -jar <filename> , replace the <filename> with whatever you called the jar if you renamed it or something, if it has a space in the name make sure you surround it like this: "java -jar "Matessim Tumblr Ownage.jar"

    Most importantly, if you guys want updates and new versions, please reply here and tell me what's wrong and what you want, I will update what i will be asked for, i don't use tumblr myself at all and i built this tool for BHW, not myself.

    Have Fun with this! ;)

    Mediafire Download Link
    VirusTotal

    And if you feel like donating 5$ or something to aid a hungry programmer, PM me and i'll give you my PP Address :).
     
    • Thanks Thanks x 24
    Last edited by a moderator: May 18, 2016
  2. DanTe_0101

    DanTe_0101 Senior Member

    Joined:
    Mar 2, 2012
    Messages:
    863
    Likes Received:
    706
    Location:
    Fucksville
    thanks for the bot but first you must upload a VT Scan...
     
    • Thanks Thanks x 1
  3. matessim

    matessim Junior Member

    Joined:
    Nov 22, 2008
    Messages:
    164
    Likes Received:
    72
    Occupation:
    Being funny and kind to puppies
    Location:
    UT 2003
    if i upload the scan myself how does it make sense? i could fake it... anyway, i'll update with a scan in a second.

    Also would be glad if someone else can upload their scan to show my scan is legitimate.

    EDIT:Scan
     
    Last edited: Nov 8, 2012
  4. DanTe_0101

    DanTe_0101 Senior Member

    Joined:
    Mar 2, 2012
    Messages:
    863
    Likes Received:
    706
    Location:
    Fucksville
    You must upload Virustotal scan NOT you PC Scan lol...
    go to virustotal.com and scan this file and then post the scan url here :)
     
    • Thanks Thanks x 1
  5. matessim

    matessim Junior Member

    Joined:
    Nov 22, 2008
    Messages:
    164
    Likes Received:
    72
    Occupation:
    Being funny and kind to puppies
    Location:
    UT 2003
    Yeah... i know that, but i could just rename a random file and upload that and claim its that file (unless people here check the SHA256/MD5 Hashes of their files, which i assume most don't).

    Anyway, i uploaded the scan, other people are also welcome to scan their files (or decompile it if you want to play with it, i didn't take any effort to obfuscate my code).

    EDIT: Well, i think it's safe to say my code scales, i just downloaded 2,000 images from the i love charts tumblr blog in like 60 seconds(50Mbps Internet Speed) with 500 networking threads (I did have a few timeouts and i still don't have the retry mechanism), the corgi pictures in there are the time outs... note that this shouldn't happen with the release version, i put 25x as much networking threads just to stress it, i don't want to enable modification in the release version for over 25 netthreads until i build in the retry mechanism.
     
    Last edited: Nov 8, 2012
  6. SpecialOne

    SpecialOne Registered Member

    Joined:
    Jan 12, 2011
    Messages:
    65
    Likes Received:
    21
    Too bad I can't use it and it seems like good program... It sucks for people like me which don't have any programming skills (I am familiar just with HTML and CSS). I tried to install some Java from Oracle, watched some videos on YouTube for that "path" etc. but I don't understand a damn thing so I decided to quit. Better that then I end with a dead PC. Thx anyway!
     
  7. matessim

    matessim Junior Member

    Joined:
    Nov 22, 2008
    Messages:
    164
    Likes Received:
    72
    Occupation:
    Being funny and kind to puppies
    Location:
    UT 2003
    What is the problem? If your on windows 7, go to start menu, type 'envi', press Edit the system variables blah blah, press the right bottom button that says env variables, Path is in the lower box, add a ; and after it the path to your java bin, in my case you go there and add:
    ;C:\Program Files (x86)\Java\jre7\bin

    to the path.

    also, it might already be set up for you, open a cmd window (press Windows Key + R and type cmd and enter) and write "java -version" if it prints your java version, you're good to go.

    need more help? PM Me, i can help you tomorrow over TeamViewer if you want (please try to fix it on your own though, since my time is pretty limited, but if it won't work, i'll help you).
     
  8. m0use

    m0use Registered Member

    Joined:
    Nov 2, 2012
    Messages:
    54
    Likes Received:
    49
    sorry for being newb about this thing,

    I followed everything and tried to run the proggy but i got this error and didn't saved any picture,


    can anybody help?
     

    Attached Files:

  9. AzonGeek

    AzonGeek Jr. VIP Jr. VIP Premium Member

    Joined:
    May 6, 2012
    Messages:
    1,174
    Likes Received:
    510
    Thanks matessim
    Testing it, will come back for review
     
  10. SpecialOne

    SpecialOne Registered Member

    Joined:
    Jan 12, 2011
    Messages:
    65
    Likes Received:
    21
    Hey Matessim! Believe it or not I was able to make your program work and it is not that hard like I previously thought. So I will try to write down some instructions to help others too which don't know anything about programming like me.

    Instructions:


    1. Download his file in first post "Matessim tumblr Scraper.jar" to desktop
    2. I renamed it to something more easier "tumblr.jar
    3. Download Java Environment 7 on Oracle's official page (type in Google "java se 1.7" and click first Oracle google result link ---> choose windows version)
    4. Install it
    5. Go to drive C: and create new folder called "Scrape" or whatever you like --- > C:\Scrape
    6. Place "tumblr.jar" from desktop to C:\Scrape
    7. Go to Youtube and find this video "watch?v=RkycwpimOEc" (*mute the video while watching*)
    8. It basically shows you how to setup that Java path
    9. Go to Start ---> type CMD
    10. Now for me my cmd showed something like this C:\Users\You_Computer_Name> and you should navigate to C:\Scrape
    11. To navigate to that folder type cd .. (yes cd + space + two dots) , repeat that again
    12. You should be now in C:\
    13. From C:\ you need to get to C:\Scrape therefore you will need to type "cd scrape"
    14. Now you should see C:\Scrape in your CMD line
    15. Type java -jar tumblr.jar
    16. Type your Tumblr URL you wish to scrape images from and number of pages
    17. Check saved images in C:\Scrape\Images

    Thx for your help and program matessin!
     
    • Thanks Thanks x 4
  11. kinaks

    kinaks Junior Member

    Joined:
    Mar 27, 2012
    Messages:
    179
    Likes Received:
    44
    Location:
    Philippines
    woah just watch the video and that's crazy fast! thanks for the share im going to try it :D

    keep it up mate!

    parley? :D
     
    • Thanks Thanks x 1
  12. uuhoever

    uuhoever Registered Member

    Joined:
    Sep 26, 2012
    Messages:
    88
    Likes Received:
    14
    Remember that when it asks for URL to scrape you have to write "http://scrapethis.tumblr.com"

    Don't forget the http://

    Got it to work. Blazing fast.

    Suggestion: Is there a way to find how many tumblr pages there are? Or can the program have a feature to scan the tumblr URL and set it so that it downloads everything?

    Thanks.
     
    • Thanks Thanks x 1
    Last edited: Nov 9, 2012
  13. queenmery

    queenmery Power Member

    Joined:
    Jan 18, 2011
    Messages:
    501
    Likes Received:
    30
    Occupation:
    Student
    Location:
    BANGLADESH
    Actually what can i do about that? I do not have much idea about that thing.
     
  14. matessim

    matessim Junior Member

    Joined:
    Nov 22, 2008
    Messages:
    164
    Likes Received:
    72
    Occupation:
    Being funny and kind to puppies
    Location:
    UT 2003
    Just put in a ton of pages. Its fine

    Yeah, you need the http, next release will have retry and will let you choose dl thread count. Also lets you decide on directory
     
    Last edited: Nov 9, 2012
  15. qwertys

    qwertys Registered Member

    Joined:
    Oct 21, 2011
    Messages:
    76
    Likes Received:
    10
    How do you put in a ton of pages? Space or comma separated or running multiple invocations of the app?
     
  16. matessim

    matessim Junior Member

    Joined:
    Nov 22, 2008
    Messages:
    164
    Likes Received:
    72
    Occupation:
    Being funny and kind to puppies
    Location:
    UT 2003

    The program guides you when you run it and asks you how many pages to scrape. a ton meaning a large number, the program will go to each of these pages and try to scrape them, if you put a number larger than the pages the website has it's fine, it won't crash and burn but just try to fetch them also.
     
    Last edited: Nov 10, 2012
  17. matessim

    matessim Junior Member

    Joined:
    Nov 22, 2008
    Messages:
    164
    Likes Received:
    72
    Occupation:
    Being funny and kind to puppies
    Location:
    UT 2003
    if you are asking how i make money from giving this away.. I don't
     
  18. Archi

    Archi Newbie

    Joined:
    Sep 7, 2011
    Messages:
    23
    Likes Received:
    1
    Occupation:
    Driver Student
    Location:
    Germany/ B
    Works for me. rep+1. I have a few remarks though: On tumblr there are offen two different sizes for pictures: the front page view with ...500.jpg and the larger view with ...1280.jpg. This program takes the smaller sizes from the front page. Is there a way to implement downloading the full size pictures? How many pages to download? When i enter 30 pages I get 41 pictures, with 100 -> 111 on the same tumblr site. from a quick overview there are 200 images per month so 10000 pages should be enough for this particular site. But on another tumblr page 200 pages equal 2700 pictures. I also noticed the program "ends" without notification or command prompt it just hangs, so one has to close the window process and open a new one, typing everything again. Is there a way around this?
     
  19. Empower Network

    Empower Network Registered Member

    Joined:
    Sep 29, 2012
    Messages:
    82
    Likes Received:
    22
    Home Page:
    looks good... gonna download and test it from home later... I currently use another scraper but it's not as fast as yours (when the claimed speed is correct)

    nice work and thanks in advance!
     
  20. matessim

    matessim Junior Member

    Joined:
    Nov 22, 2008
    Messages:
    164
    Likes Received:
    72
    Occupation:
    Being funny and kind to puppies
    Location:
    UT 2003
    Thanks for the feedback, ill make sure these problems get fixed