How to scan or discover removed tumbrl blogs

templaries

Newbie
Joined
Feb 24, 2012
Messages
15
Reaction score
0
Hello

I am looking for any method to discover removed tumblr blogs. I put here a basic strategy but I would to check here if it could be a wasted of time.

- List for tumbrl blogs from a keyword using google or bing.
- For each tumbrl
- Add to a list the urls found to other tumblr blogs​

- For each url of the previous list
- Check if the url returns 404 or any other code error or any message like blogger where you can try to take the url again.​

Regargs
 
My method -

Use Ahrefs or Majestic to check for any linked pages on a tumblr subdomain.

Then, take of ALL those subdomains you can find (because they HAVE to be indexed and have links pointing to them for Ahrefs to find) then run all the subdomains through a broken link checker.
 
Good idea. But I understand that for this method you need a premium account for one of that services. In the other hand, how do you tell ahrefs that? Do you start with a set of tumblr blogs?

I would to know if exist tumblr blogs that contain links to removed tumblr blogs, like blogger site in each profile.

Sorry for my english.
 
Last edited:
if you have scrapebox or gscraper or any scraper just do a search for:

site:tumblr.com

add a bunch of keywords and scrape

then trim all urls, remove duplicates and run them through a live checker (you can use either the sb addon or xenu link sleuth)

select all the dead ones, run the pr checker and there you go, you have yourself a bunch of high pr tumblrs
 
So all the urls are extracted from Google, so it is a need to have proxies. I think that the point is to get a set of tumblr blogs as seed and then scrape them searching other tumblr blogs inside (in the post, in blog roll zones, etc)

I have not using scrapebox yet, by the moment I make my own bots to do these jobs.
 
Hi,

I do this with a 2 step process.

1. Scrapebox. Find a dead tumblr blog. Find keywords on it. Create a footprint. That's really simple. Scrape all the blogs. Filter - remove duplicates, should only have domain .tumblr.com on them, etc.
2. Live check them all in Scrapebox. You've done a live check - the dead ones are *potentially* available.
3. Run them through TumblingJazz to see if the actually are possible to sign up.

Pretty easy.
 
So scrapebox takes links from Google and also from the tumblr blogs that are processing? I suppose that scrapebox could return thousand of tumblr blogs in that process.

Does a tumblr blog return 404 error code when is available?

Edit: Yes, they do. (black-hat-seo/646896-tumblr-returns-404-meanwhile-not-available-registration.html)
 
Last edited:
you only need scrapebox.

scrape shit out of tumblr > remove dupes > use alive check > short out 404 ulrs > use vernity checker.

-=-
 
So scrapebox takes links from Google and also from the tumblr blogs that are processing? I suppose that scrapebox could return thousand of tumblr blogs in that process.

Does a tumblr blog return 404 error code when is available?

Edit: Yes, they do. (black-hat-seo/646896-tumblr-returns-404-meanwhile-not-available-registration.html)

Yes, with scrapbox i usually get at least 100k tumblr urls which turn into 50k unique tumblr accounts. Most of them are still alive. Only close to 5% of them are dead. From those 5% less than half are available for registration. If you can build your own bot you'd better do it as the vanity checker inside scrapebox is not that trustworthy.

And yes, you need tons of proxies to scrape properly, i suggest adding a huge list of public proxies - that's what i'm doing and i'm reaching 180-200 urls / sec.
 
Thank you, that the awnser what I was looking for. I am going to find many proxy then. It's scary that outside there are many players with thousands of proxies fishing tumblr blogs, hehe.
 
after spending a week with scrapebox and gscraper i gave up , i think there is a software sold in the forum that can do better in identifing the available for registration, 404 does not mean is available!!!!
i
 
after spending a week with scrapebox and gscraper i gave up , i think there is a software sold in the forum that can do better in identifing the available for registration, 404 does not mean is available!!!!
i

that soft is too like a kid before sb (it just have too many bugs)

i scrape 900k+ urls , then trim to root and then remove dupe domains.
now use alive check (more then half of 404 urls are available to be registered)
after some hours i have a list of 100+ available tumblrs.

it's not that difficult , all depends upon the way of scraping.
-=-
 
Back
Top