1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Footprints for scraping Tumblr accounts

Discussion in 'Black Hat SEO' started by ensema, Apr 24, 2014.

  1. ensema

    ensema Registered Member

    Joined:
    Jul 6, 2012
    Messages:
    96
    Likes Received:
    32
    I've been scraping inactive Tumblr blogs with scrapebox for the past few days. I've got 10 x PR1 10 x PR2 and 1 x PR3 so far. I've been using the following method which I found somewhere on here...

    1) Get a keyword list, or scrape one with Scrapebox
    2) Use footprint site:tumblr.com/post/
    3) Scrape
    4) Trim to root
    5) De-Duplicate
    6) Run results through vanity checker addon
    7) Export available blogs
    8) Check PR
    9) Register blogs with PR1+

    Now if you have done this you will have noticed Tumblr's standard 404 page. It has the same little bit of text every time...

    There's nothing here.

    Whatever you were looking for doesn't currently exist at this address. Unless you were looking for this error page, in which case: Congrats! You totally found it.


    So, just as you would use the following footprint to search for Wordpress comments:

    site:.edu "You can leave a response, or trackback"

    Can we not use:

    site:tumblr.com "There's nothing here"


    Or a footprint with any other part of the standard text on their 404 page.

    I've tried it and it didn't just return available blogs so I'm either dumb, missing something, unlucky, or all three.
     
  2. StraussCan

    StraussCan Regular Member

    Joined:
    Aug 24, 2013
    Messages:
    393
    Likes Received:
    128
    Occupation:
    Pen testing
    you are all three i guess
     
  3. ScrapeboxWorker

    ScrapeboxWorker Regular Member

    Joined:
    Jul 23, 2012
    Messages:
    465
    Likes Received:
    266
    Home Page:
    First of all use
    Second thing is, not all blogs are avaiable to register.

    1. Scrape
    2. Remove dup domains
    3. Alive check
    4. Save only Dead blogs
    5. Check PR / PA
    6. Remove the ones with low PR / PA
    6. Try to register

    Keep in mind not all blogs will keep the PR you have to rebuild the content most of the time from archive.org, also most of the blogs are spammed .
     
  4. ensema

    ensema Registered Member

    Joined:
    Jul 6, 2012
    Messages:
    96
    Likes Received:
    32
    Thanks

    Appreciated the pointers dude. I understand some aren't available to register but why use the alive check over the vanity check? I've tried to do the archive.org bit where possible too.

    I'm still wondering if there is a way to use whats on the standard 404 page to pick up only dead blogs???
     
  5. ScrapeboxWorker

    ScrapeboxWorker Regular Member

    Joined:
    Jul 23, 2012
    Messages:
    465
    Likes Received:
    266
    Home Page:
    Why first alive check later vanity checker? Think a while and then tell me lol
     
  6. ensema

    ensema Registered Member

    Joined:
    Jul 6, 2012
    Messages:
    96
    Likes Received:
    32
    lol...but if the vanity checker shows it as an available name then isn't that good enough or are some just completely dead.
     
  7. ScrapeboxWorker

    ScrapeboxWorker Regular Member

    Joined:
    Jul 23, 2012
    Messages:
    465
    Likes Received:
    266
    Home Page:
    You cant register a blog that is ALIVE man
     
  8. ensema

    ensema Registered Member

    Joined:
    Jul 6, 2012
    Messages:
    96
    Likes Received:
    32
    LOL i get that. I'm not that simple dude...but if the vanity name checker says its taken then surely the blog must be alive and owned by someone else? Rendering the alive check pointless.
     
  9. ScrapeboxWorker

    ScrapeboxWorker Regular Member

    Joined:
    Jul 23, 2012
    Messages:
    465
    Likes Received:
    266
    Home Page:
    Alive check is 100x fast than vanity check..... thats a so called filter, so you wont waste so much time.
     
  10. ensema

    ensema Registered Member

    Joined:
    Jul 6, 2012
    Messages:
    96
    Likes Received:
    32
    Ok cool...I'm rerunning everything I scraped in the past few days through the alive checker now. The first few times i used the alive checker it was CRAAAZY slow now its like the things on speed.
     
  11. ScrapeboxWorker

    ScrapeboxWorker Regular Member

    Joined:
    Jul 23, 2012
    Messages:
    465
    Likes Received:
    266
    Home Page:
    Remember to put 200 for OK and follow the relocation.
     
    • Thanks Thanks x 1
  12. freerider

    freerider Newbie

    Joined:
    Aug 30, 2012
    Messages:
    31
    Likes Received:
    1
    You are right, the footprint you said don't work. You don't have to search for any 404 text, just search for any word and hope to have luck.
     
  13. ensema

    ensema Registered Member

    Joined:
    Jul 6, 2012
    Messages:
    96
    Likes Received:
    32
    I ran my scraped data through the alive checker and then then vanity checker and got a slightly different set of results. Doing it this way I found a few extra PR2 & PR1 blogs. So I was wrong the alive checker is worth it.

    Posted via Topify on Android