Anyone doing any massive analysis of domains out there?

Discussion in 'Domain Names & Parking' started by killer2021, Mar 12, 2011.

  1. killer2021

    killer2021 Regular Member

    Sep 9, 2010
    Likes Received:
    I have the 130+ million list of domains from The list contains all the active .com, .net, .org, .biz, .info, .us domains as of feb 2011.

    I want to do massive data mining from these lists. So far it seems the major task is just clearing out all the crap domains, primarily domains that redirect, point to servers that are down, or domains that are parked.

    I've written a simple tool that can ping each domain and if there is no response then it deletes the domain from the list. Generally if the domain doesn't respond then it means the domain doesn't have any content.

    I've also written a tool that checks each domain to see if it is a parked domain or not, however, it doesn't have 100% accuracy.

    The primary reason I want to scan out all the crap is because I want to pull each domain's expiration/registration date. However doing that for 130+ million domains is quite a challenge!

    The only realistic way I could accomplish this is by complete automation of the process. For example a tool that goes like this:
    is the domain responding to ping? If no, then delete.
    Is the domain parked? If yes, then delete.
    Does the domain redirect? If yes, then delete.

    If the domain passes the above three checks then it pulls the whois data and stores it in excel file.

    I've looked for tools that do this but to say the least I haven't had much luck. I need an all in one tool. Haven't found any so I figure I have to write my own.

    Ultimately I'd like to have an excel file with the registration + expiration date of all, active non-junk domains.
    Last edited: Mar 12, 2011
  2. Douffy

    Douffy Newbie

    Sep 22, 2011
    Likes Received:
    Full time offliner
    Essex, UK
    That is a cool idea. Did you manage to make it happen? If so pls drop me a pm as I would be interested and I probably won't see this thread again....