I have the 130+ million list of domains from premiumdrops.com. The list contains all the active .com, .net, .org, .biz, .info, .us domains as of feb 2011. I want to do massive data mining from these lists. So far it seems the major task is just clearing out all the crap domains, primarily domains that redirect, point to servers that are down, or domains that are parked. I've written a simple tool that can ping each domain and if there is no response then it deletes the domain from the list. Generally if the domain doesn't respond then it means the domain doesn't have any content. I've also written a tool that checks each domain to see if it is a parked domain or not, however, it doesn't have 100% accuracy. The primary reason I want to scan out all the crap is because I want to pull each domain's expiration/registration date. However doing that for 130+ million domains is quite a challenge! The only realistic way I could accomplish this is by complete automation of the process. For example a tool that goes like this: is the domain responding to ping? If no, then delete. Is the domain parked? If yes, then delete. Does the domain redirect? If yes, then delete. If the domain passes the above three checks then it pulls the whois data and stores it in excel file. I've looked for tools that do this but to say the least I haven't had much luck. I need an all in one tool. Haven't found any so I figure I have to write my own. Ultimately I'd like to have an excel file with the registration + expiration date of all, active non-junk domains.