Need a simple data matching program made

Discussion in 'Hire a Freelancer' started by newone32, Dec 7, 2013.

    Hi. I am looking for a simple data matching program. I have a master list of about 7 million records with Fname, Lname, Address, County Code and City that I want to be able to match to other list with Fname, Lname, Address and City. The files to be matched are user generated so they have errors in them and just some bad data and not every record actually has a match. I have my own system but need this stepped up. I need to be able to run at least 100k records against the 7 million at once and hopefully far more.

    Here are my requirements:

    • I need to be able to swap out the master list of 7 million names from time to time so a simple way pointing the program to the new updated files is critical.
    • The program should be able to clean the data before, which I am very familiar with and can provide a detail description of what needs to be done. It is not a whole lot; road to rd, street to st, removing middle initials, ect.
    • I prefer fuzzy matching algorithms to SoundEx matching because the errors are usually typing errors but if there was a way for it to know that Steve is short for Stephen that would be great.
    • The matching should be weighted or ran in an order. A big problem starts with name duplication, there is a thousand john smiths right, so you narrow from the whole set to the county level and then try to match. The catch is the data to be match has no county code so you have to back track from the closest matched city. This is at least the way that has worked best for me but I would be very open to hearing other thoughts.
    • When a match occurs a % or likelihood is needed and a unique a identifier needs to be carried over from the master list.
    • CSV file format

    Please respond or PM if you are interested.
    Dec 7, 2013
    I can create this bot. Shoot me a PM.
    This is easy job. Done hundreds of such programs before.

    Talk to me on skype: botrockets