1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Programmer's survey

Discussion in 'General Programming Chat' started by timothywcrane, May 4, 2011.

  1. timothywcrane

    timothywcrane Power Member

    Joined:
    Apr 25, 2009
    Messages:
    590
    Likes Received:
    236
    Occupation:
    Internet Promotion Management
    Location:
    USA
    Home Page:
    I am surveying the opinion of programmers, script kiddies, and all cut and paste coders (the category I fall into).

    I have a little over 350,000 csv files, with 5 columns of data, and I am only in need of the data in column two.

    I obviously need to find a way to programatically parse the files, as using WinAutomation or Sikuli to open each file, remove columns, save, then move file, in an endless loop for over a quarter of a million times would take years and never fail to crash under the load daily.

    I have so far looked at a pure python solution, using php, and even bash under gnuwin.

    I simply need to parse the second column into a separate file and dump the rest. I do not even need to have incremental naming for I can use an autorenaming tool if needed.

    What language would you recommend for doing this task as raw and gritty as possible? Any help would be appreciated. I can RTFM after that and get the job done.

    Thanks
     
  2. darshan1994

    darshan1994 BANNED BANNED

    Joined:
    Oct 9, 2009
    Messages:
    654
    Likes Received:
    318
    If its things like this
    name,email,etc,etc if they are separated by comma which I am sure they will be. Just download Excel and open the file in there, highlight the coloumn you want and copy and paste done.


    However for the 350k files you have, you can easily do this in any 3rd generation language by using things like string split etc. The part that I think will be long is routing through each file, if they have random names hard to make your program read it. If they are named like 1.txt, 2.txt. Its pretty easy.

    My Recommandation is Java/c# (I already do such things in java so I know its doable for sure pretty easily)
     
    Last edited: May 4, 2011
  3. timothywcrane

    timothywcrane Power Member

    Joined:
    Apr 25, 2009
    Messages:
    590
    Likes Received:
    236
    Occupation:
    Internet Promotion Management
    Location:
    USA
    Home Page:
    "using WinAutomation or Sikuli to open each file, remove columns, save, then move file, in an endless loop for over a quarter of a million times would take years and never fail to crash under the load daily."

    What would you like to get paid to manually open almost a half a million csv files by hand, remove four of the five columns, resave the file, move it to another folder, then on to the next one?

    I understand it seems so simple, and so I thought also, but now I have carpel tunnel and might even simply looking into importing the csvs into mysql, but as I have gsplit them from much larger files (too large for Excel or OOO, and gnumeric was far too slow, ther are now no headings in the top row for easy importing.

    Don't get me wrong, your idea was also my first, but I have since learned to regret and hate ever knowing rightclick-d right aroow, rightclick-d ...
     
  4. drey2k

    drey2k Power Member

    Joined:
    Jan 4, 2009
    Messages:
    551
    Likes Received:
    169
    Occupation:
    Finance guy
    Location:
    USSR 1943
    You can do this pretty easily with a VBA macro inside Excel... no need to find a complex solution with a language foreign to Excel.
     
  5. hackNstuff

    hackNstuff BANNED BANNED Premium Member

    Joined:
    Jun 10, 2010
    Messages:
    136
    Likes Received:
    15
    People use perl all the time for things like this, for text manipulation it's one of the most efficient methods. How large are these files you are looking at? I have a script sitting around that I could easily modify to do what you're talking about. PM me if you want help, I could set one of my servers at this and have it done in no time. I'd estimate it would take 15 min worth of setup, and maybe a couple of hours to process automatically.
    No autorenaming, if you're going to be automate it do it right.

    //loop through each file in the folder
    //parse file for data, save to external file
    //on completion, move original file to secondary folder
    //start the loop again
    You can run something like that in batches of 10 at a time and if it seems stable keep upping the number or just let it run until it can't find anything to do. Easy work, you have me curious now.
     
  6. timothywcrane

    timothywcrane Power Member

    Joined:
    Apr 25, 2009
    Messages:
    590
    Likes Received:
    236
    Occupation:
    Internet Promotion Management
    Location:
    USA
    Home Page:
    Sounds like a reasonable solution. I have been using Sikuli, a jython visual IDE and the time to process and digital overhead was just killing me. I sent you a PM, Now off to read up on some perl.

    turned out to have the right answer.
    Code:
    http://www.mind-pioneer.com/services/719_Text_file_parser.html
    just had to install the gui and use the correct replace string for tab delimted files. Will still send over a file to hacknstuff, as the pure code way to go is THE way to go, and I am always burning to learn more in that direction.
     
    Last edited: May 5, 2011
  7. gnote

    gnote Registered Member

    Joined:
    Mar 10, 2009
    Messages:
    80
    Likes Received:
    6
    Occupation:
    Programmer
    Location:
    USA
    Exporting to CSV would make it simple to parse in any language

    In .NET you can use easily load spreadsheet files using Data connections

    Code:
    var fileName = string.Format("{0}\\fileNameHere", Directory.GetCurrentDirectory());
    var connectionString = string.Format("Provider=Microsoft.Jet.OLEDB.4.0; data source={0}; Extended Properties=Excel 8.0;", fileName);
    
    var adapter = new OleDbDataAdapter("SELECT * FROM [workSheetNameHere$]", connectionString);
    var ds = new DataSet();
    
    adapter.Fill(ds, "anyNameHere");
    
    DataTable data = ds.Tables["anyNameHere"];
    
    
    then you simply numerate the data table you filled

    there's a few different ways
    http://stackoverflow.com/questions/15828/reading-excel-files-from-c
     
  8. timothywcrane

    timothywcrane Power Member

    Joined:
    Apr 25, 2009
    Messages:
    590
    Likes Received:
    236
    Occupation:
    Internet Promotion Management
    Location:
    USA
    Home Page:
    Thank you for all of the responses. I have it figured out using perl based tool. I am now having a problem figuring out how to import csv data into MySQL when the file name to be imported is the index field. lool. It is always something...
     
  9. timothywcrane

    timothywcrane Power Member

    Joined:
    Apr 25, 2009
    Messages:
    590
    Likes Received:
    236
    Occupation:
    Internet Promotion Management
    Location:
    USA
    Home Page:
  10. CraigLackie

    CraigLackie Newbie

    Joined:
    May 22, 2011
    Messages:
    13
    Likes Received:
    1
    Occupation:
    Freelancer/CEO
    Location:
    Exeter
    I'd say Excel or similar would be fine for this.

    Craig