1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

[Script] Splitting text file using Windows PowerShell

Discussion in 'General Scripting Chat' started by seocompanymail, Feb 23, 2014.

  1. seocompanymail

    seocompanymail Regular Member

    Joined:
    Nov 15, 2010
    Messages:
    295
    Likes Received:
    116
    Home Page:
    I wanted to split a 60mb text file but did not want to download a software for it.

    So looking around the web I found the script below. It counts the lines in your text file then asks you how many lines do you want per file. It does this by loading your original file into temporary memory then writing it to numerous text file.

    I did it on my i3 cpu + 2 gigs of ram Laptop. A 60mb file split into 50,000 lines took 8mins.

    You run the script using windows powershell.

    Code:
    ############################################# # Split a log/text file into smaller chunks # 
    ############################################# 
    # 
    # WARNING: This will take a long while with extremely large files and uses lots of memory to stage the file 
    # 
     
    # Set the baseline counters 
    # 
    # Set the line counter to 0 
    $linecount = 0 
    # Set the file counter to 1. This is used for the naming of the log files 
    $filenumber = 1 
     
    # Prompt user for the path 
    $sourcefilename = Read-Host "What is the full path and name of the log file to split? (e.g. D:\mylogfiles\mylog.txt)" 
     
    # Prompt user for the destination folder to create the chunk files 
    $destinationfolderpath = Read-Host "What is the path where you want to extract the content? (e.g. d:\yourpath\)" 
     
    Write-Host "Please wait while the line count is calculated. This may take a while. No really, it could take a long time." 
     
    # Find the current line count to present to the user before asking the new line count for chunk files 
    Get-Content $sourcefilename | Measure-Object | ForEach-Object { $sourcelinecount = $_.Count } 
     
    #Tell the user how large the current file is 
    Write-Host "Your current file size is $sourcelinecount lines long" 
     
    # Prompt user for the size of the new chunk files 
    $destinationfilesize = Read-Host "How many lines will be in each new split file?" 
     
    # the new size is a string, so we convert to integer and up 
    # Set the upper boundary (maximum line count to write to each file) 
    $maxsize = [int]$destinationfilesize  
     
    Write-Host File is $sourcefilename - destination is $destinationfolderpath - new file line count will be $destinationfilesize 
     
    # The process reads each line of the source file, writes it to the target log file and increments the line counter. When it reaches 100000 (approximately 50 MB of text data) 
    $content = get-content $sourcefilename | % { 
     Add-Content $destinationfolderpath\splitlog$filenumber.txt "$_" 
      $linecount ++ 
      If ($linecount -eq $maxsize) { 
        $filenumber++ 
        $linecount = 0 
      } 
    } 
     
    # Clean up after your pet 
    [gc]::collect()  
    [gc]::WaitForPendingFinalizers() 
     
    • Thanks Thanks x 1
  2. solventnine

    solventnine Junior Member

    Joined:
    Dec 4, 2009
    Messages:
    113
    Likes Received:
    16
    Alternately, download cygwin and run the following command in the command line interface (n is the number of lines you want to split into):

    Code:
    split -l n filename.txt
    
    Optionally, you can pass a "prepend variable" to split and it will rename the output files to PREPENDaa, PREPENDab, ...PREPENDzz

    Code:
    split -l n filename.txt PREPEND
    
    The advantage to using cygwin is it gives you access to a bunch of other Linux tools, too.
     
    • Thanks Thanks x 1