Parsing data from one row into multiple rows in CSV file

At one of my clients I had a requirement to parse out a single records into multiple rows as the incoming file had extra columns added for every new occurrence of a unique data, SSIS 2008 does not automatically parse it into multiple records and it doesn’t read a varying column records, SSIS 2012 you can load a varying column records.

 

So to achieve what we what we need to first parse the data from one row into multiple rows using a script task then load the data into table using Dataflow.

 

For demonstration purpose let’s create a new CSV and call it SplitText.csv

 

Data in csv:

1, 2, 3, 4,repeat1,repeat2,repeat3,

5,6,7,8,repeat4,

1,3,5,7,repeat5,repeat6,

 

So as you can see my data has first 4 columns which are constants/uniqueness of records and the rest are repeating columns which we need to split so that my final output looks like this

1, 2, 3, 4,repeat1,

1, 2, 3, 4,repeat2,

1, 2, 3, 4,repeat3,

5,6,7,8,repeat4,

1,3,5,7,repeat5,

1,3,5,7,repeat6,

 

Now let’s create a new package and add a Script Task and select Script Language as “Visual Basic” and then click on edit the script task and paste the below code

 

' Microsoft SQL Server Integration Services Script Task

' Write scripts using Microsoft Visual Basic 2008.

' The ScriptMain is the entry point class of the script.

 

Imports System

Imports System.Data

Imports System.Math

Imports Microsoft.SqlServer.Dts.Runtime

Imports System.IO

 

 

 

 

<System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="")> _

<System.CLSCompliantAttribute(False)> _

Partial Public Class ScriptMain

                Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase

 

                Enum ScriptResults

                                Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success

                                Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure

                End Enum

               

    Dim fileLoc As String = "\OutputSplitCSVRow.csv" '***Specify the output file location here

    Private Sub FileCreation()

        Dim fs As IO.FileStream = Nothing

        If (Not IO.File.Exists(fileLoc)) Then

            fs = IO.File.Create(fileLoc)

 

            Using fs

            End Using

        End If

    End Sub

    Public Sub Main()

        '

        ' Add your code here

        '

        FileCreation()

 

        Dim Column_Count As Integer = 5 '***Specify the Total expected column here for me it is 4 Uniqueness columns+1 repeat column

        Dim FILE_NAME As String = "\SplitText.csv" '***Specify the source file location here

        Dim TextLine As String = ""

        If System.IO.File.Exists(FILE_NAME) = True Then

            Dim objReader As New System.IO.StreamReader(FILE_NAME)

            Do While objReader.Peek() <> -1

                'TextLine = TextLine & objReader.ReadLine() & vbNewLine

                TextLine = objReader.ReadLine()

 

                Dim MyArray() As String = TextLine.Split(",")

                If MyArray.Length - 1 = Column_Count Then

                    If File.Exists(fileLoc) Then

                        Using sw As StreamWriter = New StreamWriter(fileLoc, True)

                            sw.WriteLine(TextLine.ToString())

 

                        End Using

                    End If

                Else

                    For j As Integer = 0 To (MyArray.Length - 1 - Column_Count)

                        Dim SplitRows As String = ""

                        For i As Integer = 0 To Column_Count - 2

                            SplitRows = SplitRows + MyArray(i).ToString() + ","

                        Next

                        SplitRows = SplitRows + MyArray(Column_Count - 1 + j).ToString() + ","

                        If File.Exists(fileLoc) Then

                            Using sw As StreamWriter = New StreamWriter(fileLoc, True)

                                sw.WriteLine(SplitRows.ToString())

 

                            End Using

                        End If

                    Next

                End If

 

 

            Loop

            'MsgBox(TextLine.ToString())

            'System.IO.File.Move("\sample1.rvr", "\sample1.txt")

        Else

            MsgBox("File Does Not Exist")

        End If

 

 

        Dts.TaskResult = ScriptResults.Success

 

    End Sub

End Class

 

In the above script modify the file locations and columns count according to your needs.

 

Once you run the above script the output file data should be SSIS Dataflow Source friendly.

 

Add-on/Enhancement: If there are multiple files which needs to be processed then consider using foreach loop and feed the foreach loop file names to a variable and use this variable in the script task to assign the value to FILE_NAME variable above, and write the output file name of script task (fileLoc) to a variable and use this variable for SSIS Dataflow flat file source connection string.