At one of my clients I had a requirement to parse out a single records into multiple rows as the incoming file had extra columns added for every new occurrence of a unique data, SSIS 2008 does not automatically parse it into multiple records and it doesn’t read a varying column records, SSIS 2012 you can load a varying column records.
So to achieve what we what we need to first parse the data from one row into multiple rows using a script task then load the data into table using Dataflow.
For demonstration purpose let’s create a new CSV and call it SplitText.csv
Data in csv:
1, 2, 3, 4,repeat1,repeat2,repeat3,
5,6,7,8,repeat4,
1,3,5,7,repeat5,repeat6,
So as you can see my data has first 4 columns which are constants/uniqueness of records and the rest are repeating columns which we need to split so that my final output looks like this
1, 2, 3, 4,repeat1,
1, 2, 3, 4,repeat2,
1, 2, 3, 4,repeat3,
5,6,7,8,repeat4,
1,3,5,7,repeat5,
1,3,5,7,repeat6,
Now let’s create a new package and add a Script Task and select Script Language as “Visual Basic” and then click on edit the script task and paste the below code
' Microsoft SQL Server Integration Services Script Task
' Write scripts using Microsoft Visual Basic 2008.
' The ScriptMain is the entry point class of the script.
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.IO
<System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="")> _
<System.CLSCompliantAttribute(False)> _
Partial Public Class ScriptMain
Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
Enum ScriptResults
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
End Enum
Dim fileLoc As String = "\OutputSplitCSVRow.csv" '***Specify the output file location here
Private Sub FileCreation()
Dim fs As IO.FileStream = Nothing
If (Not IO.File.Exists(fileLoc)) Then
fs = IO.File.Create(fileLoc)
Using fs
End Using
End If
End Sub
Public Sub Main()
'
' Add your code here
'
FileCreation()
Dim Column_Count As Integer = 5 '***Specify the Total expected column here for me it is 4 Uniqueness columns+1 repeat column
Dim FILE_NAME As String = "\SplitText.csv" '***Specify the source file location here
Dim TextLine As String = ""
If System.IO.File.Exists(FILE_NAME) = True Then
Dim objReader As New System.IO.StreamReader(FILE_NAME)
Do While objReader.Peek() <> -1
'TextLine = TextLine & objReader.ReadLine() & vbNewLine
TextLine = objReader.ReadLine()
Dim MyArray() As String = TextLine.Split(",")
If MyArray.Length - 1 = Column_Count Then
If File.Exists(fileLoc) Then
Using sw As StreamWriter = New StreamWriter(fileLoc, True)
sw.WriteLine(TextLine.ToString())
End Using
End If
Else
For j As Integer = 0 To (MyArray.Length - 1 - Column_Count)
Dim SplitRows As String = ""
For i As Integer = 0 To Column_Count - 2
SplitRows = SplitRows + MyArray(i).ToString() + ","
Next
SplitRows = SplitRows + MyArray(Column_Count - 1 + j).ToString() + ","
If File.Exists(fileLoc) Then
Using sw As StreamWriter = New StreamWriter(fileLoc, True)
sw.WriteLine(SplitRows.ToString())
End Using
End If
Next
End If
Loop
'MsgBox(TextLine.ToString())
'System.IO.File.Move("\sample1.rvr", "\sample1.txt")
Else
MsgBox("File Does Not Exist")
End If
Dts.TaskResult = ScriptResults.Success
End Sub
End Class
In the above script modify the file locations and columns count according to your needs.
Once you run the above script the output file data should be SSIS Dataflow Source friendly.
Add-on/Enhancement: If there are multiple files which needs to be processed then consider using foreach loop and feed the foreach loop file names to a variable and use this variable in the script task to assign the value to FILE_NAME variable above, and write the output file name of script task (fileLoc) to a variable and use this variable for SSIS Dataflow flat file source connection string.