XML Task in SSIS

Using XML Task we can perform the following operations and based on the operation selected the option in the task editor changes accordingly

· Validate: Validates the XML document against a Document Type Definition (DTD) or XML Schema definition (XSD) schema.
· XSLT: Performs XSL transformations on XML documents to convert the file based on the XSLT script
· XPATH: Performs XPath queries and evaluations.
· Merge: Merges two XML documents.
· Diff: Compares two XML documents.
· Patch: Applies the output from the Diff operation to create a new document.

Of all the operations performed by this task, the most commonly used operation is XSLT, let’s have a look at the way it’s done.

XSLT Operation of XML task in SSIS:
The most common causes to use XSLT Operation in SSIS are listed below
1) In SSIS when we try to import data from XML File using XML Source transform in Data Flow of the package, it automatically generates an XSD file but if there are multiple namespaces in the XML file, it fails to generate XSD file. so the work around is to remove the name spaces from XML files before processing the XML files 
2) Few XML Files have multiple nodes and to pull the data into the same table from all those nodes we will have to merge join both the outputs from the nodes instead we can actually merge the nodes into a single node using XSLT file and then use XML Source transform in Data Flow to load data into the table.

So let’s start removing namespaces from files to demonstrate the XSLT Operation in XML Task. 

Remove Namespaces from XML in SSIS
Steps:
1) Create a new XSLT file (removenamespace.xslt) and paste the below script(Note: this code works universally to XML Files)
   <?xml version="1.0" encoding="utf-8" ?>
   <xsl:stylesheet version="1.0"         xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 
    <xsl:output method="xml" indent="no" /> 
    <xsl:template match="/|comment()|processing-instruction()">    
    <xsl:copy>      
    <xsl:apply-templates />    
    </xsl:copy>  
    </xsl:template>  
    <xsl:template match="*">    
    <xsl:element name="{local-name()}">      
    <xsl:apply-templates select="@*|node()" />    
    </xsl:element>  
    </xsl:template>  
    <xsl:template match="@*">    
    <xsl:attribute name="{local-name()}">      
    <xsl:value-of select="." />    
    </xsl:attribute>  
    </xsl:template> </xsl:stylesheet>

2) Create a new package then drag and drop "XML Task" into the control flow
3) Double click on the XML Task and go to general properties and select Operation Type as XSLT and set the source xml file and destination file (create file as option in connection string) and the XSLT file in second operation as shown in the below image


4) Execute the package and it should create a new file without namespaces.

XPATH Operation of XML task in SSIS: 
Now that we have an XML file from XSLT operation let’s fetch some data using XPATH operation in XML Task, usually it is used to get a value of element or attribute or list of nodes into a string variable.

Steps:
1) For demo purpose let’s create a new Output.xml file from below content.
Output.xml file:
<?xml version="1.0" encoding="utf-8"?>
<Sales SaleDate="2013-10-28" NumberOfOrders="2">
 <SalesOrder OrderDate="2005-07-01T00:00:00" SalesOrderNumber="SO43700" TaxAmt="55.9279" SubTotal="699.0982" Freight="17.4775">
  <Detail OrderQty="1" UnitPrice="699.0982" LineTotal="699.098200">'Sam'</Detail>
 </SalesOrder>
 <SalesOrder OrderDate="2007-12-24T00:00:00" SalesOrderNumber="SO60686" TaxAmt="200.9856" SubTotal="2512.3200" Freight="62.8080">
  <Detail OrderQty="1" UnitPrice="2443.3500" LineTotal="2443.350000"></Detail>
  <Detail OrderQty="1" UnitPrice="4.9900" LineTotal="4.990000"></Detail>
  <Detail OrderQty="1" UnitPrice="8.9900" LineTotal="8.990000"></Detail>
  <Detail OrderQty="1" UnitPrice="54.9900" LineTotal="54.990000"></Detail>
 </SalesOrder>
</Sales>
 2) Drag drop the XML Task into the control flow and select the "OperationType" as XPATH
 3) Source in XML Task will be the Output.xml file which we generated, create a new variable "varOutputVariable" and set the data type to string and in the “SecondOperand” specify the operation that needs to be done as shown in below image


In the above image we specify SecondOperand as "/Sales/SalesOrder[2]/Detail/@UnitPrice"  that means we are asking SSIS Package to picks up second child of SalesOrder in above XML and the corresponding value in the attribute UnitPrice of Detail, so when we run the package the output in the variable varOutputVariable will be "443.3500\n4.9900\n8.9900\n54.9900".
If we want value of detail let’s say the first child then query in SecondOperand will be "/Sales/SalesOrder[1]/Detail" and the output in the variable varOutputVariable would be "Sam".The above two scripts for SecondOperand are just for example, please search online for XPATH Language or XML Path Language for more info.

Add-on/Enhancement:
The source and destination files can be made dynamic using expressions in XML Task and can process multiple files.