Using XML Task we
can perform the following operations and based on the operation selected the
option in the task editor changes accordingly
· Validate: Validates the XML document against a
Document Type Definition (DTD) or XML Schema definition (XSD) schema.
· XSLT: Performs XSL transformations on XML documents to
convert the file based on the XSLT script
· XPATH: Performs XPath
queries and evaluations.
· Merge: Merges two XML documents.
· Diff: Compares two XML documents.
· Patch: Applies the output from the Diff operation to
create a new document.
Of all the
operations performed by this task, the most commonly used operation is
XSLT, let’s have a look at the way it’s done.
XSLT Operation of XML task in SSIS:
The most common causes to use XSLT Operation in SSIS
are listed below
1) In SSIS when we try to import data from XML File
using XML Source transform in Data Flow of the package, it automatically generates an XSD file
but if there are multiple namespaces in the XML file, it fails to generate XSD file. so
the work around is to remove the name spaces from XML files before processing
the XML files
2)
Few XML Files have multiple nodes and to pull the data into the same table from
all those nodes we will have to merge join both the outputs from the nodes
instead we can actually merge the nodes into a single node using XSLT file and
then use XML Source transform in Data Flow to load data into the table.
So let’s start
removing namespaces from files to demonstrate the XSLT Operation in XML
Task.
Remove Namespaces from XML in SSIS
Steps:
1) Create a new XSLT file (removenamespace.xslt)
and paste the below script(Note: this code works
universally to XML Files)
<?xml
version="1.0" encoding="utf-8" ?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml"
indent="no" />
<xsl:template
match="/|comment()|processing-instruction()">
<xsl:copy>
<xsl:apply-templates
/>
</xsl:copy>
</xsl:template>
<xsl:template
match="*">
<xsl:element
name="{local-name()}">
<xsl:apply-templates
select="@*|node()" />
</xsl:element>
</xsl:template>
<xsl:template
match="@*">
<xsl:attribute
name="{local-name()}">
<xsl:value-of select="."
/>
</xsl:attribute>
</xsl:template> </xsl:stylesheet>
2) Create a new
package then drag and drop "XML Task" into the control flow
3) Double click on
the XML Task and go to general properties and select Operation Type as XSLT and
set the source xml file and destination file (create file as option in
connection string) and the XSLT file in second operation as shown in the below
image
4) Execute the
package and it should create a new file without namespaces.
XPATH Operation
of XML task in SSIS:
Now
that we have an XML file from XSLT operation let’s fetch some data using XPATH
operation in XML Task, usually it is used to get a value of element
or attribute or list of nodes into a string variable.
Steps:
1) For demo purpose
let’s create a new Output.xml file from below content.
Output.xml file:
<?xml version="1.0"
encoding="utf-8"?>
<Sales SaleDate="2013-10-28"
NumberOfOrders="2">
<SalesOrder
OrderDate="2005-07-01T00:00:00" SalesOrderNumber="SO43700" TaxAmt="55.9279"
SubTotal="699.0982"
Freight="17.4775">
<Detail OrderQty="1" UnitPrice="699.0982"
LineTotal="699.098200">'Sam'</Detail>
</SalesOrder>
<SalesOrder
OrderDate="2007-12-24T00:00:00" SalesOrderNumber="SO60686" TaxAmt="200.9856"
SubTotal="2512.3200"
Freight="62.8080">
<Detail OrderQty="1" UnitPrice="2443.3500"
LineTotal="2443.350000"></Detail>
<Detail OrderQty="1" UnitPrice="4.9900"
LineTotal="4.990000"></Detail>
<Detail OrderQty="1" UnitPrice="8.9900"
LineTotal="8.990000"></Detail>
<Detail OrderQty="1" UnitPrice="54.9900"
LineTotal="54.990000"></Detail>
</SalesOrder>
</Sales>
2) Drag drop
the XML Task into the control flow and select the "OperationType"
as XPATH
3) Source in
XML Task will be the Output.xml file which we generated, create a new variable
"varOutputVariable" and set the data type
to string and in the “SecondOperand” specify the
operation that needs to be done as shown in below image
In
the above image we specify SecondOperand as
"/Sales/SalesOrder[2]/Detail/@UnitPrice" that means we are asking SSIS Package
to picks up second child of SalesOrder in above
XML and the corresponding value in the attribute UnitPrice of
Detail, so when we run the package the output in the variable varOutputVariable will
be "443.3500\n4.9900\n8.9900\n54.9900".
If
we want value of detail let’s say the first child then query in SecondOperand will be "/Sales/SalesOrder[1]/Detail"
and the output in the variable varOutputVariable would
be "Sam".The above two scripts for SecondOperand are just for example, please search
online for XPATH Language or XML Path Language for more info.
Add-on/Enhancement:
The source and destination files can be
made dynamic using expressions in XML Task and can process multiple files.