Ratko Ćosić - lamentations of one programmer

ponedjeljak, 04.08.2008.

SQL Server 2005/08 Integration Services



Take a look at these links for further info:
- MSDN Integration Services site
- Integration Services Datasheet
- Integration Services White Paper

SQL Server 2005 brings us new things to play around with'em, and one of these things is SSIS (SQL Server Integration Services) - a successor of DTS (Data Transformation Services) in SQL Server 2000.



Well, the first thing buggered me when I've installed 2K5 was that I didn't find Import/Export Wizard. Well, you don't have it anymore! At least, not inside Enterprise Manager, oops, Management Studio (Express). You'll find some similar tool inside new SQL Server 2005/08 "Visual Studio" (it's called BIDS - Business Intelligence Development Studio).

As such, SSIS has some practical usage, such as:

- importing and exporting data (as told),
- transformations of data,
- automating maintenance tasks,
- executing sql scripts,
- sending e-mail notifications (now through SMTP instead of dreadful MAPI),
- FTP-ing files,
- interacting with message queues (remember MSMQ?),
- launching "Analysis Services" processes (think of it as OLAP, for start),
- data mining (remember dwarfs?),
- manipulating XML files,
- querying WMI queries (pinging computers and low-level events).

So, when using SSIS? The answer is whenever you want some job concerning data transfer or modifications or manipulation of these mentioned objects to take place MORE THAN ONCE. There is no necessity to play around with it if you want to extract the data from an Excel file and put it in one (new) table and do it just once. You would normally just copy/paste it into the "Open table" view in SQL Management Studio.

I will show you the usage of it in one practical example that I needed to implement...

As a pre-step, enable "SQL Server Integration Services" service inside SQL Server Configuration Manager, and make sure you installed BIDS during the SQL Server installation.

If you want to create some SSIS package, you normally open the BIDS studio and start with the creating of new project/solution, as you would normally do with C#/VB.NET. On first look, BIDS is nothing much more as ordinarily Visual Studio and you will find it very easy to work in. When you create new Integration Services Project, you'll see four special folders created: Data Sources, Data Source Views, SSIS Packages (containing one package named Package.dtsx), and Miscellaneous.
Package file will be automatically opened in Control Flow view. Except this view, you'll see Data Flow, Event Handlers and Package Explorer. Let's stick to the first view for now.

As starters, you should define some of the Source Adapters inside Connection Managers pane. You can choose between: DataReader (works with ADO.NET connections), FlatFile (csv's), OLE DB (extract data using any valid OLE DB Provider, as SQL Server itself), Raw File (used only as destination to write intermediary results of partly processed data - frankly, I dunno what the beep it is), XML (extracts the data from an XML document), Dimension Processing (process dimensions in data warehouse tables), Partition Processing (processes Analysis Services partitions), Recordset (oh, an old Methusalem ...), and SQL Mobile (importing using the high-speed bulk insert interface for your precious Blueberry).

I've created for this example an Excel Connection Manager, two SQL Server 2005 database connections, and a MS Access Database connection.

Excel Connection Manager - the most important property is ExcelFilePath, which could be parameterized (you should avoid using hardcoded file path, as always).

SQL Server 2005/08 Database - the most important property is normally ConnectionString (with usual format: Data Source=[server name][instance name];Initial Catalog=[database name];Provider=SQLNCLI.1;Integrated Security=SSPI;Auto Translate=False;).

MS Access Database - also fill out its ConnectionString, usually formatted as: uid=[user name];Dsn=MS Access Database;dbq=[path to the *.mdb];defaultdir=[working directory (make it equal as path to the mdb file)];driverid=25;fil=MS Access;maxbuffersize=2048;pagetimeout=5;.

Now, choose one of the containers (Sequence Container, For Loop Container, Foreach Loop Container). The containers are self-explanatory, so I'll not explain it.

After that, you should start to define your element inside the package just like you would do in DTS package before. Pick it from the toolbox and drag and drop it on the Data Flow surface.

For the list of all the tasks you can use, you can see the complete toolbox as you open the package:



As you've seen, I first choose a sequence container, named it after the table name, and then dragged and dropped the following elements (tasks):

- Execute SQL Task - a and most useful task created for execution of various T-SQL code,
- Data Flow Task - needed for importing/exporting data from one into the other(s) source(s).

to perform the following actions:

- truncating destination table (Execute SQL Task),
- dropping constrains of dest table (Execute SQL Task),
- transferring data from source to the destination (Data Flow Task),
- filling GUIDs for non-matched records (Execute SQL Task),
- adding constrains to dest table (Execute SQL Task).

The package would look something similar as the following picture (for this case, a left container):



If you double-click a Data Flow Task you can see that it's assembled from sub-elements. Let's pick the first transfer data task:



I've choose XML Source Task to define my source file as XML file. This is nothing special except it relies on the given XML data source (file path you should configure using configuration - as explained later on this post)

This task actually serves to extract data from an XML data file by using next task: Data Conversion task. This task is very clever as you can see more thouroughly if you double-click it. On the top it has a table with available input columns, and on the bottom it shows these columns being transformed (renamed, data type changed, excluded, etc.). Also, if you click on the button named "Configure Error Output.." you can define in which cases the trasformation will fail and there you have the opportunity to gracefully react on failure or just continue with your work.

After XML columns being transformed, I've put in action Derived Column Transformation Task, which derives the column values using expressions consisted of column names, variables, functions and operators. For example, I trim the column values and perform concatenations of some of input columnt. Very neat stuff.

After that, I've used Aggregate Transformation Task, which uses one of the aggregate functions (as group by, count, or count distinct) practically to normalize used table. With this you can avoid double rows as input.

Then, one of the good new advantages of SSIS - Lookup Transformation Task is used. This task, as documented, enables the performance of simple equi-joins between the input and a reference data set. What it means actually is that it can seach for certain linkage in a destination table and use that linkage to perform some special actions. I've used it, for example, to match the primary key of destination table to unique combination of source columns, so I would now that the import is consistently performed its job.

And that's practically all!
You can then play around with it by put the breakpoints and "debug" this package as you would with your .NET code. You can trigger its execution for multiple times, use parameters, disable containers, and so on.

Configurable settings

What is very practical is to make the values configurable, especially file paths of the data files, resulting files, emails etc. This is trivial, because SSIS uses Package Configuration Organizer which allows you to set (as you had an opportunity before in Dynamic Properties in DTS 2000) package properties based on machine environment variables, XML configuration files, registry entries, etc.

How you can do it? Open the package you want, right-click on Data Flow surface, and click on "Package Configurations..." option. This opens the Package Configurations Organizer which allows you to define one of the configs you'll play around. Add one, pick the package configuration type from the combo. The values are:

- XML configuration file
- Environment variable
- Registry entry
- Parent package variable
- SQL Server

When you go to the next wizard's page, you'll be astonished with myriads of options you can select and define as configurable: variables and properties of package containers, event handlers, connection managers (specially practical), and global variables and properties - myriads!!

For example, you can select your Excel or SQL Server connection manager, and then select ConnectionString or ExcelFilePath, InitialCatalog or similar and then mark it as configurable:



You will most likely use this functionality prior to setting up a package in the Package Deployment Wizard, so that deployments can be run from different target machines.

Apart from this, you also can use variables scoped to a package or to a task. Also, variable has its data type and a value, of course.

So, this is all. At least, for now.
Enjoy in SSIS and BIDS!

Kind regards,
Ratko.

- 09:05 - Comments (0) - Print - #

<< Prethodni mjesec | Sljedeći mjesec >>

Creative Commons License
Ovaj blog je ustupljen pod Creative Commons licencom Imenovanje-Nekomercijalno-Dijeli pod istim uvjetima.