Friday, January 30, 2009

SSIS Gotchas

SQL Server Integration Services is very finicky (read “gives indecipherable errors and will make your life miserable”). I wrote some packages using VS2008 / SQL Server 2008. Following is a list of gotchas I ran across. Note that there’s no frigging way you can figure any of these out from the error messages alone:
  • Restoring a database renders connections in packages unusable (even though everything looks good and testing the connection is fine). Thus, every time you restore a database that a package is pointing to (which I was doing repeatedly during testing), you must go into the package, add a new connection from a data source (by right clicking in the connection manager area of each package) and add in the same source that’s already being used (giving it a different name). Then you must go into each task item in the package and point to the new connection. If that task had mappings (as from a flat file to a table), you must re-map everything. Enjoy!
  • If you’re executing a package from .Net code, only the .Net connection types will run.
    • There are 3 different forms of essentially the same connection type: “Native OLE DB”, “.Net providers for OleDb”, and “.Net Providers”
    • The “.Net Providers” didn’t seem to work at all; the “Native OLE DB” providers would not execute when attempting to run the packages from a web page on a different server; both the “Native OLE DB” and “.Net providers for OleDb” worked from my local machine.
    • It seems the proper choice for calling packages from code is “.Net providers for OleDb”.
    • The “.Net providers for OleDb” will not execute from code directly from within visual studio (i.e., you can’t right click on the package and click execute) so you can't debug directly.
    • The .Net connections ignore their timeout property. They simply won't work when running a long query.
    • Enjoy!
  • You can't include a package that deals with a table that doesn't yet exist.
    • So you can’t have a package that refers to a table that hasn’t yet been created. The package will give errors all over the place. So, for instance (IIRC), you can’t deploy a package that refers to a table that will be created by another package b/c it won’t validate on deployment.
  • Deployment of packages is a nightmare. There are myriad issues surrounding the demonic property of packages, “ProtectionLevel”.
    • Under most circumstances, you would want to make your connection string properties configurable at the server. That way, you can move the packages around or point them to different servers w/o having to redeploy them. In my case, I had no end of problems with this (and the packages need only run annually) so I just deployed and ran this way:
      • For deployment, I would just do file | Save copy of [package] as and copy the package to the server running integration services. This would just embed the connection settings and not rely on any package configuration files. Then, for executing the packages from code, I would just refer to the server I copied the packages to.
    • At one point, I had a package that had a bad path in one of its connection’s properties. This seemed to prevent all packages from working.
    • Most destructive: Changing the connections and re-deploying (as described above) doesn't overwrite the connections. So I changed the connections from production back to QA, verified everything multiple times, redeployed (which prompts for confirmation whether I want to overwrite the existing packages), and ran a package. Much to my surprise & chagrin, the package executed against production! In this case, the existing packages must be deleted. In Reporting Services, there is some “overwrite data sources” property, but I couldn’t find this in SSIS.
  • Another disturbing issue: packages run where they’re called from. If, in code, you execute some SQL against a particular server, that SQL will execute on the server. Confusingly (b/c I was importing and exporting data), when you execute a package, it executes on the machine where it was called, not on the server where the package exists!
  • Warnings on package objects show up all over the place. There's no figuring out why they don't consistently work. E.G., login failure when there is nothing wrong with the connection. I think setting the property “BypassPrepare” to true helped.
  • Unlike the predecessor technology, DTS, there is no way to switch a connection vis-a-vis data flow objects w/o losing mappings. Enjoy!
  • Names of all new packages are Package1 – you must rename them and delete file Package1 manually.
  • If you have an extra line break at the end of a script task, it crashes. You can copy/paste that script into management studio and it will work fine.
  • Closing / reopening the project sometimes causes connection references from tasks to be lost.
  • Yet another disturbing issue: I was not able to get it down to a reproducible set of steps, but the project was reverting my script changes. I found that I had to check my script changes into sourcesafe immediately after saving every time (I suspect that the reverting took place somewhere in deploying).
The fun part is that you can’t compare versions of packages in sourcesafe b/c a simple change to a script task changes all sorts of stuff in the dtsx file.

1 comment:

Anonymous said...

I too loathe the errors that come out of SSIS. The remapping of fields is baffling to me, microsoft really needs to put together a more solid ETL tool.