Using .NET Control Structures to Fully Automate BimlScript

gravatar

Scott Currie

Using .NET Control Structures to Fully Automate BimlScript

published 08.14.15

last updated 10.05.15


Part of lesson Using .NET Control Structures to Fully Automate BimlScript.

Share

Introduction

In the previous lesson, we arrived at a solution using only a handful of control and text code nuggets to partially automate our staging layer. Change a variable, build, repeat - each table taking only seconds to add to our solution. In this lesson, we'll learn how to use loops and conditionals in our control nuggets to remove the manual intervention for each table. We will be able to create one list of tables and let automation do the rest for us.

Creating a List of Tables

Before getting started, let's recall the code sample from the end of the previous lesson:

<# var tableName = "DimAccount"; #>

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <OleDbConnection Name="Source" ConnectionString="Provider=SQLNCLI11;Server=SOURCE_SERVER;Initial Catalog=AdventureWorksDW2012;Integrated Security=SSPI;" />
        <OleDbConnection Name="Target" ConnectionString="Provider=SQLNCLI11;Server=localhost;Initial Catalog=AdventureWorksDW2012;Integrated Security=SSPI;" />
    </Connections>
    <Packages>
        <Package Name="Copy <#=tableName#>" ConstraintMode="Parallel">
            <Tasks>
                <Dataflow Name="DFT_CopyData">
                    <Transformations>
                        <OleDbSource Name="Retrieve Rows" ConnectionName="Source">
                            <DirectInput>SELECT * FROM <#=tableName#></DirectInput>
                        </OleDbSource>
                        <OleDbDestination Name="Load Rows" ConnectionName="Target">
                            <ExternalTableOutput Table="<#=tableName#>" />
                        </OleDbDestination>
                    </Transformations>
                </Dataflow>
            </Tasks>
        </Package>
    </Packages>
</Biml>

That tableName variable is our friend and our enemy. It's our friend, because it saves us the work of customizing our Biml file in multiple places for each table. It's our enemy, because we need to make edits to this file for every single table in our solution. What if we could create a collection variable that includes ALL table names at the beginning of the file, and then iterate over that collection to create each of our packages. That sounds like a great idea! Let's start by adding just the collection variable, which we learned how to do in the C# primer:

<# var allTableNames = new[] { "DimAccount", "DimCurrency", "DimCustomer", "DimDate", "DimDepartmentGroup", "DimEmployee", "DimGeography", "DimOrganization", "DimProduct", "DimProductCategory", "DimProductSubcategory" }; #>
<# var tableName = "DimAccount"; #>

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <OleDbConnection Name="Source" ConnectionString="Provider=SQLNCLI11;Server=SOURCE_SERVER;Initial Catalog=AdventureWorksDW2012;Integrated Security=SSPI;" />
        <OleDbConnection Name="Target" ConnectionString="Provider=SQLNCLI11;Server=localhost;Initial Catalog=AdventureWorksDW2012;Integrated Security=SSPI;" />
    </Connections>
    <Packages>
        <Package Name="Copy <#=tableName#>" ConstraintMode="Parallel">
            <Tasks>
                <Dataflow Name="DFT_CopyData">
                    <Transformations>
                        <OleDbSource Name="Retrieve Rows" ConnectionName="Source">
                            <DirectInput>SELECT * FROM <#=tableName#></DirectInput>
                        </OleDbSource>
                        <OleDbDestination Name="Load Rows" ConnectionName="Target">
                            <ExternalTableOutput Table="<#=tableName#>" />
                        </OleDbDestination>
                    </Transformations>
                </Dataflow>
            </Tasks>
        </Package>
    </Packages>
</Biml>

Looping over the Table List

At this point, we have created the variable, but it isn't being used. Let's remedy that. As we saw in the C# Primer, we can use the foreach loop to iterate over a collection. So we will add a foreach loop surrounding the Package definition. Furthermore, we can now eliminate the separate tableName variable, since we will now be assigning that from within the foreach loop.

<# var allTableNames = new[] { "DimAccount", "DimCurrency", "DimCustomer", "DimDate", "DimDepartmentGroup", "DimEmployee", "DimGeography", "DimOrganization", "DimProduct", "DimProductCategory", "DimProductSubcategory" }; #>

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <OleDbConnection Name="Source" ConnectionString="Provider=SQLNCLI11;Server=SOURCE_SERVER;Initial Catalog=AdventureWorksDW2012;Integrated Security=SSPI;" />
        <OleDbConnection Name="Target" ConnectionString="Provider=SQLNCLI11;Server=localhost;Initial Catalog=AdventureWorksDW2012;Integrated Security=SSPI;" />
    </Connections>
    <Packages>
        <# foreach (var tableName in allTableNames) { #>
        <Package Name="Copy <#=tableName#>" ConstraintMode="Parallel">
            <Tasks>
                <Dataflow Name="DFT_CopyData">
                    <Transformations>
                        <OleDbSource Name="Retrieve Rows" ConnectionName="Source">
                            <DirectInput>SELECT * FROM <#=tableName#></DirectInput>
                        </OleDbSource>
                        <OleDbDestination Name="Load Rows" ConnectionName="Target">
                            <ExternalTableOutput Table="<#=tableName#>" />
                        </OleDbDestination>
                    </Transformations>
                </Dataflow>
            </Tasks>
        </Package>
        <# } #>
    </Packages>
</Biml>

Believe it or not, we're done! If you build this script, you will generate one package for each of the tables listed in your allTableNames variable.

Automatically Loading Table Names

It's great that we can now build all of our packages at once, but this is still more work than I'd like to do. Specifically, if the source system ever adds a new table, I have to remember to come back and edit my collection. Furthermore, if I ever want to point this BimlScript at a different source system, I have to retrieve a table list and change the allTableNames variable initialization. It turns out that we can solve this problem very easily by using a BimlScript utility method.

We're going to learn much more about BimlScript utility methods in the next lesson, but there is one utility method that is so useful in this scenario, that I'm going to give you a preview. It's called ImportDB.

ImportDB enables you to connect to a live database server, and produce objects containing all of its schema information - tables, columns, indexes, etc., though in our case we'll only use the table names.

Reorganizing our BimlScripts into multiple files

It will be much easier to use ImportDB if we first reorganize our code a little bit. Specifically, I would like to move the connection definition into its own file. This has several advantages, but the two biggest are:

  1. I can put all of my configuration information (including the connection definitions) into a single Environment.biml file. This will be the only file I need to edit when I wish to retarget my solution to other systems.
  2. Having the connections in their own files makes them easier to reference from BimlScript.

Rearranging our code, we now have the following:

Environment.biml
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <OleDbConnection Name="Source" ConnectionString="Provider=SQLNCLI11;Server=SOURCE_SERVER;Initial Catalog=AdventureWorksDW2012;Integrated Security=SSPI;" />
        <OleDbConnection Name="Target" ConnectionString="Provider=SQLNCLI11;Server=localhost;Initial Catalog=AdventureWorksDW2012;Integrated Security=SSPI;" />
    </Connections>
</Biml>
LoadPackages.biml
<# var allTableNames = new[] { "DimAccount", "DimCurrency", "DimCustomer", "DimDate", "DimDepartmentGroup", "DimEmployee", "DimGeography", "DimOrganization", "DimProduct", "DimProductCategory", "DimProductSubcategory" }; #>

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Packages>
        <# foreach (var tableName in allTableNames) { #>
        <Package Name="Copy <#=tableName#>" ConstraintMode="Parallel">
            <Tasks>
                <Dataflow Name="DFT_CopyData">
                    <Transformations>
                        <OleDbSource Name="Retrieve Rows" ConnectionName="Source">
                            <DirectInput>SELECT * FROM <#=tableName#></DirectInput>
                        </OleDbSource>
                        <OleDbDestination Name="Load Rows" ConnectionName="Target">
                            <ExternalTableOutput Table="<#=tableName#>" />
                        </OleDbDestination>
                    </Transformations>
                </Dataflow>
            </Tasks>
        </Package>
        <# } #>
    </Packages>
</Biml>

From this point forward I will omit Environment.biml from examples, because it will not change.

IMPORTANT

Once you separate your Biml code into multiple files, if you are using BIDSHelper, you must be sure to multi-select all of the desired files before trying to build.

RootNode and the Biml API

The first thing we need to do is to access our source connection so that we can retrieve the table names using ImportDB. One of the really neat features in BimlScript is that it actually allows you to access the current project and everything defined in your project from within your code nuggets. The RootNode property corresponds to the Biml element in your source files - except that it doesn't contain the object defined in just one Biml file, it contains the union of all objects defined in all Biml files.

The RootNode object is part of the Biml API. Biml, in compiler jargon, is referred to as a homoiconic language. This basically just means that there is a 1:1 mapping from elements in the language to types and properties in the API. Just as there is a Connections collection element under the Biml root element in the XML source code, there is a Connections collection property in the RootNode object. (The same goes for Tables, Packages, etc.) The collection properties can be enumerated or you can index into them by name. So to retrieve the connection named Source, you would just write:

<# var sourceConnection = RootNode.Connections["Source"]; #>

For convenience, we're actually going to do something a little different, though. Biml supports many types of connections (e.g. flat file connections, SMTP connections, etc.). Only database connections actually support ImportDB. As a result, we would have to cast the source connection to the AstDbConnectionNode Biml API type in order to use it with the ImportDB method. But we have a shortcut for that. While homoiconicity mandates that RootNode have a Connections collection, it does not prevent it from having additional convenience properties. One such convenience property is DbConnections, which returns a collection of only the database collections. Consequently, we can write:

<# var sourceConnection = RootNode.DbConnections["Source"]; #>

Once we have the sourceConnection, we can use it to call ImportDB. Let's do that now in context of the larger sample:

<# var sourceConnection = RootNode.DbConnections["Source"]; #>
<# var importResults = sourceConnection.ImportDB(); #>

<# var allTableNames = new[] { "DimAccount", "DimCurrency", "DimCustomer", "DimDate", "DimDepartmentGroup", "DimEmployee", "DimGeography", "DimOrganization", "DimProduct", "DimProductCategory", "DimProductSubcategory" }; #>

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Packages>
        <# foreach (var tableName in allTableNames) { #>
        <Package Name="Copy <#=tableName#>" ConstraintMode="Parallel">
            <Tasks>
                <Dataflow Name="DFT_CopyData">
                    <Transformations>
                        <OleDbSource Name="Retrieve Rows" ConnectionName="Source">
                            <DirectInput>SELECT * FROM <#=tableName#></DirectInput>
                        </OleDbSource>
                        <OleDbDestination Name="Load Rows" ConnectionName="Target">
                            <ExternalTableOutput Table="<#=tableName#>" />
                        </OleDbDestination>
                    </Transformations>
                </Dataflow>
            </Tasks>
        </Package>
        <# } #>
    </Packages>
</Biml>

The only thing left to do is to remove allTableNames and instead iterate over the TableNodes collection contained within the importResults:

<# var sourceConnection = RootNode.DbConnections["Source"]; #>
<# var importResults = sourceConnection.ImportDB(); #>

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Packages>
        <# foreach (var table in importResults.TableNodes) { #>
        <Package Name="Copy <#=table.Name#>" ConstraintMode="Parallel">
            <Tasks>
                <Dataflow Name="DFT_CopyData">
                    <Transformations>
                        <OleDbSource Name="Retrieve Rows" ConnectionName="Source">
                            <DirectInput>SELECT * FROM <#=table.Name#></DirectInput>
                        </OleDbSource>
                        <OleDbDestination Name="Load Rows" ConnectionName="Target">
                            <ExternalTableOutput Table="<#=table.Name#>" />
                        </OleDbDestination>
                    </Transformations>
                </Dataflow>
            </Tasks>
        </Package>
        <# } #>
    </Packages>
</Biml>

And we're done! Now there is no need to even create a list of tables. We just automatically retrieve the list of tables from the server itself!

A couple of notes before proceeding:

  1. Note that since we are iterating over table objects (that also contain column and other information), we switched our variable name to "table" and we accessed the name property of that using "table.Name"
  2. The ImportDB method actually provides several different versions or overloads, which allow us to set a variety of options. We could filter out certain schemas or tables. We could filter out views, ignore foreign keys, or perform a wide variety of other customizations. For simplicity, we'll keep the code as is with a full schema load.

Further Considerations

At this point, we could start to add additional functionality.

Want some event handlers for error logging? Add them and rebuild - they will then be in all packages.

Want to add some conditional logic to insert DataConversion components in some cases? We learned how to do that in the C# Primer.

Instead of going down the path of adding potentially interesting, but ultimately artificial code sequences, we are going to keep this sample realistic, and continue with the current simple version of the sample.

Conclusion

We're just beginning to use the full power of code nuggets in concert with C# code snippets and BimlScript utility methods to fully automate our Biml solutions. In the next lesson, we'll take a look at more of the Biml utility methods and see how they can help take our coding to the next level.

P.S. Final Code Sample

Below is the final code sample, in case you'd like to build it yourself. Remember to change your source and target connection strings!

Environment.biml
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <OleDbConnection Name="Source" ConnectionString="Provider=SQLNCLI11;Server=SOURCE_SERVER;Initial Catalog=AdventureWorksDW2012;Integrated Security=SSPI;" />
        <OleDbConnection Name="Target" ConnectionString="Provider=SQLNCLI11;Server=localhost;Initial Catalog=AdventureWorksDW2012;Integrated Security=SSPI;" />
    </Connections>
</Biml>
LoadPackages.biml
<# var sourceConnection = RootNode.DbConnections["Source"]; #>
<# var importResults = sourceConnection.ImportDB(); #>

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Packages>
        <# foreach (var table in importResults.TableNodes) { #>
        <Package Name="Copy <#=table.Name#>" ConstraintMode="Parallel">
            <Tasks>
                <Dataflow Name="DFT_CopyData">
                    <Transformations>
                        <OleDbSource Name="Retrieve Rows" ConnectionName="Source">
                            <DirectInput>SELECT * FROM <#=table.Name#></DirectInput>
                        </OleDbSource>
                        <OleDbDestination Name="Load Rows" ConnectionName="Target">
                            <ExternalTableOutput Table="<#=table.Name#>" />
                        </OleDbDestination>
                    </Transformations>
                </Dataflow>
            </Tasks>
        </Package>
        <# } #>
    </Packages>
</Biml>

Finished?

Complete the lesson Using .NET Control Structures to Fully Automate BimlScript:

You are not authorized to comment. A verification email has been sent to your email address. Please verify your account.

Comments

gravatar

Laurens

11:54am 02.22.17

I tried to use the rootNode.DbConnections property, but it seems to keep returning an empty list. I've boiled it down to a very minimal example.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <OleDbConnection Name="Source" ConnectionString="Server=TNDS002;Initial Catalog=Selligent;Integrated Security=SSPI;Provider=SQLNCLI11;" />
    </Connections>
<# var sourceConnection = RootNode.DbConnections["Source"]; #>
<# var importResults = sourceConnection.ConnectionString; #>
</Biml>

Upon validation, this reports a NullReferenceException for the sourceConnection.ConnectionString call. Any ideas what might be going wrong here?

gravatar

Varigence

5:56pm 02.22.17

In order for your BimlScript code nugget to access your OleDbConnection, the OleDbConnection must be defined in a separate file. For example:

File1.biml:

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <OleDbConnection Name="Source" ConnectionString="Server=TNDS002;Initial Catalog=Selligent;Integrated Security=SSPI;Provider=SQLNCLI11;" />
    </Connections>
</Biml>

File2.biml:

<# var sourceConnection = RootNode.Connections.First(); #>

Then to build this, you would either make both files "Live" in Mist/BimlStudio of multi-select both files when generating packages in BimlExpress.

The reason for this is how the Biml compiler needs to work. The Biml compiler will parse the BimlScript file and convert it into a dynamically generated C# program that generates the "flat" Biml code that is then used to generate tables, packages, etc. When that C# program with the RootNode.Connections.First() statement is being run, the Biml code that contains the connection definition does not yet exist. Specifically, that Biml code will be part of the output of the dynamically generated C# program.

While that may be a bit complicated, the solution is simple. When you want to access a Biml object through the Biml API in a code nugget, just make sure that Biml object is defined in an earlier file. If you need additional layers to your BimlScripts, you can use the template directive with the tier attribute to control the order in which BimlScript files are compiled.