Data Profiling Script

gravatar

Peter Avenent

This is a simple script that will create all the DataProfiling required using Biml

published 05.20.14

last updated 05.20.14


Share

Lay of the land

Firstly let me point you to the articles I referenced to create the Biml scripts and if you want to know anything about DataProfiling go no further than SSIS Junkie aka @jamiet blog series. Once you understand the basics of the Data Profiling Task you might want to look at filtering the list of tables and for that you can refer to the Import a filtered list of tables using a static list of table names snippet by Scott Currie.

The Code

<!-- Connections.biml -->   
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <AdoNetConnection Name="AW" Provider="System.Data.SqlClient.SqlConnection, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" ConnectionString="Data Source=.;Initial Catalog=AdventureWorksDW2012;Integrated Security=True;"/> 
    </Connections>
</Biml>

<!-- DataProfiling.biml -->
<#  var rootFolder = @"C:\Temp\";
    var sourceConnection = (AstAdoNetConnectionNode)RootNode.Connections["AW"];
    var importResult = sourceConnection.ImportDB("", "", ImportOptions.ExcludeForeignKey | ImportOptions.ExcludeColumnDefault | ImportOptions.ExcludeViews);#>

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
    <#  foreach (var tableNode in importResult.TableNodes) { #>
        <FileConnection Name="<#=tableNode.SsisSafeScopedName#>.xml" FileUsageType="CreateFile" FilePath="<#=rootFolder#><#=tableNode.SsisSafeScopedName#>.xml" />
    <#  } #>
    </Connections>
    <Packages>
    <#  foreach (var tableNode in importResult.TableNodes) { #>
        <Package Name="ProfileTable <#=tableNode.SsisSafeScopedName#>" ConstraintMode="Linear" ProtectionLevel="EncryptSensitiveWithUserKey">
            <Tasks>
                <DataProfiling Name="DPT - <#=tableNode.SsisSafeScopedName#>" DelayValidation="true" ForcedExecutionValueDataType="Empty" OverwriteDestination="true">
                    <ProfileRequests>
                        <ColumnNullRatioProfileRequest Name="NullRatioReq" ConnectionName="AW" SchemaId="<#=tableNode.Schema.Name#>" TableId="<#=tableNode.Name#>" />
                        <ColumnStatisticsProfileRequest Name="StatisticsReq" ConnectionName="AW" SchemaId="<#=tableNode.Schema.Name#>" TableId="<#=tableNode.Name#>" />
                        <ColumnLengthDistributionProfileRequest Name="LengthDistReq" ConnectionName="AW" SchemaId="<#=tableNode.Schema.Name#>" TableId="<#=tableNode.Name#>" />
                        <ColumnValueDistributionProfileRequest Name="ValueDistReq" ConnectionName="AW" SchemaId="<#=tableNode.Schema.Name#>" TableId="<#=tableNode.Name#>" />
                        <ColumnPatternProfileRequest Name="PatternReq" ConnectionName="AW" SchemaId="<#=tableNode.Schema.Name#>" TableId="<#=tableNode.Name#>" />
                    <#  if (tableNode.PreferredKey !=null){#>
                        <CandidateKeyProfileRequest Name="KeyReq" ConnectionName="AW" SchemaId="<#=tableNode.Schema.Name#>" TableId="<#=tableNode.Name#>">
                            <KeyColumns>
                            <#  foreach (AstTableKeyColumnNode keyColumn in tableNode.PreferredKey.Columns) {#>
                                <Column><#=keyColumn.Column.Name#></Column>
                            <#  } #>
                            </KeyColumns>
                        </CandidateKeyProfileRequest>
                    <#  } #>
                        <FunctionalDependencyProfileRequest Name="FDReq" ConnectionName="AW" SchemaId="<#=tableNode.Schema.Name#>" TableId="<#=tableNode.Name#>" DependentColumn="*">
                            <DeterminantColumns>
                                <Column>*</Column>
                            </DeterminantColumns>
                        </FunctionalDependencyProfileRequest>
                    </ProfileRequests>
                    <FileOutput ConnectionName="<#=tableNode.SsisSafeScopedName#>.xml" />
                </DataProfiling>
            </Tasks>
            <Connections>
                <Connection ConnectionName="AW" />
                <Connection ConnectionName="<#=tableNode.SsisSafeScopedName#>.xml"/>
            </Connections>
        </Package>
    <#  } #>
        </Packages>
</Biml>
You are not authorized to comment. A verification email has been sent to your email address. Please verify your account.

Comments

gravatar

Craig

8:00am 07.28.14

Hi Peter Using this script in Mist or BIDSHelper, I get the error below

Object reference not set to an instance of an object. at Varigence.Hadron.Extensions.SchemaManagement.ConnectionExtensions.ImportDB in :line 0

Is anyone else getting this? Could this be an environmental thing? Thanks

gravatar

Craig

9:52am 08.20.14

Hi Peter

Thought I'd let you know that I seemed to have found a solution to my error previously reported. I needed to split the script into two .biml files and also add an imports directive to the second.

Script 1 is below

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <AdoNetConnection Name="AW" Provider="System.Data.SqlClient.SqlConnection, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" ConnectionString="Data Source=.;Initial Catalog=AdventureWorksDW2012;Integrated Security=True;"/> 
    </Connections>
</Biml>

Script 2 is below

<#@ template tier="2" #>
<#@ import namespace="Varigence.Hadron.CoreLowerer.SchemaManagement" #>

<#  var rootFolder = @"C:\Temp\";
    var sourceConnection = (AstAdoNetConnectionNode)RootNode.Connections["AW"];
    var importResult = sourceConnection.ImportDB("", "", ImportOptions.ExcludeForeignKey | ImportOptions.ExcludeColumnDefault | ImportOptions.ExcludeViews);#>


<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
    <#  foreach (var tableNode in importResult.TableNodes) { #>
        <FileConnection Name="<#=tableNode.SsisSafeScopedName#>.xml" FileUsageType="CreateFile" FilePath="<#=rootFolder#><#=tableNode.SsisSafeScopedName#>.xml" />
    <#  } #>
    </Connections>
    <Packages>
    <#  foreach (var tableNode in importResult.TableNodes) { #>
        <Package Name="ProfileTable <#=tableNode.SsisSafeScopedName#>" ConstraintMode="Linear" ProtectionLevel="EncryptSensitiveWithUserKey">
            <Tasks>
                <DataProfiling Name="DPT - <#=tableNode.SsisSafeScopedName#>" DelayValidation="true" ForcedExecutionValueDataType="Empty" OverwriteDestination="true">
                    <ProfileRequests>
                        <ColumnNullRatioProfileRequest Name="NullRatioReq" ConnectionName="AW" SchemaId="<#=tableNode.Schema.Name#>" TableId="<#=tableNode.Name#>" />
                        <ColumnStatisticsProfileRequest Name="StatisticsReq" ConnectionName="AW" SchemaId="<#=tableNode.Schema.Name#>" TableId="<#=tableNode.Name#>" />
                        <ColumnLengthDistributionProfileRequest Name="LengthDistReq" ConnectionName="AW" SchemaId="<#=tableNode.Schema.Name#>" TableId="<#=tableNode.Name#>" />
                        <ColumnValueDistributionProfileRequest Name="ValueDistReq" ConnectionName="AW" SchemaId="<#=tableNode.Schema.Name#>" TableId="<#=tableNode.Name#>" />
                        <ColumnPatternProfileRequest Name="PatternReq" ConnectionName="AW" SchemaId="<#=tableNode.Schema.Name#>" TableId="<#=tableNode.Name#>" />
                    <#  if (tableNode.PreferredKey !=null){#>
                        <CandidateKeyProfileRequest Name="KeyReq" ConnectionName="AW" SchemaId="<#=tableNode.Schema.Name#>" TableId="<#=tableNode.Name#>">
                            <KeyColumns>
                            <#  foreach (AstTableKeyColumnNode keyColumn in tableNode.PreferredKey.Columns) {#>
                                <Column><#=keyColumn.Column.Name#></Column>
                            <#  } #>
                            </KeyColumns>
                        </CandidateKeyProfileRequest>
                    <#  } #>
                        <FunctionalDependencyProfileRequest Name="FDReq" ConnectionName="AW" SchemaId="<#=tableNode.Schema.Name#>" TableId="<#=tableNode.Name#>" DependentColumn="*">
                            <DeterminantColumns>
                                <Column>*</Column>
                            </DeterminantColumns>
                        </FunctionalDependencyProfileRequest>
                    </ProfileRequests>
                    <FileOutput ConnectionName="<#=tableNode.SsisSafeScopedName#>.xml" />
                </DataProfiling>
            </Tasks>
            <Connections>
                <Connection ConnectionName="AW" />
                <Connection ConnectionName="<#=tableNode.SsisSafeScopedName#>.xml"/>
            </Connections>
        </Package>
    <#  } #>
        </Packages>
</Biml>

Cheers Craig