SSIS Reporting Pack – High Level Review

I always appreciate codeplex tools that are available out there to help us Database Developers and Administrators in making our lives much easier. One of my favourite codeplex tools is SSIS Reporting Pack written by Jamie Thomson (blog | twitter). The reason is simple, I use SSIS 2012 a lot this year for the company that I work for; and I find SSIS Reporting Pack makes it easier for me to see the progress of the SSIS projects that I have scheduled to run.

In addition, when there is an issue, (e.g. SSIS package just hung due to ASYNC_NETWORK_IO issue on the source database server), it’s much easier to see when it happened and what the last messages were. Of course, in this example, it wouldn’t say anything that there was an ASYNC_NETWORK_IO issue on the source database server, but the ability to visually see what’s happening and quickly narrowing down the result is the key. The tool is written using SSRS, which means I can also export it to Excel, PDF, etc, if I want to.

The SSRS reports accesses the SSISDB database of the server specified in the “SSISCatalog.rds” Shared Data Source. So, it can only track as far as the data retained in SSISDB. If you have multiple SSISDB instances, you will need to run/deploy separate SSIS Reporting Pack reports for each instance. Now, let’s talk about my favourite features within SSIS Reporting Pack.

 

Dashboard

It’s simple and easy to see what SSIS projects that have been running on the server and their execution time. This gives me the ability to quickly see how my deployed projects are running in recent times.

From here you can go to “Most Recent Execution” , “Most Recent Failed Execution”, “Folders”, “Executions”, “Longest Executions”.

SSIS Reporting Pack - Dashboard

SSIS Reporting Pack - Dashboard

By the way, if you have any suggestions on what else to put on the dashboards, please log it here at codeplex.

 

Executions Report

This gives me a quick view of how the executed packages perform within the selected period. I can also narrow it down to only show a specific project or folder. This is a more powerful version of the dashboard with more information to start doing analysis. For example, how often a project fails and see if there is any pattern. You can drillthrough to the project execution that failed (or ran successfully) which will bring up the Execution Report.

 

SSIS Reporting Pack - Executions

SSIS Reporting Pack - Executions

 

Execution Report

The Execution page provides a detailed list of event messages associated to the Execution. It’s a great way for debugging at package and/or task level. Browsing it by drilling down the package executions to the task level is made easy in this Execution page. In addition, you can also browse previous or next execution of the same package; which is very handy if you have deployed a new version of the package and would like to analyse how they compare.

SSIS Reporting Pack - Execution

SSIS Reporting Pack - Execution

 

Wrap Up

So, if you are using SSIS in SQL Server 2012, give SSIS Reporting Pack a go, which you can download from  http://ssisreportingpack.codeplex.com/

The latest version, which is v0.4 is stable. SSRS Report Server is not required as you can run the report from SSDT. However, having it published to an SSRS Report Server is a better idea. Running the report can sometime take a long time.

If you have used SSIS Reporting Pack, please share your thoughts and comments here.

 

Further Reading

SSIS Reporting Pack v0.4 – Execution Report updated by Jamie Thomson

SSIS Reporting Pack v0.2 now available by Jamie Thomson

SSIS Reporting Pack – Installation Overview at Codeplex

 

Short Review on .ispac SSIS Deployment

My last two articles on SSIS have been related to bugs. That’s not because I am looking for its faults all the time. I like using SSIS – I have been since SQL Server 2005, more so in 2008 and even better in 2012. I think SSIS has limitations but also great advantages. For most of the projects that I have worked with, it simplifies quite a lot of things for me.

Matt Mason (twitter | blog) has discussed the branching/development strategies and how they fit in the revamped SQL Server 2012 SSIS in a couple of articles. Read them here:

Thoughts on Branching Strategies for SSIS Projects

Can I deploy a single SSIS package from my project to the SSIS Catalog?

 

So here’s my thought on the new .ispac method of deployment in SQL Server 2012, in response to the latter post by Matt.

After using SSIS in SQL Server 2012 for a moderately sized ETL for data warehouse for over 6 months, I’ve become more and more appreciative of the .ispac method of deploying. Building/deploying it, as if it was a C# project, in SSIS SQL Server 2012 makes more and more sense to me.

There has been a few seconds in the past, which I thought deploying just a single package would be useful. For example, when there is a need to import simple text files that can be done via one SSIS package on an ad hoc basis. However, the more I think about this, I think it’s not a “good” enough reason to do a single package deployment. In fact, if it gets run reasonably frequently or it may be reusable, I would create a project called “Ad hoc Import” which consists of a collection of “common” ad hoc SSIS packages.  In my experience, I don’t always know how often an SSIS package to be executed at conception. It could be a one-off / throw-away package; or it could start with being a one-off and then continue to be executed every so often. More often than not, my “first thought” one-off packages have become reusable (re-runnable) that they are worthy of being maintained as a project and deployed to the Integration Services Catalogs.

I think having / forcing project deployment just makes us think twice about how we organise our packages / projects in Integration Services Catalogs; which in my opinion, is not a bad thing at all.

Also – yes – I do use version control for all my SSIS projects. I use Team Foundation Server 2010. Although with the Integration Services Catalogs in SQL Server 2012, the deployed versions information is kept, this is not good enough to replace a version control tool; especially in team development environment.

Most of the issues that I have encountered and submitted at Microsoft Connect in regards to administering and executing deployed SSIS projects have workarounds (albeit risky). So I am all for project deployment.  By the way, more info on these issues:

http://connect.microsoft.com/SQLServer/feedback/details/751220/ssisdb-package-parameter-error-in-previous-version-project-lsn-carried-forward-to-the-future-deployed-versions. Also see: SSISDB Bug: Package Parameter Mapping

http://connect.microsoft.com/SQLServer/feedback/details/743735/ssisdb-catalog-get-parameter-values-with-null-able-package-name-input-parameter. Also see: Get Parameter Values from SSISDB Catalog: a bug?

I am also keen to hear what others think about a single package deployment and/or their experience with the new project deployment method in SQL Server 2012.

 

 

 

 

Hybrid Mode in Tabular BI Semantic Model – Part 2

This is Part 2 of the Hybrid Mode in Tabular BI Semantic Model series, where we will learn more about the design tips, a few factors to consider on Hybrid mode implementation and a summarised pros/cons of the Hybrid Mode.

Part 1 of the Hybrid Mode in Tabular BI Semantic Model  is located here.

 

Partitioning for Hybrid Mode

DirectQuery only supports one partition. However, in Hybrid Mode, you can additionally define a set of mutually exclusive In-Memory partitions that can serve the In-Memory data access. The DirectQuery partition and the In-Memory partitions can overlap; however, the Processing Option of the DirectQuery partition must be set to “Never process this partition” otherwise “A duplicate attribute key has been found when processing” error will be returned as shown below.

Processing Hybrid Tabular Model with Duplicate Key Attribute Error

More information about Partitioning and DirectQuery mode: http://msdn.microsoft.com/en-us/library/hh230965.aspx

Note: the results of the DirectQuery partition and the In-Memory partition cannot be combined automatically. A separate connection must be made for each data access and query results can potentially be combined manually or programmatically. For example, SSRS reports can potentially use both DirectQuery and In-Memory data access types of the same Tabular database to retrieve the real-time and in-memory data. More on this, in another post.

 

Design Tips:

1. Only expose a subset of the data that is required for real-time access.

Requirement example: when performing Lead Analysis, detailed analysis of the past month is required in real-time using PowerView.

Design solution: Configure the DirectQuery partition with date filter to only retrieve past month data.

Analysts can retrieve the required up-to-date data of past month via PowerView to perform accurate analysis.

 

2. Only expose a subset of data that is required for in-memory analysis

By default the In-Memory partition is set to be the same one as the DirectQuery partition; i.e. it’s an exact copy. If there is a requirement to only make a subset of source data generally available via In-Memory (i.e. access via Excel), then this subset of source data is a good candidate for an In-Memory partition. Multiple partitions can be defined to efficiently process the data into the In-Memory part of the Hybrid Tabular Model.

 

3. Set DirectQuery partition to “Never process this partition”

This is to avoid processing error during design/development in SSDT.

 

4. The larger the combined size of the In-Memory partitions, the larger the Memory size needed in the server.

Although the compression in Tabular Model is great, but the source data needs to be processed and loaded into Memory which can be quite heavy on resources. VertipaqPagingPolicy can be tweaked if the data in the In-Memory partitions do not fit in the memory, to make use of virtual memory, paging data out to system pagefile. Marco Russo has a brilliant article on VertipaqPagingPolicy modes and Memory Properties.

 

5. Optimise at source level first, i.e. SQL Server database.

Use Columnstore index and partitioning techniques where necessary, especially on the tables that are source to the Tabular model. (This tip is applicable to any Tabular Model modes).

 

6. Partitioning in Tabular Model is a means of organising the data to be loaded and processed into In-Memory.

Unlike Partitions in SQL Server relational database engine where they can improve performance at query time, partitions in Tabular Model are strictly for optimisation in the processing performance.  Having said that, the larger the combined total size of the tabular model partitions, query performance may decline a little bit – but not in linear manner, remember that xVelocity engine uses outstanding compression algorithm. More about Partition and Processing strategy can be found below. Further reading on Partitions can be found here: http://msdn.microsoft.com/en-us/library/hh230976.aspx.

 

 

Factors to consider when adopting Hybrid Model

There are a few factors to consider when adopting Hybrid Model. It is a clever way of attacking “real time” and “flexibility” issues, but it is not always suitable for every scenario.

Consistency

Although Hybrid Mode can offer the “real-time” feature of DirectQuery, semantic differences between the xVelocity In-Memory Analytics Engine and SQL Server engine should be carefully considered and investigated. The results when accessing the data from DirectQuery mode and In-Memory mode for Hybrid tabular model may differ due to the semantic differences. See more information on semantic differences here.

In Part 1, I have mentioned that the DAX query issued to the DirectQuery data access, will be translated to equivalent SQL queries. DirectQuery uses/relies on SQL Server Database Engine whilst In-Memory uses xVelocity In-Memory Analytics Engine which are different semantically and may return different results.

Scalability

One of the reasons in choosing DirectQuery design is for scalability, i.e. relying on optimisation performed at SQL Server level and minimal additional resources required for DirectQuery.  Enabling Hybrid data access mode means enabling both DirectQuery mode as well as In-Memory mode.  Since In-Memory mode requires resources, particularly memory and CPU, this means that enabling Hybrid Mode requires considerations on memory and CPU allocation in a similar way to In-Memory. Using Hybrid mode without careful considerations on partitioning and how much data to be processed into the In-Memory Tabular Model may defeat the purpose of switching to DirectQuery design (exposed as Hybrid Mode Tabular Model)

Security

Securing data when it is accessed through DirectQuery mode and the In-Memory mode of a Hybrid tabular model is not trivial. Please note that Row/Dynamic level security is not offered in Hybrid mode as it follows the DirectQuery design. One of the benefits of using DirectQuery on the first place may be to take advantage of existing permission definition in the SQL Server data source.

Most complex permissions defined the SQL Server data source could not be persisted when data is accessed through In-Memory mode of a Hybrid Tabular Model. An example of “complex” (not so complex in this one) permission defined in the SQL Server data source, user A only has read access to transaction records dated 2012 and newer; while user B has read access to transaction records dated 2010 and older.

In this example, In-Memory will not be able to capture user A and user B permission in this case, because:

1. Impersonation setting for In-Memory data processing of a Hybrid Tabular Model is specific to one credential (windows account or service account), all data will be processed into In-Memory is based on the credential supplied.

2. Calculated column is not available in Hybrid mode (as it follows DirectQuery design), thus it would be impossible to define the permission definition at row by row level to mimic SQL Server data source’s.

This means uniformity in complex permission definition regardless of the data access mode of a Hybrid tabular model is not possible.

Partitioning Strategy and Processing Frequency

Partitioning strategy and processing of the In-Memory mode can be aligned to business requirements. For example, if the business requires only the past 12 months of data to be accessed via Excel connecting via In-Memory, the In-Memory partition(s) can be defined such that only the past 12 month worth of data will be processed into the Hybrid tabular model. Please note that the DirectQuery partition can overlap the In-Memory partitions, so it is possible for the DirectQuery partition to contain all data or some of the data or a superset/subset of the In-Memory partition(s).

An example of Partition Design in Hybrid Mode

An example of Partition Design in Hybrid Mode

 

Summary

As outlined above, the architecture of Hybrid mode is quite clever but would not solve all real-time vs flexibility issue. The advantages and limitations of Hybrid mode can be summarised as follows.

Hybrid Mode Advantages

1. Greater options for client tools (compared to pure DirectQuery only mode), i.e. Excel and other MDX client tools can access the In-Memory partition(s)

2. When diligently partitioned, less resource required for caching, processing and querying In-Memory partitions.

3. Flexibility in accessing Real-time data using the DirectQuery data access, i.e. PowerView and SSRS

Hybrid Mode Limitations

Inherited design constraints of DirectQuery:

1. Restricted DAX functions (as it needs to follow DirectQuery design constraints). For example, TOTALYTD, SAMEPERIODLASTYEAR are not available in DirectQuery, as such, are not available in Hybrid Mode either.

2. No Row Level security. Even if it is possible to implement this at the Source database level, there will be inconsistency in the data returned between using In-Memory and DirectQuery

3. No Calculated Columns

Inherited In-Memory drawbacks:

1. Memory requirements to store the compressed In-Memory data.

2. Processor requirements to process source data into the tabular database and memory.

Data inconsistency limitations:

1. Stale data in the In-Memory and up-to-date data in DirectQuery, which may confuse users

2. Semantic differences between SQL Database Engine (returned by the DirectQuery mode) and the xVelocity In-Memory Analytics Engine (returned by the In-Memory mode)

 

Other Options

Hybrid mode still has the burden of Memory and Processor requirements as much as the default In-Memory mode. When designed and configured carefully the default In-Memory mode may be able to achieve real-time in a same manner as Hybrid mode with less complexity. This of course would be another topic of discussion. So, stay tuned!

 

Wrap Up

Hybrid Mode mainly offers both Real-Time access and Client Tool flexibility. However, it comes with price. This Part 2 of the Hybrid Mode in Tabular BI Semantic Model suggests including relevant data in the tabular model for both In-Memory and DirectQuery. Consistency, scalability, security, partitioning strategy and processing frequency are important factors to consider when implementing Hybrid Mode. Some limitations of the Hybrid Mode is valid as at the time of writing.

If you are implementing Tabular Model database, I’d love to hear your thoughts. Please leave your comments in regards to your experience with Tabular Model, or the Hybrid Mode posts.

 

Edit - 25 July 2012: 

Added more explanation on VertipaqPagingPolicy reference.

Added more explanation in Section 6. Partitioning in Tabular Model is a means of organising the data to be loaded and processed into In-Memory.