This is Part 2 of the Hybrid Mode in Tabular BI Semantic Model series, where we will learn more about the design tips, a few factors to consider on Hybrid mode implementation and a summarised pros/cons of the Hybrid Mode.
Part 1 of the Hybrid Mode in Tabular BI Semantic Model is located here.
Partitioning for Hybrid Mode
DirectQuery only supports one partition. However, in Hybrid Mode, you can additionally define a set of mutually exclusive In-Memory partitions that can serve the In-Memory data access. The DirectQuery partition and the In-Memory partitions can overlap; however, the Processing Option of the DirectQuery partition must be set to “Never process this partition” otherwise “A duplicate attribute key has been found when processing” error will be returned as shown below.
More information about Partitioning and DirectQuery mode: http://msdn.microsoft.com/en-us/library/hh230965.aspx
Note: the results of the DirectQuery partition and the In-Memory partition cannot be combined automatically. A separate connection must be made for each data access and query results can potentially be combined manually or programmatically. For example, SSRS reports can potentially use both DirectQuery and In-Memory data access types of the same Tabular database to retrieve the real-time and in-memory data. More on this, in another post.
1. Only expose a subset of the data that is required for real-time access.
Requirement example: when performing Lead Analysis, detailed analysis of the past month is required in real-time using PowerView.
Design solution: Configure the DirectQuery partition with date filter to only retrieve past month data.
Analysts can retrieve the required up-to-date data of past month via PowerView to perform accurate analysis.
2. Only expose a subset of data that is required for in-memory analysis
By default the In-Memory partition is set to be the same one as the DirectQuery partition; i.e. it’s an exact copy. If there is a requirement to only make a subset of source data generally available via In-Memory (i.e. access via Excel), then this subset of source data is a good candidate for an In-Memory partition. Multiple partitions can be defined to efficiently process the data into the In-Memory part of the Hybrid Tabular Model.
3. Set DirectQuery partition to “Never process this partition”
This is to avoid processing error during design/development in SSDT.
4. The larger the combined size of the In-Memory partitions, the larger the Memory size needed in the server.
Although the compression in Tabular Model is great, but the source data needs to be processed and loaded into Memory which can be quite heavy on resources. VertipaqPagingPolicy can be tweaked if the data in the In-Memory partitions do not fit in the memory, to make use of virtual memory, paging data out to system pagefile. Marco Russo has a brilliant article on VertipaqPagingPolicy modes and Memory Properties.
5. Optimise at source level first, i.e. SQL Server database.
Use Columnstore index and partitioning techniques where necessary, especially on the tables that are source to the Tabular model. (This tip is applicable to any Tabular Model modes).
6. Partitioning in Tabular Model is a means of organising the data to be loaded and processed into In-Memory.
Unlike Partitions in SQL Server relational database engine where they can improve performance at query time, partitions in Tabular Model are strictly for optimisation in the processing performance. Having said that, the larger the combined total size of the tabular model partitions, query performance may decline a little bit – but not in linear manner, remember that xVelocity engine uses outstanding compression algorithm. More about Partition and Processing strategy can be found below. Further reading on Partitions can be found here: http://msdn.microsoft.com/en-us/library/hh230976.aspx.
Factors to consider when adopting Hybrid Model
There are a few factors to consider when adopting Hybrid Model. It is a clever way of attacking “real time” and “flexibility” issues, but it is not always suitable for every scenario.
Although Hybrid Mode can offer the “real-time” feature of DirectQuery, semantic differences between the xVelocity In-Memory Analytics Engine and SQL Server engine should be carefully considered and investigated. The results when accessing the data from DirectQuery mode and In-Memory mode for Hybrid tabular model may differ due to the semantic differences. See more information on semantic differences here.
In Part 1, I have mentioned that the DAX query issued to the DirectQuery data access, will be translated to equivalent SQL queries. DirectQuery uses/relies on SQL Server Database Engine whilst In-Memory uses xVelocity In-Memory Analytics Engine which are different semantically and may return different results.
One of the reasons in choosing DirectQuery design is for scalability, i.e. relying on optimisation performed at SQL Server level and minimal additional resources required for DirectQuery. Enabling Hybrid data access mode means enabling both DirectQuery mode as well as In-Memory mode. Since In-Memory mode requires resources, particularly memory and CPU, this means that enabling Hybrid Mode requires considerations on memory and CPU allocation in a similar way to In-Memory. Using Hybrid mode without careful considerations on partitioning and how much data to be processed into the In-Memory Tabular Model may defeat the purpose of switching to DirectQuery design (exposed as Hybrid Mode Tabular Model)
Securing data when it is accessed through DirectQuery mode and the In-Memory mode of a Hybrid tabular model is not trivial. Please note that Row/Dynamic level security is not offered in Hybrid mode as it follows the DirectQuery design. One of the benefits of using DirectQuery on the first place may be to take advantage of existing permission definition in the SQL Server data source.
Most complex permissions defined the SQL Server data source could not be persisted when data is accessed through In-Memory mode of a Hybrid Tabular Model. An example of “complex” (not so complex in this one) permission defined in the SQL Server data source, user A only has read access to transaction records dated 2012 and newer; while user B has read access to transaction records dated 2010 and older.
In this example, In-Memory will not be able to capture user A and user B permission in this case, because:
1. Impersonation setting for In-Memory data processing of a Hybrid Tabular Model is specific to one credential (windows account or service account), all data will be processed into In-Memory is based on the credential supplied.
2. Calculated column is not available in Hybrid mode (as it follows DirectQuery design), thus it would be impossible to define the permission definition at row by row level to mimic SQL Server data source’s.
This means uniformity in complex permission definition regardless of the data access mode of a Hybrid tabular model is not possible.
Partitioning Strategy and Processing Frequency
Partitioning strategy and processing of the In-Memory mode can be aligned to business requirements. For example, if the business requires only the past 12 months of data to be accessed via Excel connecting via In-Memory, the In-Memory partition(s) can be defined such that only the past 12 month worth of data will be processed into the Hybrid tabular model. Please note that the DirectQuery partition can overlap the In-Memory partitions, so it is possible for the DirectQuery partition to contain all data or some of the data or a superset/subset of the In-Memory partition(s).
As outlined above, the architecture of Hybrid mode is quite clever but would not solve all real-time vs flexibility issue. The advantages and limitations of Hybrid mode can be summarised as follows.
Hybrid Mode Advantages
1. Greater options for client tools (compared to pure DirectQuery only mode), i.e. Excel and other MDX client tools can access the In-Memory partition(s)
2. When diligently partitioned, less resource required for caching, processing and querying In-Memory partitions.
3. Flexibility in accessing Real-time data using the DirectQuery data access, i.e. PowerView and SSRS
Hybrid Mode Limitations
Inherited design constraints of DirectQuery:
1. Restricted DAX functions (as it needs to follow DirectQuery design constraints). For example, TOTALYTD, SAMEPERIODLASTYEAR are not available in DirectQuery, as such, are not available in Hybrid Mode either.
2. No Row Level security. Even if it is possible to implement this at the Source database level, there will be inconsistency in the data returned between using In-Memory and DirectQuery
3. No Calculated Columns
Inherited In-Memory drawbacks:
1. Memory requirements to store the compressed In-Memory data.
2. Processor requirements to process source data into the tabular database and memory.
Data inconsistency limitations:
1. Stale data in the In-Memory and up-to-date data in DirectQuery, which may confuse users
2. Semantic differences between SQL Database Engine (returned by the DirectQuery mode) and the xVelocity In-Memory Analytics Engine (returned by the In-Memory mode)
Hybrid mode still has the burden of Memory and Processor requirements as much as the default In-Memory mode. When designed and configured carefully the default In-Memory mode may be able to achieve real-time in a same manner as Hybrid mode with less complexity. This of course would be another topic of discussion. So, stay tuned!
Hybrid Mode mainly offers both Real-Time access and Client Tool flexibility. However, it comes with price. This Part 2 of the Hybrid Mode in Tabular BI Semantic Model suggests including relevant data in the tabular model for both In-Memory and DirectQuery. Consistency, scalability, security, partitioning strategy and processing frequency are important factors to consider when implementing Hybrid Mode. Some limitations of the Hybrid Mode is valid as at the time of writing.
If you are implementing Tabular Model database, I’d love to hear your thoughts. Please leave your comments in regards to your experience with Tabular Model, or the Hybrid Mode posts.
Edit - 25 July 2012:
Added more explanation on VertipaqPagingPolicy reference.
Added more explanation in Section 6. Partitioning in Tabular Model is a means of organising the data to be loaded and processed into In-Memory.