Hybrid Mode in Tabular BI Semantic Model – Part 2

This is Part 2 of the Hybrid Mode in Tabular BI Semantic Model series, where we will learn more about the design tips, a few factors to consider on Hybrid mode implementation and a summarised pros/cons of the Hybrid Mode.

Part 1 of the Hybrid Mode in Tabular BI Semantic Model  is located here.

 

Partitioning for Hybrid Mode

DirectQuery only supports one partition. However, in Hybrid Mode, you can additionally define a set of mutually exclusive In-Memory partitions that can serve the In-Memory data access. The DirectQuery partition and the In-Memory partitions can overlap; however, the Processing Option of the DirectQuery partition must be set to “Never process this partition” otherwise “A duplicate attribute key has been found when processing” error will be returned as shown below.

Processing Hybrid Tabular Model with Duplicate Key Attribute Error

More information about Partitioning and DirectQuery mode: http://msdn.microsoft.com/en-us/library/hh230965.aspx

Note: the results of the DirectQuery partition and the In-Memory partition cannot be combined automatically. A separate connection must be made for each data access and query results can potentially be combined manually or programmatically. For example, SSRS reports can potentially use both DirectQuery and In-Memory data access types of the same Tabular database to retrieve the real-time and in-memory data. More on this, in another post.

 

Design Tips:

1. Only expose a subset of the data that is required for real-time access.

Requirement example: when performing Lead Analysis, detailed analysis of the past month is required in real-time using PowerView.

Design solution: Configure the DirectQuery partition with date filter to only retrieve past month data.

Analysts can retrieve the required up-to-date data of past month via PowerView to perform accurate analysis.

 

2. Only expose a subset of data that is required for in-memory analysis

By default the In-Memory partition is set to be the same one as the DirectQuery partition; i.e. it’s an exact copy. If there is a requirement to only make a subset of source data generally available via In-Memory (i.e. access via Excel), then this subset of source data is a good candidate for an In-Memory partition. Multiple partitions can be defined to efficiently process the data into the In-Memory part of the Hybrid Tabular Model.

 

3. Set DirectQuery partition to “Never process this partition”

This is to avoid processing error during design/development in SSDT.

 

4. The larger the combined size of the In-Memory partitions, the larger the Memory size needed in the server.

Although the compression in Tabular Model is great, but the source data needs to be processed and loaded into Memory which can be quite heavy on resources. VertipaqPagingPolicy can be tweaked if the data in the In-Memory partitions do not fit in the memory, to make use of virtual memory, paging data out to system pagefile. Marco Russo has a brilliant article on VertipaqPagingPolicy modes and Memory Properties.

 

5. Optimise at source level first, i.e. SQL Server database.

Use Columnstore index and partitioning techniques where necessary, especially on the tables that are source to the Tabular model. (This tip is applicable to any Tabular Model modes).

 

6. Partitioning in Tabular Model is a means of organising the data to be loaded and processed into In-Memory.

Unlike Partitions in SQL Server relational database engine where they can improve performance at query time, partitions in Tabular Model are strictly for optimisation in the processing performance.  Having said that, the larger the combined total size of the tabular model partitions, query performance may decline a little bit – but not in linear manner, remember that xVelocity engine uses outstanding compression algorithm. More about Partition and Processing strategy can be found below. Further reading on Partitions can be found here: http://msdn.microsoft.com/en-us/library/hh230976.aspx.

 

 

Factors to consider when adopting Hybrid Model

There are a few factors to consider when adopting Hybrid Model. It is a clever way of attacking “real time” and “flexibility” issues, but it is not always suitable for every scenario.

Consistency

Although Hybrid Mode can offer the “real-time” feature of DirectQuery, semantic differences between the xVelocity In-Memory Analytics Engine and SQL Server engine should be carefully considered and investigated. The results when accessing the data from DirectQuery mode and In-Memory mode for Hybrid tabular model may differ due to the semantic differences. See more information on semantic differences here.

In Part 1, I have mentioned that the DAX query issued to the DirectQuery data access, will be translated to equivalent SQL queries. DirectQuery uses/relies on SQL Server Database Engine whilst In-Memory uses xVelocity In-Memory Analytics Engine which are different semantically and may return different results.

Scalability

One of the reasons in choosing DirectQuery design is for scalability, i.e. relying on optimisation performed at SQL Server level and minimal additional resources required for DirectQuery.  Enabling Hybrid data access mode means enabling both DirectQuery mode as well as In-Memory mode.  Since In-Memory mode requires resources, particularly memory and CPU, this means that enabling Hybrid Mode requires considerations on memory and CPU allocation in a similar way to In-Memory. Using Hybrid mode without careful considerations on partitioning and how much data to be processed into the In-Memory Tabular Model may defeat the purpose of switching to DirectQuery design (exposed as Hybrid Mode Tabular Model)

Security

Securing data when it is accessed through DirectQuery mode and the In-Memory mode of a Hybrid tabular model is not trivial. Please note that Row/Dynamic level security is not offered in Hybrid mode as it follows the DirectQuery design. One of the benefits of using DirectQuery on the first place may be to take advantage of existing permission definition in the SQL Server data source.

Most complex permissions defined the SQL Server data source could not be persisted when data is accessed through In-Memory mode of a Hybrid Tabular Model. An example of “complex” (not so complex in this one) permission defined in the SQL Server data source, user A only has read access to transaction records dated 2012 and newer; while user B has read access to transaction records dated 2010 and older.

In this example, In-Memory will not be able to capture user A and user B permission in this case, because:

1. Impersonation setting for In-Memory data processing of a Hybrid Tabular Model is specific to one credential (windows account or service account), all data will be processed into In-Memory is based on the credential supplied.

2. Calculated column is not available in Hybrid mode (as it follows DirectQuery design), thus it would be impossible to define the permission definition at row by row level to mimic SQL Server data source’s.

This means uniformity in complex permission definition regardless of the data access mode of a Hybrid tabular model is not possible.

Partitioning Strategy and Processing Frequency

Partitioning strategy and processing of the In-Memory mode can be aligned to business requirements. For example, if the business requires only the past 12 months of data to be accessed via Excel connecting via In-Memory, the In-Memory partition(s) can be defined such that only the past 12 month worth of data will be processed into the Hybrid tabular model. Please note that the DirectQuery partition can overlap the In-Memory partitions, so it is possible for the DirectQuery partition to contain all data or some of the data or a superset/subset of the In-Memory partition(s).

An example of Partition Design in Hybrid Mode

An example of Partition Design in Hybrid Mode

 

Summary

As outlined above, the architecture of Hybrid mode is quite clever but would not solve all real-time vs flexibility issue. The advantages and limitations of Hybrid mode can be summarised as follows.

Hybrid Mode Advantages

1. Greater options for client tools (compared to pure DirectQuery only mode), i.e. Excel and other MDX client tools can access the In-Memory partition(s)

2. When diligently partitioned, less resource required for caching, processing and querying In-Memory partitions.

3. Flexibility in accessing Real-time data using the DirectQuery data access, i.e. PowerView and SSRS

Hybrid Mode Limitations

Inherited design constraints of DirectQuery:

1. Restricted DAX functions (as it needs to follow DirectQuery design constraints). For example, TOTALYTD, SAMEPERIODLASTYEAR are not available in DirectQuery, as such, are not available in Hybrid Mode either.

2. No Row Level security. Even if it is possible to implement this at the Source database level, there will be inconsistency in the data returned between using In-Memory and DirectQuery

3. No Calculated Columns

Inherited In-Memory drawbacks:

1. Memory requirements to store the compressed In-Memory data.

2. Processor requirements to process source data into the tabular database and memory.

Data inconsistency limitations:

1. Stale data in the In-Memory and up-to-date data in DirectQuery, which may confuse users

2. Semantic differences between SQL Database Engine (returned by the DirectQuery mode) and the xVelocity In-Memory Analytics Engine (returned by the In-Memory mode)

 

Other Options

Hybrid mode still has the burden of Memory and Processor requirements as much as the default In-Memory mode. When designed and configured carefully the default In-Memory mode may be able to achieve real-time in a same manner as Hybrid mode with less complexity. This of course would be another topic of discussion. So, stay tuned!

 

Wrap Up

Hybrid Mode mainly offers both Real-Time access and Client Tool flexibility. However, it comes with price. This Part 2 of the Hybrid Mode in Tabular BI Semantic Model suggests including relevant data in the tabular model for both In-Memory and DirectQuery. Consistency, scalability, security, partitioning strategy and processing frequency are important factors to consider when implementing Hybrid Mode. Some limitations of the Hybrid Mode is valid as at the time of writing.

If you are implementing Tabular Model database, I’d love to hear your thoughts. Please leave your comments in regards to your experience with Tabular Model, or the Hybrid Mode posts.

 

Edit - 25 July 2012: 

Added more explanation on VertipaqPagingPolicy reference.

Added more explanation in Section 6. Partitioning in Tabular Model is a means of organising the data to be loaded and processed into In-Memory.

Hybrid Mode in Tabular BI Semantic Model – Part 1

During my presentation on “DirectQuery vs Vertipaq Modes in SSAS Tabular Model” at PASS BI/DW VC and SQL Rally in Dallas this year, I briefly explained the Hybrid Mode in Tabular BI Semantic Model (BISM). I would like to discuss Hybrid Mode further in two parts. Here is the first part, which contains a basic walkthrough of Hybrid Mode in Tabular BISM.

On a side note, DirectQuery vs Vertipaq Modes in SSAS Tabular Model slide deck can be downloaded from here: http://www.mssqlgirl.com/slide-deck-directquery-vs-vertipaq-for-pass-dwbi-vc.html

 

Introduction

The default mode of SSAS Tabular Model is the In-Memory (also known as Vertipaq Mode), where the data from various types of data source are processed and loaded into the in-memory Tabular databases. Any queries executed on the Tabular database will be served based on the data in Memory (cache). Due to the state of the art compression algorithm and multi-threaded query processors, the xVelocity in-memory analytics engine can provide fast access to the tabular model objects and data. In-Memory mode supports both DAX and MDX query types.

Another mode available for SSAS Tabular Model is DirectQuery. DirectQuery translates all DAX queries at run time to SQL statements, allowing real time access to the SQL Server source database. Unlike the In-Memory mode, DirectQuery only works with one SQL Server data source. The main advantage of using DirectQuery is the real time access and scalability. This comes with a price of restrictions on a number of DAX functions and missing Calculated Column feature. Only client tools that issues DAX queries can access Tabular Model with DirectQuery mode.

Hybrid Mode combines the design aspect of DirectQuery and Client Tool flexibility of In-Memory. Essentially, Tabular Database Model with Hybrid Mode is designed/developed with DirectQuery enabled, and is published with both DirectQuery access mode and In-Memory access mode. When published, the metadata is deployed and data will be processed into memory for the InMemory access. Hence, Hybrid Mode also requires processing mechanism of In-Memory and also supports both In-Memory partition type and DirectQuery partition type.

Querying in Hybrid Mode

There are two options of querying a Tabular database with Hybrid mode enabled.

The following diagram shows how Hybrid Mode can serve DAX issuing client tools (PowerView, SSRS) and MDX issuing client tools (Excel, Tableau, SSRS).

SSAS_Tabular_Hybrid Mode_Query Flow

 

When a client tool issues a DAX query via the DirectQuery access mode, the DAX query is passed and then converted into an equivalent SQL query that accesses the source SQL Server database directly. The result returned to the DAX issuing client tool will be straight from the source SQL Server data source.

When executing an MDX query via In-Memory access mode, the query is served from the cache (the tabular database). There is no conversion and the results returned to the MDX issuing client tool will be based on the data that is in the cache.

Note: PowerView is a DAX issuing client tool that can work with both DirectQuery and In-Memory. If the primary/default Query Mode of a hybrid Tabular database is DirectQuery, the result is served straight from the source SQL Server data source. If the primary/default mode is In-Memory, the result is served from the cache. More about the two hybrid modes this later.

 

Enabling Hybrid Mode – the Basics

1. Design phase

When creating a Tabular Model solution using SSDT, ensure that DirectQueryMode value is set to On. This will ensure that the solution will conform to DirectQuery design features.

Model.bim Properties on SSDT

Enable DirectQueryMode on tabular model via SSDT

2. Deployment phase

Prior to deploying the solution, change the Query Mode to “In-Memory with DirectQuery” or “DirectQuery with In-Memory”. These are the two available Hybrid modes.

Project Properties of Tabular Model in SSDT

Change Query Mode of Tabular Project in SSDT

 

 

Query Mode: In-Memory with DirectQuery

In-Memory with DirectQuery option means that In-Memory is the primary (or default) connection. Howevever, when needed and if the client tool supports this, the secondary Query Mode, i.e DirectQuery, can be used instead.

This query mode is ideal for the following scenarios

  1. Users are mainly using Excel to perform analysis on the tabular model.
  2. The processed data in memory will be used to serve Excel queries.
  3. The processed data in memory will be used to serve PowerView report.
  4. Only occasional real-time queries required for accessing the real time data, using SSRS as an example.

Query Mode: DirectQuery with In-Memory

DirectQuery with In-Memory option means that DirectQuery is the primary (or default) connection. Howevever, when needed and if the client tool supports this, the secondary Query Mode, i.e In-Memory, can be used instead.

This query mode is ideal for the following scenarios

  1. Users are mainly using PowerView (or DAX issuing Client Tool) to perform analysis on the tabular model.
  2. By default, always returns real time data.
  3. Only occasional processed in memory data is required to be retrieved from Excel.

 

Connecting to Hybrid Mode tabular database

Connecting to a tabular database with Hybrid Mode is the same as connecting to either InMemory or DirectQuery – that is, if you would like to use the primary mode. For example, if a tabular database is published with Query Mode of “DirectQuery with In-Memory”, the default connection via a client tool will always be made through to the DirectQuery, with no extra steps required. Similarly, if the Query Mode is “In-Memory with DirectQuery”, the default connection will use the In-Memory.

When using “DirectQuery with In-Memory” Query Mode, a client tool can connect to the tabular database using the In-Memory mode, by specifying it in the connection. Below is an example for connecting via SQL Server Management Studio, to the In-Memory part to a tabular database that has been published with “DirectQuery with In-Memory” .

SQL Server Management Studio Additional Connection Parameters

Specify "DirectQueryMode" parameter in SSMS

 

Below is a sample of specifying the DirectQueryMode in Excel:

Excel Connection String Dialog Box

Specifying DirectQueryMode in Excel Connection String Property

 

Note: As at the time of writing, the DirectQueryMode connection property cannot be specified on PowerView. So, PowerView will use the primary (default) of the Hybrid Mode of the tabular database.

Wrap Up

Above is the basic walk-through of Hybrid Mode in SSAS Tabular. Hybrid Mode Tabular Databases come in two Query Modes, “DirectQuery with In-Memory” or “In-Memory with DirectQuery”. The first Query Mode in the name is the “primary” Query Mode of the tabular database. Tabular databases with Hybrid mode are designed in DirectQuery, but published with either of the two Query Modes.  Some client tools provide the option to switch to the secondary Query Mode by specifying “DirectQueryMode” connection parameter.

Stay tuned for Part 2 of this series for more information on the design tips and important factors to consider when implementing Tabular solution with Hybrid Mode.


Slide Deck: DirectQuery vs Vertipaq for PASS DW/BI VC

I presented at PASS Data Warehousing and Business intelligence Virtual Chapter on May 3rd, 2012 for the “DirectQuery vs Vertipaq mode in SSAS Tabular Model” session.

I have purposely prepared 30+ slides so that they could be used as a reference to get back to after the session. So here’s the slide deck:

DirectQuery vs Vertipaq modes in SSAS Tabular Model by Julie Koesmarno

If you have any feedback or comments, please don’t hesitate to let me know.

This presentation will be delivered at SQL Rally in Dallas on Friday, 11th May 2012. Come and join me!

Upcoming DirectQuery vs Vertipaq Presentation

How exciting is it for us SQL Professionals to have so many SQL Server events since the beginning of the year? We’ve had 12 Hours of SQL, 24 Hours of PASS, 24 Hours of PASS in Russian edition, SQL Server 2012 Virtual Launch, SQL Saturday (ANZ tour is currently running) and plenty other PASS virtual chapter sessions.
 
I’m quite honoured that I’ve been selected to present at SQL Server User Group in Sydney, 24 Hours of PASS, and SQL Saturday #138 as well as being picked by the Community to present at SQL Rally Dallas this year.

I’d like to focus a bit more on my upcoming SQL Saturday #138 session and SQL Rally Dallas session. The title is “DirectQuery vs Vertipaq Mode in SSAS Tabular Model”. This session will take you to a second step to see what’s beyond the default option (In-Memory / Vertipaq). 

Since the time I wrote the abstract for SQL Rally, Microsoft has rebranded Vertipaq to “xVelocity in-memory analytics engine (VertiPaq)”. Some of the project settings / options have also been changed to refer to “In-Memory”, instead of Vertipaq; while some remain as Vertipaq such as in the Tabular Model Analysis Server Properties. Despite the name changes, they mean the same thing in SSAS. 

The DirectQuery vs Vertipaq Mode in SSAS Tabular Model session brings a tiny step beyond your first leap to deciding/considering In-Memory Tabular mode. It concentrates on introducing DirectQuery and how different it is to In-Memory. The demo will also show how the two modes differ in query execution, design and maintenance aspects; giving you enough information to make an informed decision on which mode to use based on your business case. 

If you are new to Tabular Model or PowerPivot and would like to know more about it, there are quite a number of great resources to get you up to speed with it, as listed at the end of this post. Hopefully by then, you’d be comfortable in learning more in my upcoming “DirectQuery vs Vertipaq Mode in SSAS Tabular Model ” session. Having said that, if you’re completely new to Tabular Model but want to know what’s the fuss is about, come to the session – and have your first leap to learning Tabular Model with me. 

I have been having a great joy using Tabular Model in my current work, and am continuously extending my knowledge by preparing for this presentation. So I do hope that you can attend SQL Saturday #138 in Sydney or SQL Rally in Dallas and join me at the session. 

 

Reading/Watching List

Welcome to Tabular Projects

http://blogs.msdn.com/b/analysisservices/archive/2011/07/13/welcome-to-tabular-projects.aspx

 

Building your first Analysis Services Tabular BI Semantic model with SQL Server 2012

Speaker: Frederik Vandeputte

http://technet.microsoft.com/en-us/edge/building-your-first-analysis-services-tabular-bi-semantic-model-with-sql-server-2012

 

Building the Perfect BI Semantic Tabular Models for Power View

Speaker: Kasper De Jonge

http://technet.microsoft.com/en-us/edge/building-the-perfect-bi-semantic-tabular-models-for-power-view

 

Developing and Managing a Business Intelligence Semantic Model (BISM) in SQL Server Code Name “Denali” Analysis Services [BIA-316-M]

Speaker: Cathy Dumas

http://www.sqlpass.org/summit/2011/Speakers/CallForSpeakers/SessionDetail.aspx?sid=1964

Note: if you have the PASS Summit 2011 DVDs, pull this session out and start watching it. Cathy was superb in this presentation and I would consider this as an energetic presentation where I “ooo… aaaa…”-ed a few times!

SQLRally Dallas 2012 Website