24 Hours of PASS – High Volume Data Processing Techniques Without Driving Your DBA Crazy

I have just delivered my presentation on “High Volume Data Processing Techniques Without Driving Your DBA Crazy!” via webinar at 24 Hours of PASS (24HOP) Spring 2012. Thank you for those who attended the session. Hope it was useful for you. If you have any questions, please don’t hestitate to contact me and I will do my best to answer them.

 

As promised, below is a list of links that I briefly touched on at the presentation.

SQL Server 2012

ORDER BY Clause (OFFSET .. FETCH NEXT …)

http://msdn.microsoft.com/en-us/library/ms188385%28v=SQL.110%29.aspx

 

Paging Function Performance in SQL Server 2012

http://www.mssqlgirl.com/paging-function-performance-in-sql-server-2012.html

 

Columnstore Indexes

http://msdn.microsoft.com/en-us/library/gg492088(v=SQL.110).aspx

 

 

SSIS

The Data Loading Performance Guide

http://msdn.microsoft.com/en-us/library/dd425070(v=sql.100).aspx

 

Minimally Logging Bulk Load Inserts into SQL Server

http://www.mssqltips.com/sqlservertip/1185/minimally-logging-bulk-load-inserts-into-sql-server/

 

BDD

The “Balanced Data Distributor” SSIS

http://blogs.msdn.com/b/sqlperf/archive/2011/05/25/the-balanced-data-distributor-for-ssis.aspx

 

Download Center – BDD

http://www.microsoft.com/download/en/details.aspx?id=4123

 

Special thanks to 24HOP team and IBTalk for everything. I hope to present with 24HOP in the near future. The recording will be available in a few weeks time via the 24HOP website.

 

 

Filtered Index Limitation

Filtered Index feature has been introduced since SQL Server 2008. In my recent survey with some Data Analysts, Database Developers and DBAs, it seems that Filtered Index has been ignored or forgotten. So here’s some reminder on why it’s a good idea to rethink about using Filtered Index.

Filtered index characteristics:

  • an optimised nonclustered index
  • containing only the specified subset of data, specified by the filter predicate
  • referring to column(s) of the table which it belongs to

Having Filtered Index is particularly useful if only the same subset of data is being queried very frequently (or almost all the time). Filtered index will be smaller in size compared to an index with the same definition without filtered index.

Below is an illustration of the Production.WorkOrder set and its subset data where DueDate between 1 Jan 2011 and 30 Jun 2011.

Set Diagram representing Production.WorkOrder records

 

The Non-Filtered index size is much larger in comparison to the Filtered index size as shown below.  The smaller the index size, the quicker for SQL Server will fetch the data.

Non Filtered Index vs Filtered Index Size Comparison

 

A couple of basic usage examples are listed below:

1. Reports are generated only for the year to date data.

A good example will be creating an index filtering the date column from the beginning of the year. e.g.

CREATE NONCLUSTERED INDEX IX_Production_WorkOrder_DueDate ON Production.WorkOrder
(
     DueDate
)
WHERE DueDate >= '01 Jan 2011' 

Of course, at the beginning of the year next year, you may wish to change the DueDate clause to refer to “01 Jan 2012″ (instead of “01 Jan 2011″).  To avoid manually changing the filter every year, you could create a SQL Server job that dynamically alters the index so that the filtered index is referring to the new DueDate filter at the specified date of the year.

 

2. Data is processed only for Customers that have email address ID value.

If there is more data with Non Nullable EmailAddressID values than the NULL EmailAddressID, creating filtered index will reduce size and therefore improve on performance (compared to creating a non-filtered index).

CREATE NONCLUSTERED INDEX IX_Person_Person ON Person.Person
(
     EmailAddressID
)
WHERE EmailAddressID IS NOT NULL

 

More scenarios and best practice have been documented here on BOL: http://msdn.microsoft.com/en-us/library/cc280372(v=SQL.100).aspx

The filter predicate only supports simple comparison operator such as the following logical operators:

  1. =
  2. !=
  3. <
  4. <=
  5. >
  6. >=
  7. IS NULL
  8. IS NOT NULL

This means “LIKE” and “BETWEEN … AND …” comparisons cannot be used.  However, there is a workaround  to the “BETWEEN … AND …”  comparison, i.e. by replacing it with the “>=” and “<=” operators, such as shown in the snippet below.

CREATE NONCLUSTERED INDEX IX_Production_WorkOrder_DueDate ON Production.WorkOrder

(
     DueDate
)
WHERE DueDate >= '01 Jan 2011' AND DueDate <= '30 Jun 2011'
-- equivalent to DueDate BETWEEN '01 Jan 2011' AND '30 Jun 2011

 

When implementing Filtered Indexes, I would recommend running the result of turning on Statistics IO for before and after.  This will give you a better understanding of how much improvement you will gain with the Filtered Index.

An article from MSSQLTips worth reading to get to know more about Filtered Index can be found here: http://www.mssqltips.com/sqlservertip/1785/sql-server-filtered-indexes-what-they-are-how-to-use-and-performance-advantages/