1. Turn on the execution plan, and statistics
The first thing you need to do is to use the tools that help you determine whether a query done one way is better than another. That’s what we’re trying to do. By comparing the original query to a new query that we come up with is the best way to evaluate the benefits of any changes.
To do this, go into Sql Server Management Studio and select the Query menu. Select the “Include Actual Query Plan.” This turns on the graphical Execution Plan when you execute a query, and that can be found in the bottom pane after the execution.
In the Execution Plan, you can mouse over the components of the plan and it provides tooltip information boxes. The box contains Estimated Subtree Cost, which can be used to help determine whether one query is better than another. Of course, it’s not always right, as some query parts are not included in the execution plan, but it helps. It is also helpful to know the estimated number of rows, which is also found in this tooltip box.
Next, turn on statistics. Type the following statement:
SET STATISTICS IO ON;
This causes statistics to be output to the Messages tab in the bottom pane. The information you want here is mainly logical reads and physical reads. Logical reads are page reads from memory. Physical reads are page reads from disk. This stat can be a little deceptive as it doesn’t include CPU in the metric, but in general, the less page reads, the less work done, and so the more performant the query will be.
To counteract the above two you should also compare the actual execution times. To do this, eecute the following statement:
SET STATISTICS TIME ON;
This also has issues, as blocking and contention issues affect the output time. You should execute the query a few times to determine how accurate the time shown is.
2. Use Clustered Indexes
Having the clustered index on the primary key is sometimes not the most efficient place for the clustered index to be. A clustered index is the most performant type of index. The whole table is sorted according to the clustered index. If the table is involved in lots of joins based on the primary key, it is probably the right place for it to be, but if you are continually filtering or grouping on other columns in a table, then you should possibly consider changing the primary key index to Non-Clustered, and putting the clustered index on those filtered or grouped columns.
The following statement removes and existing clustered index on the primary key and replaces it with a non-clustered index:
ALTER TABLE MySchema.SalesOrderHeader
DROP CONSTRAINT PK_SalesOrderHeader
GO
ALTER TABLE MySchema.SalesOrderHeader
ADD CONSTRAINT PK_SalesOrderHeader
PRIMARY KEY NONCLUSTERED(SalesOrderID);
GO
Then the following statement adds a new clustered index to a table.
CREATE CLUSTERED INDEX MyClusteredIndex
ON MySchema.SalesOrderHeader (OrderID)
GO
3. Use Indexed Views
Indexed Views have been around for a while. A view is like a named query, and these days you can add indexes to them. If used correctly, they can cause a massive improvement in execution times, often better than a clustered index with covering columns on the original table. Also, in SQL Server Developer Edition and Enterprise Edition, a view index will also be automatically used if it is the best index even if you don’t actually specify the view in your query!
CREATE VIEW MySchema.SalesByCustomer
WITH SCHEMABINDING
AS
SELECT soh.SalesTerritoryID, soh.CustomerID,
SUM(sod.Quantity * sod.UnitPrice)
FROM MySchema.SalesOrderHeader soh
INNER JOIN MySchema.SalesOrderDetail sod
ON (soh.SalesOrderID = sod.SalesOrderID)
GROUP BY soh.SalesOrderTerritory, soh.CustomerID
GO
Note the use of the schema binding attribute. This prevents you from changing underlying tables while this view exists, and is necessary if you want to add an index. Some people avoid indexed views for this reason, as the maintenance becomes more complicated as further dependencies to the view are created. The following statement adds an index:
CREATE UNIQUE CLUSTERED INDEX IdxSalesOrderView
ON MySchema.SalesByCustomer(
SalesTerritoryID, CustomerID
)
GO
4. Use Covering Indexes
Covering indexes are a feature that was newly added to SQL 2005. Basically, you can create an index optimised for the query itself based on joins, filters and grouping, and then add additional columns that can be retrieved directly from the index for use in select statements, as follows:
CREATE NONCLUSTERED INDEX TestIndex
ON MySchema.SalesOrderDetail(OrderId)
INCLUDE (Quantity, UnitPrice)
The above statement causes a non-clustered index to be created on the SalesOrderDetail table. If queries are executed on the OrderId column, the index will be used, and if the only other columns being retrieved are Quantity and UnitPrice, then the query optimiser doesn’t need to retrieve any extra columns from the underlying table. It can just use the index. Because the query optimiser doesn’t need to query the original table, performance is improved.
5. Keep your clustered index small.
One thing you need to consider when determining where to put your clustered index is how big the key for that index will be. The problem here is that the key to the clustered index is also used as the key for every non-clustered index in the table. So if you have a large clustered index on a table with a decent number of rows, the size could blow out significantly. In the case where there is no clustered index on a table, this could be just as bad, because it will use the row pointer, which is 8 bytes per row.
6. Avoid cursors
A bit of a no-brainer. Cursors are less performant because every FETCH statement executed is equivalent to another SELECT statement execution that returns a single row. The optimiser can’t optimise a CURSOR statement, instead optimising the queries within each execution of the cursor loop, which is undesireable. Given that most CURSOR statements can be re-written using set logic, they should generally be avoided.
7. Archive old data
Another no-brainer, so I won’t say much. If you want to improve query performance, give the optimiser less work to do. If you can cut down the number of rows the query has deal with, then performace will improve. I have no problem with people creating audit triggers to move historical data into other tables for this reason. Alternatively, if you don’t need your data after a certain period of time, back up your database and remove the data.
8. Partition your data correctly
These days, you don’t actually have to move old data out of a table to improve query performance. You can partition your table into a number of data segments based on a partition function. The query optimiser can use the partition function to look at rows only on the most appropriate filegroup. To create partitions, you need a partition function and a partition scheme.
CREATE PARTITION FUNCTION myRangePartitionFunction(int)
AS RANGE RIGHT FOR VALUES (1,100,1000)
Once the partition function is created, you can then apply the function to a partition scheme for a table.
CREATE PARTITION SCHEME myRangePartitionScheme
AS PARTITION myRangePartitionFunction
TO (filegrp1, filegrp2, filegrp3, filegrp4)
Then it’s just a matter of creating the table to use the partition scheme on the column you decided to partition on:
CREATE TABLE mySchema.myPartitionTable(
col1 int,
col2 nvarchar(100)
)
ON myRangePartitionScheme(col1)
9. Remove user-defined inline scalar functions
Inline scalar functions are convenient if you want to return a single value, but at the expense of performance. They look somewhat like stored procedures, and they can be used in SQL statements. The problem is that they are not expanded and therefore not optimised into the query plan by the query optimiser. Bad news, because it turns a Seek into a Scan. Queries like this may appear to be performant in the Execution plans and also in the IO statistics, but when you run the query, it can perform really really badly. No seriously, really bad.
Here’s an example of what I’m talking about:
CREATE FUNCTION dbo.fnGetPostalCode(
@Suburb nvarchar(100),
@State nvarchar(10))
RETURNS int
AS
BEGIN
RETURN ISNULL(
(
SELECT PostalCode
FROM dbo.PostalCode
WHERE Suburb = @Suburb
AND State = @State
), -1 );
END
GO
The following statement will only perform a clustered index scan, not a seek, and on a big table this could seriously affect performance.
SELECT s.SalesPersonID,
s.SuburbName,
s.State,
dbo.fnGetPostalCode(s.SuburbName,s.State)
AS PostalCode
FROM dbo.SalesPerson
You can have a look at the details by clicking on SQL Server Management Studio’s Query menu, and selecting “Include Actual Execution Plan”
One way to get around this is to simply inline the underlying query from the function, as follows:
SELECT s.SalesPersonID, s.SuburbName, s.State,
ISNULL( (SELECT PostalCode
FROM dbo.PostalCode
WHERE Suburb = s.SuburbName
AND State = s.State), -1)
AS PostalCode
FROM dbo.SalesPerson
Inline the SQL statement will perform significantly better.
10. Use APPLY
The apply statement was created for the situation where you put multiple inline nested queries in the one statement. For example, take the following statement:
SELECT soh.SalesOrderID,
Quantity=(SELECT TOP 1 (Quantity)
FROM Sales.SalesOrderDetails
WHERE SalesOrderID = soh.SalesOrderID),
UnitPrice=(SELECT TOP 1 (UnitPrice)
FROM Sales.SalesOrderDetails
WHERE SalesOrderID = soh.SalesOrderID)
FROM Sales.SalesOrderHeader soh
This performs an extra query, retrieving data from another table using the same criterion. This can now be replaced with the following:
SELECT soh.SalesOrderID, soh.OrderDate, a.*
FROM Sales.SalesOrderHeader soh
CROSS APPLY (
SELECT TOP (1) sod.UnitPrice, sod.Quantity
FROM Sales.SalesOrderDetail sod
WHERE sod.SalesOrderId = soh.SalesOrderId
ORDER BY sod.Quantity DESC
) as a
11. Use computed columns
Computed columns are derived from other columns in a table. By creating and indexing a computed column, you can turn what would otherwise be a scan into a seek. For example, if you needed to calculate SalesPrice and you had a Quantity and UnitPrice column, multiplying them in the SQL inline would cause a table scan as it multiplied the two columns together for every single row. Create a computed column called SalesPrice, then index it, and the query optimiser will no longer need to retrieve the UnitPrice and Quantity data and do a calculation – it’s already done.
12. Use the correct transaction isolation level
If there are a lot of rows in your table, multiple concurrent requests to that table could cause contention if the correct transaction isolation level is not set. If requests are repeatedly blocked, it could be time to consider whether to change.
For example, READ UNCOMMITED is equivalent to dirty reads, or NOLOCK. That is, if a transaction is in the middle of processing and you read a row, the data may not be valid, especially if multiple inserts/updates are occuring that require atomicity. This is the most performant and it ignores locking altogether, but is generally not allowed by good design and is a special case.
With READ_COMMITTED_SNAPSHOT, it specifies that any data read by the transaction will be the transactionally consistent version of the data that existed at the start of the transaction. Internally, it makes a versioned copy of the data and this is placed in tempdb until the transaction has competed. Except when the database is being recovered, snapshot transactions do not request locks when reading data, and therefore do not block other transactions from writing data. Transactions writing data also do not block other transactions reading data.
There are various other types of transaction options, including REPEATABLE_READ and SERIALIZABLE amongst others that you can look at to determine whether they are appropriate for your needs.
No comments:
Post a Comment