SQL DBA Insights: TSQL Tuning Tips

Introduction:

This Chapter will provide the overview of Query Optimization in SQL Server.

Objective:

After completing this chapter you will be able to:

· Identify the Query which needs to be tuned.

· Things which you need to keep in mind while optimizing a Query.

· How to optimize the Query.

Optimizing Query Performance

1. Comparison on same data types. Make sure that you are performing the comparison on same data types. Otherwise Sql server will perform Implicit Conversion to match the datatypes, for which the query will go for a table scan instead of index seek.

2. Use Correlated Subqueries rather than using Subqueries without Correlation to the Outer Query. An uncorrelated subquery is executed only once per query execution and returns only one value. Correlated subqueries include a reference to the outer query. Typically, this reference is used to filter the correlated subquery. A correlated subquery is typically good for performance when used in combination with the EXISTS operator to filter the outer query. Ex. As below /*Non Correlated query */ SELECT dept_id, dept_name FROM dept WHERE dept_id IN (SELECT dept_id FROM employee); /*Correlated query*/ SELECT a.dept_id, a.dept_name FROM dept a WHERE a.dept_id IN (SELECT b.dept_id FROM employee b WHERE a.dept_id= b.dept_id);

3. Scalar and Table-Valued User-Defined Function A Scalar User-Defined Function that returns a single value. This type of function can significantly degrade performance. The reason is this is that these functions are not expanded and optimized into the main query plan by the query optimizer, but they are rather just called from the execution plan. This also means that the cost of whatever is done inside the function is not included in the cost estimates found in graphical execution plan for the query. This same problem occurs for the output of the SET STATISTICS IO ON statement, which will contain no references to what is done inside the UDF. Table-Valued User-Defined Function is actually just a view that can accept parameters. It is optimized in the same way as a view or any select statement would be.

4. Use Derived Table Instead of using temporary tables, consider using a derived table to reduce I/O and boost your application's performance. A derived table is the result set from a SELECT statement is used as a table from which the outer SELECT statement selects its data. Consider the following query to find the second highest salary from Employees table: SELECT MIN(Salary) FROM Employees WHERE EmpID IN ( SELECT TOP 2 EmpID FROM Employees ORDER BY Salary DESC); The same query can be re-written using a derived table as shown below, and it performs twice as fast as the above query: SELECT MIN(Salary) FROM ( SELECT TOP 2 Salary FROM Employees ORDER BY Salary DESC ) AS A

5. Use Table Variable rather than Temp Table A table variable and all associated data is stored in memory. . However, if the amount of data placed into the table variable causes it to require more storage space than is available in memory, the overflow will be spooled to disk within tempdb. Consider using a table variable instead of creating a temporary table. Advantages are: - -Table variable is faster. -Requires less locking. - Requires less resource. - Table variables found in stored procedure. - result in fewer compilations (than when using temporary tables). N.B. In some cases temp table gives better performance (where we have to deal with large amount of data, temp table will perform better).

6. Use Indexed views in place of view without an index creating indexed view can greatly improve read performance of queries View is created with the clustered index is stored in the database the same way as a clustered index on a table. Once clustered index has been created for a view, you can also create non-clustered indexes for the same view.

7. Cursors You should generally avoid using cursors because of their negative effect on performance.

8. Avoid unnecessary joins To optimize queries, one of the basic strategies is to minimize the number of join clauses used. Another consideration is that outer joins typically incur more cost than inner joins because of the extra work needed to find the unmatched rows. Also try to minimize the number of joins in your query.

9. Using SELECT -Restricted columns and rows use less resources (memory, i/o etc) and less network traffic hence better performance. Use the select statements with TOP keyword or the SET ROWCOUNT statement to restrict number of rows.

-SELECT * and SELECT without WHERE will perform a table scan. A table scan will lock the table during the time-consuming scan, preventing other users from accessing it, hurting concurrency. Table scan is slower than clustered/non clustered seek.

-Another negative aspect of a table scan is that it will tend to flush out data pages from the cache with useless data, which reduces SQL Server's ability to reuse useful data in the cache, which increases disk I/O and hurts performance.

10. Try to avoid SELECT DISTINCT DISTINCT RETRIEVE rows depending on WHERE clause, then eliminate duplicate row. The DISTINCT clause creates a lot of extra work for SQL Server, and reduces the physical resources that other SQL statements have at their disposal. Because of this, only use the DISTINCT clause if it is necessary.

11. Using UNION By default UNION perform UNION DISTINCT. The UNION ALL statement is much faster than UNION, because UNION ALL does not look for duplicate rows, and UNION statement does look for duplicate rows. UNION combines output of all select statement, and then performs a SELECT DISTINCT in order to eliminate any duplicate rows. This process occurs even if there are no duplicate rows. If you know that there will never be any duplicate rows, then you should use the UNION ALL statement instead of the UNION statement.

12. Avoid using ORDER BY Avoid using ORDER BY in your SELECT statements unless you really need to, as it adds a lot of extra overhead. Sorting often occurs when any of the following T-SQL statements are executed: - GROUP BY - SELECT DISTINCT - UNION - CREATE INDEX On the other hand, there are few ways that sorting overhead can be reduced. These include: - Keep the number of rows to be sorted to a minimum. - Keep the number of columns to be sorted to the minimum. - Keep the width (physical size) of the columns to be sorted to a minimum. - Sort column with number datatypes instead of character datatypes.

13. Avoid GROUP BY with HAVING clause The HAVING clause is used to restrict the result set returned by the GROUP BY clause. Here's what happens. - The WHERE clause is used to select the appropriate rows that need to be grouped. - The GROUP BY clause divides the rows into sets of grouped rows, and then aggregates their values. - Last, the HAVING clause then eliminates undesired aggregated groups. If the WHERE clause is used to eliminate as many of the undesired rows as possible, this means the GROUP BY and the HAVING clauses will have less work to do, boosting the overall performance of the query. In many cases, you can write your select statement so, that it will contain only WHERE and GROUP BY clauses without HAVING clause. This can improve the performance of your query.

14. Don’t use IN if you have a choice to use between: The Query Optimizer converts IN clause to the OR operator when parsing your code. Because of this, keep in mind that if the referenced column in your query doesn't include an index, then the Query Optimizer will perform a table scan or clustered index scan on the table. When you have a choice of using the IN or the BETWEEN, use BETWEEN clause, as it is more efficient. When you have a choice of using the IN or the EXISTS, use EXISTS clause, as it is usually more efficient

If use of IN become necessary then following tips help to reduce performance overhead:

- Use IN with reference column having appropriate index.

- Order the list of values so that the most frequently found values are placed at the first of the list, and the less frequently found values are placed at the end of the list. This can speed performance because the IN option returns true as soon as any of the values in the list produce a match. The sooner the match is made, the faster the query completes.

- If you find that SQL Server uses a TABLE SCAN instead of an INDEX SEEK when you use an IN or OR clause as part of your WHERE clause, even when those columns are covered by an index, consider using an index hint to force the Query Optimizer to use the index.

15. Avoid NOT IN

NOT IN offers poor performance because the SQL Server optimizer has to use a nested table scan to perform this activity, instead try to use one of the following options, all of which offer better performance:

- Use EXISTS or NOT EXISTS

- Perform a LEFT OUTER JOIN and check for a NULL condition

- Use IN

16. Using Operators In a WHERE clause, the various operators used directly affect how fast a query will run. This is because some operators lend themselves to speed over other operators. Of course, you may not have any choice of which operator you use in your WHERE clauses, but sometimes you do.

Here are the key operators used in the WHERE clause, ordered by their performance. Those operators at the top will produce results faster than those listed at the bottom.

1. =

2. >, >=, <, <=

3. LIKE

4. <>

This lesson here is to use = as much as possible, and <> as least as possible. NOT operator also go for table and index scans.

17. Using LIKE Try to use one or more leading character in the LIKE clause. If you use a leading character in your LIKE clause, then the Query Optimizer has the ability to potentially use an index to perform the query, speeding performance and reducing the load on SQL Server. The more leading characters you can use in the LIKE clause, the more likely the Query Optimizer will find and use a suitable index. But if the leading character in a LIKE clause is a wildcard then table scan occurs.

18. Using AND SQL Server evaluates AND from left to right in the order they are written, it will not honor parenthesis. Because of this, you may want to consider one of the following when using AND:

- Of the search criterions in the WHERE clause, at least one of them should be based on a highly selective column that has an index.

- If at least one of the search criterions in the WHERE clause is not highly selective, consider adding indexes to all of the columns referenced in the WHERE clause.

- If none of the column in the WHERE clause are selective enough to use an index on their own, consider creating a covering index for this query.

- Locate the least likely true AND expression first. This way, if the AND expression is false, the clause will end immediately, saving time.

- Put the least complex AND expression first. This way, if it is false, less work will have to be done to evaluate the expression.

19. Using OR Use parenthesis with OR otherwise it may accidentally retrieve much more data than you need, which hurts performance. If any of the referenced columns in the OR clause are not indexed (or does not have a useful index), the Query Optimizer will perform a table scan or a clustered index scan. If you have a query that uses ORs and it not making the best use of indexes, consider rewriting it as a UNION ALL, and then testing performance. Only through testing can you be sure that one version of your query will be faster than another.

20. Perform Multiple UPDATE Generally, it is better to perform multiple UPDATEs on records in one fell swoop (using one query), instead of running the UPDATE statement multiple times (using multiple queries).

21. Use column list in your INSERT statement Always use a column list in your INSERT statements. This helps in avoiding problems when the table structure changes (like adding a column).

22. Consider Using the TABLOCK hint along with BULK INSERT If you use the BULK INSERT to import data into SQL Server, seriously consider using the TABLOCK hint along with it. This will prevent SQL Server from running out of locks during vary large imports, and also boost performance due to the reduction of lock contention.

23. Use EXECUTE sp_executesql EXECUTE sp_executesql in dynamic query is more versatile than EXECUTE; and because sp_executesql generates execution plans that are more likely to be reused by SQL Server, sp_executesql is more efficient than EXECUTE.

SQL DBA Insights

Monday, 18 June 2018

TSQL Tuning Tips

No comments:

Post a Comment