Introduction:
This Chapter will provide the overview of Query
Optimization in SQL Server.
Objective:
After
completing this chapter you will be able to:
·
Identify the Query which needs to be tuned.
·
Things which you need to keep in mind while
optimizing a Query.
·
How to optimize the Query.
Optimizing Query Performance
1. Comparison on same
data types.
Make
sure that you are performing the comparison on same data types. Otherwise Sql
server will perform Implicit Conversion to match the datatypes, for
which the query will go for a table scan instead of index seek.
2. Use Correlated
Subqueries rather than using Subqueries without Correlation to the Outer Query.
An uncorrelated subquery is executed only
once per query execution and returns only one value. Correlated subqueries
include a reference to the outer query. Typically, this reference is used to
filter the correlated subquery. A correlated subquery is typically good for
performance when used in combination with the EXISTS operator to filter the
outer query. Ex. As below /*Non
Correlated query */ SELECT
dept_id, dept_name FROM dept WHERE dept_id IN (SELECT dept_id FROM employee); /*Correlated query*/ SELECT a.dept_id, a.dept_name FROM dept a WHERE
a.dept_id IN (SELECT b.dept_id FROM employee b WHERE a.dept_id= b.dept_id);
3. Scalar and
Table-Valued User-Defined Function
A
Scalar User-Defined Function that returns a single value. This type of function
can significantly degrade performance. The reason is this is that these
functions are not expanded and optimized into the main query plan by the query
optimizer, but they are rather just called from the execution plan. This also
means that the cost of whatever is done inside the function is not included in
the cost estimates found in graphical execution plan for the query. This same
problem occurs for the output of the SET STATISTICS IO ON statement, which will
contain no references to what is done inside the UDF. Table-Valued User-Defined
Function is actually just a view that can accept parameters. It is optimized in
the same way as a view or any select statement would be.
4. Use Derived
Table
Instead
of using temporary tables, consider using a derived table to reduce I/O and
boost your application's performance. A derived table is the result set from a
SELECT statement is used as a table from which the outer SELECT statement
selects its data. Consider the following query to find the second highest
salary from Employees table:
SELECT
MIN(Salary) FROM Employees WHERE EmpID IN ( SELECT TOP 2 EmpID FROM Employees
ORDER BY Salary DESC);
The same query can be re-written using a derived table as shown below,
and it performs twice as fast as the above query:
SELECT MIN(Salary)
FROM ( SELECT TOP 2 Salary FROM Employees ORDER BY Salary DESC ) AS A
5. Use Table Variable
rather than Temp Table
A table variable and all
associated data is stored in memory. . However, if the amount of data placed
into the table variable causes it to require more storage space than is
available in memory, the overflow will be spooled to disk within tempdb.
Consider using a table variable instead of creating a temporary table.
Advantages are: - -Table
variable is faster. -Requires less
locking. -
Requires less resource. -
Table variables found in stored procedure. - result in fewer
compilations (than when using temporary tables). N.B. In
some cases temp table gives better performance (where we have to deal with
large amount of data, temp table will perform better).
6. Use Indexed views in
place of view without an index
creating
indexed view can greatly improve read performance of queries View is created
with the clustered index is stored in the database the same way as a clustered
index on a table. Once clustered index has been created for a view, you can
also create non-clustered indexes for the same view.
7. Cursors You
should generally avoid using cursors because of their negative effect on
performance.
8. Avoid unnecessary
joins
To optimize
queries, one of the basic strategies is to minimize the number of join clauses
used. Another consideration is that outer joins typically incur more cost than
inner joins because of the extra work needed to find the unmatched rows. Also
try to minimize the number of joins in your query.
9. Using SELECT
-Restricted columns and
rows use less resources (memory, i/o etc) and less network traffic hence better
performance. Use the select statements with TOP keyword or the SET ROWCOUNT
statement to restrict number of rows.
-SELECT
* and SELECT without WHERE will perform a table scan. A table scan will lock
the table during the time-consuming scan, preventing other users from accessing
it, hurting concurrency. Table scan is slower than clustered/non clustered
seek.
-Another
negative aspect of a table scan is that it will tend to flush out data pages
from the cache with useless data, which reduces SQL Server's ability to reuse
useful data in the cache, which increases disk I/O and hurts performance.
10.
Try to avoid SELECT DISTINCT
DISTINCT
RETRIEVE rows depending on WHERE clause, then eliminate duplicate row. The
DISTINCT clause creates a lot of extra work for SQL Server, and reduces the
physical resources that other SQL statements have at their disposal. Because of
this, only use the DISTINCT clause if it is necessary.
11.
Using UNION
By
default UNION perform UNION DISTINCT. The UNION ALL statement is much faster
than UNION, because UNION ALL does not look for duplicate rows, and UNION
statement does look for duplicate rows. UNION combines output of all select
statement, and then performs a SELECT DISTINCT in order to eliminate any
duplicate rows. This process occurs even if there are no duplicate rows. If you
know that there will never be any duplicate rows, then you should use the UNION
ALL statement instead of the UNION statement.
12.
Avoid using ORDER BY Avoid
using ORDER BY in your SELECT statements unless you really need to, as it adds
a lot of extra overhead. Sorting often occurs when any of the following T-SQL
statements are executed: -
GROUP BY
- SELECT DISTINCT
-
UNION
-
CREATE INDEX
On the other hand, there are few ways that sorting overhead can be
reduced. These include: - Keep
the number of rows to be sorted to a minimum.
- Keep the number of columns to be sorted to
the minimum. -
Keep the width (physical size) of the columns to be sorted to a minimum. -
Sort column with number datatypes instead of character datatypes.
13.
Avoid GROUP BY with HAVING clause
The
HAVING clause is used to restrict the result set returned by the GROUP BY
clause. Here's what happens. - The WHERE clause is used to select the
appropriate rows that need to be grouped.
- The GROUP BY clause divides the
rows into sets of grouped rows, and then aggregates their values. - Last, the HAVING
clause then eliminates undesired aggregated groups. If the WHERE clause is used to
eliminate as many of the undesired rows as possible, this means the GROUP BY
and the HAVING clauses will have less work to do, boosting the overall
performance of the query. In many cases, you can write your select statement
so, that it will contain only WHERE and GROUP BY clauses without HAVING clause.
This can improve the performance of your query.
14.
Don’t use IN if you have a choice to use between:
The
Query Optimizer converts IN clause to the OR operator when parsing your code.
Because of this, keep in mind that if the referenced column in your query
doesn't include an index, then the Query Optimizer will perform a table scan or
clustered index scan on the table. When you have a
choice of using the IN or the BETWEEN, use BETWEEN clause, as it is more
efficient. When
you have a choice of using the IN or the EXISTS, use EXISTS clause, as it is
usually more efficient
If use of IN become necessary
then following tips help to reduce performance overhead:
-
Use IN with reference column having appropriate index.
- Order the list of values so that the most frequently found
values are placed at the first of the list, and the less frequently found
values are placed at the end of the list. This can speed performance because
the IN option returns true as soon as any of the values in the list produce a
match. The sooner the match is made, the faster the query completes.
- If you find that SQL Server uses a
TABLE SCAN instead of an INDEX SEEK when you use an IN or OR clause as part of
your WHERE clause, even when those columns are covered by an index, consider
using an index hint to force the Query Optimizer to use the index.
15. Avoid NOT IN
NOT IN offers poor
performance because the SQL Server optimizer has to use a nested table scan to
perform this activity, instead try to use one of the following options, all of
which offer better performance:
- Use EXISTS or NOT EXISTS
- Perform a LEFT OUTER JOIN and check for a NULL condition
- Use IN
16. Using Operators
In
a WHERE clause, the various operators used directly affect how fast a query will
run. This is because some operators lend themselves to speed over other
operators. Of course, you may not have any choice of which operator you use in
your WHERE clauses, but sometimes you do.
Here are the key
operators used in the WHERE clause, ordered by their performance. Those
operators at the top will produce results faster than those listed at the
bottom.
1. =
2. >, >=, <, <=
3. LIKE
4. <>
This lesson here is to
use = as much as possible, and <> as least as possible. NOT operator also
go for table and index scans.
17. Using LIKE
Try
to use one or more leading character in the LIKE clause. If you use a leading
character in your LIKE clause, then the Query Optimizer has the ability to
potentially use an index to perform the query, speeding performance and
reducing the load on SQL Server. The more leading characters you can use in the
LIKE clause, the more likely the Query Optimizer will find and use a suitable
index. But if the leading character in a LIKE clause is a wildcard then table
scan occurs.
18. Using AND
SQL Server
evaluates AND from left to right in the order they are written, it will not
honor parenthesis. Because of this, you may want to consider one of the
following when using AND:
- Of the search criterions in the WHERE clause, at least one
of them should be based on a highly selective column that has an index.
- If at least one of the search criterions in the WHERE
clause is not highly selective, consider adding indexes to all of the columns
referenced in the WHERE clause.
- If none of the column in the WHERE clause are selective
enough to use an index on their own, consider creating a covering index for
this query.
- Locate the least likely true AND expression first. This
way, if the AND expression is false, the clause will end immediately, saving
time.
- Put the least complex
AND expression first. This way, if it is false, less work will have to be done
to evaluate the expression.
19. Using OR
Use parenthesis with OR otherwise
it may accidentally retrieve much more data than you need, which hurts
performance. If any of the referenced columns in the OR clause are not indexed
(or does not have a useful index), the Query Optimizer will perform a table
scan or a clustered index scan. If you have a query that uses ORs and it not
making the best use of indexes, consider rewriting it as a UNION ALL, and then
testing performance. Only through testing can you be sure that one version of
your query will be faster than another.
20. Perform Multiple
UPDATE Generally, it is better to perform multiple UPDATEs on
records in one fell swoop (using one query), instead of running the UPDATE
statement multiple times (using multiple queries).
21. Use column list in
your INSERT statement
Always use a column list in your
INSERT statements. This helps in avoiding problems when the table structure
changes (like adding a column).
22. Consider Using the
TABLOCK hint along with BULK INSERT
If you use the BULK INSERT to import data into SQL
Server, seriously consider using the TABLOCK hint along with it. This will
prevent SQL Server from running out of locks during vary large imports, and
also boost performance due to the reduction of lock contention.
23. Use EXECUTE
sp_executesql
EXECUTE sp_executesql in
dynamic query is more versatile than EXECUTE;
and because sp_executesql generates execution plans that are more
likely to be reused by SQL Server, sp_executesql is more efficient than
EXECUTE.
No comments:
Post a Comment