Industry standard to follow for TSQL coding:
Summary
This will provide a
reasonable, consistent and effective coding style, which basically highlights
some common T-SQL best practices to ensure reliable, robust and efficient SQL
code. Here are the some common T-SQL best guidelines for you to maximize database
performance.
General Key Points for Best Practices
1. Maintain T-SQL format during
coding
The formatting of SQL code may not seem that important, but
consistent formatting makes it easier for others to scan and understand your
code. So, it’s a good practice to have a structured code. In this case, you may
take help from ONLINE TSQL format by clicking the below link.
1. Do not use SELECT * in your query
It is recommended to put only required column
name while writing T-SQL query, instead of *. This technique results in reduced
disk I/O and better performance.
2.
Use ANSI Standard Join Clause
Always try to avoid old style joining clause, rather you should
use ANSI join which supports FULL outer join. Please find the below example for
the same.
-- old style
join
SELECT a.Au_id, t.Title
FROM
TITLES t,
AUTHORS a,
TITLEAUTHOR ta
WHERE
a.Au_id = ta.Au_id
AND ta.Title_id = t.Title_id
AND t.Title LIKE ‘%Computer%’
--ANSI join syntax
SELECT a.Au_id,
t.Title
FROM
AUTHORS a
INNER JOIN TITLEAUTHOR ta
ON
a.Au_id = ta.Au_id
INNER JOIN TITLES t
ON
ta.Title_id = t.Title_id
WHERE t.Title LIKE ‘%Computer%’
3.
Do not use column numbers in the
ORDER BY clause
Always use column names in an order by
clause. Try to avoid positional references
4.
Do not prefix your stored
procedure names with “sp_”
Whenever SQL Server finds a procedure name commencing with sp_, it
first tries to locate the procedure in the ‘master’ database, then it looks for
any qualifiers (database, owner) provided, then it tries dbo as the owner. So
you can really save time in locating the stored procedure by avoiding the “sp_”
prefix.
5.
Proper Naming Convention
It’s
suggested to use correct naming convention for
trigger (TR_Emails_LogEmailChanges),
index (IX[C][F][U][S]_<TableName>_<Column1Name>[_<Column2Name>]…[_ColumnNName>]),
primary key
(PK_[<SchemaName>_]<TableName>),
foreign key
(FK_CompanyPreferences_Company, FK_PersonAddress_Code_AddressType) etc.
6.
Use SET NOCOUNT ON at the
beginning of SQL Batches, Stored Procedures and Triggers.
SET NOCOUNT ON prevents the sending of DONE_IN_PROC messages to
the client for each statement in a stored procedure. This setting can provide a
significant performance boost, because network traffic is greatly reduced.
7.
Use
schema name with object name:
The object name is qualified if used with schema name. Schema name should
be used with the stored procedure name and with all objects referenced inside
the stored procedure. This help in directly finding the complied plan instead
of searching the objects in other possible schema before finally deciding to
use a cached plan, if available. This process of searching and deciding a
schema for an object leads to COMPILE lock on stored procedure and decreases
the stored procedure’s performance. Therefore, always refer the objects with
qualified name in the stored
Procedure like
SELECT * FROM dbo.MyTable -- Preferred method
-- Instead of
SELECT * FROM MyTable -- Avoid this method
--And finally call the stored procedure with qualified name like:
EXEC dbo.MyProc -- Preferred method
-- Instead of
SELECT * FROM MyTable -- Avoid this method
--And finally call the stored procedure with qualified name like:
EXEC dbo.MyProc -- Preferred method
--Instead of
EXEC MyProc -- Avoid this method
8.
Do not use wild card character
Do not use
wild card characters at the beginning of word while search using LIKE keyword
as it results in Index scan.
9.
Default constraint must be defined at the column
level. All other constraints must be defined at the table level. Avoid using rules
of database objects instead use constraints.
10. Practice to put the DECLARE statements at the
starting of the code in the stored procedure for better readability
11. Write INNER joins first and then LEFT OUTER joins:
If a table is joined with INNER and LEFT OUTER joins, then always
write the INNER joins first and then LEFT OUTER joins.
12. ------- Try not to use
functions in the query, we can store the result of the function in a variable
and use that.
DECLARE
@userID varchar(10)='ehaskell',
@printed_only varchar(1)='y' ,
@HiddenStatus INT = 559,
@DATDIFF INT = 1000
DECLARE
@DATDIFF_1 DATETIME = getdate()-
@DATDIFF
SELECT count(0) as printed
FROM CompMasterQueue q INNER JOIN CompMaster_Hist h ON
q.[id]=h.CompMasterQueue_ID
WHERE q.user_ID_created_by = @userID AND q.created_date
>@DATDIFF_1
AND 1 = case
when @printed_only<>'y' and q.P_Status_ID <>
@HiddenStatus then 1
when @printed_only='y' and q.P_Status_ID =
@HiddenStatus then 1
else 2
end
13. We should avoid scalar Sub Queries and use
joins. Set based approach makes the process faster.
14. We should avoid wrapper SP/functions calls if
possible, instead of doing this we can make use of joins so that the process
becomes set based.
(16) Use
the sp_executesql stored procedure instead of the EXECUTE statement:
The sp_executesql stored procedure supports parameters. So, using the
sp_executesql stored procedure instead of the EXECUTE statement improve the
re-usability of your code. The execution plan of a dynamic statement can be
reused only if each and every character, including case, space, comments and
parameter, is same for two statements. For example, if we execute the below
batch:
DECLARE @Query VARCHAR(100)
DECLARE @Age INT
SET @Age = 25
SET @Query = 'SELECT * FROM dbo.tblPerson WHERE Age = ' + CONVERT(VARCHAR(3),@Age)
EXEC (@Query)
DECLARE @Age INT
SET @Age = 25
SET @Query = 'SELECT * FROM dbo.tblPerson WHERE Age = ' + CONVERT(VARCHAR(3),@Age)
EXEC (@Query)
If we again execute the above batch using different @Age value, then the
execution plan for SELECT statement created for @Age =25 would not be reused.
However, if we write the above batch as given below,
DECLARE @Query NVARCHAR(100)
SET @Query = N'SELECT * FROM dbo.tblPerson WHERE Age = @Age'
EXECUTE sp_executesql @Query, N'@Age int', @Age = 25
SET @Query = N'SELECT * FROM dbo.tblPerson WHERE Age = @Age'
EXECUTE sp_executesql @Query, N'@Age int', @Age = 25
The compiled plan of this SELECT statement will be reused for different
value of @Age parameter. The reuse of the existing complied plan will result in
improved performance.
(17) Try to avoid using SQL
Server cursors whenever possible: Cursor uses
a lot of resources for overhead processing to maintain current record position
in a recordset and this decreases the performance. If we need to process
records one-by-one in a loop, then we should use the WHILE clause. Wherever
possible, we should replace the cursor-based approach with SET-based approach.
Because the SQL Server engine is designed and optimized to perform SET-based
operation very fast. Again, please note cursor is also a kind of WHILE Loop.
(18) Choose Appropriate Data Type: Choose appropriate
SQL Data Type to store your data since it also helps in to improve the query
performance. Example: To store strings use varchar in place of text data type
since varchar performs better than text. Use text data type, whenever you
required storing of large text data (more than 8000 characters). Up to 8000
characters data you can store in varchar.
(19)
Avoid nchar and nvarchar: Practice to avoid nchar and nvarchar data type since both the data types
takes just double memory as char and varchar. Use nchar and nvarchar when you
required to store Unicode (16-bit characters) data like as Hindi, Chinese
characters etc.
(20) Avoid NULL in fixed-length field:
Practice
to avoid the insertion of NULL values in the fixed-length (char) field. Since,
NULL takes the same space as desired input value for that field. In case of
requirement of NULL, use variable-length (varchar) field that takes less space
for NULL.
(21) Avoid
Having Clause: Try to avoid
Having Clause since it acts as filter over selected rows. Having clause is
required if we further wish to filter the result of an aggregations. Don't use
HAVING clause for any other purpose.
SELECT subject, count(subject)
FROM
student_details
WHERE subject != 'Science'
AND subject != 'Maths'
GROUP BY subject;
Instead of:
SELECT subject, count(subject)
FROM
student_details
GROUP BY subject
HAVING subject!= 'Vancouver' AND subject!= 'Toronto';
(24) Use Table variable in place of Temp table:
Try to use Table variable in place of Temp
table since Temp table resides in the TempDb database. Hence use of Temp tables
required interaction with TempDb database that is a little bit time taking
task.
Only use temporary table whenever table variable doesn’t serve the
purpose which needed.
(25) Use
UNION ALL in place of UNION: Try to use
UNION ALL in place of UNION since it is faster than UNION as it doesn't sort
the result set for distinguished values.
(26) Avoid
using correlated sub query: Try to avoid
the use of correlated sub query. We know the fact that the sub-query in a
correlated sub-query can be executed for every row returned in the outer query,
performance can be degraded. With a sub-query, performance is totally dependent
upon the query and the data involved. However, if written efficiently, a
correlated sub-query will outperform applications that use several joins and
temporary tables. i think we should replace the correlated sub query.
(27) Use Clustered Indexes:
Having the clustered index on the
primary key is sometimes not the most efficient place for the clustered index
to be. A clustered index is the most performant type of index. The whole table
is sorted according to the clustered index. If the table is involved in lots of
joins based on the primary key, it is probably the right place for it to be,
but if you are continually filtering or grouping on other columns in a table,
then you should possibly consider changing the primary key index to
Non-Clustered, and putting the clustered index on those filtered or grouped
columns.
The
following statement removes and existing clustered index on the primary key and
replaces it with a non-clustered index:
ALTER TABLE MySchema.SalesOrderHeader
DROP CONSTRAINT PK_SalesOrderHeader
GO
ALTER TABLE MySchema.SalesOrderHeader
ADD CONSTRAINT PK_SalesOrderHeader
PRIMARY KEY NONCLUSTERED(SalesOrderID);
GO
Then the
following statement adds a new clustered index to a table.
CREATE CLUSTERED INDEX
MyClusteredIndex
ON MySchema.SalesOrderHeader (OrderID)
GO
(28) Use Indexed Views: Indexed Views have been around for a while. A
view is like a named query, and these days you can add indexes to them. If used
correctly, they can cause a massive improvement in execution times, often
better than a clustered index with covering columns on the original table.
Also, in SQL Server Developer Edition and Enterprise Edition, a view index will
also be automatically used if it is the best index even if you don’t actually
specify the view in your query!
CREATE VIEW MySchema.SalesByCustomer
WITH SCHEMABINDING
AS
SELECT soh.SalesTerritoryID,
soh.CustomerID,
SUM(sod.Quantity * sod.UnitPrice)
FROM MySchema.SalesOrderHeader soh
INNER JOIN MySchema.SalesOrderDetail
sod
ON (soh.SalesOrderID = sod.SalesOrderID)
GROUP BY soh.SalesOrderTerritory, soh.CustomerID
GO
Note the use
of the schema binding attribute. This prevents you from changing underlying
tables while this view exists, and is necessary if you want to add an index.
Some people avoid indexed views for this reason, as the maintenance becomes
more complicated as further dependencies to the view are created. The following
statement adds an index:
CREATE UNIQUE CLUSTERED INDEX IdxSalesOrderView
ON MySchema.SalesByCustomer(
SalesTerritoryID,
CustomerID
)
GO
(29) Use Covering Indexes:
Covering indexes are a
feature that was newly added to SQL 2005. Basically, you can create an index
optimised for the query itself based on joins, filters and grouping, and then
add additional columns that can be retrieved directly from the index for use in
select statements, as follows:
CREATE NONCLUSTERED INDEX
TestIndex
ON MySchema.SalesOrderDetail(OrderId)
INCLUDE (Quantity, UnitPrice)
The above
statement causes a non-clustered index to be created on the SalesOrderDetail
table. If queries are executed on the OrderId column, the index will be used,
and if the only other columns being retrieved are Quantity and UnitPrice, then
the query optimiser doesn’t need to retrieve any extra columns from the
underlying table. It can just use the index. Because the query optimiser
doesn’t need to query the original table, performance is improved.
(30) Keep your clustered index
small: One thing
you need to consider when determining where to put your clustered index is how
big the key for that index will be. The problem here is that the key to the
clustered index is also used as the key for every non-clustered index
in the table. So if you have a large clustered index on a table with a decent
number of rows, the size could blow out significantly. In the case
where there is no clustered index on a table, this could be just as bad,
because it will use the row pointer, which is 8 bytes per row.
(31) Archive old data: Another no-brainer, so I won’t say much.
If you want to improve query performance, give the optimiser less work to do.
If you can cut down the number of rows the query has deal with, then performace
will improve. I have no problem with people creating audit triggers to move
historical data into other tables for this reason. Alternatively, if you don’t
need your data after a certain period of time, back up your database and remove
the data.
(32)
Use APPLY: The apply statement was created
for the situation where you put multiple inline nested queries in the one
statement. For example, take the following statement:
SELECT soh.SalesOrderID,
Quantity=(SELECT TOP 1 (Quantity)
FROM
Sales.SalesOrderDetails
WHERE SalesOrderID =
soh.SalesOrderID),
UnitPrice=(SELECT TOP 1 (UnitPrice)
FROM
Sales.SalesOrderDetails
WHERE SalesOrderID =
soh.SalesOrderID)
FROM Sales.SalesOrderHeader
soh
This
performs an extra query, retrieving data from another table using the same
criterion. This can now be replaced with the following:
SELECT soh.SalesOrderID, soh.OrderDate, a.*
FROM Sales.SalesOrderHeader soh
CROSS APPLY (
SELECT TOP (1) sod.UnitPrice, sod.Quantity
FROM Sales.SalesOrderDetail sod
WHERE sod.SalesOrderId = soh.SalesOrderId
ORDER BY sod.Quantity DESC
) as a
(33)
Use computed columns: Computed
columns are derived from other columns in a table. By creating and indexing a
computed column, you can turn what would otherwise be a scan into a seek. For
example, if you needed to calculate SalesPrice and you had a Quantity and
UnitPrice column, multiplying them in the SQL inline would cause a table scan
as it multiplied the two columns together for every single row. Create a
computed column called SalesPrice, then index it, and the query optimiser will
no longer need to retrieve the UnitPrice and Quantity data and do a calculation
– it’s already done.
(34)
Use the correct transaction isolation level: If there are
a lot of rows in your table, multiple concurrent requests to that table could
cause contention if the correct transaction isolation level is not set. If
requests are repeatedly blocked, it could be time to consider whether to
change.
For example,
READ UNCOMMITED is equivalent to dirty reads, or NOLOCK. That is, if a
transaction is in the middle of processing and you read a row, the data may
not be valid, especially if multiple inserts/updates are occuring that
require atomicity. This is the most performant and it ignores locking
altogether, but is generally not allowed by good design and is a special case.
With READ_COMMITTED_SNAPSHOT,
it specifies that any data read by the transaction will be the transactionally
consistent version of the data that existed at the start of the transaction.
Internally, it makes a versioned copy of the data and this is placed in tempdb
until the transaction has competed. Except when the database is being
recovered, snapshot transactions do not request locks when reading data, and
therefore do not block other transactions from writing data. Transactions
writing data also do not block other transactions reading data.
There are
various other types of transaction options, including REPEATABLE_READ and
SERIALIZABLE amongst others that you can look at to determine whether they are
appropriate for your needs.
(35) Use operator
EXISTS, IN and table joins appropriately in your query:
a) Usually IN has the slowest performance.
b) IN is efficient when most of the filter criteria are in the sub-query.
c) EXISTS is efficient when most of the filter criteria is in the main query.
Select * from product p
where EXISTS (select * from order_items o
where o.product_id = p.product_id)
--Instead of:
Select * from product p where product_id IN (select product_id from
order_items)
(36) Use EXISTS
instead of DISTINCT when using joins which involves tables having one-to-many
relationship:
SELECT d.dept_id, d.dept
FROM dept d
WHERE EXISTS ( SELECT 'X' FROM employee e WHERE e.dept = d.dept);
--Instead of:
SELECT DISTINCT d.dept_id, d.dept
FROM dept d,employee e
WHERE e.dept = e.dept
No comments:
Post a Comment