SQL DBA Insights: TSQL Best Practice

Industry standard to follow for TSQL coding:

Summary

This will provide a reasonable, consistent and effective coding style, which basically highlights some common T-SQL best practices to ensure reliable, robust and efficient SQL code. Here are the some common T-SQL best guidelines for you to maximize database performance.

General Key Points for Best Practices

1. Maintain T-SQL format during coding

The formatting of SQL code may not seem that important, but consistent formatting makes it easier for others to scan and understand your code. So, it’s a good practice to have a structured code. In this case, you may take help from ONLINE TSQL format by clicking the below link.

http://www.dpriver.com/pp/sqlformat.htm

1. Do not use SELECT * in your query

It is recommended to put only required column name while writing T-SQL query, instead of *. This technique results in reduced disk I/O and better performance.

2. Use ANSI Standard Join Clause

Always try to avoid old style joining clause, rather you should use ANSI join which supports FULL outer join. Please find the below example for the same.

-- old style join

SELECT a.Au_id, t.Title

FROM TITLES t,

AUTHORS a,

TITLEAUTHOR ta

WHERE a.Au_id = ta.Au_id

AND ta.Title_id = t.Title_id

AND t.Title LIKE ‘%Computer%’

--ANSI join syntax

SELECT a.Au_id,

t.Title

FROM AUTHORS a

INNER JOIN TITLEAUTHOR ta

ON a.Au_id = ta.Au_id

INNER JOIN TITLES t

ON ta.Title_id = t.Title_id

WHERE t.Title LIKE ‘%Computer%’

3. Do not use column numbers in the ORDER BY clause

Always use column names in an order by clause. Try to avoid positional references

4. Do not prefix your stored procedure names with “sp_”

Whenever SQL Server finds a procedure name commencing with sp_, it first tries to locate the procedure in the ‘master’ database, then it looks for any qualifiers (database, owner) provided, then it tries dbo as the owner. So you can really save time in locating the stored procedure by avoiding the “sp_” prefix.

5. Proper Naming Convention

It’s suggested to use correct naming convention for

trigger (TR_Emails_LogEmailChanges),

index (IX[C][F][U][S]_<TableName>_<Column1Name>[_<Column2Name>]…[_ColumnNName>]),

primary key (PK_[<SchemaName>_]<TableName>),

foreign key (FK_CompanyPreferences_Company, FK_PersonAddress_Code_AddressType) etc.

6. Use SET NOCOUNT ON at the beginning of SQL Batches, Stored Procedures and Triggers.

SET NOCOUNT ON prevents the sending of DONE_IN_PROC messages to the client for each statement in a stored procedure. This setting can provide a significant performance boost, because network traffic is greatly reduced.

7. Use schema name with object name:

The object name is qualified if used with schema name. Schema name should be used with the stored procedure name and with all objects referenced inside the stored procedure. This help in directly finding the complied plan instead of searching the objects in other possible schema before finally deciding to use a cached plan, if available. This process of searching and deciding a schema for an object leads to COMPILE lock on stored procedure and decreases the stored procedure’s performance. Therefore, always refer the objects with qualified name in the stored

Procedure like

SELECT * FROM dbo.MyTable -- Preferred method
-- Instead of
SELECT * FROM MyTable -- Avoid this method
--And finally call the stored procedure with qualified name like:
EXEC dbo.MyProc -- Preferred method

--Instead of
EXEC MyProc -- Avoid this method

8. Do not use wild card character

Do not use wild card characters at the beginning of word while search using LIKE keyword as it results in Index scan.

9. Default constraint must be defined at the column level. All other constraints must be defined at the table level. Avoid using rules of database objects instead use constraints.

10. Practice to put the DECLARE statements at the starting of the code in the stored procedure for better readability

11. Write INNER joins first and then LEFT OUTER joins:

If a table is joined with INNER and LEFT OUTER joins, then always write the INNER joins first and then LEFT OUTER joins.

12. ------- Try not to use functions in the query, we can store the result of the function in a variable and use that.

DECLARE

@userID varchar(10)='ehaskell',

@printed_only varchar(1)='y' ,

@HiddenStatus INT = 559,

@DATDIFF INT = 1000

DECLARE

@DATDIFF_1 DATETIME = getdate()- @DATDIFF

SELECT count(0) as printed

FROM CompMasterQueue q INNER JOIN CompMaster_Hist h ON q.[id]=h.CompMasterQueue_ID

WHERE q.user_ID_created_by = @userID AND q.created_date >@DATDIFF_1

AND 1 = case

when @printed_only<>'y' and q.P_Status_ID <> @HiddenStatus then 1

when @printed_only='y' and q.P_Status_ID = @HiddenStatus then 1

else 2

end

13. We should avoid scalar Sub Queries and use joins. Set based approach makes the process faster.

14. We should avoid wrapper SP/functions calls if possible, instead of doing this we can make use of joins so that the process becomes set based.

(16) Use the sp_executesql stored procedure instead of the EXECUTE statement:

The sp_executesql stored procedure supports parameters. So, using the sp_executesql stored procedure instead of the EXECUTE statement improve the re-usability of your code. The execution plan of a dynamic statement can be reused only if each and every character, including case, space, comments and parameter, is same for two statements. For example, if we execute the below batch:

DECLARE @Query VARCHAR(100)
DECLARE @Age INT
SET @Age = 25
SET @Query = 'SELECT * FROM dbo.tblPerson WHERE Age = ' + CONVERT(VARCHAR(3),@Age)
EXEC (@Query)

If we again execute the above batch using different @Age value, then the execution plan for SELECT statement created for @Age =25 would not be reused. However, if we write the above batch as given below,

DECLARE @Query NVARCHAR(100)
SET @Query = N'SELECT * FROM dbo.tblPerson WHERE Age = @Age'
EXECUTE sp_executesql @Query, N'@Age int', @Age = 25

The compiled plan of this SELECT statement will be reused for different value of @Age parameter. The reuse of the existing complied plan will result in improved performance.

(17) Try to avoid using SQL Server cursors whenever possible: Cursor uses a lot of resources for overhead processing to maintain current record position in a recordset and this decreases the performance. If we need to process records one-by-one in a loop, then we should use the WHILE clause. Wherever possible, we should replace the cursor-based approach with SET-based approach. Because the SQL Server engine is designed and optimized to perform SET-based operation very fast. Again, please note cursor is also a kind of WHILE Loop.

(18) Choose Appropriate Data Type: Choose appropriate SQL Data Type to store your data since it also helps in to improve the query performance. Example: To store strings use varchar in place of text data type since varchar performs better than text. Use text data type, whenever you required storing of large text data (more than 8000 characters). Up to 8000 characters data you can store in varchar.

(19) Avoid nchar and nvarchar: Practice to avoid nchar and nvarchar data type since both the data types takes just double memory as char and varchar. Use nchar and nvarchar when you required to store Unicode (16-bit characters) data like as Hindi, Chinese characters etc.

(20) Avoid NULL in fixed-length field: Practice to avoid the insertion of NULL values in the fixed-length (char) field. Since, NULL takes the same space as desired input value for that field. In case of requirement of NULL, use variable-length (varchar) field that takes less space for NULL.

(21) Avoid Having Clause: Try to avoid Having Clause since it acts as filter over selected rows. Having clause is required if we further wish to filter the result of an aggregations. Don't use HAVING clause for any other purpose.

SELECT subject, count(subject)

FROM student_details

WHERE subject != 'Science'

AND subject != 'Maths'

GROUP BY subject;

Instead of:

SELECT subject, count(subject)

FROM student_details

GROUP BY subject

HAVING subject!= 'Vancouver' AND subject!= 'Toronto';

(24) Use Table variable in place of Temp table: Try to use Table variable in place of Temp table since Temp table resides in the TempDb database. Hence use of Temp tables required interaction with TempDb database that is a little bit time taking task.

Only use temporary table whenever table variable doesn’t serve the purpose which needed.

(25) Use UNION ALL in place of UNION: Try to use UNION ALL in place of UNION since it is faster than UNION as it doesn't sort the result set for distinguished values.

(26) Avoid using correlated sub query: Try to avoid the use of correlated sub query. We know the fact that the sub-query in a correlated sub-query can be executed for every row returned in the outer query, performance can be degraded. With a sub-query, performance is totally dependent upon the query and the data involved. However, if written efficiently, a correlated sub-query will outperform applications that use several joins and temporary tables. i think we should replace the correlated sub query.

(27) Use Clustered Indexes: Having the clustered index on the primary key is sometimes not the most efficient place for the clustered index to be. A clustered index is the most performant type of index. The whole table is sorted according to the clustered index. If the table is involved in lots of joins based on the primary key, it is probably the right place for it to be, but if you are continually filtering or grouping on other columns in a table, then you should possibly consider changing the primary key index to Non-Clustered, and putting the clustered index on those filtered or grouped columns.

The following statement removes and existing clustered index on the primary key and replaces it with a non-clustered index:

ALTER TABLE MySchema.SalesOrderHeader

DROP CONSTRAINT PK_SalesOrderHeader

ALTER TABLE MySchema.SalesOrderHeader

ADD CONSTRAINT PK_SalesOrderHeader

PRIMARY KEY NONCLUSTERED(SalesOrderID);

Then the following statement adds a new clustered index to a table.

CREATE CLUSTERED INDEX MyClusteredIndex

ON MySchema.SalesOrderHeader (OrderID)

(28) Use Indexed Views: Indexed Views have been around for a while. A view is like a named query, and these days you can add indexes to them. If used correctly, they can cause a massive improvement in execution times, often better than a clustered index with covering columns on the original table. Also, in SQL Server Developer Edition and Enterprise Edition, a view index will also be automatically used if it is the best index even if you don’t actually specify the view in your query!

CREATE VIEW MySchema.SalesByCustomer

WITH SCHEMABINDING

SELECT soh.SalesTerritoryID, soh.CustomerID,

SUM(sod.Quantity * sod.UnitPrice)

FROM MySchema.SalesOrderHeader soh

INNER JOIN MySchema.SalesOrderDetail sod

ON (soh.SalesOrderID = sod.SalesOrderID)

GROUP BY soh.SalesOrderTerritory, soh.CustomerID

Note the use of the schema binding attribute. This prevents you from changing underlying tables while this view exists, and is necessary if you want to add an index. Some people avoid indexed views for this reason, as the maintenance becomes more complicated as further dependencies to the view are created. The following statement adds an index:

CREATE UNIQUE CLUSTERED INDEX IdxSalesOrderView

ON MySchema.SalesByCustomer(

SalesTerritoryID, CustomerID

)

(29) Use Covering Indexes: Covering indexes are a feature that was newly added to SQL 2005. Basically, you can create an index optimised for the query itself based on joins, filters and grouping, and then add additional columns that can be retrieved directly from the index for use in select statements, as follows:

CREATE NONCLUSTERED INDEX TestIndex

ON MySchema.SalesOrderDetail(OrderId)

INCLUDE (Quantity, UnitPrice)

The above statement causes a non-clustered index to be created on the SalesOrderDetail table. If queries are executed on the OrderId column, the index will be used, and if the only other columns being retrieved are Quantity and UnitPrice, then the query optimiser doesn’t need to retrieve any extra columns from the underlying table. It can just use the index. Because the query optimiser doesn’t need to query the original table, performance is improved.

(30) Keep your clustered index small: One thing you need to consider when determining where to put your clustered index is how big the key for that index will be. The problem here is that the key to the clustered index is also used as the key for every non-clustered index in the table. So if you have a large clustered index on a table with a decent number of rows, the size could blow out significantly. In the case where there is no clustered index on a table, this could be just as bad, because it will use the row pointer, which is 8 bytes per row.

(31) Archive old data: Another no-brainer, so I won’t say much. If you want to improve query performance, give the optimiser less work to do. If you can cut down the number of rows the query has deal with, then performace will improve. I have no problem with people creating audit triggers to move historical data into other tables for this reason. Alternatively, if you don’t need your data after a certain period of time, back up your database and remove the data.

(32) Use APPLY: The apply statement was created for the situation where you put multiple inline nested queries in the one statement. For example, take the following statement:

SELECT soh.SalesOrderID,

Quantity=(SELECT TOP 1 (Quantity)

FROM Sales.SalesOrderDetails

WHERE SalesOrderID = soh.SalesOrderID),

UnitPrice=(SELECT TOP 1 (UnitPrice)

FROM Sales.SalesOrderDetails

WHERE SalesOrderID = soh.SalesOrderID)

FROM Sales.SalesOrderHeader soh

This performs an extra query, retrieving data from another table using the same criterion. This can now be replaced with the following:

SELECT soh.SalesOrderID, soh.OrderDate, a.*

FROM Sales.SalesOrderHeader soh

CROSS APPLY (

SELECT TOP (1) sod.UnitPrice, sod.Quantity

FROM Sales.SalesOrderDetail sod

WHERE sod.SalesOrderId = soh.SalesOrderId

ORDER BY sod.Quantity DESC

) as a

(33) Use computed columns: Computed columns are derived from other columns in a table. By creating and indexing a computed column, you can turn what would otherwise be a scan into a seek. For example, if you needed to calculate SalesPrice and you had a Quantity and UnitPrice column, multiplying them in the SQL inline would cause a table scan as it multiplied the two columns together for every single row. Create a computed column called SalesPrice, then index it, and the query optimiser will no longer need to retrieve the UnitPrice and Quantity data and do a calculation – it’s already done.

(34) Use the correct transaction isolation level: If there are a lot of rows in your table, multiple concurrent requests to that table could cause contention if the correct transaction isolation level is not set. If requests are repeatedly blocked, it could be time to consider whether to change.

For example, READ UNCOMMITED is equivalent to dirty reads, or NOLOCK. That is, if a transaction is in the middle of processing and you read a row, the data may not be valid, especially if multiple inserts/updates are occuring that require atomicity. This is the most performant and it ignores locking altogether, but is generally not allowed by good design and is a special case.

With READ_COMMITTED_SNAPSHOT, it specifies that any data read by the transaction will be the transactionally consistent version of the data that existed at the start of the transaction. Internally, it makes a versioned copy of the data and this is placed in tempdb until the transaction has competed. Except when the database is being recovered, snapshot transactions do not request locks when reading data, and therefore do not block other transactions from writing data. Transactions writing data also do not block other transactions reading data.

There are various other types of transaction options, including REPEATABLE_READ and SERIALIZABLE amongst others that you can look at to determine whether they are appropriate for your needs.

(35) Use operator EXISTS, IN and table joins appropriately in your query:

a) Usually IN has the slowest performance.
b) IN is efficient when most of the filter criteria are in the sub-query.
c) EXISTS is efficient when most of the filter criteria is in the main query.

Select * from product p

where EXISTS (select * from order_items o

where o.product_id = p.product_id)

--Instead of:

Select * from product p where product_id IN (select product_id from order_items)

(36) Use EXISTS instead of DISTINCT when using joins which involves tables having one-to-many relationship:

SELECT d.dept_id, d.dept

FROM dept d

WHERE EXISTS ( SELECT 'X' FROM employee e WHERE e.dept = d.dept);

--Instead of:

SELECT DISTINCT d.dept_id, d.dept

FROM dept d,employee e

WHERE e.dept = e.dept

SQL DBA Insights

Monday, 18 June 2018

TSQL Best Practice

Industry standard to follow for TSQL coding:

No comments:

Post a Comment