Site: | DBAcademy |
Course: | Database Developer |
Book: | Deep Dive into SQL Optimization |
Printed by: | Guest user |
Date: | Thursday, 18 September 2025, 3:54 AM |
In this section, we'll see how fine-tuned syntax and best practices can significantly impact query performance.
Clarity and Readability:
Write SQL statements that are not only correct but also easy to read and maintain. For instance:
SELECT employee_id, first_name, last_name
FROM employees
WHERE department_id = 10;
Instead of:
SELECT e_id, fname, lname
FROM employees
WHERE d_id = 10;
Appropriate Use of Aliases:
Use column and table aliases to improve readability, especially in complex queries:
SELECT e.employee_id, e.first_name, e.last_name, d.department_name
FROM employees AS e
JOIN departments AS d ON e.department_id = d.department_id;
Consistent Formatting:
Adopt a consistent formatting style for indentation, capitalization of SQL keywords, and alias naming. This not only aids review by others but can also reduce errors when modifying queries.
Parameterization:
Use parameterized queries to avoid SQL injection and to allow query caching by the database engine.
Subqueries:
Subqueries can be used in SELECT, FROM, or WHERE clauses. They are useful when you need to first compute a result before performing a join or filter operation. For example:
SELECT employee_id, first_name, last_name
FROM employees
WHERE department_id IN (SELECT department_id FROM departments WHERE location_id = 1700);
Use Cases and Considerations:
JOINs:
JOINs are generally more performant as they allow the use of indexes and are generally optimized by the DBMS. For example, instead of using a subquery:
SELECT e.employee_id, e.first_name, e.last_name
FROM employees AS e
JOIN departments AS d ON e.department_id = d.department_id
WHERE d.location_id = 1700;
Use Cases and Considerations:
Avoid SELECT *:
Only retrieve the columns you need. This minimizes I/O and reduces memory consumption.
-- Problematic approach:
SELECT * FROM orders;
-- Optimized approach:
SELECT order_id, order_date, customer_id FROM orders;
Filtering Early:
Use filters (WHERE clauses) as early as possible in your queries to reduce the number of rows processed:
SELECT order_id, order_date
FROM orders
WHERE order_date >= '2023-01-01';
Index Usage:
Ensure that indexed columns appear in JOIN conditions and WHERE filters to take advantage of faster data access.
Using Derived Tables and Common Table Expressions (CTEs): Recursive Queries and Modular Query Design
Introduction to CTEs:
CTEs improve the readability and maintainability of SQL queries. They provide a way to define temporary result sets that can be referenced within a SELECT, INSERT, UPDATE, or DELETE statement.
Non-Recursive CTE Example:
WITH TopOrders AS (
SELECT order_id, amount
FROM orders
WHERE amount > 1000
)
SELECT * FROM TopOrders;
Recursive CTE Example:
Recursive queries are particularly useful for hierarchical data (e.g., organizational charts, bill of materials).
WITH EmployeeHierarchy AS (
SELECT employee_id, manager_id, first_name, last_name, 0 AS level
FROM employees
WHERE manager_id IS NULL
UNION ALL
SELECT e.employee_id, e.manager_id, e.first_name, e.last_name, eh.level + 1
FROM employees e
JOIN EmployeeHierarchy eh ON e.manager_id = eh.employee_id
)
SELECT * FROM EmployeeHierarchy;
Modular Query Design:
CTEs enable decomposition of complex queries into smaller, logical units that can be developed, tested, and optimized individually.
This part of the course teaches how to analyze query performance and recognize patterns or pitfalls that can lead to slower queries. It focuses on understanding and interpreting execution plans and optimizing indexes.
Understanding the Execution Plan:
Every SQL engine can provide a detailed plan outlining how it intends to execute the query. For example, using the EXPLAIN
keyword in MySQL or PostgreSQL:
EXPLAIN SELECT employee_id, first_name, last_name
FROM employees
WHERE department_id = 10;
Key areas to examine:
Example:
A query performing a full table scan vs. one using an index:
Without Index:
EXPLAIN SELECT * FROM orders WHERE customer_id = 12345;
With Index on customer_id:
CREATE INDEX idx_customer ON orders(customer_id);
EXPLAIN SELECT * FROM orders WHERE customer_id = 12345;
The revised execution plan would show an Index Seek, significantly reducing the scanning of unnecessary rows.
Full Table Scans:
Recognize queries where the engine is forced to read every row. Use indexes to target specific rows:
-- Problematic if table is large:
SELECT * FROM orders WHERE order_status = 'Completed';
-- Optimized if order_status is indexed:
CREATE INDEX idx_order_status ON orders(order_status);
Missing or Inefficient Indexes:
Learn how to identify when indexes should be added or modified. Use hints provided by the execution plan to decide if composite indexes are necessary for queries with multiple predicates.
Overuse of Subqueries and Redundant Joins:
Overcomplicated query logic can result in multiple subqueries which could be consolidated:
-- Consolidated query using JOINs or CTEs may perform better
WITH ActiveCustomers AS (
SELECT customer_id FROM customers WHERE status = 'Active'
)
SELECT o.order_id, o.order_date
FROM orders AS o
JOIN ActiveCustomers ac ON o.customer_id = ac.customer_id
WHERE o.order_date >= '2023-01-01';
Partitioning Tables:
Break large tables into partitions based on time, region, or other criteria to improve query performance.
Regular Maintenance:
Update statistics, rebuild indexes, and perform periodic clean-ups.
Caching Strategies:
Use caching for frequently executed queries or results to reduce repetitive computation.
Query Parameterization:
Avoid dynamic SQL where possible. Instead, utilize stored procedures and parameterized queries to benefit from plan caching.
This section covers advanced SQL constructs that allow developers to build sophisticated, modular, and high-performing database applications.
Encapsulation of Business Logic:
Stored procedures allow you to encapsulate complex business logic on the server side. This centralizes the logic and can reduce network traffic by executing multiple operations in a single call.
CREATE PROCEDURE GetEmployeeDetails
@DepartmentID INT
AS
BEGIN
SELECT employee_id, first_name, last_name
FROM employees
WHERE department_id = @DepartmentID;
END;
Performance and Security Benefits:
Use Cases:
Triggers automatically execute in response to specific events (INSERT, UPDATE, DELETE). They are useful for auditing changes or enforcing business rules.
CREATE TRIGGER trg_AuditEmployeeUpdate
ON employees
AFTER UPDATE
AS
BEGIN
INSERT INTO employee_audit (employee_id, change_date, changed_by)
SELECT employee_id, GETDATE(), SYSTEM_USER
FROM inserted;
END;
Best Practices:
Definition and Use-Cases:
Window functions (also known as analytic functions) allow you to perform calculations across a set of related rows, or a “window” of rows, without collapsing them into a single output row. Unlike aggregate functions (e.g., SUM
, AVG
) used with GROUP BY
, window functions preserve each row and add analytical context to it.
They are especially useful in reporting, analytics, and trend analysis, where row-by-row comparisons and cumulative computations are common.
<function_name>() OVER (
[PARTITION BY column]
[ORDER BY column]
[ROWS BETWEEN ...]
)
PARTITION
BY
: Divides the result set into partitions to perform the window function independently in each.
ORDER
BY
: Determines the order of rows within the window.
ROWS
BETWEEN
: Specifies a window frame relative to the current row (e.g., 2 rows before to current).
Common Use-Cases and Examples:
1. Running TotalQuery:
SELECT order_id, order_date, amount,
SUM(amount) OVER (ORDER BY order_date) AS RunningTotal
FROM orders;
Input:
order_id | order_date | amount |
---|---|---|
1 | 2024-01-01 | 100 |
2 | 2024-01-02 | 200 |
3 | 2024-01-03 | 150 |
Output:
order_id | order_date | amount | RunningTotal |
---|---|---|---|
1 | 2024-01-01 | 100 | 100 |
2 | 2024-01-02 | 200 | 300 |
3 | 2024-01-03 | 150 | 450 |
Query:
SELECT order_id, order_date, amount,
AVG(amount) OVER (
ORDER BY order_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS MovingAvg
FROM orders;
Input:
order_id | order_date | amount |
---|---|---|
1 | 2024-01-01 | 100 |
2 | 2024-01-02 | 200 |
3 | 2024-01-03 | 150 |
4 | 2024-01-04 | 250 |
Output:
order_id | order_date | amount | MovingAvg |
---|---|---|---|
1 | 2024-01-01 | 100 | 100.0 |
2 | 2024-01-02 | 200 | 150.0 |
3 | 2024-01-03 | 150 | 150.0 |
4 | 2024-01-04 | 250 | 200.0 |
Moving average is calculated over the current row and two before it (3 values total, or fewer if not available).
Query:
SELECT employee_id, department, salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS SalaryRank
FROM employees;
Input:
employee_id | department | salary |
---|---|---|
101 | Sales | 5000 |
102 | Sales | 7000 |
103 | Sales | 5000 |
201 | HR | 6000 |
202 | HR | 6500 |
Output:
employee_id | department | salary | SalaryRank |
---|---|---|---|
102 | Sales | 7000 | 1 |
101 | Sales | 5000 | 2 |
103 | Sales | 5000 | 2 |
202 | HR | 6500 | 1 |
201 | HR | 6000 | 2 |
RANK()
assigns the same rank to equal salaries and skips the next rank accordingly.
Query:
SELECT order_id, order_date, amount,
LAG(amount) OVER (ORDER BY order_date) AS PrevAmount,
LEAD(amount) OVER (ORDER BY order_date) AS NextAmount
FROM orders;
Input:
order_id | order_date | amount |
---|---|---|
1 | 2024-01-01 | 100 |
2 | 2024-01-02 | 200 |
3 | 2024-01-03 | 150 |
Output:
order_id | order_date | amount | PrevAmount | NextAmount |
---|---|---|---|---|
1 | 2024-01-01 | 100 | NULL | 200 |
2 | 2024-01-02 | 200 | 100 | 150 |
3 | 2024-01-03 | 150 | 200 | NULL |
Benefits:
This section covers SQL Server-specific features and advanced constructs that leverage the unique capabilities of Microsoft's database platform for enterprise applications.
SQL Server's T-SQL provides powerful CTE capabilities, including recursive CTEs for hierarchical data:
-- Recursive CTE for organizational hierarchy
WITH EmployeeHierarchy AS (
-- Anchor member: Top-level managers
SELECT employee_id, first_name, last_name, manager_id, 0 AS Level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive member: Direct reports
SELECT e.employee_id, e.first_name, e.last_name, e.manager_id, eh.Level + 1
FROM employees e
INNER JOIN EmployeeHierarchy eh ON e.manager_id = eh.employee_id
)
SELECT employee_id, REPLICATE(' ', Level) + first_name + ' ' + last_name AS hierarchy
FROM EmployeeHierarchy
ORDER BY Level, employee_id;
The MERGE statement allows for sophisticated data synchronization:
MERGE target_table AS target
USING source_table AS source
ON target.id = source.id
WHEN MATCHED AND target.last_updated < source.last_updated THEN
UPDATE SET
name = source.name,
email = source.email,
last_updated = source.last_updated
WHEN NOT MATCHED BY TARGET THEN
INSERT (id, name, email, last_updated)
VALUES (source.id, source.name, source.email, source.last_updated)
WHEN NOT MATCHED BY SOURCE THEN
DELETE
OUTPUT $action, inserted.*, deleted.*;
JSON Support:
-- Store and query JSON data
DECLARE @json NVARCHAR(MAX) = '{"name": "John", "skills": ["C#", "SQL", "Azure"]}'
SELECT
JSON_VALUE(@json, '$.name') AS Name,
JSON_QUERY(@json, '$.skills') AS Skills;
-- Query JSON in tables
SELECT customer_id,
JSON_VALUE(profile_data, '$.preferences.language') AS PreferredLanguage
FROM customers
WHERE JSON_VALUE(profile_data, '$.status') = 'active';
Spatial Data Types:
-- Working with GEOMETRY data type
DECLARE @point GEOMETRY = GEOMETRY::Point(10, 20, 0);
DECLARE @polygon GEOMETRY = GEOMETRY::STGeomFromText('POLYGON((0 0, 10 0, 10 10, 0 10, 0 0))', 0);
SELECT @point.STWithin(@polygon) AS IsPointInPolygon;
SQL Server provides built-in temporal functionality for tracking data changes:
-- Create temporal table
CREATE TABLE dbo.Employee
(
EmployeeID INT IDENTITY PRIMARY KEY,
Name NVARCHAR(100) NOT NULL,
Position NVARCHAR(100) NOT NULL,
Salary DECIMAL(10,2) NOT NULL,
-- System temporal columns
SysStartTime DATETIME2 GENERATED ALWAYS AS ROW START NOT NULL,
SysEndTime DATETIME2 GENERATED ALWAYS AS ROW END NOT NULL,
PERIOD FOR SYSTEM_TIME (SysStartTime, SysEndTime)
)
WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.EmployeeHistory));
-- Query historical data
SELECT * FROM dbo.Employee
FOR SYSTEM_TIME AS OF '2024-01-01 12:00:00';
-- Query all changes between dates
SELECT * FROM dbo.Employee
FOR SYSTEM_TIME BETWEEN '2024-01-01' AND '2024-01-31';
Ideal for analytical workloads and data warehousing:
-- Create clustered columnstore index
CREATE CLUSTERED COLUMNSTORE INDEX CCI_Sales ON dbo.FactSales;
-- Create nonclustered columnstore index
CREATE NONCLUSTERED COLUMNSTORE INDEX NCCI_ProductAnalysis
ON dbo.Sales (ProductID, SaleDate, Quantity, Revenue);
-- Query with columnstore benefits
SELECT ProductID, YEAR(SaleDate) AS SaleYear, SUM(Revenue) AS TotalRevenue
FROM dbo.Sales
WHERE SaleDate >= '2023-01-01'
GROUP BY ProductID, YEAR(SaleDate);
For high-performance transactional workloads:
-- Create memory-optimized table
CREATE TABLE dbo.SessionData
(
SessionID UNIQUEIDENTIFIER PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 1000000),
UserID INT NOT NULL,
StartTime DATETIME2 NOT NULL,
LastActivity DATETIME2 NOT NULL,
INDEX IX_UserID NONCLUSTERED (UserID)
) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA);
-- Natively compiled stored procedure
CREATE PROCEDURE dbo.UpdateSessionActivity
@SessionID UNIQUEIDENTIFIER,
@NewActivity DATETIME2
WITH NATIVE_COMPILATION, SCHEMABINDING
AS BEGIN ATOMIC WITH
(
TRANSACTION ISOLATION LEVEL = SNAPSHOT,
LANGUAGE = N'us_english'
)
UPDATE dbo.SessionData
SET LastActivity = @NewActivity
WHERE SessionID = @SessionID;
END;
Protects sensitive data with client-side encryption:
-- Create column master key
CREATE COLUMN MASTER KEY CMK_Auto1
WITH
(
KEY_STORE_PROVIDER_NAME = 'MSSQL_CERTIFICATE_STORE',
KEY_PATH = 'CurrentUser/My/f2260f28d909d21c642a3d8e0b45a830e79a1420'
);
-- Create table with encrypted columns
CREATE TABLE dbo.Patients
(
PatientID INT IDENTITY(1,1) PRIMARY KEY,
Name NVARCHAR(50) NULL,
SSN CHAR(11) COLLATE Latin1_General_BIN2 ENCRYPTED WITH
(
COLUMN ENCRYPTION KEY = CEK_Auto1,
ENCRYPTION_TYPE = Deterministic,
ALGORITHM = 'AEAD_AES_256_CBC_HMAC_SHA_256'
),
BirthDate DATE ENCRYPTED WITH
(
COLUMN ENCRYPTION KEY = CEK_Auto1,
ENCRYPTION_TYPE = Randomized,
ALGORITHM = 'AEAD_AES_256_CBC_HMAC_SHA_256'
)
);
Implements fine-grained access control:
-- Create security function
CREATE FUNCTION dbo.fn_SecurityPredicateCustomers(@CustomerRegion AS NVARCHAR(50))
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN SELECT 1 AS fn_SecurityPredicateCustomers_result
WHERE @CustomerRegion = USER_NAME() OR USER_NAME() = 'Manager';
-- Apply security policy
CREATE SECURITY POLICY dbo.CustomerSecurityPolicy
ADD FILTER PREDICATE dbo.fn_SecurityPredicateCustomers(Region) ON dbo.Customers,
ADD BLOCK PREDICATE dbo.fn_SecurityPredicateCustomers(Region) ON dbo.Customers AFTER UPDATE
WITH (STATE = ON);
Query multidimensional data directly from T-SQL:
-- Query Analysis Services cube using OPENROWSET
SELECT *
FROM OPENROWSET('MSOLAP',
'DATA SOURCE=localhost;INITIAL CATALOG=AdventureWorksDW;',
'SELECT [Measures].[Internet Sales Amount] ON COLUMNS,
[Product].[Category].MEMBERS ON ROWS
FROM [Adventure Works]')
Execute R or Python scripts within SQL Server:
-- Execute R script for statistical analysis
EXEC sp_execute_external_script
@language = N'R',
@script = N'
model <- lm(SalesAmount ~ ProductCost + MarketingSpend, data = InputDataSet)
predictions <- predict(model, InputDataSet)
OutputDataSet <- data.frame(InputDataSet, Predicted = predictions)
',
@input_data_1 = N'SELECT SalesAmount, ProductCost, MarketingSpend FROM dbo.SalesData',
@input_data_1_name = N'InputDataSet',
@output_data_1_name = N'OutputDataSet'
WITH RESULT SETS ((
SalesAmount DECIMAL(10,2),
ProductCost DECIMAL(10,2),
MarketingSpend DECIMAL(10,2),
Predicted DECIMAL(10,2)
));
Model and query graph relationships:
-- Create node tables
CREATE TABLE Person (
ID INTEGER PRIMARY KEY,
Name NVARCHAR(100)
) AS NODE;
-- Create edge table
CREATE TABLE Friends AS EDGE;
-- Insert data
INSERT INTO Person VALUES (1, 'Alice'), (2, 'John'), (3, 'Bob');
INSERT INTO Friends VALUES ((SELECT $node_id FROM Person WHERE Name = 'Alice'),
(SELECT $node_id FROM Person WHERE Name = 'John'));
-- Query graph relationships
SELECT Person2.Name AS FriendOfFriend
FROM Person Person1, Friends, Person Person2, Friends AS Friends2, Person Person3
WHERE MATCH(Person1-(Friends)->Person2-(Friends2)->Person3)
AND Person1.Name = 'Alice'
AND Person3.Name <> 'Alice';
SQL Server automatically optimizes queries with features like:
-- Enable intelligent query processing features
ALTER DATABASE [YourDatabase] SET COMPATIBILITY_LEVEL = 160;
-- Query that benefits from intelligent query processing
SELECT c.CustomerName, SUM(o.OrderValue) as TotalOrders
FROM Customers c
JOIN Orders o ON c.CustomerID = o.CustomerID
WHERE o.OrderDate >= DATEADD(year, -1, GETDATE())
GROUP BY c.CustomerName
HAVING SUM(o.OrderValue) > 10000
ORDER BY TotalOrders DESC;