Deep Dive into SQL Optimization

Site:	DBAcademy
Course:	Database Developer
Book:	Deep Dive into SQL Optimization

Printed by:	Guest user
Date:	Friday, 18 July 2025, 6:01 PM

1. Advanced Query Writing
2. Performance Patterns and Common Pitfalls
3. Advanced SQL Constructs

1. Advanced Query Writing

In this section, we'll see how fine-tuned syntax and best practices can significantly impact query performance.

A. Syntax Refinements and Best Practices

Clarity and Readability:
Write SQL statements that are not only correct but also easy to read and maintain. For instance:

SELECT employee_id, first_name, last_name 
FROM employees 
WHERE department_id = 10;

Instead of:

SELECT e_id, fname, lname 
FROM employees 
WHERE d_id = 10;

Appropriate Use of Aliases:
Use column and table aliases to improve readability, especially in complex queries:

SELECT e.employee_id, e.first_name, e.last_name, d.department_name
FROM employees AS e
JOIN departments AS d ON e.department_id = d.department_id;

Consistent Formatting:
Adopt a consistent formatting style for indentation, capitalization of SQL keywords, and alias naming. This not only aids review by others but can also reduce errors when modifying queries.
Parameterization:
Use parameterized queries to avoid SQL injection and to allow query caching by the database engine.

B. Subqueries vs. JOINs: When to Use Each

Subqueries:
Subqueries can be used in SELECT, FROM, or WHERE clauses. They are useful when you need to first compute a result before performing a join or filter operation. For example:
```
SELECT employee_id, first_name, last_name
FROM employees
WHERE department_id IN (SELECT department_id FROM departments WHERE location_id = 1700);
```
Use Cases and Considerations:
- When the subquery returns a single or a small dataset.
- When logical separation improves clarity, but know that some complex subqueries might be less performant than a well-indexed JOIN.
JOINs:
JOINs are generally more performant as they allow the use of indexes and are generally optimized by the DBMS. For example, instead of using a subquery:
```
SELECT e.employee_id, e.first_name, e.last_name
FROM employees AS e
JOIN departments AS d ON e.department_id = d.department_id
WHERE d.location_id = 1700;
```
Use Cases and Considerations:
- When merging rows from two or more tables efficiently is required.
- When relational integrity must be enforced by retrieving data from multiple tables.

C. Query Rewriting Techniques for Performance Gains

Avoid SELECT *:
Only retrieve the columns you need. This minimizes I/O and reduces memory consumption.

-- Problematic approach:
SELECT * FROM orders;

-- Optimized approach:
SELECT order_id, order_date, customer_id FROM orders;

Filtering Early:
Use filters (WHERE clauses) as early as possible in your queries to reduce the number of rows processed:
```
SELECT order_id, order_date
FROM orders
WHERE order_date >= '2023-01-01';
```
Index Usage:
Ensure that indexed columns appear in JOIN conditions and WHERE filters to take advantage of faster data access.
Using Derived Tables and Common Table Expressions (CTEs): Recursive Queries and Modular Query Design
- Introduction to CTEs:
  CTEs improve the readability and maintainability of SQL queries. They provide a way to define temporary result sets that can be referenced within a SELECT, INSERT, UPDATE, or DELETE statement.
- Non-Recursive CTE Example:
```
WITH TopOrders AS (
    SELECT order_id, amount
    FROM orders
    WHERE amount > 1000
)
SELECT * FROM TopOrders;
```
- Recursive CTE Example:
  Recursive queries are particularly useful for hierarchical data (e.g., organizational charts, bill of materials).
```
WITH EmployeeHierarchy AS (
    SELECT employee_id, manager_id, first_name, last_name, 0 AS level
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    SELECT e.employee_id, e.manager_id, e.first_name, e.last_name, eh.level + 1
    FROM employees e
    JOIN EmployeeHierarchy eh ON e.manager_id = eh.employee_id
)
SELECT * FROM EmployeeHierarchy;
```
- Modular Query Design:
  CTEs enable decomposition of complex queries into smaller, logical units that can be developed, tested, and optimized individually.

2. Performance Patterns and Common Pitfalls

This part of the course teaches how to analyze query performance and recognize patterns or pitfalls that can lead to slower queries. It focuses on understanding and interpreting execution plans and optimizing indexes.

A. Analyzing Query Execution Plans

Understanding the Execution Plan:
Every SQL engine can provide a detailed plan outlining how it intends to execute the query. For example, using the EXPLAIN keyword in MySQL or PostgreSQL:
```
EXPLAIN SELECT employee_id, first_name, last_name
FROM employees
WHERE department_id = 10;
```
Key areas to examine:
- Join methods: Nested loops, hash joins, merge joins.
- Index utilization: Whether indexes are used or a full table scan is occurring.
- Cost estimates: Estimated time and resource usage.
Example:
A query performing a full table scan vs. one using an index:
- Without Index:
```
EXPLAIN SELECT * FROM orders WHERE customer_id = 12345;
```
- With Index on customer_id:
```
CREATE INDEX idx_customer ON orders(customer_id);
EXPLAIN SELECT * FROM orders WHERE customer_id = 12345;
```
The revised execution plan would show an Index Seek, significantly reducing the scanning of unnecessary rows.

B. Identifying Inefficiencies

Full Table Scans:
Recognize queries where the engine is forced to read every row. Use indexes to target specific rows:

-- Problematic if table is large:
SELECT * FROM orders WHERE order_status = 'Completed';

-- Optimized if order_status is indexed:
CREATE INDEX idx_order_status ON orders(order_status);

Missing or Inefficient Indexes:
Learn how to identify when indexes should be added or modified. Use hints provided by the execution plan to decide if composite indexes are necessary for queries with multiple predicates.

Overuse of Subqueries and Redundant Joins:
Overcomplicated query logic can result in multiple subqueries which could be consolidated:

-- Consolidated query using JOINs or CTEs may perform better
WITH ActiveCustomers AS (
    SELECT customer_id FROM customers WHERE status = 'Active'
)
SELECT o.order_id, o.order_date
FROM orders AS o
JOIN ActiveCustomers ac ON o.customer_id = ac.customer_id
WHERE o.order_date >= '2023-01-01';

C. Best Practices for Query Optimization in Large Databases

Partitioning Tables:
Break large tables into partitions based on time, region, or other criteria to improve query performance.
Regular Maintenance:
Update statistics, rebuild indexes, and perform periodic clean-ups.
Caching Strategies:
Use caching for frequently executed queries or results to reduce repetitive computation.
Query Parameterization:
Avoid dynamic SQL where possible. Instead, utilize stored procedures and parameterized queries to benefit from plan caching.

3. Advanced SQL Constructs

This section covers advanced SQL constructs that allow developers to build sophisticated, modular, and high-performing database applications.

A. Stored Procedures: Benefits and Best Practices

Encapsulation of Business Logic:
Stored procedures allow you to encapsulate complex business logic on the server side. This centralizes the logic and can reduce network traffic by executing multiple operations in a single call.
```
CREATE PROCEDURE GetEmployeeDetails
    @DepartmentID INT
AS
BEGIN
    SELECT employee_id, first_name, last_name
    FROM employees
    WHERE department_id = @DepartmentID;
END;
```
Performance and Security Benefits:
- Precompiled Execution: Stored procedures are often precompiled or cached, which speeds up their execution.
- Reduced risk of SQL injection because the input parameters are handled safely.
- Easier to maintain and update business logic as it's centralized in the procedure rather than across multiple applications.

B. Triggers: Automating Tasks and Maintaining Data Integrity

Use Cases:
Triggers automatically execute in response to specific events (INSERT, UPDATE, DELETE). They are useful for auditing changes or enforcing business rules.

CREATE TRIGGER trg_AuditEmployeeUpdate
ON employees
AFTER UPDATE
AS
BEGIN
    INSERT INTO employee_audit (employee_id, change_date, changed_by)
    SELECT employee_id, GETDATE(), SYSTEM_USER
    FROM inserted;
END;

Best Practices:
- Keep triggers lightweight. Heavy logic can slow down the DML operations.
- Avoid cascading triggers that call other triggers unless absolutely necessary.
- Document triggers well so that their side-effects are clear to developers and DBAs.

C. Window Functions: Running Totals, Moving Averages, and Ranking

Definition and Use-Cases:

Window functions (also known as analytic functions) allow you to perform calculations across a set of related rows, or a “window” of rows, without collapsing them into a single output row. Unlike aggregate functions (e.g., SUM, AVG) used with GROUP BY, window functions preserve each row and add analytical context to it.

They are especially useful in reporting, analytics, and trend analysis, where row-by-row comparisons and cumulative computations are common.
Syntax Breakdown:

<function_name>() OVER ( [PARTITION BY column] [ORDER BY column] [ROWS BETWEEN ...] )
- PARTITION BY: Divides the result set into partitions to perform the window function independently in each.
- ORDER BY: Determines the order of rows within the window.
- ROWS BETWEEN: Specifies a window frame relative to the current row (e.g., 2 rows before to current).
Common Use-Cases and Examples:

1. Running Total
Query:
```
SELECT order_id, order_date, amount,
       SUM(amount) OVER (ORDER BY order_date) AS RunningTotal
FROM orders;
```
Input:

order_id order_date amount

1 2024-01-01 100

2 2024-01-02 200

3 2024-01-03 150

Output:

order_id order_date amount RunningTotal

1 2024-01-01 100 100

2 2024-01-02 200 300

3 2024-01-03 150 450

2. Moving Average (3-Day Window)
Query:
```
SELECT order_id, order_date, amount,
       AVG(amount) OVER (
           ORDER BY order_date
           ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
       ) AS MovingAvg
FROM orders;
```
Input:

order_id order_date amount

1 2024-01-01 100

2 2024-01-02 200

3 2024-01-03 150

4 2024-01-04 250

Output:

order_id order_date amount MovingAvg

1 2024-01-01 100 100.0

2 2024-01-02 200 150.0

3 2024-01-03 150 150.0

4 2024-01-04 250 200.0

Moving average is calculated over the current row and two before it (3 values total, or fewer if not available).

3. Ranking by Salary
Query:
```
SELECT employee_id, department, salary,
       RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS SalaryRank
FROM employees;
```
Input:

employee_id department salary

101 Sales 5000

102 Sales 7000

103 Sales 5000

201 HR 6000

202 HR 6500

Output:

employee_id department salary SalaryRank

102 Sales 7000 1

101 Sales 5000 2

103 Sales 5000 2

202 HR 6500 1

201 HR 6000 2

RANK() assigns the same rank to equal salaries and skips the next rank accordingly.

4. Lag and Lead (Comparing with Previous and Next Rows)
Query:
```
SELECT order_id, order_date, amount,
       LAG(amount) OVER (ORDER BY order_date) AS PrevAmount,
       LEAD(amount) OVER (ORDER BY order_date) AS NextAmount
FROM orders;
```
Input:

order_id order_date amount

1 2024-01-01 100

2 2024-01-02 200

3 2024-01-03 150

Output:

order_id order_date amount PrevAmount NextAmount

1 2024-01-01 100 NULL 200

2 2024-01-02 200 100 150

3 2024-01-03 150 200 NULL
Benefits:
- - Reduces the need for self-joins or subqueries.
  - Enhances performance by letting the DBMS optimize the window operations.

order_id	order_date	amount	MovingAvg
1	2024-01-01	100	100.0
2	2024-01-02	200	150.0
3	2024-01-03	150	150.0
4	2024-01-04	250	200.0

order_id	order_date	amount	PrevAmount	NextAmount
1	2024-01-01	100	NULL	200
2	2024-01-02	200	100	150
3	2024-01-03	150	200	NULL

employee_id	department	salary
101	Sales	5000
102	Sales	7000
103	Sales	5000
201	HR	6000
202	HR	6500

employee_id	department	salary	SalaryRank
102	Sales	7000	1
101	Sales	5000	2
103	Sales	5000	2
202	HR	6500	1
201	HR	6000	2