As commitment to our database literacy campaign, we're offering our Database Foundations course—for FREE!
Indexes are essential for enhancing query performance by allowing quick lookups of data. The choice of an index type depends on the data distribution, query patterns, and storage considerations. Below, we discuss different categories:
Clustered Indexes
-- Example: Creating a clustered index on OrderDate in SQL Server
CREATE CLUSTERED INDEX IDX_Orders_OrderDate
ON Orders (OrderDate);
Non-Clustered Indexes
-- Example: Creating a non-clustered index on LastName
CREATE NONCLUSTERED INDEX IDX_Customers_LastName
ON Customers (LastName);
Unique Indexes
-- Example: Creating a unique index on Email
CREATE UNIQUE INDEX IDX_Users_Email
ON Users (Email);
Composite Indexes
Definition: An index that includes multiple columns. They are useful when queries frequently filter on several columns simultaneously.
Considerations: The order of columns in a composite index is critical. It should reflect the most common query uses.
Example: If many queries filter by both LastName
and FirstName
, a composite index can be applied.
-- Example: Creating a composite index on LastName and FirstName
CREATE INDEX IDX_Customers_Name
ON Customers (LastName, FirstName);
Full-Text Indexes
-- Example: Creating a full-text index on a Documents table
CREATE FULLTEXT INDEX ON Documents (Content)
KEY INDEX PK_Documents;
Consider Query Patterns:
Data Distribution & Cardinality:
Maintenance Overhead:
An execution plan outlines the steps executed by the database engine to run a query. Analyzing these plans is key to determining and optimizing the effectiveness of indexing strategies.
Understanding Plan Operators:
Cost Estimation:
Example:
Database-Specific Tools:
Techniques:
Example:
-- Using EXPLAIN in MySQL
EXPLAIN SELECT * FROM Orders WHERE OrderDate > '2023-01-01';
Partitioning involves dividing a table into smaller, more manageable pieces without changing the logical structure. This technique can significantly boost performance and ease database maintenance.
Definition:
Horizontal partitioning (also known as sharding in certain contexts) distributes rows of a table across multiple physical segments (tables or servers) based on a partitioning key.
Benefits:
Design Considerations:
Example:
-- SQL Server example using partitioning function and scheme
CREATE PARTITION FUNCTION pfOrderDate(datetime)
AS RANGE RIGHT FOR VALUES ('2020-01-01', '2021-01-01', '2022-01-01');
CREATE PARTITION SCHEME psOrderDate
AS PARTITION pfOrderDate
TO ([PRIMARY], [SECONDARY], [TERTIARY], [QUATERNARY]);
CREATE TABLE Orders (
OrderId INT PRIMARY KEY,
OrderDate datetime,
CustomerId INT,
...
) ON psOrderDate(OrderDate);
Definition:
Vertical partitioning divides a table by columns. Critical, frequently accessed columns are stored separately from infrequently accessed or sensitive columns.
Benefits:
Design Considerations:
Example:
-- Employee tables vertical partition example
-- Employee_Public: Contains non-sensitive, frequently accessed columns.
CREATE TABLE Employee_Public (
EmployeeId INT PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50),
Department VARCHAR(50)
);
-- Employee_Sensitive: Contains sensitive columns.
CREATE TABLE Employee_Sensitive (
EmployeeId INT PRIMARY KEY,
SocialSecurityNumber CHAR(11),
Salary DECIMAL(10,2),
FOREIGN KEY (EmployeeId) REFERENCES Employee_Public(EmployeeId)
);
Sharding is a partitioning method that distributes data across multiple servers. Though it’s often associated with NoSQL environments, sharding can be applied within traditional RDBMS systems to improve scalability.
Definition:
Sharding is the process of horizontally partitioning data across multiple databases or servers (called shards), each operating as an independent, self-contained database.
Benefits:
Shard Key Selection:
Common Sharding Approaches:
Example:
-- Pseudocode for a hash-based sharding algorithm
DECLARE @UserId INT = 12345;
DECLARE @ShardId INT = @UserId % 4; -- Assuming we have 4 shards
-- The above calculation helps determine which shard the user's data resides in.
Considerations in RDBMS Environment: