This section provides insight into how RDBMS architecture supports database functionality and performance.
Database Engines
Types and Implementations
- InnoDB (MySQL/MariaDB): Transaction-safe engine with ACID compliance, row-level locking, and foreign key constraints support. Features crash recovery and multi-versioning for concurrent transactions.
- MyISAM (MySQL): Non-transactional engine optimized for read-heavy workloads with table-level locking. Offers faster performance for read-only scenarios.
- PostgreSQL's Engine: Single integrated engine with extensible architecture supporting custom data types, operators, and functions. Implements MVCC (Multi-Version Concurrency Control) natively.
- Oracle Database Engine: Sophisticated engine with proprietary storage structures (tablespaces, segments, extents, blocks) and background processes for managing memory and I/O.
- SQL Server Engine: Consists of relational engine (query processing) and storage engine (data access and management).
Engine Selection Criteria
- Transactional requirements (ACID properties)
- Read vs. write workload balance
- Concurrency requirements
- Availability and recovery requirements
- Scaling needs (vertical vs. horizontal)
Storage Mechanisms
Physical Storage Structures
- Pages/Blocks: Fundamental unit of I/O (typically 4KB to 16KB)
- Extents: Collection of contiguous pages/blocks
- Segments: Logical storage units composed of extents
- Tablespaces: Container for segments across multiple files
Data Organization
- Heap Tables: Unordered collections of rows
- Index-Organized Tables: Data stored in B-tree structure ordered by primary key
- Partitioning: Horizontal division of tables based on range, list, hash, or composite schemes
- Sharding: Distributing data across multiple physical servers
Storage Format Optimization
- Row vs. Column Storage: Tradeoffs between OLTP and OLAP workloads
- Compression Techniques: Page, row, and column level compression
- Data Encoding: Binary encoding, dictionary encoding for space efficiency
- Sparse Data Handling: Null value storage optimization
Memory and Disk Utilization
Memory Components
- Buffer Pool/Cache: Keeps frequently accessed data pages in memory
- Query Cache: Stores results of frequent queries
- Sort Buffer: Dedicated memory for sorting operations
- Join Buffer: Memory allocated for join operations
- Log Buffer: Temporarily stores transaction log records
Memory Management Strategies
- LRU (Least Recently Used): Algorithm for page replacement
- Pinning: Keeping critical pages in memory
- Scan-resistant Caching: Preventing full table scans from flushing useful data
- Adaptive Hash Indexing: Dynamic in-memory indexing based on access patterns
Disk I/O Optimization
- Read-ahead: Prefetching anticipated pages
- Write Clustering: Grouping multiple writes into fewer I/O operations
- Asynchronous I/O: Non-blocking disk operations
- Direct I/O vs. Buffered I/O: Bypassing OS cache for specialized workloads
- RAID Configurations: Different redundancy and performance options (RAID 0, 1, 5, 10)
Concurrency and Transaction Management
Transaction Isolation Levels
- Read Uncommitted: Lowest isolation, allows dirty reads
- Read Committed: Prevents dirty reads, but allows non-repeatable reads
- Repeatable Read: Prevents non-repeatable reads, but allows phantom reads
- Serializable: Highest isolation, prevents all concurrency anomalies
Locking Mechanisms
- Shared (S) vs. Exclusive (X) Locks: Reading vs. writing
- Granularity Levels: Row, page, table locks
- Deadlock Detection and Resolution: Timeout or dependency cycle detection
- Lock Escalation: Converting multiple fine-grained locks to fewer coarse-grained locks
Concurrency Control Methods
- Pessimistic Concurrency Control: Locks resources before accessing
- Optimistic Concurrency Control: Checks for conflicts at commit time
- MVCC (Multi-Version Concurrency Control): Maintains multiple data versions to enhance concurrency
- Two-Phase Commit Protocol: Ensures atomic transaction across distributed systems
Transaction Logging
- Write-Ahead Logging (WAL): Recording changes before applying to data
- Checkpoint Process: Synchronizing memory state with disk
- Redo vs. Undo Logs: Forward vs. backward recovery mechanisms
- Point-in-time Recovery: Ability to restore database to specific moment
Last modified: Thursday, 10 April 2025, 3:59 PM