Hello World

Be Happy!

MySQL Indexing and B-Tree Deep Dive


This guide provides a comprehensive explanation of how MySQL indexes work, with a focus on B-tree structure and composite indexes. It's designed to help developers understand database indexing to optimize query performance.

Index Basics
MySQL indexes are data structures that improve query performance by providing fast lookup paths to data. Most MySQL indexes use B-tree data structures.

sql
CREATE INDEX idx_name ON table(column);
Primary Benefits:
  • Dramatically faster data retrieval
  • Efficient sorting operations
  • Support for uniqueness constraints
B-Tree Structure
What is a B-Tree?
Despite the name, a B-tree is NOT a binary tree. The "B" likely stands for "Balanced" or "Bayer" (the inventor), not "Binary."
B-trees allow:
  • Multiple keys per node
  • Multiple children per node
  • Self-balancing properties
B-Tree Components

                Root Node
                ["M"]
               /     \
     Internal Node    Internal Node
     ["D","H"]        ["R","V"]
    /    |    \      /    |    \
   Leaf Nodes containing actual data pointers
  • Separator Keys: Values like ["M"] or ["D","H"] that divide the data
  • Root Node: Top-level node with separator keys
  • Internal Nodes: Middle-level nodes with separator keys
  • Leaf Nodes: Bottom-level nodes with actual indexed values and row pointers
How Separator Keys Work
The separator keys (like "M") divide the data into ranges:
  • If searching for "Smith": "S" comes after "M", so go right
  • If searching for "Brown": "B" comes before "M", so go left
This division allows efficient O(log n) searches even with huge datasets.
Composite Indexes
A composite index includes multiple columns:

sql
CREATE INDEX idx_composite ON users(last_name, first_name, birth_date);
Single B-Tree Structure
Composite indexes use a single B-tree with hierarchical organization:
  • Primary sort by first column (e.g., last_name)
  • Secondary sort by second column (e.g., first_name) within each first column value
  • Tertiary sort by third column (e.g., birth_date) within each first+second column combination
Why Not Multiple B-Trees?
Using a single B-tree for composite indexes offers several advantages:
  • Storage Efficiency: Only one copy of data
  • Maintenance Overhead: Only one structure to update
  • Query Optimization: Perfect for multi-column conditions
  • Atomic Updates: Consistency with single updates
  • Range Queries: Efficient for complex conditions
The Leftmost Prefix Rule
The most critical rule for composite indexes is the "leftmost prefix rule":
A composite index can only be efficiently used if the query references a prefix of the indexed columns, starting from the leftmost column.
For an index on (last_name, first_name, birth_date):
Will Use Index Efficiently

sql
WHERE last_name = 'Smith'
WHERE last_name = 'Smith' AND first_name = 'John'
WHERE last_name = 'Smith' AND first_name = 'John' AND birth_date = '1990-01-01'
Will NOT Use Index Efficiently

sql
WHERE first_name = 'John'
WHERE birth_date = '1990-01-01'
WHERE last_name = 'Smith' AND birth_date = '1990-01-01' -- Skips middle column
Range Condition Limitations
Once you hit a range condition, subsequent columns can't be used for searching:

sql
-- Can use index for both last_name and first_name
WHERE last_name = 'Smith' AND first_name = 'John'

-- Can only use index for last_name, then filter results by first_name
WHERE last_name = 'Smith' AND first_name LIKE 'J%'
Multiple Column Filtering
When a query has conditions on multiple columns, the filtering happens in stages:
Two-Phase Filtering Process
  1. First Filtering (Index Seek)
    • Uses B-tree traversal based on the first column
    • Fast, O(log n) operation
    • Finds leaf node containing all matching first-column records
  2. Second Filtering (Index Scan)
    • Scans within the leaf node(s) for second-column matches
    • Usually fast if first column is selective
    • Performance depends on how many records match the first condition
Example:

sql
WHERE last_name = 'Smith' AND first_name = 'John'
  1. Use B-tree to find all 'Smith' records
  2. Scan those records for first_name = 'John'
Performance Considerations
Column Order Matters
  • Put high-selectivity columns (with more unique values) first
  • Put frequently used filter columns early in the index
  • Put equality conditions before range conditions
Selectivity Problems
If the first column has low selectivity (many duplicate values):
  • The second filtering step becomes O(n) in complexity
  • May need to scan thousands or millions of records
  • Can lead to poor performance
Solutions for Low Selectivity
  1. Reorder index columns for better selectivity
  2. Create additional indexes for different query patterns
  3. Ensure database statistics are up-to-date
  4. Use EXPLAIN to analyze actual execution plans
Rails Implementation
Creating indexes in Rails migrations:

ruby
class AddIndexesToUsers < ActiveRecord::Migration[7.0]
  def change
    # Single column index
    add_index :users, :email, unique: true
    
    # Composite index
    add_index :users, [:last_name, :first_name, :birth_date], 
              name: :idx_users_name_birth
              
    # Index with specific sort orders (MySQL 8.0+)
    add_index :users, [:last_name, :first_name, :birth_date], 
              name: :idx_users_mixed_order,
              order: { last_name: :asc, first_name: :desc, birth_date: :asc }
  end
end
Checking Indexes in Rails Console

ruby
# Show all indexes on a table
ActiveRecord::Base.connection.indexes(:users)

# Use EXPLAIN to see how a query uses indexes
User.where(last_name: 'Smith', first_name: 'John').explain

Conclusion
Understanding B-tree structure and the leftmost prefix rule is crucial for effective database indexing. By designing indexes that match your query patterns and considering column selectivity, you can dramatically improve application performance.
Remember:
  1. B-trees organize data hierarchically
  2. Composite indexes use a single B-tree with nested sorting
  3. The leftmost prefix rule determines index efficiency
  4. Column order significantly impacts performance
  5. Use EXPLAIN to verify index usage
#mysql (10) #index (1)
List