A relational query engine that parses SQL, builds a logical plan, applies cost-based optimizations, and executes via a Volcano-model iterator tree. Supports B-tree indexes, joins, aggregations, and transactions.
Architecture
- Parser — Hand-written recursive descent parser producing an AST
- Planner — Converts AST to a logical plan, then rewrites with push-down predicates and join reordering
- Optimizer — Cost model using table statistics (row counts, column cardinality) to pick index scans vs. full scans
- Executor — Volcano iterator model: each node implements
next(), composing into a tree - Storage — Page-based storage with a buffer pool manager, B-tree index, and MVCC for transactions
Design Decisions
- Chose Volcano model over vectorized execution for simplicity — easier to reason about operator boundaries
- MVCC over 2PL: avoids read-write conflicts without locks, at the cost of more complex GC
- Predicate push-down happens before join reordering, keeping the optimizer deterministic
- Fixed 4KB page size — trades flexibility for simpler alignment with OS page size
What I Learned
The buffer pool manager is the most consequential component — a bug there corrupts data silently. Rust's ownership model caught several use-after-free issues at compile time that would have been subtle runtime bugs in C++.
Cost-based optimization is hard to get right without good statistics. Cardinality estimation errors compound through join chains, leading to dramatically wrong plan choices.