A distributed key-value store built from scratch to understand consensus protocols and storage engine design. Supports linearizable reads, strong consistency, and automatic leader election.
Problem
Most KV stores abstract away the hard parts — consensus, log replication, and failure recovery. I wanted to understand what happens at each layer by building it.
Architecture
The system is structured in three layers:
- Consensus layer — Raft implementation handling leader election, log replication, and membership changes
- Storage layer — Write-ahead log (WAL) for durability, with an in-memory hash map as the state machine
- Transport layer — gRPC for node-to-node communication, HTTP for client-facing API
Design Decisions
- Linearizable reads require routing through the leader to avoid stale data on followers
- Log compaction via snapshotting prevents unbounded WAL growth
- Heartbeat interval tuned to balance leader election latency vs. network overhead
- Batch log entries before appending to reduce disk I/O on the critical path
What I Learned
Raft is deceptively simple on paper. The edge cases — split votes, log conflicts during network partitions, snapshot installation — took most of the debugging time. The hardest part was not the algorithm, but making the state transitions observable enough to reason about failures.
Key Numbers
- ~10k ops/sec on a 3-node cluster (single-node benchmark: ~80k ops/sec)
- P99 write latency: ~4ms (LAN)
- Leader election converges in <150ms under typical failure scenarios