Fast-FST
Overview
What started as a quick test script for the fst crate turned into a proper
performance experiment. The fst crate builds a Finite State Transducer — a compact, immutable
data structure for storing sorted strings that allows Levenshtein automaton queries in microseconds.
I'm still learning Rust. There are bugs and rough edges. But I learned more about real performance engineering in this project than in almost anything else I've worked on.
What it does
Type a query, get fuzzy matches back — instantly. The dictionary is compiled once into a .fst
binary file. From then on, every search streams through that structure without loading anything
into RAM.
Benchmarks
Measured with Criterion.rs on a local machine.
| Query | Latency |
|---|---|
Exact match — apple | ~113 µs |
Fuzzy match (short) — aple | ~114 µs |
Fuzzy match (long) — interational | ~231 µs |
Storage-wise, the compiled FST comes in at 279 KB versus 977 KB for the raw text —
a 71% reduction — because the FST shares prefixes and suffixes across all entries.
Cold startup (when dict.fst already exists) is around ~3.8 µs.
Optimizations
Heap-based top-K ranking
The naive approach collects every fuzzy match, then sorts. That's O(N log N) and uses
O(N) memory. Instead I used std::collections::BinaryHeap capped at 10 entries, keeping
only the best results as they come in. Complexity drops to O(N log 10) — effectively linear.
Incremental build
Rebuilding the FST from scratch every run took ~350 ms. I added a simple mtime check — the
same idea behind make — that skips the build if dict.fst is newer than dict.txt.
Startup time went from 350 ms → ~8 µs. That's roughly a 40,000× speedup for the common case.
Zero-RAM streaming construction
Loading the dictionary into a Vec<String> before building the FST would be O(N) in memory.
Instead, I stream lines from disk directly into the FST builder. RAM usage during construction
stays constant — O(1) — regardless of dictionary size.
Limitations
Levenshtein distance 1 is fast (~200 µs). Distance 2 is noticeably slower (~1.5 ms). The automaton's state space grows exponentially with distance, and even the heap optimisation can't fully hide that. Distance ≥ 3 is currently not practical.
What I learned
- How FSTs represent sorted string sets and why they compress so well
- How to integrate Criterion benchmarks and read the output properly
- That
BinaryHeapis the right tool whenever you only need the top-N of a large stream - Incremental rebuilds are almost always worth adding — mtime checks are five lines of code