Fast-FST

Overview

What started as a quick test script for the fst crate turned into a proper performance experiment. The fst crate builds a Finite State Transducer — a compact, immutable data structure for storing sorted strings that allows Levenshtein automaton queries in microseconds.

I'm still learning Rust. There are bugs and rough edges. But I learned more about real performance engineering in this project than in almost anything else I've worked on.

What it does

Type a query, get fuzzy matches back — instantly. The dictionary is compiled once into a .fst binary file. From then on, every search streams through that structure without loading anything into RAM.

Benchmarks

Measured with Criterion.rs on a local machine.

QueryLatency
Exact match — apple~113 µs
Fuzzy match (short) — aple~114 µs
Fuzzy match (long) — interational~231 µs

Storage-wise, the compiled FST comes in at 279 KB versus 977 KB for the raw text — a 71% reduction — because the FST shares prefixes and suffixes across all entries. Cold startup (when dict.fst already exists) is around ~3.8 µs.

Optimizations

Heap-based top-K ranking

The naive approach collects every fuzzy match, then sorts. That's O(N log N) and uses O(N) memory. Instead I used std::collections::BinaryHeap capped at 10 entries, keeping only the best results as they come in. Complexity drops to O(N log 10) — effectively linear.

Incremental build

Rebuilding the FST from scratch every run took ~350 ms. I added a simple mtime check — the same idea behind make — that skips the build if dict.fst is newer than dict.txt. Startup time went from 350 ms → ~8 µs. That's roughly a 40,000× speedup for the common case.

Zero-RAM streaming construction

Loading the dictionary into a Vec<String> before building the FST would be O(N) in memory. Instead, I stream lines from disk directly into the FST builder. RAM usage during construction stays constant — O(1) — regardless of dictionary size.

Limitations

Levenshtein distance 1 is fast (~200 µs). Distance 2 is noticeably slower (~1.5 ms). The automaton's state space grows exponentially with distance, and even the heap optimisation can't fully hide that. Distance ≥ 3 is currently not practical.

What I learned

Source

github.com/HaiqalALy/fast-fst