Fast-FST

#rust #algorithms #performance

Overview

What started as a quick test script for the fst crate turned into a proper performance experiment. The fst crate builds a Finite State Transducer — a compact, immutable data structure for storing sorted strings that allows Levenshtein automaton queries in microseconds.

I'm still learning Rust. There are bugs and rough edges. But I learned more about real performance engineering in this project than in almost anything else I've worked on.

What it does

Type a query, get fuzzy matches back — instantly. The dictionary is compiled once into a .fst binary file. From then on, every search streams through that structure without loading anything into RAM.

Benchmarks

Measured with Criterion.rs on a local machine.

Query	Latency
Exact match — `apple`	~113 µs
Fuzzy match (short) — `aple`	~114 µs
Fuzzy match (long) — `interational`	~231 µs

Storage-wise, the compiled FST comes in at 279 KB versus 977 KB for the raw text — a 71% reduction — because the FST shares prefixes and suffixes across all entries. Cold startup (when dict.fst already exists) is around ~3.8 µs.

Optimizations

Heap-based top-K ranking

The naive approach collects every fuzzy match, then sorts. That's O(N log N) and uses O(N) memory. Instead I used std::collections::BinaryHeap capped at 10 entries, keeping only the best results as they come in. Complexity drops to O(N log 10) — effectively linear.

Incremental build

Rebuilding the FST from scratch every run took ~350 ms. I added a simple mtime check — the same idea behind make — that skips the build if dict.fst is newer than dict.txt. Startup time went from 350 ms → ~8 µs. That's roughly a 40,000× speedup for the common case.

Zero-RAM streaming construction

Loading the dictionary into a Vec<String> before building the FST would be O(N) in memory. Instead, I stream lines from disk directly into the FST builder. RAM usage during construction stays constant — O(1) — regardless of dictionary size.

Limitations

Levenshtein distance 1 is fast (~200 µs). Distance 2 is noticeably slower (~1.5 ms). The automaton's state space grows exponentially with distance, and even the heap optimisation can't fully hide that. Distance ≥ 3 is currently not practical.

What I learned

How FSTs represent sorted string sets and why they compress so well
How to integrate Criterion benchmarks and read the output properly
That BinaryHeap is the right tool whenever you only need the top-N of a large stream
Incremental rebuilds are almost always worth adding — mtime checks are five lines of code

Source

github.com/HaiqalALy/fast-fst

Overview#

What it does#

Benchmarks#

Optimizations#

Heap-based top-K ranking#

Incremental build#

Zero-RAM streaming construction#

Limitations#

What I learned#

Source#