Memory is often the hidden tax in software design. You can get something working quickly with familiar tools, ship it, and only later discover that the real bottleneck was never CPU time — it was representation.
That’s what makes Andrew Quinn’s write-up about replacing a 3 GB SQLite database with a tiny finite state transducer binary so satisfying. The numbers are dramatic, but the deeper lesson is about choosing a data structure that matches the shape of the problem.
The project is a Finnish-English dictionary called Taskusanakirja, designed for incremental search-as-you-type. At small scale, a trie was enough. Prefix search is what tries are built for, and with some caching and match limiting, the system handled hundreds of thousands of words comfortably while staying around 50–60 MB of memory.
Then Finnish happened.
Finnish is aggressively agglutinative: a single root word can explode into huge numbers of inflected forms. Case endings, plural markers, possessives, clitics, consonant gradation, vowel harmony — the language manufactures variants at industrial scale. A naïve trie handles shared prefixes well, but it pays for every suffix separately.
That distinction matters more than it first appears.
Consider English autocomplete. Many words diverge quickly after a common beginning:
- auto…
- autopsy…
- autonomous…
- automobile…
A trie shines here because the overlap is mostly near the start.
Finnish pushes overlap in the opposite direction. Thousands upon thousands of words may terminate in the same grammatical endings. The structure of the language creates massive redundancy in suffixes, not prefixes.
That is where finite state transducers become interesting.
Unlike a trie, a minimal deterministic automaton can merge structurally identical suffix trees. Two words that end differently in a trie may collapse into the same tail representation if their remaining structure matches. For natural language corpora with highly repetitive endings, this becomes extraordinarily efficient.
The result in Quinn’s case was absurdly effective: a multi-gigabyte SQLite database collapsed into roughly ten megabytes.
Not by compressing harder. Not by inventing a new algorithm. By using a representation aligned with the statistical structure of the data.
That distinction is easy to miss in modern software engineering because storage is cheap and databases are everywhere. SQLite with Full Text Search was already “fast enough.” It solved the problem. Users could search instantly. The only real downside was distribution size: downloading several gigabytes for a pocket dictionary is difficult to justify, especially when the software is intended to run on older hardware.
And that detail is important. Constraints create taste.
A desktop app targeting modern gaming PCs might never force this kind of optimization. But software intended for aging laptops, unreliable internet connections, and globally diverse hardware environments exposes waste very quickly. Suddenly a 300× reduction is not academic elegance — it directly changes who can use the software.
There’s also a refreshing honesty in the development path itself.
The SQLite implementation was not a mistake. It was the bridge that made the eventual solution possible. The system worked, shipped, and generated enough understanding of the workload to expose the real shape of the problem later.
That pattern repeats constantly in engineering:
- Build the obvious version.
- Learn where reality differs from theory.
- Replace the expensive abstraction with a narrower one.
Experienced engineers often arrive at highly optimized systems not because they started clever, but because they accumulated enough friction from the first implementation to know exactly what needed to disappear.
The post also highlights something easy to underestimate: data structures are compression algorithms.
We usually think about compression as a separate phase — ZIP files, entropy coding, binary packing. But many classic computer science structures are fundamentally about exploiting redundancy:
- tries compress repeated prefixes
- hash tables compress sparse lookup space into dense memory
- columnar databases compress repeated values across rows
- suffix arrays compress repeated text structure
- finite automata compress repeated transitions
The “algorithm” is often secondary. The representation is the optimization.
That is why the jump from SQLite to an FST feels so disproportionate. SQLite is general-purpose infrastructure. It supports transactions, updates, indexes, concurrency semantics, query planning, persistence guarantees. An FST supports almost none of that. It is static and specialized.
But specialization is precisely where the efficiency comes from.
General-purpose systems carry the weight of possibility. Specialized systems carry only the weight of reality.
And sometimes reality only weighs ten megabytes.