language preservation

open source

It gave my community a safety net when the last elder died. Not perfect — but something.
The AI Translating Dying Languages Back to Life

Kallawaya is not a language you can find on Google Translate. It is a ritual tongue, passed from father to son among the itinerant healers of the Bolivian highlands for at least six centuries. Fewer than 200 speakers remain. None of them are young.

For decades, linguists documented what they could. Notebooks filled up. Hard drives were made. The knowledge sat, inaccessible, in formats that required specialists to unlock.

Then a small team at a university in La Paz began feeding those archives into a fine-tuned language model.

Training on Scarcity

The challenge with endangered languages is not just vocabulary. It is context, register, and the way meaning is encoded in things that are not said. Standard large language models are trained on billions of tokens. The Kallawaya corpus contains thousands.

The team's approach was to treat the model not as a translator but as a retrieval and reconstruction tool — surfacing patterns from fragmentary evidence the way an archaeologist reads a shattered pot.

What the Model Gets Right and Wrong

The results are uneven, which the team is open about. The model is reliable on plant names, ritual phrases, and formulaic sequences. It struggles with narrative — the parts of a language that carry the most culture.

But even partial recovery is valuable. A healer's apprentice can now search an archive that would have taken years to read by hand. A community that lost its last fluent speaker can still hear something of their ancestors' voice.

A Tool, Not a Replacement

The team is clear about what AI can and cannot do here. It cannot replace a living language community. It cannot repair the colonial ruptures that put these languages in danger in the first place.

What it can do is buy time — and return something to the people it belongs to.

• • •