Every language app gives you the same feedback: ❌ Wrong. Try again.
That's not teaching. That's a slot machine with educational branding. The difference between a typo and a conceptual gap is the difference between "you fat-fingered it" and "you don't understand accusative case." A human tutor sees this instantly. Duolingo doesn't try.
And for classical languages? The apps don't even pretend to work. Latin has 50+ forms per noun. Ancient Greek has more. Old Norse hasn't been spoken in centuries. The market's answer: "Just use Anki."
Interlinear knows why you're wrong—not just that you are.
Five error types. Five different interventions. The same evaluation harness patterns we use for production LLMs—applied to language learning.
Then: upload any text and the system generates a full course. Comprehension exercises, translation drills, contextual dialogs, vocabulary cards. All calibrated to your demonstrated level.
The error correction, the course generation, the pedagogical engine—all of it feeds into conversations that actually teach. Not chatbots. Tutors.
LLMs can finally do real morphological analysis—not pattern matching. Claude understands why puellam is accusative. It can generate pedagogically-sound exercises, not just random fill-in-the-blanks.
And ElevenLabs + lip-sync models mean the talking heads aren't a research project anymore. The pieces exist. Someone just has to assemble them with pedagogical rigor.
Frederick Sengstacke is a technical writer turned engineer who's spent years explaining hard things clearly. Former Curriculum Engineer at Trilogy Education. Understands how people actually learn—not just how to build software.
The error taxonomy isn't a feature. It's how production LLM evals work. Different root causes need different interventions. That insight—applied to education—is the entire thesis.
Academic rigor. Research preparation. The language of the sagas, the Eddas, and the Vikings—taught properly.
Eventually: a conversation with a skald.