How Real-Time AI Translation Works for Baduk Lessons

The strongest Baduk teachers in the world live in Korea, China, and Japan. Many of them — including most current and recently retired pros — do not speak conversational English. For an English-speaking student this has been the single biggest barrier to lessons with a genuinely strong teacher.

We built a translation pipeline specifically to remove that barrier inside a live 1:1 lesson, so a French student can read fluent French in their chat panel while a Korean teacher explains a position in Korean a second earlier. This article is the honest look at what is actually happening when that works — and when, occasionally, it doesn't.

Last updated: 2026-05-26.

What you see in the lesson

Open a lesson with a Korean teacher and turn on subtitles. As the teacher speaks, two things happen in the panel next to the Go board:

A line of text appears in the teacher's language (Korean characters in this case), updating live as they speak.
A second line appears below it in the language you set when you joined — English, Chinese, Japanese, French, German.

The visible delay between the teacher's voice and the translated text is typically one to two seconds. Long enough that you notice; short enough that the rhythm of a lesson holds together. You can ignore the source-language line entirely if you don't read it; it is shown because some students explicitly want to see the teacher's exact words, and because — for the small fraction of utterances that the AI gets wrong — being able to glance at the source helps you flag it.

What's happening underneath

Four stages, roughly:

Capture. Your browser records short rolling chunks of the teacher's microphone audio in a standard WebM container. These chunks are streamed over a WebSocket to our server.
Speech recognition (ASR). The server forwards each audio chunk to a speech-recognition provider — currently Deepgram — configured for the teacher's spoken language. The provider returns text, both interim (best guess so far, updated as more audio arrives) and final (locked in once the speaker pauses).
Translation. Each final transcript is sent to a translation layer running on top of an LLM, with a Baduk-specific system prompt and a few-shot vocabulary. The translation is generated in chunks and streamed back as soon as the first words arrive.
Display. Both the source-language transcript and the streamed translation are pushed back to your browser over the same WebSocket and rendered into the lesson chat panel.

The whole loop — your teacher's voice to text in your language on your screen — generally completes in about one second of network and inference latency, with the streamed translation visibly typing out for another half-second or so after that.

Why generic translation tools fail at this

You might reasonably ask: Google Translate exists. Zoom has live captions. WeChat Meeting translates between Chinese and English. Why not just use those?

We tried. Here is what breaks, in order of severity:

No Baduk vocabulary. Generic translators do not know that sente — when its Chinese-character form 先手 appears in a Japanese or Chinese source — refers to a move that keeps the initiative, not a literal "first hand" or "previous hand". They do not know that aji (味) refers to latent potential or weakness in a position rather than "taste", that hane (ハネ) is a diagonal move that bends around an opponent's stone and not the surname it transliterates to, that joseki (定石) is a standard sequence — usually but not always in a corner — and not "fixed pattern". On the Korean side, the Hangul 선수 is the canonical example: in everyday Korean it means "athlete" or "player" (Hanja 選手), but in Baduk it means sente (Hanja 先手), and generic captioning will default to "athlete" mid-game-review because that is overwhelmingly the most common meaning outside the board. Once these terms are mangled, the rest of the explanation collapses around them.

Generic translation, not domain translation. Tencent Meeting added Korean and 14 other languages to its real-time translation in August 2024, so the older "Chinese ↔ English only" complaint no longer applies — but the translation is generic-domain. The same model that handles a sales meeting handles a Baduk lesson, and on Baduk-specific vocabulary it makes the same kinds of mistakes Zoom and Google Translate make. The feature is also gated to Enterprise and Business editions, which most individual teachers don't have. Adding a language to the picker is not the same as adding a domain to the model.

Subtitles aren't live. Pre-recorded YouTube lectures with English subtitles are a fine learning resource, but they are not a lesson. You cannot ask the teacher to clarify a move. You cannot show them your own game. The point of live translation is the live part.

Human interpreters cost as much as the lesson again. A qualified Korean–English interpreter in a teaching context runs about $60 per hour and up, which roughly doubles the cost of every lesson. We have met exactly one person in three years of running this who actually hired one. It is not a real option for ordinary students.

The Baduk-aware part

Anyone can wire ASR to a translation API and call it a product. The translation quality you get back is generic, and for a Baduk lesson generic is not usable. Several pieces of the pipeline are specific to this domain:

A Baduk-aware system prompt. Before any utterance is sent for translation, the LLM is told it is translating live commentary from a Baduk lesson — that the speaker is analyzing positions, discussing moves, shapes, strategy, and reading sequences. Without this framing, even when the model knows the words individually it tends to default to colloquial register and miss the technical meaning.

Source-language disambiguation rules. Some words mean very different things inside a Baduk lesson than they do outside. The system prompt explicitly lists these. A few from Korean alone:

백 (baek) → "White" (the player or white stones). Not "100", not "back."
흑 (heuk) → "Black" (player or black stones).
집 (jip) → "territory" — area controlled on the board. Not "house."
모양 (moyang) → "shape" — the quality of a stone formation. Not "appearance."
선수 (seonsu) → "player" or "sente" depending on context. The prompt teaches the model to pick by context: "흑선수" is "the Black player"; "여기서 선수를 쳐야 합니다" is "you need to play sente here."

Without disambiguation, a generic translator regularly turns "this move is sente" into "this move is a previous hand" or, worse, just leaves it transliterated. Lessons with that quality of subtitle are unusable.

Baduk-specific glossary, paired both ways. The pipeline carries a glossary mapping core Baduk terminology — joseki, fuseki, miai, semeai, atari, dame, ko, kosumi, ponnuki, and roughly a hundred more — between every supported language pair. The translator uses this as ground truth; if a teacher says 정석 in Korean, the English chat reads "joseki", not "standard sequence" and not "fixed pattern." Same for aji, hane, miai — kept in their canonical Romaji form rather than literal-translated.

Few-shot examples for register switching. Lessons aren't pure technical commentary. Teachers and students throw in brief acknowledgements — "OK", "got it", "wait a moment", "see you next time" — between Go content. Translated literally inside the lesson context, these used to come out as Go-move commentary ("this move is great") because the surrounding context was Go. We caught this from a French student whose "Ça marche !" landed in her Chinese teacher's panel as praise of a stone placement. The pipeline now teaches the model to recognize brief conversational interjections and translate them as ordinary chat even mid-lesson, with paired examples for each language pair we ship — it's a single prompt block, not a separate model.

Recent-conversation context. Each translation request includes the last few turns of the lesson, so a pronoun like "이거" ("this") or a reference like "그 수" ("that move") translates relative to the actual move under discussion, not in a vacuum.

Honest limits

We will not tell you this is magic. A few real constraints:

Latency. One to two seconds is typical, occasionally three under load. This affects the rhythm of fast back-and-forth; for a teacher narrating an analysis at normal pace it's fine.
Audio quality matters. A teacher in a quiet room with a decent USB headset gets near-perfect transcription. A teacher on a phone in a café does not. We surface a warning if the input audio gets too noisy; we cannot fix it on the server side.
Occasional mistranslations are inevitable. Streaming LLM output is not deterministic, and ASR misreads happen. We see roughly one questionable sentence per ten minutes of teaching in clean conditions, more in noisy ones. The two-pane display (source + translation) is partly so you can spot when something looks off.

The right way to think about it: this is the difference between "a Korean lesson is not possible at all because we share no language" and "a Korean lesson is possible, with a couple of seconds of latency and the occasional weird subtitle." For most students, that gap is the difference between learning from the strongest teachers in the world and not.

Try a translated lesson with a pro-level teacher — browse our roster. Every teacher's profile shows the languages they speak; the translation works for any pair from there to yours.

Where this is heading

We are still improving the translation layer continuously. The disambiguation list grows as we find new edge cases — for example, the Korean word 행마 (haengma) is a Baduk term meaning "stone movement" or "the way stones travel," and a generic translator unaware of the domain will surface "march" or "parade" instead, because 행마 in everyday Korean is a procession or parade. Each pair we ship adds its own few-shot examples for the most common interjections that fall outside Go content. None of these are model updates we wait on a vendor for; they are prompt-level adjustments we ship on our own cadence.

If you want context on what to do with this once it works, read our companion guide on how to learn Baduk online — it covers the full progression from beginner apps to pro-level lessons. The AI-translation piece is the part that makes the final step practical for English-speaking students for the first time.