Feature Guide
Core Features
Translate Once, Output Many
Translate the same file into multiple languages in a single run — perfect for multilingual subtitles or i18n projects. For example, translate an English subtitle file into Chinese, Japanese, German, and French at once and download all versions packaged. 120+ languages supported, with more added regularly.
Translation Cache
Translation results are saved locally in your browser. When parameters match, the tool returns the cached result and skips the API call:
- Persistent: survives refreshes and browser restarts
- High capacity: holds millions of records without bloating memory
- Toggle off: disable temporarily when debugging prompts or model settings
- One-click clear: clean up everything from the settings panel
- Hit conditions: source text + source/target language + service-specific key params must all match (LLMs key off prompt/temperature/thinking effort; Qwen-MT off
domains; traditional MT only off the source text). Changing temperature, editing prompts, toggling thinking, or editing glossary terms will miss the cache.
Glossary
Pin fixed translations for names and domain terms so they stay consistent across a whole file — or a whole series. Character names in episode subtitles and product terms in technical docs stop drifting between runs.
Where to enable: the "Glossary" card in the translation settings panel (shown only when the selected service supports it). A glossary status chip next to the API status badge on the main page shows the on/off state and term count — click it to jump to settings. Turning the switch on creates a default glossary automatically if you don't have one yet.
Managing terms:
- Each term = source word → required translation, bound to one target language — the same word can carry a different translation per target language, and only the current target language's terms apply
- Click "Edit terms" to open the editor: inline add/edit/delete, search filtering, duplicate-source warnings, automatic pagination past 50 terms
- TSV bulk import/export: one term per line as
source ⇥ translation(tab-separated, pastes straight from Excel / any spreadsheet); an optional third column takes a target language code (zh,ja, …) so one file can import terms for several languages - Multiple glossary presets (e.g. one per show / project) with create, rename, delete — included in settings import/export
How terms are enforced (multiple layers):
- LLM services: only the terms actually present in the current text are injected into the system prompt (no token waste from shipping hundreds of terms with every request)
- Qwen-MT: terms go through the official native
translation_options.termsparameter — the most reliable channel - Violation retry: if an LLM translation ignores a required term, the offending line is retried once with a stricter instruction, and the version with fewer violations wins
- Post-hoc replacement net: on every service, source words left untranslated in the output are replaced per the glossary (word boundaries for Latin terms, substring matching for CJK, longest term first) — so terms always land
Service support: all LLM services + Qwen-MT. Plain MT APIs (GTX, Google, DeepL(X), Azure, TranslateGemma) have no in-model term channel, so the glossary card is hidden for them.
Matching: case-insensitive (AI and ai both match). Glossary data lives in your browser's local storage and is never uploaded.
Long Text & Concurrency
Tuned for large documents and batch jobs:
- Concurrency control: customize request rate — max out paid APIs or throttle free ones to avoid bans
- Streaming for large files: chunk-based handling keeps the UI responsive
- Context-aware translation: subtitles and documents get sent with surrounding context so the AI understands flow
- Per-line retry: failed lines are tracked separately; the rest of the batch isn't blocked
Failed-Line Retry
LLMs occasionally drop a line, return an empty response, or break formatting. When that happens:
- Failed lines automatically fall back to the original text — never empty, so the output is always usable
- A red alert at the top of the result panel says how many lines failed
- Click Retry to reissue only those — completed content stays as-is and isn't re-billed
- A copy button lets you grab the failed source rows for manual handling elsewhere
Multi-language batch mode: when translating into multiple targets at once, if a whole target language fails (quota exhausted, model refusal, etc.), the failing language codes get aggregated into a dedicated panel. One-click copy them back into the "target languages" field to retry.
Network blips, 429 rate-limits, and 5xx errors retry automatically. Bad API key, timeouts, context-length-exceeded, and max_tokens truncation never retry. See FAQ → Will failed translations retry? for the full list.
Cancel Translation
Click the close button on the progress modal to abort a running batch. Already-translated lines are cached, so clicking "Translate" again resumes from where you stopped.
RTL Language Auto-Adaptation
Right-to-left languages (Arabic, Hebrew, Persian, Urdu) automatically render right-to-left in the textarea and result view — no manual configuration.
Usage Modes
Batch vs. Single-File
The tool switches modes based on what you upload:
- Batch mode (default): drop multiple files, they queue up automatically and download as a bundle when done.
- Single-file mode: upload one file or paste text — review line-by-line, edit before exporting.
Advanced settings let you lock to single-file mode if you prefer.
JSON Translate is single-file mode only.
One-Click Source/Target Swap
A ⇄ button sits between the source and target language dropdowns — click to swap them. The button greys out when the source is "Auto-detect" or when multi-language mode is on (you can't swap "auto" or against multiple targets).
Language Picker
122 languages are grouped by geography + speaker count (Common / Europe / Middle East / Central Asia / South Asia / Southeast Asia / Africa / Americas & Oceania) so you can find what you need fast. Multi-language mode adds four quick-preset buttons that merge-select common bundles:
- Global Top 10: the 10 most-spoken languages (English, Chinese, Spanish, French, Japanese, Portuguese, German, Russian, Hindi, Arabic)
- European mainstream: French, German, Italian, Spanish, Portuguese, Dutch, Polish, and other commercial European languages
- East Asian: Chinese (Simplified + Traditional), Japanese, Korean, Cantonese
- Indian subcontinent: Hindi, Bengali, Tamil, Marathi, Gujarati, and other major South Asian languages
In single-language mode the tool also remembers your last 5 picks and surfaces them in a "Recent" group at the top of the dropdown. The mobile layout collapses to a single column automatically.
API Connection Status
The badge at the top of the main page tells you the current API's status at a glance:
- Not configured / Needs config: URL or API key missing
- Configured: filled in but not yet tested
- Testing → ✓ Connected or Connection failed: test results
- Free API: free, no-config services (GTX / Edge / DeepLX)
Click the badge to jump to the API settings panel.
Presets: API Config, Prompts, and Glossaries, Separately
API configs, prompts, and glossaries are stored as three independent preset types so they combine freely:
- API presets: snapshot the current service's URL, key, model, temperature, etc. Useful for switching between local Ollama, a remote gateway, and a paid cloud endpoint.
- Prompt presets: snapshot the system + user prompts. Switch between a "strict terminology" prompt and a "creative paraphrase" prompt without touching the API config.
- Glossary presets: maintain separate term bases per show / project — see Glossary above.
All three types support add / load / rename / update / delete, and travel with settings import/export.
Post-Translation Cleanup
After translation, the tool can automatically apply simple string replacements:
- Character filtering: strip stray symbols like
♪ ♫from subtitles - Format cleanup: remove leftover HTML tags
This feature does plain string replacement only — no escape sequences (\n, \t etc.). Use the Text Splitter for richer transformations.
Advanced Settings
Settings Import/Export
One-click backup of every configuration: API credentials, model parameters, API presets, Prompt presets, glossaries. The exported JSON imports across devices, ideal for team sharing or moving to a new machine.
General Options
- Use Cache: enabled by default. Reads cached results when parameters match. Disable temporarily while debugging.
- Retry Count: maximum retries on failure. Bump it up on shaky networks or rate-limited free endpoints.
- Timeout: per-request timeout in seconds. Increase for slow models or long text. The Test Connection buttons use the same threshold - the test is never stricter than the translation it guards.
- Max Tokens (Custom OpenAI-compatible only): optional output cap to prevent local small models from getting stuck in repetition loops — 2048–4096 recommended for local models. Default 0 (unlimited). When a response truncates (
finish_reason=length), the line is marked failed and won't retry — same params would truncate again. Cloud LLMs rely on the server-side default and don't expose this control. - Remove characters after translation: auto-strip specified characters or fragments from results (e.g.,
♪in subtitles, leftover<i>tags). - Custom export filename: standardize filenames in batch exports. Placeholders:
{name}(source filename),{lang}(target language),{ext}(extension),{date},{time}. Example:{name}_{lang}_{date}.{ext}.
API Parameter Tuning
Chunk Size
Non-LLM APIs (Google / Azure) split long text into chunks before sending. Chunk size is the per-chunk character cap. Common limits:
⚠️ Google Translate Web breaks line breaks, so chunking is disabled there.
Delay (ms)
The cooldown between chunked requests. Increase on poor networks or free APIs. For example, Azure Translate Free Tier works best at 5000 ms or higher.
Concurrent Lines
The max number of lines translated in parallel. The default is already tuned per service — free APIs run fast, paid APIs default to safe. Bump it up if you want speed, drop it on 429 rate-limits, otherwise leave it alone.
Defaults per service (reference)
- GTX (Free): translates in batched chunks (~5000 chars/block), not line-by-line concurrency
- Edge (Free): 100
- Commercial MT (Google / Azure / DeepL): 20-100
- Cloud LLMs (Claude / Gemini / OpenAI / Qwen, etc.): 20
- Custom local LLM / TranslateGemma / DeepLX: 10
Context Mode Concurrency
- Default: 3 (1 for some providers)
- What it does: in context-aware mode, how many "target lines" go into a single request. Each request also carries the surrounding context.
- Trade-off: larger values give higher throughput but force the model to emit more lines per response, raising the chance of formatting drift. Smaller values are steadier but use more requests. Stick with 3 for documents/subtitles, push to 5 for plain text.
Context Lines
How many surrounding lines accompany each batch sent to the model — more = more coherent, but heavier requests. The default is tuned, normally no need to change. Drop it if you hit "context length exceeded"; raise it for better dialogue flow.
Defaults per service (reference)
- Cloud LLMs (Claude / Gemini / Nvidia / Azure OpenAI): 100
- Custom local LLM: 30 (models under 14B tend to drop lines in long batches, hence the smaller default)

