1. Overview
On January 15, 2026, Google announced "TranslateGemma," an open-source translation model. Built on Gemma 3, it supports translation across 55 languages. A notable characteristic is its ability to deliver high-accuracy translation even when running on a local device, compared to conventional translation models.
For running large language models in a local environment, the author uses Ollama. Ollama is a tool and runtime designed to make it easy to run and manage large language models (LLMs) locally — one command handles model retrieval, startup, and inference. By using quantized models, practical inference is achievable even on ordinary PCs without high-end GPUs. It allows LLM usage with privacy protection and offline operation in mind, while keeping setup overhead low, making it a useful option for researchers and engineers. The author also uses Open WebUI as a frontend, configured so that LLMs are accessible through a browser-based chat interface. This setup supports switching between models in a conversational workflow, and connects to backends such as Ollama and OpenAI-compatible APIs.
The author's local hardware is a 16-inch MacBook Pro (M2 Max, 64 GB unified memory). Apple Silicon Macs use a unified memory architecture in which the CPU and GPU share memory; roughly 10 GB is used on the CPU side, and the remainder can be allocated for GPU use. In a 64 GB configuration, approximately 48–54 GB is available for GPU computation. Providing 48 GB of GPU memory in a typical Intel/AMD-based PC is not straightforward. The author also owns a gaming PC with an AMD Ryzen 9 9950X3D and NVIDIA GeForce RTX 5080, but despite the high overall system price, its GPU memory is limited to 16 GB, which is not necessarily advantageous for local LLM use.
The performance of generative AI improved dramatically throughout 2025. The early public release of ChatGPT 3.5 was prone to hallucinations, with a strong tendency to generate misinformation in a plausible tone, leaving entertainment value noticeably more prominent than practical utility. At present, the author primarily uses Anthropic Claude, OpenAI ChatGPT, and GitHub Copilot as cloud-based AI tools, and finds that the range of situations in which they can be trusted has grown substantially.
Meanwhile, from late 2025 into early 2026, on-premises generative AI also reached a practical level. This is likely attributable to techniques such as distillation and quantization enabling smaller models to achieve performance comparable to their larger predecessors. In fact, some models show equivalent performance with roughly half the parameter count of the previous generation. That said, smaller models still exhibit noticeably more hallucinations, with behavior reminiscent of early LLMs. Hallucinations appear easier to suppress in use cases that are constrained to a specific set of documents, and on-premises AI is particularly well suited when handling information you do not want to send externally. For use cases involving web search or queries requiring broad knowledge, cloud AI remains more appropriate, and selecting based on the task at hand is necessary.
Translation is a task that frequently involves text you do not want to share externally and that is self-contained within the document being translated — making it a strong fit for on-premises AI. The author therefore tried TranslateGemma. Multiple parameter sizes are available, but this article uses the 12B model. Translation quality was generally good, and practical performance was achieved even with local execution. However, TranslateGemma requires strictly following a prescribed prompt, and pasting the prompt and target text into Open WebUI every time is cumbersome.
To reduce this friction, the author wrote a Bash script and configured it to be invoked from Alfred via a shortcut. If text is selected, the selection is used as input; otherwise, the clipboard contents are translated. The script automatically detects the language — Japanese to English, or English to Japanese — and translates accordingly. Since it runs locally and returns results instantly, the experience is highly comfortable.
This article describes how to implement a translation script using TranslateGemma, and explains the steps to build an environment — using Alfred's Workflow feature — that can translate selected text instantly via a shortcut or keyword.