Ghostboard pixel

Canonical's New AI Tool Wants You to Talk to Ubuntu Instead of Type

Myna runs entirely on local hardware and is set to debut with Ubuntu 26.10.
Warp Terminal

In April, Jon Seager of Canonical laid out the company's plan for handling AI in Ubuntu. The framework split things into two groups, implicit AI that quietly improves what you already use and explicit AI that are features you'd actually summon on purpose.

Back then, Jon gave speech-to-text and text-to-speech as one of the examples of what an implicit feature could look like. Weeks later, one piece of that puzzle has materialized in the form of Myna.

While the tool is early in the development cycle, it is set to debut with Ubuntu 26.10, due out in October.

AI-powered accessibility begins

Jean-Baptiste Lallement, Canonical's Director of Engineering for Ubuntu Desktop, posted the announcement, saying that voice dictation has become a common feature across modern platforms.

For Ubuntu 26.10, the initial version of Myna is expected to be a desktop dictation tool built around GNOME on Wayland with a push-to-talk mechanism gatekeeping when your microphone accepts input.

Using it means holding a hotkey, speaking, and letting go. A small activity indicator shows while it is listening, and the transcribed text lands wherever the cursor was sitting when dictation started.

How will it work?

a complex diagram depicting the system architecture of myna is shown here, i suggest using an ocr tool to understand it
Source: Canonical

Recognition itself happens inside a sandboxed component called the Canonical Inference Snap, while a Speech Orchestrator manages the session and an Audio Adapter handles whatever the microphone picks up, denoising and chunking it before it ever reaches the model.

The snap is meant to carry speech models in three sizes, lightweight, default, and quality, along with a runtime to match whatever hardware is being used to run Myna. May it be an NVIDIA GPU, an Intel NPU, or just a CPU.

And before you yell, "my data would be sent to cloud servers!" know that speech recognition will happen locally, and an internet connection is not needed once the appropriate model is installed.

Moreover, text only appears once it is finalized, so you won't see half-formed words flicker the way some assistants show live captions. The audio data won't be sticking around either, being stored in a small in-memory buffer that gets discarded the moment the session ends.

Features like dictation into password fields, wake words, continuous listening, voice assistants, voice commands, translation, speaker identification, and automatic language detection are all off the table.

The fine print

None of this is locked in yet. The GitHub repository holds nothing more than a license, a README, and a folder for the documentation and architecture specs.

And, going by how past features have landed on interim Ubuntu releases, we could see Myna show up in the daily builds of Ubuntu 26.10 in the coming weeks.

You should also know that Canonical is looking for feedback before the specs for Myna are finalized, especially from people who already rely on dictation or assistive tools on Linux.

About the author
Sourav Rudra

Sourav Rudra

A nerd with a passion for open source software, custom PC builds, motorsports, and exploring the endless possibilities of this world.

Become a Better Linux User

With the FOSS Weekly Newsletter, you learn useful Linux tips, discover applications, explore new distros and stay updated with the latest from Linux world

itsfoss happy penguin

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to It's FOSS.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.