Canonical reveal Myna, a speech to text system for Ubuntu Linux

18 Jun 2026 at 1:12 pm UTC | Last updated: 18 Jun 2026 at 3:29 pm UTC
By Liam Squires-Hand

As part of their accessibility drive, Canonical have revealed Myna, their new in-development speech to text AI system for Ubuntu Linux.

In a post written by Jean Baptiste Lallement of the Canonical Desktop Team, they mention how "Speech recognition has become a common feature on modern platforms, and we think it should be a first-class experience on Ubuntu Desktop as well" with a privacy-first design.

For the upcoming Ubuntu 26.10 their current aim is just to get reliable desktop dictation. So you press a key, speak and the text will appear on-screen. How? They said it currently "uses speech recognition models running locally on your machine" with the initial release targeting Ubuntu Desktop on Wayland with GNOME. Other desktop environment support is to come sometime later. More advanced features like voice assistants, voice commands, desktop control, translation and automatic language detection are to come later too once this basic first step is ready.

As for the privacy side of it they outlined these points:

While it is not restricted to local models, the initial implementation prioritizes speech recognition running locally on your machine.

No internet connection is required once the necessary models are installed.

The microphone is only accessed when you explicitly activate dictation.

Audio is processed in memory and discarded after use.

No audio recordings are uploaded to external services.

So far it seems they've only released the specifications and architecture documents as open source on GitHub.

Accessibility features like this are one area where AI and LLMs could actually be properly useful.

See more on the Ubuntu Discourse forum post.

Article taken from GamingOnLinux.com.

Tags: AI, Distro News, Misc, Ubuntu

10 Likes

About the author - Liam Squires-Hand

I am the owner of GamingOnLinux. After discovering Linux back in the days of Mandrake in 2003, I constantly checked on the progress of Linux until Ubuntu appeared on the scene and it helped me to really love it. You can reach me easily by emailing GamingOnLinux directly. You can follow me personally on Mastodon [External Link].
See more from me

Some you may have missed, popular articles from the last month:

Valve attempt to get the New York lootbox lawsuit dismissed as "People enjoy surprises"

Steam Survey for May 2026 is out - Linux down at 3.99% but still above macOS

ARC Raiders gets Denuvo Anti-Cheat, a new trader, a new weapon and more

Drill, craft, and blast your way through the demo of Veridian Expanse

All posts need to follow our rules. Please hit the Report Flag icon on any post that breaks the rules or contains illegal / harmful content. Readers can also email us for any issues or concerns.

11 comments

voytrekk 18 hours ago

Flag

Seems like a positive feature that hopefully everyone on Linux can benefit from in the future. I"m curious as to how they are dealing with some of the issues with Wayland's global input captures.

4 Likes 👍 (4) See more...

Ehvis 17 hours ago

Flag

Supporter Plus

View PC info

Using a local models is good. I can't say if this is actually the most efficient way of doing it, but at least this application is useful and far less likely to generate BS. If more people realised that most non-professional tasks can actually run on free self hosted models, then the floor might drop out from under the biggest AI nonsense quicker.

5 Likes 👍 (5) See more...

scaine 17 hours ago

Flag

Contributing Editor
Mega Supporter

View PC info

I'm massively anti-genAI and even I can see that this is a useful, targeted use of the technology. Not that I'll likely ever use it - only CEO's seem to think that people want to talk to their devices, like Star Trek. But it's an amazing accessibility feature and local models, assuming the model training is ethical, is the way to go.

I'm sure the anti-canonical crowd will find something to moan about though. Probably point at some other obscure project and whine "why didn't they just contribute to this??? <outrage>".

10 Likes 👍 (10) See more...

Arehandoro 16 hours ago

Flag

Supporter

View PC info

So far it seems they've only released the specifications and architecture documents as open source on [GitHub](https://github.com/canonical/myna).

As long as they release the full thing under open source, all good. Accessibility options are very important, and a must if we really want Linux to get more mainstream.

7 Likes 👍 (7) See more...

tmtvl 16 hours ago

Flag

View PC info

Considering Talon (the current state-of-the-art STT) is proprietary and the developer/maintainer has been vocal about not wanting to support Wayland, this is something I can really get behind. It's also interesting because people can speak faster than they can type (stenography excluded, because it was specifically made for the purpose of letting typists keep up with speech).

3 Likes 👍 (3) See more...

AllyTheProtogen 14 hours ago

Flag

View PC info

Quoting: scaineI'm massively anti-genAI and even I can see that this is a useful, targeted use of the technology. Not that I'll likely ever use it - only CEO's seem to think that people want to talk to their devices, like Star Trek. But it's an amazing accessibility feature and local models, assuming the model training is ethical, is the way to go.

I'm sure the anti-canonical crowd will find something to moan about though. Probably point at some other obscure project and whine "why didn't they just contribute to this??? <outrage>".

Honestly, I'm not even sure if this is GenAI... this seems more akin to a finely tuned ML model, like how us filling out captchas trains road safety shit like speed cameras. If this is GenAI, then fuck them. Local or not. I always say that until AI is ethical, any possibly good use of it is completely moot. But attaching AI to this seems like it would be a waste of time compared to just regular machine learning methods that have existed for decades at this point.

Last edited by AllyTheProtogen on 18 Jun 2026 at 5:36 pm UTC

0 Likes

Purple Library Guy 13 hours ago

Flag

Quoting: tmtvlIt's also interesting because people can speak faster than they can type

The thing is, most info people put into computers nowadays isn't data entry. I remember when typing stuff into the computer so that the information, which started off not in a computer, would now be in it, was a huge deal. If you could have talked that information instead, that would have been good. But now all the information is in the computer to start with, you're just sending it between different files or different computers.

So nowadays normally, if people are typing, they're composing--and most people can type faster than they can think. And I would expect editing to be easier with keyboard than speech. So, mostly an accessibility thing.

Last edited by Purple Library Guy on 18 Jun 2026 at 6:44 pm UTC

1 Likes 👍 (1) See more...

Ehvis 12 hours ago

Flag

Supporter Plus

View PC info

Quoting: AllyTheProtogenHonestly, I'm not even sure if this is GenAI...

You input text, sound comes out. Seems to fall pretty tightly within the GenAI scope. Much tighter application though, but still.

0 Likes

emphy 8 hours ago

Flag

Quoting: Ehvis
Quoting: AllyTheProtogenHonestly, I'm not even sure if this is GenAI...
You input text, sound comes out. Seems to fall pretty tightly within the GenAI scope. Much tighter application though, but still.

It's the other way around: speech to text. I.e. you talk to the machine and text getsput in.

Either direction is pretty ancient technology from long before the current wave of "ai" and most certainly not "generative".

2 Likes 👍 (2) See more...

Salvatos 7 hours ago

Flag

Quoting: tmtvlIt's also interesting because people can speak faster than they can type (stenography excluded, because it was specifically made for the purpose of letting typists keep up with speech).

Depending on your use case, that’s not necessarily a huge argument in favor of using automatic transcription, because editing the inevitable errors takes a substantial amount of time that wouldn’t be needed if you just typed without errors, other than the occasional typo, at a slightly slower rate. A similar principle applies to stenography, by the way: stenographers (or their colleagues) spend considerable time reviewing and editing their work after the fact, because they often need to take shortcuts or find workarounds for terms not in their steno dictionary, and can’t necessarily make quick corrections in real time like you would when typing (not that it’s not technically possible, IIRC, but because of time constraints).

And like Purple Library Guy mentioned, you only save time if you have a really solid idea of what you need to write down. For composing, or in my case translating, I can continue to think while I type and long pauses mid-sentence don’t impact the output, nor do I have to redo a whole sentence if I change my mind about a word or two. For immediate, immutable stuff like live captioning, though? Phenomenal tool as long as the accuracy is acceptable.

So I would agree with PLG that it’s mostly a boon for accessibility reasons unless your use case has a significant tolerance for errors and/or you are a very slow typist for reasons other than disability. It also comes with a number of practical downsides that can make one reconsider (privacy, confidentiality or annoyance if working in a shared environment, inability to listen to audio or have people talking nearby without getting noise in the transcription, inability to improvise spellings for unrecognized words, etc.).

It’s also pretty resource-intensive. I tried Speech Note on my laptop a year or two ago and it ran so slowly that I could have typed faster even at a lower-than-average WPM. On my desktop, utilizing my gaming graphics card, it runs decently smoothly but I still feel like I need to let it catch up after every sentence (and fix at least the worst errors). From what I remember, more traditional approaches like Dragon Naturally Speaking weren’t as reliant on processing power, but used significant (for the time) amounts of RAM. Interestingly, I remember DNS being more grammatically accurate, better at handling specific punctuation and better able to handle large volumes of input without pause, but the LLM approach is better able to guess your intent when your speech isn’t clear… for better or for worse. (I’ve gotten some incredible outputs from sneezing or coughing without muting.)

Quoting: Ehvis
Quoting: AllyTheProtogenHonestly, I'm not even sure if this is GenAI...
You input text, sound comes out. Seems to fall pretty tightly within the GenAI scope. Much tighter application though, but still.

This is the opposite: speech to text (transcription), not text to speech (speech synthesis). And it has existed much longer than generative AI, but I can’t claim to fully understand the differences between neural networks, machine learning, large language models or generative AI, so I don’t know exactly which categories this falls under.

Last edited by Salvatos on 19 Jun 2026 at 12:05 am UTC

0 Likes

Harry Haller 5 hours ago

Flag

New User

Quoting: emphyEither direction is pretty ancient technology from long before the current wave of "ai" and most certainly not "generative".

I worked at IBM in 1996 at a datacenter in CT and we used OS/2 Warp on the desktop, connecting to mainframes in virtual terminal sessions. Since I used Warp as my daily driver, I enjoyed the setup. One day - remember, this was 1996 - one of the friendlier IBMers brought me to a room in the research area and showed me the new OS/2 4... code-named “Merlin”. Not only would it take dictation, but you could issue voice commands to control the system!

I was sold. In my best Capt. Jean-Luc Picard voice, "Computer! Open... drive... C. Open... folder... Documents... Open... file... passwords.txt." Lol! He told me, "You don't have to say 'Computer'.

"But I want to!"

"It doesn't work that way," he explained patiently.

"I don't care. It feels cool."

In 1996. And they make a big deal about it now, thirty years later. But, I have to say, I'm impressed with the progress with things like Claude Code, etc. AFAIK, they weren’t doing that in 1996.

1 Likes 👍 (1) See more...

While you're here, please consider supporting GamingOnLinux on:

Reward Tiers: Patreon. Plain Donations:

PayPal.

This ensures all of our main content remains totally free for everyone! Patreon supporters can also remove all adverts and sponsors! Supporting us helps bring good, fresh content. Without your continued support, we simply could not continue!

You can find even more ways to support us on this dedicated page any time. If you already are, thank you!