As part of their accessibility drive, Canonical have revealed Myna, their new in-development speech to text AI system for Ubuntu Linux.
In a post written by Jean Baptiste Lallement of the Canonical Desktop Team, they mention how "Speech recognition has become a common feature on modern platforms, and we think it should be a first-class experience on Ubuntu Desktop as well" with a privacy-first design.
For the upcoming Ubuntu 26.10 their current aim is just to get reliable desktop dictation. So you press a key, speak and the text will appear on-screen. How? They said it currently "uses speech recognition models running locally on your machine" with the initial release targeting Ubuntu Desktop on Wayland with GNOME. Other desktop environment support is to come sometime later. More advanced features like voice assistants, voice commands, desktop control, translation and automatic language detection are to come later too once this basic first step is ready.
As for the privacy side of it they outlined these points:
- While it is not restricted to local models, the initial implementation prioritizes speech recognition running locally on your machine.
- No internet connection is required once the necessary models are installed.
- The microphone is only accessed when you explicitly activate dictation.
- Audio is processed in memory and discarded after use.
- No audio recordings are uploaded to external services.
So far it seems they've only released the specifications and architecture documents as open source on GitHub.
Accessibility features like this are one area where AI and LLMs could actually be properly useful.
See more on the Ubuntu Discourse forum post.
I'm sure the anti-canonical crowd will find something to moan about though. Probably point at some other obscure project and whine "why didn't they just contribute to this??? <outrage>".
So far it seems they've only released the specifications and architecture documents as open source on [GitHub](https://github.com/canonical/myna).As long as they release the full thing under open source, all good. Accessibility options are very important, and a must if we really want Linux to get more mainstream.
Quoting: scaineI'm massively anti-genAI and even I can see that this is a useful, targeted use of the technology. Not that I'll likely ever use it - only CEO's seem to think that people want to talk to their devices, like Star Trek. But it's an amazing accessibility feature and local models, assuming the model training is ethical, is the way to go.Honestly, I'm not even sure if this is GenAI... this seems more akin to a finely tuned ML model, like how us filling out captchas trains road safety shit like speed cameras. If this is GenAI, then fuck them. Local or not. I always say that until AI is ethical, any possibly good use of it is completely moot. But attaching AI to this seems like it would be a waste of time compared to just regular machine learning methods that have existed for decades at this point.
I'm sure the anti-canonical crowd will find something to moan about though. Probably point at some other obscure project and whine "why didn't they just contribute to this??? <outrage>".
Last edited by AllyTheProtogen on 18 Jun 2026 at 5:36 pm UTC
Quoting: tmtvlIt's also interesting because people can speak faster than they can typeThe thing is, most info people put into computers nowadays isn't data entry. I remember when typing stuff into the computer so that the information, which started off not in a computer, would now be in it, was a huge deal. If you could have talked that information instead, that would have been good. But now all the information is in the computer to start with, you're just sending it between different files or different computers.
So nowadays normally, if people are typing, they're composing--and most people can type faster than they can think. And I would expect editing to be easier with keyboard than speech. So, mostly an accessibility thing.
Last edited by Purple Library Guy on 18 Jun 2026 at 6:44 pm UTC
Quoting: AllyTheProtogenHonestly, I'm not even sure if this is GenAI...You input text, sound comes out. Seems to fall pretty tightly within the GenAI scope. Much tighter application though, but still.



