As part of their accessibility drive, Canonical have revealed Myna, their new in-development speech to text AI system for Ubuntu Linux.
In a post written by Jean Baptiste Lallement of the Canonical Desktop Team, they mention how "Speech recognition has become a common feature on modern platforms, and we think it should be a first-class experience on Ubuntu Desktop as well" with a privacy-first design.
For the upcoming Ubuntu 26.10 their current aim is just to get reliable desktop dictation. So you press a key, speak and the text will appear on-screen. How? They said it currently "uses speech recognition models running locally on your machine" with the initial release targeting Ubuntu Desktop on Wayland with GNOME. Other desktop environment support is to come sometime later. More advanced features like voice assistants, voice commands, desktop control, translation and automatic language detection are to come later too once this basic first step is ready.
As for the privacy side of it they outlined these points:
- While it is not restricted to local models, the initial implementation prioritizes speech recognition running locally on your machine.
- No internet connection is required once the necessary models are installed.
- The microphone is only accessed when you explicitly activate dictation.
- Audio is processed in memory and discarded after use.
- No audio recordings are uploaded to external services.
So far it seems they've only released the specifications and architecture documents as open source on GitHub.
Accessibility features like this are one area where AI and LLMs could actually be properly useful.
See more on the Ubuntu Discourse forum post.
I'm sure the anti-canonical crowd will find something to moan about though. Probably point at some other obscure project and whine "why didn't they just contribute to this??? <outrage>".
So far it seems they've only released the specifications and architecture documents as open source on [GitHub](https://github.com/canonical/myna).As long as they release the full thing under open source, all good. Accessibility options are very important, and a must if we really want Linux to get more mainstream.
Quoting: scaineI'm massively anti-genAI and even I can see that this is a useful, targeted use of the technology. Not that I'll likely ever use it - only CEO's seem to think that people want to talk to their devices, like Star Trek. But it's an amazing accessibility feature and local models, assuming the model training is ethical, is the way to go.Honestly, I'm not even sure if this is GenAI... this seems more akin to a finely tuned ML model, like how us filling out captchas trains road safety shit like speed cameras. If this is GenAI, then fuck them. Local or not. I always say that until AI is ethical, any possibly good use of it is completely moot. But attaching AI to this seems like it would be a waste of time compared to just regular machine learning methods that have existed for decades at this point.
I'm sure the anti-canonical crowd will find something to moan about though. Probably point at some other obscure project and whine "why didn't they just contribute to this??? <outrage>".
Last edited by AllyTheProtogen on 18 Jun 2026 at 5:36 pm UTC
Quoting: tmtvlIt's also interesting because people can speak faster than they can typeThe thing is, most info people put into computers nowadays isn't data entry. I remember when typing stuff into the computer so that the information, which started off not in a computer, would now be in it, was a huge deal. If you could have talked that information instead, that would have been good. But now all the information is in the computer to start with, you're just sending it between different files or different computers.
So nowadays normally, if people are typing, they're composing--and most people can type faster than they can think. And I would expect editing to be easier with keyboard than speech. So, mostly an accessibility thing.
Last edited by Purple Library Guy on 18 Jun 2026 at 6:44 pm UTC
Quoting: AllyTheProtogenHonestly, I'm not even sure if this is GenAI...You input text, sound comes out. Seems to fall pretty tightly within the GenAI scope. Much tighter application though, but still.
Quoting: EhvisIt's the other way around: speech to text. I.e. you talk to the machine and text getsput in.Quoting: AllyTheProtogenHonestly, I'm not even sure if this is GenAI...You input text, sound comes out. Seems to fall pretty tightly within the GenAI scope. Much tighter application though, but still.
Either direction is pretty ancient technology from long before the current wave of "ai" and most certainly not "generative".
Quoting: tmtvlIt's also interesting because people can speak faster than they can type (stenography excluded, because it was specifically made for the purpose of letting typists keep up with speech).Depending on your use case, that’s not necessarily a huge argument in favor of using automatic transcription, because editing the inevitable errors takes a substantial amount of time that wouldn’t be needed if you just typed without errors, other than the occasional typo, at a slightly slower rate. A similar principle applies to stenography, by the way: stenographers (or their colleagues) spend considerable time reviewing and editing their work after the fact, because they often need to take shortcuts or find workarounds for terms not in their steno dictionary, and can’t necessarily make quick corrections in real time like you would when typing (not that it’s not technically possible, IIRC, but because of time constraints).
And like Purple Library Guy mentioned, you only save time if you have a really solid idea of what you need to write down. For composing, or in my case translating, I can continue to think while I type and long pauses mid-sentence don’t impact the output, nor do I have to redo a whole sentence if I change my mind about a word or two. For immediate, immutable stuff like live captioning, though? Phenomenal tool as long as the accuracy is acceptable.
So I would agree with PLG that it’s mostly a boon for accessibility reasons unless your use case has a significant tolerance for errors and/or you are a very slow typist for reasons other than disability. It also comes with a number of practical downsides that can make one reconsider (privacy, confidentiality or annoyance if working in a shared environment, inability to listen to audio or have people talking nearby without getting noise in the transcription, inability to improvise spellings for unrecognized words, etc.).
It’s also pretty resource-intensive. I tried Speech Note on my laptop a year or two ago and it ran so slowly that I could have typed faster even at a lower-than-average WPM. On my desktop, utilizing my gaming graphics card, it runs decently smoothly but I still feel like I need to let it catch up after every sentence (and fix at least the worst errors). From what I remember, more traditional approaches like Dragon Naturally Speaking weren’t as reliant on processing power, but used significant (for the time) amounts of RAM. Interestingly, I remember DNS being more grammatically accurate, better at handling specific punctuation and better able to handle large volumes of input without pause, but the LLM approach is better able to guess your intent when your speech isn’t clear… for better or for worse. (I’ve gotten some incredible outputs from sneezing or coughing without muting.)
Quoting: EhvisThis is the opposite: speech to text (transcription), not text to speech (speech synthesis). And it has existed much longer than generative AI, but I can’t claim to fully understand the differences between neural networks, machine learning, large language models or generative AI, so I don’t know exactly which categories this falls under.Quoting: AllyTheProtogenHonestly, I'm not even sure if this is GenAI...You input text, sound comes out. Seems to fall pretty tightly within the GenAI scope. Much tighter application though, but still.
Last edited by Salvatos on 19 Jun 2026 at 12:05 am UTC
Quoting: emphyEither direction is pretty ancient technology from long before the current wave of "ai" and most certainly not "generative".I worked at IBM in 1996 at a datacenter in CT and we used OS/2 Warp on the desktop, connecting to mainframes in virtual terminal sessions. Since I used Warp as my daily driver, I enjoyed the setup. One day - remember, this was 1996 - one of the friendlier IBMers brought me to a room in the research area and showed me the new OS/2 4... code-named “Merlin”. Not only would it take dictation, but you could issue voice commands to control the system!
I was sold. In my best Capt. Jean-Luc Picard voice, "Computer! Open... drive... C. Open... folder... Documents... Open... file... passwords.txt." Lol! He told me, "You don't have to say 'Computer'.
"But I want to!"
"It doesn't work that way," he explained patiently.
"I don't care. It feels cool."
In 1996. And they make a big deal about it now, thirty years later. But, I have to say, I'm impressed with the progress with things like Claude Code, etc. AFAIK, they weren’t doing that in 1996.



