Viewing comments for - 3zekiel

Latest Comments by 3zekiel

An interview with Ken VanDine, Ubuntu desktop lead at Canonical
26 May 2022 at 3:10 pm UTC

Quoting: Eike
Quoting: scaineb) I wondered if it was just Poettering-hate.
That was my impression as well.

And while I can't say much about this person, in general I'm of the opinion that social competence does matter, not only technical one, as open source should be about working together.

While social competence does matter, having a certain quantity of out-of-the-box / potentially slightly authoritarian (within reason) persons can be useful. Those personalities tend to be much better at going against the stream, and push more ambitious projects (such as pulseaudio). It's all a matter of balance though, the person can not be a pure asshole either, but being socially awkward, very (too) sure of themselves and slightly authoritarian can actually be useful in some scenario. You also can not only have persons like that, just a few, with persons used to them as buffers usually. At least that's from my experience.

View this comment - View article - View full comments

Canonical going 'all in' on gaming for Ubuntu, new Steam Snap package in testing
1 May 2022 at 1:06 pm UTC Likes: 1

Quoting: Tuxee
Quoting: 3zekielWhy, oh why ... The flatpak'd steam has been around for a couple years already, with flatpak 1.11 and up it is now fully usable, so why would they go and add their snap now ? Why not just update their flatpak in base distro, and use that ? SteamOS/Steam deck is also going all in on flatpak. So why can´t they just learn to give up ? They iterated over and over again in that canonical cycle (upstart, unity ...)
while(is_alive(canonical)) { do_somthing_on_our_own(rand()); try_to_shove_it_everywhere(); see_that_everyone_else_is_using_smthg_else(); push_on(); give_up(); leave_an_ugly_mess_for_others_to_clean_up(); }
Maybe it would be time to break out of that loop.
You should be more precise: upstart was introduced by Canonical in 2006 - years before systemd was even a thing. At some point even Fedora used it.
Snap intends to do (quite) the same thing as flatpak but has its advantages and disadvantages. And snap is Canonical's thing. Just as quite a few other technologies are Red Hat's thing (though Lennart Poettering frequently takes the blame and not his employee). Snap was introduced in 2015/2016 pretty much at exactly the same time as flatpak. The situation is NOT that there was flatpak and THEN Canonical decided to do their own thing. It has been pretty much the same situation with Mir vs. Wayland or Unity vs. Gnome Shell. (Also I am not aware who these others are, who have to clean up the mess - when they ditched Unity... there was nothing to "clean up".) Also: These decisions are obviously not rand(), but they seem to address pressing problems. Because otherwise there wouldn't be competing solutions emerging at pretty much the same time.

I wonder how ppl would have dealt with the deb-vs-rpm situation if social media would have been a thing back in the days...

For the "do_something_on_your_own" I indeed do not imply that there is already a competitor in itself. Indeed, upstart came first, and snap more or less at the same time.
My meaning is more that they always do it on their own, there's hardly ever a community going around, they rarely, if ever, involve other distributions, and so on and so forth. Snap is the pinnacle of that, where the server side is even proprietary and fully centralized to them - I think they alleviated some of that, but not sure at all, and it clearly wasn't used by anyone -.
For Unity, indeed it was fairly clean. For upstart, you still have to support their service model, and/or skim through tutorials proposing their solutions, with the compatibility in the hand of others. For Ubuntu phone it's now community maintained. So is the original Mir for those that actually still need it. I agree it might not be the absolute worst. For snap though, I expect we'll lose a lot of apps once they close the service.
For upstart, even when systemd clearly came out as the winner, both technically and community wise, they still tried to shove it for quite some time. Mir ? same. For snap, they are doing the same now too. Flatpak is more widely adopted (in distros), is getting pushed by Valve too, has quite a few technical advantages (better deduplication, fully open, clearer source and runtime management) for it relevant purpose (i.e. installing a kernel via snap is not an advantage, as it's by definition fully unconfined, if anything having grey areas in term of what's sandboxed and what is not is more disadvantage to me) and is overall gaining traction. But no, instead of contributing to the already existing solution - steam flatpak here - they decided to shove their own stuff once again.

As for RPM vs deb, well it was a heated topic at that time already...

View this comment - View article - View full comments

Canonical going 'all in' on gaming for Ubuntu, new Steam Snap package in testing
1 May 2022 at 10:15 am UTC

Why, oh why ... The flatpak'd steam has been around for a couple years already, with flatpak 1.11 and up it is now fully usable, so why would they go and add their snap now ? Why not just update their flatpak in base distro, and use that ? SteamOS/Steam deck is also going all in on flatpak. So why can´t they just learn to give up ? They iterated over and over again in that canonical cycle (upstart, unity ...)

 

while(is_alive(canonical))

{

    do_somthing_on_our_own(rand());

    try_to_shove_it_everywhere();

    see_that_everyone_else_is_using_smthg_else();

    push_on();

    give_up();

    leave_an_ugly_mess_for_others_to_clean_up();

}

Maybe it would be time to break out of that loop.

View this comment - View article - View full comments

box86 and box64 get Steam Play Proton working much better on Arm devices
19 Apr 2022 at 7:31 pm UTC

Quoting: elmapul
Quoting: 3zekielMy thought is that the video is confusing power efficiency and performance. I will answer points by points and try to explain.
"nope, the guy really knows his stuff, he even made a video to talk about trade offs.
often you exchange processing power with energy efficiency, size, etc"
he said other examples instead of etc.

OK, then my misunderstanding.

Quoting: elmapulhe knows that often one tech is not better than other, its just better at an specific thing.
more often than not.

Yup, it kinda summarizes it all. In the case of ISA, you could also say that the answer is often in the middle ... A pure RISC, except in constrained cases, isn't going to cut it very far. And a pure CISC (as in, CPUs that desperately try to implement every last special cases, super complex "zero overhead" whatever instruction) is going to be inefficient as hell - I did say that a decoder, in the case of x64 can withstand useless instructions, but as you imagine, that's only true to a point. Then depending on your use case you will also want to take from more esoteric approaches (DSP stuff that has a whole vector manipulation lib in HW as an example)
And at the very end, unless you completely screwd up your ISA (which is rare considering the guys that design ISAs are usually good at what they do), once you go higher in power, the backend is going to count more and eventually dominate (I'd say in the 10~15w+ scenario, with modern lithographies, already, you won't see that much difference anymore).

I was also reacting because I see a lot of mixups since M1 chip came out between what is ARM and what is Apple. M1 chip is insane, but it has little to do with ARM in fact. The guys at Apple did an insane work on everything around the core, like crazy interconnects that can be exposed outside the chip so as to be able to basically stack chips, ram, and everything you need right on the die - which brings insane advantages in term of performance and scalability. They also extended the ARM ISA a fair bit, and cut parts here and there to make it more efficient (that makes it not very interoperable though ...), and especially better at emulation. And they are also helped by having the best available lithography out there (and an insane load of cash to pay themselves 600mm squared dies ...).
And I do see a lot of trashing on x64 here and there (visibly not your guy), either because they mixup the low power side, at which x64 does suck, and the whole area of computing - including higher power computing at which x64 is suddenly much better.

ARM wise, they actually added some very cisc-y stuff lately, in particular for matrix manipulation, but they made the choice of keeping pure 32 bit size instructions, I am actually curious why they did that, as it bound their hand on multiple things: register count, had to drop some instructions to liberate some space ... Well, I guess they did have their reasons, just very curious what they are. They used to have dual mode (compressed 2 byte, and full 4 bytes) which I personally have found very useful too, but dropped it, likely due to lack of opcode width. But overall it's a good arch for embedded (I include phones in that)/ specialized use cases. More excited about RISC-V though, but mostly due to its openness :)

View this comment - View article - View full comments

box86 and box64 get Steam Play Proton working much better on Arm devices
19 Apr 2022 at 10:55 am UTC Likes: 3

Quoting: elmapuli saw an video explaining it, and it was quite the opposite!
arm is better to emulate x86 than x86 to emulate arm!
the video is in portuguese so i'm not sure its gonna be usefull here, but te explanation was something like:

Hmm I do not speak Portugese, but I did work on that quite a lot, and it goes completely against the benchmarks I did (and hell I did a lot). My thought is that the video is confusing power efficiency and performance. I will answer points by points and try to explain.

Quoting: elmapulyou can draw an square by drawing 4 lines, but you waste a lot of processing power if you have to draw an entire window with 1px of width every time you want an vertical line, and an entire window with 1px of height every time you want an horizontal line.

That's true, but in the end, what you do with memory accesses, vector computation is fairly predictable and standard, so what the modern CISCs do is that they concentrate on packing those operations. So the cases where you overwork will be rare. So the result is that you end up with more compact instructions, which are more cache efficient, and potentially giving more context to the backend hw optimizer - allowing it to perform better.
If we're talking old i386 instructions then yeah, that would be a valid point, but not on modern x64.

Btw, ARM has had some CISC sides for years now, be it the way it handle register save and restore, predicated instruction, some level of offseted load. I did not look at the most recent ISAs, but I would bet it got more CISC-y rather than less. In the end, when you go for performance, you hardly have nay choice.

Quoting: elmapulx86 complex instruction set is only usefull when most of those instructions get used often, but that simply is not the case, many instructions were put there to cheat on benchmarks or because hardware patents dont last forever and intel priorities were at not being copied instead of designing an efficient chip, in fact, most x86 instructions are already "emulated" using micro architecture or something like that in plain x86 chips.
(i say x86 but i mean both x86 and x86/64, its just laziness)

Indeed x86 instructions are "emulated", like most super-scalar architectures, but it is actually to obtain better performance. I did not check, but I guess server grade ARM is too. When you want to achieve very high throughput, it's pretty much the only way.
What essentially happens is that an ISA is exposed (ARM/x64/POWER), made in a way that it is retro-compatible with older chips, and user-friendly to some level. But the CPU actually executes "micro instructions" which are made to be executable more efficiently / faster. This helps an insane lots with resource allocation too (Floating point units, integer ALUs, "real" registers). Thus it allows the CPU to execute as many instructions in parallel as it possibly can. As such, this is actually positive in term of performance, even if it's a bit counter intuitive.
You can take look at work which was done on "Dynamo" JIT, which does the same in SW for older RISCV CPUs, resulting in faster code even though you have a JIT in the middle. Nvidia with their "Denver" ARM arch made a half HW half SW solution too doing just that.

Also, on the point of instructions not being used, well, then they only cost a few transistors here and there. Looking at what takes space in a CPU, it is NOT the decoder. Caches, register files dominate largely.

Overall, all of this does cost power and area. Duplicating pipelines, ressources and co will not come for free. But truth is, there is no real alternatives, Intel and others have tried to switch to more bare architectures, in particular with "VLIW"(Very Long Instruction word) or "EPIC" (Explicitely Parallel Instruction Computers) - as Intel calls it - style ISA/CPUs, where all the work is done as compile time instead of dynamically, but truth is, it just flat out does not work. Dynamic optimization of resources is always better on general code. Always. Such static approaches only work on a restricted set of program types.
And yes, if you are very constrained, then the pure RISCV approach will actually win, but this is less and less true as lithographies get better and we can pack more and more transistor per mm squared.
Also, anyway, at high throughput, prefetchers and branch prediction that mirv talked about are vital to RISCV too, this is mostly due to needing deeper pipelines at high frequency, and this blasted memory latency wall that poison us all ...

Quoting: elmapuli dont remember the exactly explanation on why arm was better, but it was something like x86 have an number of instruction that vary too much to be predictable or anything like that.

That's for the HW decoder yep, varying size instructions are kinda harder to decode. Bad news is, even in RISCV word they exist to a point and are a necessary evil.
In short, it's true that strict RISCV will allow you to have very regular instructions to decode, each instruction is 8 bytes wide on a 64 bit CPU, each has a a 16 bit opcode at the start, source register is at bit 24, dest at bit 32, immediate is there is in the rest. Of course, you can decode that faster.
BUT, and there is a big BUT, if you just want to push a register on the stack, then you only need an opcode, and a register. so you would barely use 24 bits out of those 64 you reserved. Thus you are wasting a lot of space. Also, since your instructions are very strict, saying that you want to do an offseted memory access requires to do

 

mov rx, SOME_ADDR

addi rx, SOME_IMMEDIATE

load ry, rx

Where each time you will use the full 64 bit instruction
whereas on CISC that would be
mov rx, 0xSOME_OFFSET[SOME_ADDR]
where you have only one opcode, only one dest register, and the two same ADD/IMMEDIATE as before.
On Intel, pushing a register, in x64 is only one byte (!!), where on a pure RISCV this will be 8 bytes.
Considering the price both in term of area and power of each bits of instruction cache, I think you see why most high throughput arches go the way of superscalar / more cisc like stuff. Once again apple M1 is actually borrowing intel/cisc like instruction for these things.

Quoting: elmapulthe processor spend a lot of time trying to figure out the instruction instead of executing it.
anyway, i hope someone else who work on the area can figure out what i'm talking about and and explain it in better/more precise words. =p

I kinda see the point, but it is only valid if you have very tight power/transistor budget, and can't afford deep / multi pipeline CPU backends.
As soon as you have a multi issue CPU with deep pipelines, decode stage becomes neglect-able. Not to count that the big CPUs are able to decode whole cache lines in parallel anyway, making that price even less important.
If we are talking IOT, or embedded CPUs, then yes, valid point.

To summarize,
Performance wise: pure RISCV is very efficient when you are in full control of what you execute, think of very compute intensive stuff on a very dedicated subject, where you can do insane amount of static optimizations. However, as soon as you have smthg which is more general, dynamic and this superscalar/ CISC over VLIW approaches win.

Power efficiency / Area efficiency wise: On constrained scenarios, RISC wins, as soon as you have enough area/power to go wide-issue (>=4 wise issue) with large parallel decoders then the difference will be low.

Emulation wise: A good JIT will see the patterns of mov / add / shift / load and translate it to single instructions on the host, allowing to keep instruction cache cost very low. And that is where the gain is. Conversely, ARM to x64 would have a big inflation in term of code (I measure as high as x3 inflation on pure execution code when it had a lot of control, and about 70 to 100% on more compute code , neglecting completely the emulator's control code). Pure performance, I saw a lower performance hit on ARM to x64 side than x64 to ARM side. But it's hard to validate that measurement though, as it's hard to compare smaller ARM core to full fledged x64 cores. M1 is cheating as it borrow some HW emulation too. The inflation on the other hand is a good metric, as it will lead to much more cache misses, prefetch cost and so on.
Which leads me to last note, if you use some level of HW emulation, well, who care which ISA you use for that purpose, by definition you implemented the problematic parts in HW.

Hope that I was clear.

View this comment - View article - View full comments

box86 and box64 get Steam Play Proton working much better on Arm devices
19 Apr 2022 at 7:44 am UTC

Quoting: elmapul"It's not like arm is new in gaming. Mobile phones have been doing it for a long time, the Switch uses arm cores."
speaking of it, arm processors would be much better to run emulators for portable consoles.
hell, its possible to run psp(or vita?) apps on a switch without emulators!

It's the Vita which can run without a full emulator, the PSP is using MIPS.
One problem though, at least for older portable consoles, is that they use 32 bit arm ISA, which has been dropped from newer cores. Also, emulating RISCV over modern CISC tend to work very well due to reducing the instruction cache bloat - an x64 instruction might cover 3 or more ARM instruction (think of LEA vs a multiplication a shift and an addition), keeping the generated code small. So it's not 100% sure that emulating ARM 32 over ARM 64 will be faster than emulating on top of x64.

As for emulating x64 over ARM, it is quite costly... The best way to do it is to go semi hardware like Apple did with the M1 (Implement a bunch of x64 instuctions in hw - mostly memory related -, use x64 memory ordering etc etc). Without that, I'm afraid taking a big overhead is mostly unavoidable, making recent games unplayable.

View this comment - View article - View full comments

2022 is officially the Year of Linux Gaming
18 Apr 2022 at 1:06 pm UTC Likes: 1

It sure seems to be going smooth. Compare to the time of steam machines, we have clear momentum and hype, I am confident millions will sell.
I don´t think the number of Desktop linux user is going to skyrocket, but I expect that as the steam deck gains popularity, and once valve releases SteamOS on the desktop, we will see more pc-building enthusiast go for it. I would'nt be surprised that we rise above the 5% mark in 2023. Not a very big number yes, but MacOS managed to have more ports than we did with half that - and combined with steam deck that would begin to give us a big enough market.

View this comment - View article - View full comments

AMD FidelityFX Super Resolution 2.0 announced
19 Mar 2022 at 8:19 pm UTC Likes: 1

Quoting: denyasis
Delivers similar or better than native image quality using temporal data
Wait, so it can make an image that's better than the original??

It depends which part we are talking about. They say it includes both up-scaling AND temporal anti aliasing(TAA). The up-scaling obviously won't give you a better image than original. TAA might if the base game does not implement it (very very unlikely for a game from the past 5-6 years I'd say, but well.).
So it's kind of a buzzphrase here, that's not really a lie, but that's not really true either. At least not in the sense that most people will understand it.
Overall, temporal upscale has existed for quite some time already, and 4A games has been using it extensively for Metro Exodus on DLSS-less graphics card. So once again, AMD did not really make a breakthrough in term of algorithm, so what it will do is very predictable. It will be better than static upscale, but it will introduce temporal artifacts, (reverse) ghosting, maybe shimering and other fun issues in exchange. Fixing those issues are what (I suspect) the AI part of DLSS/XeSS is mostly for. That and likely interpolating some small details. The fact that it comes in an open source toolkit however is very nice, and that it bring a TAA implementation with it will likely help smaller studios too. Also, once they add the AI part to fix the temporal artifacts, it will likely be just a small update for devs too. Now, XeSS might end up being more interesting on that front IF (and only if) it is open source.
The main inconvenient of the technique is that requires a more complicated plumbing in the game engine, so the integration complexity will be the same as DLSS.

View this comment - View article - View full comments

Google announces Steam for ChromeOS Chromebooks in 'Alpha'
16 Mar 2022 at 7:44 pm UTC Likes: 1

Quoting: PublicNuisanceWill this help Steam run on other ARM distros and hardware such as a RockPro64 on Manjaro ARM ? Would be my only interest here.

Chromebooks are x86, at least for latest generations. So no, this is purely some container work by Google, on x86 for x86. Might still help Linux though, especially with students using that a lot.

IMHO, Steam on ARM seem to still be very far to me, apple's MX chips excepted perhaps, but that's a very specific case.

View this comment - View article - View full comments

Humble Heroines Bundle has some quality treats
3 Mar 2022 at 8:36 am UTC Likes: 1

Quite a bargain with scarlet nexus this one. Also I just saw we can choose how much we give to GoL with the link, maxed it out :)

View this comment - View article - View full comments