Patreon Logo Support us on Patreon to keep GamingOnLinux alive. This ensures all of our main content remains free for everyone. Just good, fresh content! Alternatively, you can donate through PayPal Logo PayPal. You can also buy games using our partner links for GOG and Humble Store.
Latest Comments by F.Ultra
AMD reveal Ryzen 7000 X3D processors, desktop 65W CPUs and new mobile chips
10 Jan 2023 at 12:56 pm UTC

Quoting: Shmerl
Quoting: F.UltraThe problem space here is that both cores are high performance, just in different ways. I mean trying to determine if your thread/application would benefit from a higher clock or a larger cache is something that takes endless long benchmarks in numerous runs for application developers today (to determine which cpu to recommend to the enterprise to run the system on).
I wonder if AI can help with scheduler for that. It feels like prediction problem based on some moving sample input.

AMD are now adding AI chips to some of their APUs, so may be this can even be hardware accelerated in the future.
Possibly

AMD reveal Ryzen 7000 X3D processors, desktop 65W CPUs and new mobile chips
9 Jan 2023 at 8:08 pm UTC

Quoting: Shmerl
Quoting: F.UltraHow? I cannot think of how a scheduler can know which thread needs larger L3 vs higher clock frequency.
That's my thought too, but I think this so called "big little" issue exists for a while (started with ARM?) and may be there was some work for Linux on that front before in regards to asymmetric cache?

I also wonder what will happen if the scheduler will be unaware of any of that and scheduling will be random. I.e. what will perform better, 7950X or 7950X3D? Some thorough benchmarks comparing them will be for sure needed.
ARM big.LITTLE is a easy problem in this complex problem space. There the scheduler have the "simple" choice of "should this thread run on a high performing CPU or on a low performing CPU", it does this by doing some metrics (which still after all these years are far from perfect). Alder Lake have very similar design but there they added metric collection in the hardware (Intel Thread Director) buf AFAIK this is still far from perfect even in Windows 11 where compile jobs sometimes gets scheduled to the e-cores and thus takes 55min to complete instead of 17.

The problem space here is that both cores are high performance, just in different ways. I mean trying to determine if your thread/application would benefit from a higher clock or a larger cache is something that takes endless long benchmarks in numerous runs for application developers today (to determine which cpu to recommend to the enterprise to run the system on).

My guess is that MS (at this point in time AMD have only talked to Microsoft AFAIK) will simply (if they will do anything at all) try to detect if the application is a game or not and if so run it on the larger cache cores while running everything else on the higher boost cores.

To really benefit here the app/game developers would have to bench this individually and set the thread affinity but the number of combinations in combination with this probably going to be a niche cpu I have a hard time seeing this been done.

I think the most telling of all is that neither AMD nor Intel have any plans what so ever to implement either of these strategies on the server market.

Google open sourced CDC File Transfer from the ashes of Stadia
8 Jan 2023 at 3:19 pm UTC Likes: 4

Quoting: MayeulCI'm surprised, I thought rsync already used rolling hashes.

There's also casync in that space: https://github.com/systemd/casync/ [External Link] (the blog post is quite nice IIRC).

If it's similar but better than rsync, I feel like these improvements should be folded into rsync.
Perhaps semantics but rsync uses a rolling checksum (a variant of adler-32) combined with a strong hash. Casync looks really promising but unfortunately it needs a prepare stage that is probably why it so far haven't seen wide adoption and implementing that in rsync would require a total rewrite so that will probably never happen either (plus I don't think the rsync devs wants that need to prepare the files for transfer), the CDC improvements however should be a great contender for a new version of the rsync protocol/algorithm, just unfortunate that google decided to go NIH instead of proposing this change upstream to rsync.

Anyone interested should contact Wayne at [email protected]

AMD reveal Ryzen 7000 X3D processors, desktop 65W CPUs and new mobile chips
5 Jan 2023 at 11:49 pm UTC

Quoting: dpanter
Quoting: F.Ultrathere will now be a scheduler problem that is worse than on Alder Lake
Well, a potential problem at least. If it even is a problem at launch I expect it to be addressed quickly, much like Alder Lake was.
How? I cannot think of how a scheduler can know which thread needs larger L3 vs higher clock frequency.

AMD reveal Ryzen 7000 X3D processors, desktop 65W CPUs and new mobile chips
5 Jan 2023 at 8:53 pm UTC Likes: 1

One huge problem is that on the 7900x3d and the 7950x3d the extra L3 3d-cache is only connected to one of the CCD:s so there will now be a scheduler problem that is worse than on Alder Lake in that it have to decide which thread to give extra L3 to and which thread to give extra potential cpu boost frequency to, which of course is not something the scheduler can do (a thread could very much switch between the two needs as well).

Linux kernel 6.1 is out now
14 Dec 2022 at 8:04 pm UTC Likes: 1

Quoting: slaapliedje
Quoting: F.Ultra
Quoting: slaapliedje
Quoting: F.Ultra
Quoting: Guest
Btrfs file system performance improvements.
Is long mounting of large HDD partitions fixed now?
What counts as large partitions and long time? I use BTRFS on several servers each with 153TB per partition and mounting is sub second and have been for many years.

edit: that said one of the listed items is improved mount times on large systems:

Hi,

please pull the following updates for btrfs. There's a bunch of
performance improvements, most notably the FIEMAP speedup, the new block
group tree to speed up mount on large filesystems, more io_uring
integration, some sysfs exports and the usual fixes and core updates.

Thanks.

---

Performance:

- outstanding FIEMAP speed improvement
- algorithmic change how extents are enumerated leads to orders of
magnitude speed boost (uncached and cached)
- extent sharing check speedup (2.2x uncached, 3x cached)
- add more cancellation points, allowing to interrupt seeking in files
with large number of extents
- more efficient hole and data seeking (4x uncached, 1.3x cached)
- sample results:
256M, 32K extents: 4s -> 29ms (~150x)
512M, 64K extents: 30s -> 59ms (~550x)
1G, 128K extents: 225s -> 120ms (~1800x)

- improved inode logging, especially for directories (on dbench workload
throughput +25%, max latency -21%)

- improved buffered IO, remove redundant extent state tracking, lowering
memory consumption and avoiding rb tree traversal

- add sysfs tunable to let qgroup temporarily skip exact accounting when
deleting snapshot, leading to a speedup but requiring a rescan after
that, will be used by snapper

- support io_uring and buffered writes, until now it was just for direct
IO, with the no-wait semantics implemented in the buffered write path
it now works and leads to speed improvement in IOPS (2x), throughput
(2.2x), latency (depends, 2x to 150x)

- small performance improvements when dropping and searching for extent
maps as well as when flushing delalloc in COW mode (throughput +5MB/s)

User visible changes:

- new incompatible feature block-group-tree adding a dedicated tree for
tracking block groups, this allows a much faster load during mount and
avoids seeking unlike when it's scattered in the extent tree items
- this reduces mount time for many-terabyte sized filesystems
- conversion tool will be provided so existing filesystem can also be
updated in place
- to reduce test matrix and feature combinations requires no-holes
and free-space-tree (mkfs defaults since 5.15)

- improved reporting of super block corruption detected by scrub

- scrub also tries to repair super block and does not wait until next
commit

- discard stats and tunables are exported in sysfs
(/sys/fs/btrfs/FSID/discard)

- qgroup status is exported in sysfs (/sys/sys/fs/btrfs/FSID/qgroups/)

- verify that super block was not modified when thawing filesystem

Fixes:

- FIEMAP fixes
- fix extent sharing status, does not depend on the cached status where
merged
- flush delalloc so compressed extents are reported correctly

- fix alignment of VMA for memory mapped files on THP

- send: fix failures when processing inodes with no links (orphan files
and directories)

- fix race between quota enable and quota rescan ioctl

- handle more corner cases for read-only compat feature verification

- fix missed extent on fsync after dropping extent maps

Core:

- lockdep annotations to validate various transactions states and state
transitions

- preliminary support for fs-verity in send

- more effective memory use in scrub for subpage where sector is smaller
than page

- block group caching progress logic has been removed, load is now
synchronous

- simplify end IO callbacks and bio handling, use chained bios instead
of own tracking

- add no-wait semantics to several functions (tree search, nocow,
flushing, buffered write

- cleanups and refactoring

MM changes:

- export balance_dirty_pages_ratelimited_flags
I wonder how long it would take me to fill up 153TB with my Steam Library on my 2gbit fiber line...
Well if you start at 0 and then manages to fully saturate that 2Gbps line of yours and pay for all the games, and being able to do so in a manner that would keep the line saturated still and if we assume that this line would be dedicated to just the downloading of said games and not for other Internet usages (such as e.g visiting the Steam store) then it would take aprox 8 days.
Haha! I mean I likely have enough in my steam library where I could fill up that much. Too many games that are 100GB+ these days... hell, the libraries of most of the 16/32bit era aren't even close to 100GB.
Imagine how worse it will become once they decide to ship textures for 8K gaming...

Linux kernel 6.1 is out now
13 Dec 2022 at 11:51 pm UTC

Quoting: slaapliedje
Quoting: F.Ultra
Quoting: Guest
Btrfs file system performance improvements.
Is long mounting of large HDD partitions fixed now?
What counts as large partitions and long time? I use BTRFS on several servers each with 153TB per partition and mounting is sub second and have been for many years.

edit: that said one of the listed items is improved mount times on large systems:

Hi,

please pull the following updates for btrfs. There's a bunch of
performance improvements, most notably the FIEMAP speedup, the new block
group tree to speed up mount on large filesystems, more io_uring
integration, some sysfs exports and the usual fixes and core updates.

Thanks.

---

Performance:

- outstanding FIEMAP speed improvement
- algorithmic change how extents are enumerated leads to orders of
magnitude speed boost (uncached and cached)
- extent sharing check speedup (2.2x uncached, 3x cached)
- add more cancellation points, allowing to interrupt seeking in files
with large number of extents
- more efficient hole and data seeking (4x uncached, 1.3x cached)
- sample results:
256M, 32K extents: 4s -> 29ms (~150x)
512M, 64K extents: 30s -> 59ms (~550x)
1G, 128K extents: 225s -> 120ms (~1800x)

- improved inode logging, especially for directories (on dbench workload
throughput +25%, max latency -21%)

- improved buffered IO, remove redundant extent state tracking, lowering
memory consumption and avoiding rb tree traversal

- add sysfs tunable to let qgroup temporarily skip exact accounting when
deleting snapshot, leading to a speedup but requiring a rescan after
that, will be used by snapper

- support io_uring and buffered writes, until now it was just for direct
IO, with the no-wait semantics implemented in the buffered write path
it now works and leads to speed improvement in IOPS (2x), throughput
(2.2x), latency (depends, 2x to 150x)

- small performance improvements when dropping and searching for extent
maps as well as when flushing delalloc in COW mode (throughput +5MB/s)

User visible changes:

- new incompatible feature block-group-tree adding a dedicated tree for
tracking block groups, this allows a much faster load during mount and
avoids seeking unlike when it's scattered in the extent tree items
- this reduces mount time for many-terabyte sized filesystems
- conversion tool will be provided so existing filesystem can also be
updated in place
- to reduce test matrix and feature combinations requires no-holes
and free-space-tree (mkfs defaults since 5.15)

- improved reporting of super block corruption detected by scrub

- scrub also tries to repair super block and does not wait until next
commit

- discard stats and tunables are exported in sysfs
(/sys/fs/btrfs/FSID/discard)

- qgroup status is exported in sysfs (/sys/sys/fs/btrfs/FSID/qgroups/)

- verify that super block was not modified when thawing filesystem

Fixes:

- FIEMAP fixes
- fix extent sharing status, does not depend on the cached status where
merged
- flush delalloc so compressed extents are reported correctly

- fix alignment of VMA for memory mapped files on THP

- send: fix failures when processing inodes with no links (orphan files
and directories)

- fix race between quota enable and quota rescan ioctl

- handle more corner cases for read-only compat feature verification

- fix missed extent on fsync after dropping extent maps

Core:

- lockdep annotations to validate various transactions states and state
transitions

- preliminary support for fs-verity in send

- more effective memory use in scrub for subpage where sector is smaller
than page

- block group caching progress logic has been removed, load is now
synchronous

- simplify end IO callbacks and bio handling, use chained bios instead
of own tracking

- add no-wait semantics to several functions (tree search, nocow,
flushing, buffered write

- cleanups and refactoring

MM changes:

- export balance_dirty_pages_ratelimited_flags
I wonder how long it would take me to fill up 153TB with my Steam Library on my 2gbit fiber line...
Well if you start at 0 and then manages to fully saturate that 2Gbps line of yours and pay for all the games, and being able to do so in a manner that would keep the line saturated still and if we assume that this line would be dedicated to just the downloading of said games and not for other Internet usages (such as e.g visiting the Steam store) then it would take aprox 8 days.

Linux kernel 6.1 is out now
13 Dec 2022 at 11:40 pm UTC

Quoting: Guest
Quoting: F.Ultra
Quoting: Guest
Btrfs file system performance improvements.
Is long mounting of large HDD partitions fixed now?
What counts as large partitions and long time? I use BTRFS on several servers each with 153TB per partition and mounting is sub second and have been for many years.

edit: that said one of the listed items is improved mount times on large systems:

Hi,

please pull the following updates for btrfs. There's a bunch of
performance improvements, most notably the FIEMAP speedup, the new block
group tree to speed up mount on large filesystems, more io_uring
integration, some sysfs exports and the usual fixes and core updates.

Thanks.

---

Performance:

- outstanding FIEMAP speed improvement
- algorithmic change how extents are enumerated leads to orders of
magnitude speed boost (uncached and cached)
- extent sharing check speedup (2.2x uncached, 3x cached)
- add more cancellation points, allowing to interrupt seeking in files
with large number of extents
- more efficient hole and data seeking (4x uncached, 1.3x cached)
- sample results:
256M, 32K extents: 4s -> 29ms (~150x)
512M, 64K extents: 30s -> 59ms (~550x)
1G, 128K extents: 225s -> 120ms (~1800x)

- improved inode logging, especially for directories (on dbench workload
throughput +25%, max latency -21%)

- improved buffered IO, remove redundant extent state tracking, lowering
memory consumption and avoiding rb tree traversal

- add sysfs tunable to let qgroup temporarily skip exact accounting when
deleting snapshot, leading to a speedup but requiring a rescan after
that, will be used by snapper

- support io_uring and buffered writes, until now it was just for direct
IO, with the no-wait semantics implemented in the buffered write path
it now works and leads to speed improvement in IOPS (2x), throughput
(2.2x), latency (depends, 2x to 150x)

- small performance improvements when dropping and searching for extent
maps as well as when flushing delalloc in COW mode (throughput +5MB/s)

User visible changes:

- new incompatible feature block-group-tree adding a dedicated tree for
tracking block groups, this allows a much faster load during mount and
avoids seeking unlike when it's scattered in the extent tree items
- this reduces mount time for many-terabyte sized filesystems
- conversion tool will be provided so existing filesystem can also be
updated in place
- to reduce test matrix and feature combinations requires no-holes
and free-space-tree (mkfs defaults since 5.15)

- improved reporting of super block corruption detected by scrub

- scrub also tries to repair super block and does not wait until next
commit

- discard stats and tunables are exported in sysfs
(/sys/fs/btrfs/FSID/discard)

- qgroup status is exported in sysfs (/sys/sys/fs/btrfs/FSID/qgroups/)

- verify that super block was not modified when thawing filesystem

Fixes:

- FIEMAP fixes
- fix extent sharing status, does not depend on the cached status where
merged
- flush delalloc so compressed extents are reported correctly

- fix alignment of VMA for memory mapped files on THP

- send: fix failures when processing inodes with no links (orphan files
and directories)

- fix race between quota enable and quota rescan ioctl

- handle more corner cases for read-only compat feature verification

- fix missed extent on fsync after dropping extent maps

Core:

- lockdep annotations to validate various transactions states and state
transitions

- preliminary support for fs-verity in send

- more effective memory use in scrub for subpage where sector is smaller
than page

- block group caching progress logic has been removed, load is now
synchronous

- simplify end IO callbacks and bio handling, use chained bios instead
of own tracking

- add no-wait semantics to several functions (tree search, nocow,
flushing, buffered write

- cleanups and refactoring

MM changes:

- export balance_dirty_pages_ratelimited_flags
known issue for HDD (rotational disks)
https://bbs.archlinux.org/viewtopic.php?id=272376 [External Link]
https://lore.kernel.org/linux-btrfs/CAMQzBqCSzr4UO1VFTjtSDPt+0ukhf6yqK=q+eLA+Tp1hiB_weA@mail.gmail.com/t/#u [External Link]
https://lore.kernel.org/linux-btrfs/CAHQ7scVGPAwEGQOq3Kmn75GJzyzSQ9qrBBZrHFu+4YWQhGE0Lw@mail.gmail.com/t/#u [External Link]
That guy has 16TB disk and it takes under 1 minute to mount it.
I have 3.2TB partition and it mounts in 17 seconds, with defragmented extents.
Well I use rotational HDD:s as well (so ok they are SAS and not SATA drives but that shouldn't matter), in the latest machine I configured it was 36 x Seagate Exos X18 3.5 HDD 16TB SAS and it mounts almost instantly.

edit: not saying that it doesn't happen to people because it obviously are, just that it does not seem to be universal. Anyway here is to hoping that the changes in 6.1 solves it for the people that are experiencing this issue.

Microsoft's acquisition of Activision Blizzard hits a bump as FTC seeks to block it
13 Dec 2022 at 11:35 pm UTC Likes: 3

Quoting: pleasereadthemanualEdit: And if you're familiar with the fable of Microsoft v Netscape, there's a monopoly that ended well for customers everywhere. Thanks to Microsoft, they standardized the idea of browsers everywhere being free-of-charge, which forced Netscape to pivot into Mozilla, which resulted in a great free software browser (that was also free-of-charge) that subsequently took market share from Internet Explorer because it was a better browser that supported open standards (some of which it had invented itself, like JavaScript) better than the competition.

If Netscape won, what would the world look like? Microsoft would have had to charge for Internet Explorer instead of bundling it with Windows 95 and future editions. Netscape wouldn't have been forced to innovate. Mozilla Firefox wouldn't exist, nor would any of Mozilla's other great free software programs.
The IE monopoly had zero to do with if a browser should be free or not, browsers had been free to download and use for some years before Microsoft decided to bundle IE with Windows and thus using their monopoly on the OS to gain leverage in the browser market. Nor was "give anyone a free browser" the aim of Microsoft, their goal was to shift enterprises to use their web server ISS instead of the Netscape Web Server (and in the end they both lost out to Apache anyway).

The problem was never that MS developed IE or that they released it for free, it was all about them abusing their monopoly. Fully agree that we probably in the end got it better with Firefox than if Netscape Navigator had survived (although we will never know since we don't have access to that parallel universe) but that is a "despite of" and not "thanks to" the market abuse by MS.

Linux kernel 6.1 is out now
12 Dec 2022 at 11:36 pm UTC

Quoting: Guest
Btrfs file system performance improvements.
Is long mounting of large HDD partitions fixed now?
What counts as large partitions and long time? I use BTRFS on several servers each with 153TB per partition and mounting is sub second and have been for many years.

edit: that said one of the listed items is improved mount times on large systems:

Hi,

please pull the following updates for btrfs. There's a bunch of
performance improvements, most notably the FIEMAP speedup, the new block
group tree to speed up mount on large filesystems, more io_uring
integration, some sysfs exports and the usual fixes and core updates.

Thanks.

---

Performance:

- outstanding FIEMAP speed improvement
- algorithmic change how extents are enumerated leads to orders of
magnitude speed boost (uncached and cached)
- extent sharing check speedup (2.2x uncached, 3x cached)
- add more cancellation points, allowing to interrupt seeking in files
with large number of extents
- more efficient hole and data seeking (4x uncached, 1.3x cached)
- sample results:
256M, 32K extents: 4s -> 29ms (~150x)
512M, 64K extents: 30s -> 59ms (~550x)
1G, 128K extents: 225s -> 120ms (~1800x)

- improved inode logging, especially for directories (on dbench workload
throughput +25%, max latency -21%)

- improved buffered IO, remove redundant extent state tracking, lowering
memory consumption and avoiding rb tree traversal

- add sysfs tunable to let qgroup temporarily skip exact accounting when
deleting snapshot, leading to a speedup but requiring a rescan after
that, will be used by snapper

- support io_uring and buffered writes, until now it was just for direct
IO, with the no-wait semantics implemented in the buffered write path
it now works and leads to speed improvement in IOPS (2x), throughput
(2.2x), latency (depends, 2x to 150x)

- small performance improvements when dropping and searching for extent
maps as well as when flushing delalloc in COW mode (throughput +5MB/s)

User visible changes:

- new incompatible feature block-group-tree adding a dedicated tree for
tracking block groups, this allows a much faster load during mount and
avoids seeking unlike when it's scattered in the extent tree items
- this reduces mount time for many-terabyte sized filesystems
- conversion tool will be provided so existing filesystem can also be
updated in place
- to reduce test matrix and feature combinations requires no-holes
and free-space-tree (mkfs defaults since 5.15)

- improved reporting of super block corruption detected by scrub

- scrub also tries to repair super block and does not wait until next
commit

- discard stats and tunables are exported in sysfs
(/sys/fs/btrfs/FSID/discard)

- qgroup status is exported in sysfs (/sys/sys/fs/btrfs/FSID/qgroups/)

- verify that super block was not modified when thawing filesystem

Fixes:

- FIEMAP fixes
- fix extent sharing status, does not depend on the cached status where
merged
- flush delalloc so compressed extents are reported correctly

- fix alignment of VMA for memory mapped files on THP

- send: fix failures when processing inodes with no links (orphan files
and directories)

- fix race between quota enable and quota rescan ioctl

- handle more corner cases for read-only compat feature verification

- fix missed extent on fsync after dropping extent maps

Core:

- lockdep annotations to validate various transactions states and state
transitions

- preliminary support for fs-verity in send

- more effective memory use in scrub for subpage where sector is smaller
than page

- block group caching progress logic has been removed, load is now
synchronous

- simplify end IO callbacks and bio handling, use chained bios instead
of own tracking

- add no-wait semantics to several functions (tree search, nocow,
flushing, buffered write

- cleanups and refactoring

MM changes:

- export balance_dirty_pages_ratelimited_flags