I recently ported WebKit's libpas memory allocator[1] to Windows, which used pthreads on the Linux and Darwin ports. Depending on what pthreads features you're using it's not that much code to shim to Windows APIs. It's around ~200 LOC[2] for WebKit's usage, which a lot smaller than pthread-win32.
I'm a big fan of pigz, I discovered it 6 years ago when I had some massive files I needed to zip and and 48 core server I was underutilizing. It was very satisfying to open htop and watch all the cores max out.
Perhaps it's worth it adding this as a note at the top of the post, maybe mentioning alternatives, such as an Actually Portable⢠build of `pigz`[1] or just a windows build of zstd[2].
I don't think the port itself is very old. The latest version of original pigz seems to have been released in 2023 [1], and the port seems to be of pigz from around that time[2]
depends on the current load. i've worked places where we would create nightly postgres dumps via pg_dumpall, then pipe through pigz to compress. it's great if you run it when load is otherwise low and you want to squeeze every bit of performance out of the box during that quiet window.
this predates the maturation of pg_dump/pg_restore concurrency features :)
Not to over state it, embedding the parallelism into the application drives to the logic "the application is where we know we can do it" but embedding the parallelism into a discrete lower layer and using pipes drives to "this is a generic UNIX model of how to process data"
The thing with "and pipe to <thing>" is that you then reduce to a serial buffer delay decoding the pipe input. I do this, because often its both logically simple and the component of serial->parallel delay deblocking on a pipe is low.
Which is where xargs and the prefork model comes in, because instead you segment/shard the process, and either don't have a re-unification burden or its a simple serialise over the outputs.
When I know I can shard, and I don't know how to tell the appication to be parallel, this is my path out.
I recently ported WebKit's libpas memory allocator[1] to Windows, which used pthreads on the Linux and Darwin ports. Depending on what pthreads features you're using it's not that much code to shim to Windows APIs. It's around ~200 LOC[2] for WebKit's usage, which a lot smaller than pthread-win32.
[1] https://github.com/WebKit/WebKit/pull/41945 [2] https://github.com/WebKit/WebKit/blob/main/Source/bmalloc/li...
These VirtualAlloc's may intermittently fail if the pagefile is growing...
Ah yeah, I see Firefox ran into that and added retries:
https://hacks.mozilla.org/2022/11/improving-firefox-stabilit...
Seems like a worthwhile change, though I'm not sure when I'll get around to it.
Never knew about the destructor feature for fiber local allocations!
Very old post, needs 2013 in the title
https://web.archive.org/web/20130407195442/https://blog.kowa...
Seems to be updated, no?
I'm a big fan of pigz, I discovered it 6 years ago when I had some massive files I needed to zip and and 48 core server I was underutilizing. It was very satisfying to open htop and watch all the cores max out.
Edit: found the screenshot https://imgur.com/a/w5fnXKS
that was a big big file indeed
Worth mentioning that this is only of interest as technical info on porting process.
The port itself is very old and therefore very outdated.
Perhaps it's worth it adding this as a note at the top of the post, maybe mentioning alternatives, such as an Actually Portable⢠build of `pigz`[1] or just a windows build of zstd[2].
[1] https://cosmo.zip/pub/cosmos/tiny/pigz
[2] https://github.com/facebook/zstd/releases/latest/
I don't think the port itself is very old. The latest version of original pigz seems to have been released in 2023 [1], and the port seems to be of pigz from around that time[2]
[1] - https://zlib.net/pigz/
[2] - https://github.com/kjk/pigz/commits/master/
I wish premake could gain more traction. It is the comprehensible alternative to Cmake etc.
Xmake[0] is as-simple-as-premake and does IIRC everything Premake does and a whole lot more.
[0] https://xmake.io/
It's 2025, just use meson
Completely useless in an airgapped environment
Repository link: https://github.com/kjk/pigz
This is clearly aimed at faster results in a single user desktop environment.
In a threaded server type app where available processor cores are already being utilized, I don't see much real advantage in this --- if any.
depends on the current load. i've worked places where we would create nightly postgres dumps via pg_dumpall, then pipe through pigz to compress. it's great if you run it when load is otherwise low and you want to squeeze every bit of performance out of the box during that quiet window.
this predates the maturation of pg_dump/pg_restore concurrency features :)
Not to over state it, embedding the parallelism into the application drives to the logic "the application is where we know we can do it" but embedding the parallelism into a discrete lower layer and using pipes drives to "this is a generic UNIX model of how to process data"
The thing with "and pipe to <thing>" is that you then reduce to a serial buffer delay decoding the pipe input. I do this, because often its both logically simple and the component of serial->parallel delay deblocking on a pipe is low.
Which is where xargs and the prefork model comes in, because instead you segment/shard the process, and either don't have a re-unification burden or its a simple serialise over the outputs.
When I know I can shard, and I don't know how to tell the appication to be parallel, this is my path out.