I was surprised by how simple an allocator is

tgmatos.github.io

95 points by gilgamesh3 6 days ago

b0a04gl 2 days ago

but saying they're useless ignores a bunch of real systems that wouldn't run without them.

in unity, you literally can't do burst compiled jobs efficiently unless you choose the right allocator. they expose `Temp`, `Persistent`, etc because GC isn't even an option when you're fighting for milliseconds per frame. no allocator choice = frame skips = shipped bugs.

in embedded, FreeRTOS gives you multiple heap implementations for a reason. sometimes you need fixed size pools, sometimes you care about fragmentation. malloc's out of the question. same in any real time firmware, safety critical or not.

infra world has been using arena allocators for years. folly's `Arena`, grpc's arena, even jemalloc has thread caching and slab style region reuse built in. this isn't academic. large scale systems hit allocation pressure that general purpose allocators can't tune for globally.

and rust's whole alloc abstraction : it's more about expressing ownership across lifetimes in memory sensitive codebases. `Bumpalo` isn't a premature optimization. it's what you reach for when you know the object graph layout and want to free it in one call. also safe by design. it's not even painful to use.

imo allocator choice is an interface decision. it shapes how the rest of your code handles memory lifetimes. once you start passing memory lifetimes as part of your type system or job model, the whole architecture shifts. this is way deeper than faster malloc

reactordev 2 days ago

Allocator choice is a spice, not a staple. There’s various reasons to use one design vs another for your use case and no “one-size fits all”. Sometimes you want continuous memory. Sometimes you want sorted memory. Sometimes you want fast memory damned the internal fragmentation. Others, you want terse memory because you’re trying to fit a struct in 32 bytes. There’s a use case for it all.
- thechao 2 days ago
  
  "Staple" as in "potato"; not "paper". Although, I'm fond of both metaphors — the initial mixed one was very delightful!

wavemode 2 days ago

On first seeing this I wasn't sure what analogy the author was trying to make. After reading the article my best guess is that they are simply trying to say that, writing an allocator is easier than it seems on the surface.

Though it's not clear to me that the article does a good job of establishing that this is actually true ("mimalloc is only a few thousand lines of code" doesn't pass the smell test).

majormajor 2 days ago

Yeah, re: the title, the URL/path string "allocators-are-for-monkeys-with-typewriters" (including on the page) seems more clear about that "less hard than you'd think" thing than the larger-font published headline "Allocators are Monkeys With Typewriters". And of course the quote in the article is even more specific "given enough time, even a monkey with a typewriter can write a memory allocator".
I generally agree with the "memory management doesn't have to be as complicated as you might think" vibe, especially if you've read about some optimizations in fancy modern GCs and aren't aware of what a basic simple non-GC world can look like. That said, of course, you can indeed get into a lot of complexity beyond the textbook 101 examples. Like the mentioned threading...
- dang 2 days ago
  
  We replaced the title with a more representative sentence from the article body.

romesmoke a day ago

I spent the most part of my PhD trying to convince people that allocators are not simple. But the not-simple part is design, specifically the strategy/policy part. In other words, minimizing fragmentation by adapting to linked program behavior. From what I know, no product-level allocator and very few research-level ones go beyond a "one size fits all" approach.

The reason why this is so is pragmatic: fancy strategies take extra compute time that is too expensive when one's sitting on the critical path. So what follows has a good chance of not bearing practical value but (i) there are setups where memory capacity is a bottleneck and (ii) hey, it's just a PhD anyway.

So what does a fancy strategy even look like? Let's look at the simplest non-trivial version of the problem. Say, we know the sizes and lifetimes of all objects in advance, and we want to pack them in as tight a contiguous address space as possible. Mathematicians call this the Dynamic Storage Allocation (DSA) problem and it's NP-complete, i.e., no known optimal algorithm exists for the general case. Deep learning compilers battling the AI memory wall have been returning to DSA lately (it's an old problem).

The implication of the above is that a real allocator, that is, one forced to work under uncertainty w.r.t. object sizes and lifetimes down the line, can always fail remarkably. So going for a general-purpose solution begs that you know what you're doing (one could argue that most programs out there are "well-behaved", and I would agree up until the point where this observation is turned into a universal law).

Nevertheless, it remains a fact that writing a toy allocator is both educational and soberingly not hard.

GuB-42 2 days ago

Writing a malloc/free replacement was a 1 day project we did at school in the first couple of weeks of learning C/UNIX. Many students didn't even know how to code before that.

All that to say that writing an allocator is not hard. Writing a good allocator however is another story. Not because the algorithms are complicated, but because it is a very critical part of most systems and requires extensive knowledge of real life situations.

WalterBright 2 days ago

Writing a basic allocator should be a foundational part of any programming curriculum. It removes the mystery of them.
Just like writing a lexer should be. It's amazing how many programming tasks are improved by writing a lexer for it, and the awful code I see from people who have no idea how to write one.
(For example, I wrote a lexer for function that read a date like "Mar 7, 1997" and turned it into a time_t. The lexer greatly simplified it, as strings like "Mar" and "March" became the same tokens. A number from 13 to 31 would lex as a day token. The number 3 would lex as a DayOrMonth token. And so on.)
- o11c 21 hours ago
  
  And how did you lex "September 3, 1752"?
  - WalterBright 15 hours ago
    
    Month, DayOrMonth, Year
worthless-trash 2 days ago

Writing an allocator that is harder to exploit is harder again.

the__alchemist 2 days ago

I have an STM32 rust project I've been working on this week. It talks to an ESP using protobuf/RPC.

I'm doing it bare-metal/no allocator, as I do most embedded projects... and it's flirting with running me out of memory! What do most (and the most popular) protobuf libs do in rust? Use an allocator. What does the ESP itself do? Use an allocator (with FreeRTOS).

Meanwhile I'm using Heapless (Vec and String syntax with a statically-allocated array), on a MCU with 128K flash and 32K Ram... This won't end well.

charleslmunger 2 days ago

If you use upb, an allocator is optional - you can provide a presized block to upb_Arena and a NULL upb_alloc. Of course, you'll still fail to parse a message with an in memory representation larger than your region.
javier2 2 days ago

Think you need an allocator !

jasonthorsness 2 days ago

Applications should use more special-purpose memory allocators. Much of the complexity in memory management is designing for an unknown usage pattern and things can be quite simple when allocator is specialized and patterns are predictable.

This is difficult though in higher-level languages. Go tried and failed with arenas.

Snawoot 2 days ago

I think I found some solution for arenas in Go: https://pkg.go.dev/github.com/Snawoot/freelist
Has some shortcomings, but I think it should work.
Gibbon1 2 days ago

defer and a stack allocator would probably go well together.
Start of scope mark the allocators state. defer when it goes out of scope you release everything. Cheezy would be the release function takes a list of objects to keep.
- sirwhinesalot 2 days ago
  
  Extra cheese is if the list you pass only has the roots of a larger graph that you then "scan", "marking" every reachable block of data in the allocator and then you "compact" it, clearing out all unreachable objects.
  Hmmm... ;)

AndyKelley 2 days ago

Decent thread-safe allocator in 200 lines: https://github.com/ziglang/zig/blob/0.14.1/lib/std/heap/SmpA...

userbinator 2 days ago

Yes, but the article doesn't really show with code why it's simple.

A basic first-fit free-list type allocator can be less than 100 LoC - in x86 Asm. Using free space to store the overhead is a simple but effective optimisation too.

notepad0x90 2 days ago

allocators can be simple but how about memory management kernel subsystems?

One question I have for those who know the topic at a greater depth: Are there simpler memory managers that don't use a fixed page size? Or does that make it a lot more complex? imagine requesting allocation of 1 byte and the kernel allocating a page of 1 byte and immediately you request 20GB and you get a 20GB page. Other than memory-management metadata taking up more memory on its own, what are the downsides of dynamic page sizes?

o11c 20 hours ago

Fragmentation is one major reason to use fixed size classes. After just a few allocate/free-a-bit cycles, you can very easily end up with a large amount of free space in unusably tiny chunks. (this is one reason why languages with a Garbage Generator tend to prefer being allowed to move and compact their objects - their whole thing is keeping the allocator as simple as possible to beat the benchmarks, so they can't use the usual solutions)
There's also the matter of space overhead.
A bitmap allocator is simple and has an overhead of 1 bit per chunk (allocated or not). Normally the chunk is at least 16 bytes, but if you know you're doing smaller allocations there's nothing stopping you from doing one at byte granularity.
If there exists an invalid value (for example, UTF-8 strings cannot contain the byte 0xff), you can implement allocators with zero overhead, though efficiency requires changing the API (`malloc` requires writing to all the memory to erase the tombstones, so it's more efficient for the allocator to support `strdup` and `realloc_strcat` directly).
By bucketing you can vary the overhead based on allocation size. But also remember, a lot of the trickery you can do here comes at the cost of becoming more vulnerable to use-after-free bugs.
Besides all the missing APIs, one major annoyance related to allocators is the fact that userland lacks support for CPU-local variables that the kernel can use for more efficient operations. `rseq` helps a little but still falls short.
duped 2 days ago

That's essentially what an allocator is, it's an abstraction between a requested size and alignment from the user space program and the memory management API that interfaces with the MMU hardware (which maps virtual addresses into physical addresses where the virtual address is a "page" number + offset). The MMU is what's dealing in pages and offsets more than the kernel.
- notepad0x90 2 days ago
  
  What I mean is, there is always a minimum page size you're given by the kernel. Typically that is 4kb. if you malloc(sizeof(int)); you're not getting sizeof(int), you're getting a 4kb page allocated. You can change that value to something other than 4kb but then the entire system will use that value. I'm asking why the page size allocated can't be depending on the user space allocator's syscall parameter (dynamic page size). Or is it that the MMU at a hardware/microcode level can only handle static page sizes?
  - Veserv 2 days ago
    
    MMU hardware has a hardware-defined minimum mapping granularity (i.e. page size). On x86-64 this is 4KB. On ARMv8/v9 the specification allows runtime configuration for any of 4KB, 16KB, or 64KB as the minimum mapping granularity on a per-process basis, though the actual chip implementation is free to only support a subset of such configurations and still be conformant.
    Implementations may also define certain sizes that are more efficient. On modern x86-64, large pages may allow 2MB, 1GB, 512 GB, etc. (multiples of 512) to be more efficient. On modern ARMv8/v9 there is similar large page support (the exact details depend on the minimum mapping granularity) and they may also support aligned, contiguous mappings which making other sizes efficient (the exact details are even more complicated than simple large pages and highly optional).
    As such, if you want to get just "1 byte", then you create a userspace allocator that asks for memory that conforms to the limits of the MMU and then it is the job of the userspace allocator to abstract that away for you by chopping it up under its own auspices.
  - rcxdude 2 days ago
    
    Note that the page file is essentially a datastructure which maps addresses to other addresses, but it also resides in the same physical memory. If you were to try to map every byte of a system to different physical address, it would need significantly more memory than the system had! Further, the datastructure needs to allow efficient lookup of data because the pagefile mapping occurs on every memory access (results are cached in the Translation Lookaside Buffer (TLB) and the speed and accuracy of this cache is often a limiting factor in the speed of the CPU). This all biases towards the mapping wanting to be quite coarse, in fact 4kB is probably already not optimal on today's systems with gigabytes of memory.

nitrix 2 days ago

> "The C standard library provides malloc/free/resize"

It does not provide a `resize`. You're thinking of `realloc` on hosted environments.

charcircuit 2 days ago

>Seeing that mimalloc does not offer this feature

If you click on the issue you will see that mimalloc already does offer this feature via mi_manage_os_memory to create an arena, mi_heap_new_in_arena to create a heap that uses the arena, and mi_heap_malloc to allocate memory from the heap.

ekianjo 2 days ago

Why change the title? That's also against HN policy

gnabgib 2 days ago

It had the original title - mods changed it. See the title complaints here: https://news.ycombinator.com/item?id=44350320
And replacement: https://news.ycombinator.com/item?id=44350333
tom_ 2 days ago

Flag it if you're unhappy with the submission.

dang 2 days ago

[stub for offtopicness]

tantalor 2 days ago

[flagged]
keeptrying 2 days ago

From the title I thought he meant VCs.
- xeonmc 2 days ago
  
  Those would be alligators, rather.
davydm 6 days ago

lol, I read this as "alligators are monkeys with typewriters" and thought it would be a well interesting article
but it's just more blah-blah about ai :/
- freeone3000 2 days ago
  
  I think you might be commenting on the wrong article — this is an implementation of a memory allocator, as in, malloc.
- triknomeister 2 days ago
  
  As Andrei Alexandrescu famously said, "Allocator is to allocation what alligators is to allegation"
  https://www.youtube.com/watch?v=LIb3L4vKZ7U
  - gilgamesh3 2 days ago
    
    Author of the post here. This talk was one of the talks I watched when learning about allocators. I thought about writing a part of the post about this talk but I didn't have much time.
  - bitwize 2 days ago
    
    So when people retool their application to use GC do they say "See you later, allocator!"?
    ...I'll be here all week. Try the veal!
    
    layer8 2 days ago
    
    It’s actually without GC that you have to see the allocator again later, to free the memory.
- pmelendez 2 days ago
  
  AI? This article is about memory allocation strategies... Did I miss something?