This past weekend I had maybe the best technical thread I've had in years, and it left me chewing on something I want to write down before it fades.
I'd posted a small writeup about wire-probe, a tiny L4 latency tool I built. What started as the usual "this looks AI-written" sniping turned, somewhere around the third reply, into a long and genuinely good back-and-forth with Mohit D. Patel, who is building a from-scratch operating system in Rust. We went deep: page-fault stacks, lazy FPU state, scheduler activations, cancellation semantics. And in the middle of it a single idea surfaced that I haven't been able to put down since.
Hardware is asynchronous, and almost every operating system we use is not.
The retrofit we all live with
The dominant model, the Unix and POSIX lineage that Linux and the BSDs inherit, was designed around blocking calls and untyped byte streams. You call read, your thread stops until the data is there. That was a reasonable model for the machines of the early 1970s, and it is still a clean way to think. The problem is that real hardware has never worked that way. A disk controller, a NIC, a DMA engine all run independently of the CPU and signal completion later. The synchronous call is a fiction the kernel maintains on top of fundamentally asynchronous machinery.
For decades we have been bolting asynchrony back onto that blocking foundation, usually badly. POSIX AIO was awkward and widely avoided. Linux's native AIO only really worked for direct I/O and quietly blocked the rest of the time. Then came io_uring, which is the best async interface Linux has ever had, and it exists in large part because everything before it was inadequate.
Here is the part worth noticing. io_uring is a completion-ring design: you submit operations into one queue, the kernel completes them, and you collect results from another. That is not a new idea. Windows NT shipped overlapped I/O and I/O completion ports back in the mid-1990s, and they do exactly this. So io_uring is not Linux inventing something. It is Linux converging, thirty years later, on a completion model another mainstream OS has had the whole time. The async-first idea is old. What is rare is making it the foundation instead of a late addition.
Everything is the same handle
One thing from the conversation stuck with me more than the rest. A Unix file descriptor, a Windows HANDLE, and the observer capability in his design are all the same primitive: a kernel-managed reference to something you can wait on. Strip away the abstraction layers and every system converges on roughly the same shape. Submit an operation, get back a handle, then poll it, block on it, or hang a callback off it. The interesting differences are not in that shape. They are in lifetime and cancellation, which is exactly where these designs get hard.
That convergence is the tell. If everyone ends up at the same primitive regardless of where they started, maybe the right move is to start there on purpose rather than arrive at it by accretion.
What async-first actually looks like
His approach, and I want to be clear this is an early and openly work-in-progress project, is to make asynchrony the default rather than the exception. Events in the kernel are modeled with an observer pattern. Threads are made cheap enough that you can just spawn one for a task instead of reaching for the thread pools and tasklets that exist mainly because traditional kernel threads are too heavy to use casually. Notifications into userspace are delivered as upcalls, which sit in the same family as the scheduler-activations research from the early 90s: a kernel-originated interrupt into user code instead of a thread parked on a call.
Underneath all of it is one claim. Since the hardware is asynchronous with respect to the CPU, the OS I/O model should be lightweight-async by default, and the blocking call should be the thing you build on top, not the thing everything else gets wrapped around.
Why I think this is a good direction
A surprising amount of operating-system machinery turns out to be workaround for the blocking-first assumption.
Async-signal-safety, the rule that you can only call a small set of functions from a signal handler, exists because signals are asynchronous delivery bolted onto a synchronous world, so any critical section can now be interrupted at the worst possible moment. Thread pools, tasklets, and workqueues exist because spawning a real kernel thread per small task is too expensive in the traditional model. A lot of accumulated complexity is there to paper over a foundation that assumed your code would politely block and wait. If asynchrony is the default from the first commit, some of that machinery simply stops being necessary.
And the hardware trend only sharpens the point. Storage is queues now, and NVMe is built around a lot of them. Networking pushes more of the work onto the card every year, to the point where on a modern cloud host a SmartNIC is doing things the host CPU used to. The machine underneath us has been getting more asynchronous, not less, the entire time we have been optimizing a synchronous abstraction over it.
The honest part
I would be doing the idea a disservice if I pretended it were free.
The hard problems do not disappear when you go async-first. They move. Cancellation that is actually synchronous, backpressure when completions outrun the consumer, and the safety of interrupting userspace at an arbitrary instruction are all still hard, and an async-first design relocates that difficulty rather than removing it. We spent a good chunk of the thread on exactly these corners, and none of them have free answers.
Greenfield has its own tax, and it is a heavy one. No ecosystem. No driver base. You carry the entire hardware-enablement burden yourself, including ACPI and AML, which is one of the worst-designed standards ever shipped and a multi-year project on its own. And you give up the decades of battle-tested workarounds that Linux and Windows accumulated, which look like cruft right up until you hit the exact broken firmware they were written for.
So this is not "Linux is obsolete." Linux gets an enormous amount right inside the constraints it has to honor. The argument is narrower and more interesting than that. The assumptions underneath the dominant model are worth re-examining, and the only way to really test the alternative is to build it clean, with no backwards-compatibility burden, on hardware that actually exists today.
Where I land
The interesting thing is not whether this one project ships. It is that the question is live again. We have spent thirty years getting very good at the retrofit. Someone starting from "the OS is async by default" and following that honestly through page faults, schedulers, and drivers is going to surface things the incremental path never will. Even the failures will be informative.
The project is CharlotteOS, by Mohit D. Patel (mdpcs3544 on GitHub), written in Rust, AGPL, and early but real: github.com/charlotte-os/charlotte-os. I am going to keep following it and contribute where I can. If you do operating-systems work, or you just like watching someone attempt the hard version of a thing, it is worth a look.
We have gotten very good at optimizing the workaround. It might be time to look harder at the assumption underneath it.