arighi
on 26 February 2024
Crafting new Linux schedulers with sched-ext, Rust and Ubuntu
In our ongoing exploration of Rust and Ubuntu, we delve into an experimental kernel project that leverages these technologies to create new schedulers for Linux.
Playing around with CPU scheduling policies has always been a dream for many kernel hackers and OS enthusiasts. However, such material typically remains within the domain of a few core kernel developers with extensive years of experience.
But what if we could have a technology that allows us to hot-swap the Linux kernel scheduler at run-time and replace it with a user-space program?
This would provide not only a safer way to test scheduling policies, but it would also open the path to provide a pool of schedulers optimized for specific workload profiles (gaming, server, low-latency, power-saving, HPC, etc.), or schedulers specifically designed for complex heterogeneous architectures (e.g., systems with an intricate topology, such as fast cores mixed with slow cores, associated with multiple NUMA nodes).
Additionally, thanks to the availability of BPF maps, we can even implement full scheduling policies almost entirely in user space. This gives us access to a large variety of libraries and services, as well as debugging and profiling tools (see for example the recent trend in Ubuntu to focus on performance and observability).
Let’s see how to turn this dream into reality on Ubuntu, using eBPF, sched-ext and Rust.
The Foundation: eBPF and sched-ext
eBPF is a technology provided by the Linux kernel that allows the injection of sandboxed programs in kernel-space from the user-space.
These programs have access to kernel information and they can intercept kernel events and affect kernel actions.
eBPF programs can also store information to data structures called eBPF maps, which can be also accessed by regular user-space programs via system calls and direct memory accesses.
This can be achieved dynamically and efficiently at run-time, as eBPF programs execute actual kernel code. Additionally, this process is safe, as eBPF programs are validated by an in-kernel verifier before loading.
This verification ensures that the programs do not include out-of-bound access, potential infinite loops, memory leaks, or any other safety-related risk.
sched-ext is a new scheduling class introduced in the Linux kernel that provides a mechanism to implement scheduling policies as eBPF programs.
Exploiting the sched-ext capabilities, along with eBPF and eBPF maps, we can defer scheduling decisions to standard user-space programs, implementing fully functional hot-swappable Linux schedulers, using any language, tool, library, or resource accessible within the user-space environment.
Rust takes control
A direct consequence of this flexibility is the ability to leverage Rust for implementing Linux schedulers.
Rust can offer a great coding flexibility and advantages such as memory safety, zero-cost abstractions, and a strong type system.
With proper Rust abstractions, we can enjoy the advantages of programming at a very highly abstracted and elevated level, while retaining the capability to delve deep into low-level implementation details when necessary. All without incurring any noticeable performance overhead.
From theory to practice: scx_rustland
After outlining the theoretical benefits of a user-space scheduler written in Rust, the need for a practical proof of concept led to the creation of scx_rustland.
scx_rustland is a fully functional Linux scheduler included in the scx schedulers repository. The scheduler uses sched-ext / eBPF to channel scheduling events and actions from the kernel to a user-space program written in Rust.
This program runs all scheduling decisions, sending the results back to the kernel, which then dispatches tasks according to the order determined by the user-space scheduler.
How to use scx_rustland
Testing scx_rustland with Ubuntu 24.04 is actually very easy, it is just the matter of installing a few packages from ppa:arighi/sched-ext:
$ sudo add-apt-repository -y --enable-source ppa:arighi/sched-ext
$ sudo apt install -y linux-generic-wip scx
$ sudo sed -i "s/SCX_SCHEDULER=.*/SCX_SCHEDUER=scx_rustland/" /etc/default/scx
$ sudo systemctl enable scx
$ sudo reboot
WARNING: keep in mind that these packages are still experimental, so you should not use them in a production environment.
Result
While being a newly developed project (still in a proof-of-concept state), this scheduler exhibits promising results: despite the overhead of running in user-space it can achieve performance levels almost on par with the default Linux scheduler (EEVDF) and, with specific workloads, even outperform it.
However, the key point of this project is to prove that it is possible to implement schedulers in user-space and thanks to Ubuntu’s solid support of various packages, tools and frameworks, we can make scheduling development and experimentation accessible to everyone.
Conclusion
Key takeaways:
- A scheduler in user-space can give an extreme flexibility to quickly implement and experiment high level abstract concepts, without introducing noticeable performance overhead.
Rust does not make the scheduler faster, its advantage lies in the safety it provides, its robust programming ecosystem and its ergonomic design, over the C language (typically used with kernel code). - Schedulers do not magically make everything run faster: it depends on how you distribute the available “CPU bandwidth”. Typically one scheduler is better than another for a specific workload, because it gives more bandwidth to that workload, penalizing the others.
- Linux follows the approach of “one scheduler to rule them all”, but with the advent of new architectures with complex topology and the extreme portability of Linux it becomes more and more difficult to find a single solution that works for everything.
- Having the flexibility to hot-swap schedulers at run-time opens the possibility to dynamically load schedulers optimized for the specific workload that we need: gaming, server, low-latency, power-saving, etc.
- Implementing Linux schedulers as user-space programs enables an easier maintenance of the code: schedulers can be distributed as regular packages (deb, snap, etc.) and bug fixes, or updates can be applied at run-time without stopping the service or rebooting the system.
- Thanks to Rust and Ubuntu developers, researchers and students can easily access this technology to experiment scheduling policies and test them in a safe and accessible environment.
Future ideas
We are heading towards a micro-kernel design that has the potential to pave the way to certification on Linux: in the aforementioned scenario, if the user-space scheduler crashes, tasks will seamlessly transition to the default in-kernel scheduler, ensuring continuous system usability without any downtime.
This suggests that a similar approach could be used in other subsystems as well, allowing the Linux kernel to provide fully redundant and crash-safe systems.