With the first release candidate of Linux 5.10, Linus Torvalds announced a historic change at the end of October: The set_fs () mechanism, which has existed since 1991, is to be phased out with the coming kernel version, if not on all, at least on some CPU architectures will.
Calling set_fs () typically enables subsequent userspace access functions to access the protected memory area of the kernel – or removes this option again. However, the legacy of the old 386s has also caused serious security problems in the past. In this article, we’ll take a closer look at how set_fs () works, the dangers inherent in it, and the consequences of removing it.
30 years of set_fs ()
The kernel function set_fs () has been an integral part of the Linux kernel since Release 0.10, i.e. since the end of 1991. At that time, Linux was still tailored to the 80386 processor and inevitably used what this CPU offered.
The operating system used the FS register of the 386 to make the kernel or user space available for the functions to be called. Logical memory addresses are made up of a segment selector and an offset. Under normal circumstances, the segment selector in the FS register points to the user space. By setting the FS register to the kernel space with set_fs (), userspace access functions such as get_user_byte () can access the kernel space. This is tantamount to redeclaring: In this way, pointers between userspace and kernel space can be exchanged directly.
This mechanism was intended to efficiently address functions from the kernel space that are otherwise restricted to the user space. An example from those days is the UMSDOS file system. It maps a Unix file system to an underlying MS-DOS file system and accesses the MS-DOS file system driver via system functions. Since both UMSDOS and MS-DOS file system drivers are in the kernel space, the set_fs () trick is used to “trick” the functions into leaving the user space.
Variable instead of register
The names of the functions set_fs (), get_fs () and Co. have remained the same – but the kernel no longer uses the FS register when they are called today. From version 2.2 it was replaced by the platform-independent global variable addr_limit, the value of which shows the limit between user and kernel space. Everything below is userspace, everything above is kernel space. Compared to the FS register, addr_limit enables an address to be checked for validity much more easily. An arithmetic comparison with the variable value is sufficient to determine whether the address is located in the user or in the kernel space.
The kernel function access_ok (), for example, checks an address against addr_limit in order to generally assign it to the user or kernel space and to evaluate the access. The kernel now leaves further access protection to the processor’s memory management unit (MMU).