7 hours agoCreated a post • 189 points @0mp
FreeBSD already supported something like this effectively, but in my opinion better way.
You can call cap_enter(), which disables open(), unlink(), mkdir(), etc. entirely. You can, however, still use openat(), unlinkat(), mkdirat() with relative paths that expand to a location underneath a directory file descriptor. This achieves the same thing, except that you can now have as many chroots as you want. Not just one.
Unfortunately, the idea never caught on, because virtually no software on UNIX uses the *at() functions. Also: the non-*at() functions are still available as symbols, meaning that you can't perform simple compile-time checks to ensure that you application works properly when this form of sandboxing is enabled. Turns out that off-the-shelf software (e.g., libraries) end up misbehaving in unpredictable ways if you disable ~50% of the POSIX API.
It's a shame, because this feature effectively requires you to treat the file system in an object oriented/dependency injected way. Pretty good from a reusability/testability perspective.Reply
For those, like me, lacking context, what are the implications of this?Reply
The commit message does NOT indicate when this will be available to mere mortals like myself.
Can someone enlighten me if this will be part of FreeBSD 14, or if there is a chance it will become available earlier, perhaps with FreeBSD 13.1?
EDIT: The commit message does NOT indicate etc. Silly me.Reply
On many linux distro's you can already do this with user namespaces:
Very often when you use chroot you also want unprivileged mounts, in particular overlay mounts if you don't want to mutate the underlying rootfs. You can do that with mount namespaces: `unshare -rm`, but you need Linux kernel 5.13 (or a distro with a patched kernel like Ubuntu) to allow unpriviliged overlayfs.Reply
$ mkdir rootfs $ docker export $(docker create ubuntu:20.04) | tar -C rootfs -xf - $ unshare -r chroot rootfs bash # ls bin dev home ...
I wish Linux would do this. Patches are available: https://lwn.net/Articles/849125/
Yes, you can do this on Linux with a user namespace, but a user namespace changes the view of user accounts. You have to map every usable UID inside the namespace to a UID you control outside the namespace. At best, you can map a range of UIDs you control to "real" users (root, 1000, etc.) inside the namespace, but they won't be real users outside the namespace. If you're on a multi-user system, seeing other people's files as owned by "nobody" is confusing.
It should be enough to use NO_NEW_PRIVS mode, meaning setuid transitions are not allowed. Then it doesn't matter what user IDs you see inside the chroot.
In fact, back when Linux introduced the NO_NEW_PRIVS flag (almost a decade ago!), this was one of the motivating use cases.Reply
*BSD have been quite innovative recently. The pledge and unveil syscalls, although achievable by other means on linux, are very simple and effective for what they do. I don't know a way on linux to use a system on a directory without being root; even if possible I'd still need root to mount --bind some dirs, but definitely something I'd like to do.
I don't think containers should be needed for that.Reply