-
Notifications
You must be signed in to change notification settings - Fork 272
Added --sys argument for mounting sysfs #699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
completions/bash/bwrap
Outdated
| --size | ||
| --symlink | ||
| --sync-fd | ||
| --sys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tabs vs. spaces
bubblewrap.c
Outdated
|
|
||
| case PRIV_SEP_OP_SYS_MOUNT: | ||
| if (mount ("sysfs", arg1, "sysfs", MS_NOSUID | MS_NOEXEC | MS_NODEV, NULL) != 0) | ||
| die_with_mount_error ("Can't mount sys on %s", arg1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tabs vs. spaces
| if (ensure_dir (dest, 0755) != 0) | ||
| die_with_error ("Can't mkdir %s", op->dest); | ||
|
|
||
| if (unshare_pid || opt_pidns_fd != -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sysfs is also about unshare_net. Also check this with your mount is too revealing error. There is a difference between --unshare-user --share-net and --unshare-user --unshare-net for sysfs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like something that should be addressed, particularly if --unshare-net is involved in your original use-case for wanting this feature.
(Of course, if you change --sys so that it unconditionally mounts a new sysfs instance, then there will be no need to have this check at all.)
| break; | ||
|
|
||
| case PRIV_SEP_OP_SYS_MOUNT: | ||
| if (mount ("sysfs", arg1, "sysfs", MS_NOSUID | MS_NOEXEC | MS_NODEV, NULL) != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you on a SELinux system? Because crablock will bind the host /sys/fs/selinux selinuxfs in the new sysfs. I need to check what was the exact reason for this, but it was the better choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting.
No, I am on a quite minimal, 'debootstrapped' Debian without AppArmor or SELinux.
So that would mean that if /sys/fs/selinux exists, it should be mounted on top of the new /sys in the container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that would mean that if /sys/fs/selinux exists, it should be mounted on top of the new /sys in the container?
Yes. If the host sysfs has a /sys/fs/selinux mounted, --sys should bind mount it on top of the container sysfs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the code over at crablock you are referencing:
I guess I will have to spin up Fedora or something to test that.
As long as the container sysfs already contains the /sys/fs/selinux directory to serve as mountpoint, it should be smooth sailing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I have adapted the code to consider /sys/fs/selinux and successfully tested this under the latest Fedora 42 (with SELinux) and Debian (without SELinux), also as unprivileged user.
This gets me:
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,nosuid,nodev,noexec,relatime)
Please, if you can, let me know what you think about it.
I test this with:
bwrap
--new-session
--die-with-parent
--as-pid-1
--unshare-all
--uid 0 --gid 0
--ro-bind ./rootfs /
--dev /dev
--proc /proc
--sys /sys
--hostname "alpine-test"
--clearenv
--setenv TERM "xterm-256color"
/bin/bash --login
And I generate the roofs with:
docker run --rm -v "$(pwd)/rootfs":/build alpine:3.22.1 sh -c 'apk --arch $(arch) -X https://dl-cdn.alpinelinux.org/alpine/latest-stable/main/ -X https://dl-cdn.alpinelinux.org/alpine/latest-stable/community/ -U --allow-untrusted --root /build --initdb add alpine-base bash iproute2 wireguard-tools net-tools htop psmisc bind-tools nano mg tcpdump netcat-openbsd iputils traceroute iptables strace util-linux'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is special about /sys/fs/selinux that it should always be bind-mounted automatically, especially if the other filesystems below /sys should not?
On my Debian system I have /sys/firmware/efi/efivars, /sys/kernel/security, /sys/fs/cgroup, /sys/fs/pstore, /sys/fs/bpf and more. Should those be bind-mounted too? If yes, why? If no, why not?
I think it might be better (and certainly easier to document) for --sys to only mount the sysfs, and if callers of bwrap (such as Flatpak or Glycin or whatever) want to bind-mount other /sys-based filesystems like /sys/fs/selinux, they can --bind them explicitly.
|
What is the advantage of mounting a completely new instance of sysfs, instead of For example Flatpak does bwrap needs |
|
@smcv The problem with bind mounting sysfs surfaces when dealing with networking utilities like iproute2, ifupdown, bridge-utils (brctl) because they heavily rely on sysfs being in sync with the actual network namespace. Otherwise, I will see the host's network interfaces and not the container's. This leads to unintended side-effects. For example, I can create a bridge but I cannot manage it because for the managing operation, sysfs is accessed... and my newly created bridge is not in there! Or on Alpine, when using OpenRC as init and ifupdown-ng for bringing up interfaces, it just completely errors out when configuring anything. |
Signed-off-by: Alexander Schreiber <schreiberstein@gmail.com>
2135b29 to
2aec80b
Compare
smcv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with bind mounting sysfs surfaces when dealing with networking utilities like iproute2, ifupdown, bridge-utils (brctl) because they heavily rely on sysfs being in sync with the actual network namespace. Otherwise, I will see the host's network interfaces and not the container's.
Great, then please say so in the commit message!
| '--size[Set size in bytes for next action argument]: :->after_size' | ||
| '--symlink[Create symlink at DEST with target SRC]:symlink target:_files:symlink to create:_files:' | ||
| '--sync-fd[Keep this fd open while sandbox is running]: :_guard "[0-9]#" "file descriptor to keep open"' | ||
| '--sys[Mount new sysfs on DEST]:mount point for sysfs:_files -/' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps
| '--sys[Mount new sysfs on DEST]:mount point for sysfs:_files -/' | |
| '--sys[Mount new sysfs instance on DEST]:mount point for sysfs:_files -/' |
to emphasize that, like --proc, this is not the same thing as bind-mounting the existing /sys.
| " --file-label LABEL File label for temporary sandbox content\n" | ||
| " --proc DEST Mount new procfs on DEST\n" | ||
| " --dev DEST Mount new dev on DEST\n" | ||
| " --sys DEST Mount new sysfs on DEST\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| " --sys DEST Mount new sysfs on DEST\n" | |
| " --sys DEST Mount new sysfs instance on DEST\n" |
| if (ensure_dir (dest, 0755) != 0) | ||
| die_with_error ("Can't mkdir %s", op->dest); | ||
|
|
||
| if (unshare_pid || opt_pidns_fd != -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like something that should be addressed, particularly if --unshare-net is involved in your original use-case for wanting this feature.
(Of course, if you change --sys so that it unconditionally mounts a new sysfs instance, then there will be no need to have this check at all.)
| else | ||
| { | ||
| /* Use system sysfs, as we share pid namespace anyway */ | ||
| privileged_op (privileged_op_socket, | ||
| PRIV_SEP_OP_BIND_MOUNT, 0, 0, 0, | ||
| "oldroot/sys", dest); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems quite "do what I mean". I think it would be easier to document this option clearly if it always created a new sysfs instance, and never bind-mounted the existing /sys.
(Also easier to be sure that it's correct - if the option autodetects whether it needs a new instance or can keep the current instance, then that opens the possibility that the autodetection has bugs.)
| break; | ||
|
|
||
| case PRIV_SEP_OP_SYS_MOUNT: | ||
| if (mount ("sysfs", arg1, "sysfs", MS_NOSUID | MS_NOEXEC | MS_NODEV, NULL) != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is special about /sys/fs/selinux that it should always be bind-mounted automatically, especially if the other filesystems below /sys should not?
On my Debian system I have /sys/firmware/efi/efivars, /sys/kernel/security, /sys/fs/cgroup, /sys/fs/pstore, /sys/fs/bpf and more. Should those be bind-mounted too? If yes, why? If no, why not?
I think it might be better (and certainly easier to document) for --sys to only mount the sysfs, and if callers of bwrap (such as Flatpak or Glycin or whatever) want to bind-mount other /sys-based filesystems like /sys/fs/selinux, they can --bind them explicitly.
| </varlistentry> | ||
| <varlistentry> | ||
| <term><option>--sys <arg choice="plain">DEST</arg></option></term> | ||
| <listitem><para>Mount sysfs on <arg choice="plain">DEST</arg></para></listitem> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely needs to expand on what this option does, and why it isn't just --bind /sys DEST.
For example perhaps something like
| <listitem><para>Mount sysfs on <arg choice="plain">DEST</arg></para></listitem> | |
| <listitem><para>Mount a new instance of sysfs on <arg choice="plain">DEST</arg>. | |
| This is not necessarily the same as <literal>--bind /sys DEST</literal>: | |
| for example, if used with <arg>--unshare-net</arg>, | |
| the <filename>/sys/class/net</filename> in the new sysfs instance | |
| will reflect the new network namespace, | |
| whereas <literal>--bind /sys DEST</literal> would result in | |
| a <filename>/sys/class/net</filename> that describes the original network | |
| namespace.</para></listitem> |
If --sys has the side-effect of bind-mounting other filesystems (which I think it should not) then the man page should mention that.
Conversely, if --sys does not have the side-effect of bind-mounting other filesystems and instead leaves it up to the caller of bwrap, then it would be good to say something like
This does not mount any filesystems that might exist below the original
/sys, such as/sys/fs/selinuxor/sys/kernel/tracing. Those filesystems should be bind-mounted separately, using--bindor similar, if desired.
Greetings,
I needed this feature (sysfs) for my own use-case and decided to implement it.
My goal is to do networking experiments inside of a container, as well as to simply isolate / abstract different parts of my local networking setup.
Most networking utilities on Linux require /sys to exist.
Mounting it after the fact is not easily possible in the upstream version:
I first needed to bind-mount the host's "/sys/" to "/sys_host" via bubblewrap and only then was I allowed to mount sysfs inside the container (and only with "mount" from util-linux, not busybox!).
Otherwise, I received a "VFS: Mount too revealing" from the kernel (Debian Trixie) when executing:
mount -t sysfs sysfs /sys
Afterwards, it is possible to unmount /sys_host and use all networking utilities.
Is there a particular reason why this feature had not been implemented?
Maybe there is a security implication that I did not recognize, or it is simply not the scope of the project.
Thank you for the creation and maintenance of bubblewrap.
It is a fine piece of software...!
Sincerely,
Alexander Schreiber (schreiberstein)