Added --sys argument for mounting sysfs #699

schreiberstein · 2025-09-05T15:58:58Z

Greetings,

I needed this feature (sysfs) for my own use-case and decided to implement it.
My goal is to do networking experiments inside of a container, as well as to simply isolate / abstract different parts of my local networking setup.
Most networking utilities on Linux require /sys to exist.

Mounting it after the fact is not easily possible in the upstream version:
I first needed to bind-mount the host's "/sys/" to "/sys_host" via bubblewrap and only then was I allowed to mount sysfs inside the container (and only with "mount" from util-linux, not busybox!).
Otherwise, I received a "VFS: Mount too revealing" from the kernel (Debian Trixie) when executing:
mount -t sysfs sysfs /sys
Afterwards, it is possible to unmount /sys_host and use all networking utilities.

Is there a particular reason why this feature had not been implemented?
Maybe there is a security implication that I did not recognize, or it is simply not the scope of the project.

Thank you for the creation and maintenance of bubblewrap.
It is a fine piece of software...!

Sincerely,
Alexander Schreiber (schreiberstein)

rusty-snake · 2025-09-05T16:17:47Z

completions/bash/bwrap

 		--size
 		--symlink
 		--sync-fd
+                --sys


tabs vs. spaces

rusty-snake · 2025-09-05T16:18:21Z

bubblewrap.c


+    case PRIV_SEP_OP_SYS_MOUNT:
+      if (mount ("sysfs", arg1, "sysfs", MS_NOSUID | MS_NOEXEC | MS_NODEV, NULL) != 0)
+	die_with_mount_error ("Can't mount sys on %s", arg1);


tabs vs. spaces

rusty-snake · 2025-09-05T16:21:02Z

bubblewrap.c

+          if (ensure_dir (dest, 0755) != 0)
+            die_with_error ("Can't mkdir %s", op->dest);
+
+          if (unshare_pid || opt_pidns_fd != -1)


sysfs is also about unshare_net. Also check this with your mount is too revealing error. There is a difference between --unshare-user --share-net and --unshare-user --unshare-net for sysfs.

This seems like something that should be addressed, particularly if --unshare-net is involved in your original use-case for wanting this feature.

(Of course, if you change --sys so that it unconditionally mounts a new sysfs instance, then there will be no need to have this check at all.)

rusty-snake · 2025-09-05T16:24:15Z

bubblewrap.c

      break;

+    case PRIV_SEP_OP_SYS_MOUNT:
+      if (mount ("sysfs", arg1, "sysfs", MS_NOSUID | MS_NOEXEC | MS_NODEV, NULL) != 0)


Are you on a SELinux system? Because crablock will bind the host /sys/fs/selinux selinuxfs in the new sysfs. I need to check what was the exact reason for this, but it was the better choice.

Interesting.
No, I am on a quite minimal, 'debootstrapped' Debian without AppArmor or SELinux.
So that would mean that if /sys/fs/selinux exists, it should be mounted on top of the new /sys in the container?

So that would mean that if /sys/fs/selinux exists, it should be mounted on top of the new /sys in the container?

Yes. If the host sysfs has a /sys/fs/selinux mounted, --sys should bind mount it on top of the container sysfs.

I think this is the code over at crablock you are referencing:

https://codeberg.org/crabjail/crablock/src/commit/1a889d36177b0ff39e37a070405fb903c5f7e783/src/modules/mount_apifs.rs#L185

I guess I will have to spin up Fedora or something to test that.
As long as the container sysfs already contains the /sys/fs/selinux directory to serve as mountpoint, it should be smooth sailing.

Alright, I have adapted the code to consider /sys/fs/selinux and successfully tested this under the latest Fedora 42 (with SELinux) and Debian (without SELinux), also as unprivileged user.

This gets me:
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,nosuid,nodev,noexec,relatime)

Please, if you can, let me know what you think about it.

I test this with:

bwrap
--new-session
--die-with-parent
--as-pid-1
--unshare-all
--uid 0 --gid 0
--ro-bind ./rootfs /
--dev /dev
--proc /proc
--sys /sys
--hostname "alpine-test"
--clearenv
--setenv TERM "xterm-256color"
/bin/bash --login

And I generate the roofs with:
docker run --rm -v "$(pwd)/rootfs":/build alpine:3.22.1 sh -c 'apk --arch $(arch) -X https://dl-cdn.alpinelinux.org/alpine/latest-stable/main/ -X https://dl-cdn.alpinelinux.org/alpine/latest-stable/community/ -U --allow-untrusted --root /build --initdb add alpine-base bash iproute2 wireguard-tools net-tools htop psmisc bind-tools nano mg tcpdump netcat-openbsd iputils traceroute iptables strace util-linux'

What is special about /sys/fs/selinux that it should always be bind-mounted automatically, especially if the other filesystems below /sys should not?

On my Debian system I have /sys/firmware/efi/efivars, /sys/kernel/security, /sys/fs/cgroup, /sys/fs/pstore, /sys/fs/bpf and more. Should those be bind-mounted too? If yes, why? If no, why not?

I think it might be better (and certainly easier to document) for --sys to only mount the sysfs, and if callers of bwrap (such as Flatpak or Glycin or whatever) want to bind-mount other /sys-based filesystems like /sys/fs/selinux, they can --bind them explicitly.

smcv · 2025-09-06T16:26:18Z

What is the advantage of mounting a completely new instance of sysfs, instead of --[ro-]bind /sys /sys or a subset?

For example Flatpak does bwrap … --ro-bind /sys/block /sys/block, and the same for /sys/bus, /sys/class, /sys/dev and /sys/devices.

bwrap needs --proc to be a separate option (--bind /proc /proc is not enough) because when you unshare the pid namespace, the container's /proc is not the same as the host's /proc. But for any filesystem that is OK to be the same on the host and in the container, there is no need for a special option and you can just use the --{,ro-,-dev}bind{,-try} family of options.

schreiberstein · 2025-09-06T21:19:38Z

@smcv The problem with bind mounting sysfs surfaces when dealing with networking utilities like iproute2, ifupdown, bridge-utils (brctl) because they heavily rely on sysfs being in sync with the actual network namespace. Otherwise, I will see the host's network interfaces and not the container's. This leads to unintended side-effects. For example, I can create a bridge but I cannot manage it because for the managing operation, sysfs is accessed... and my newly created bridge is not in there! Or on Alpine, when using OpenRC as init and ifupdown-ng for bringing up interfaces, it just completely errors out when configuring anything.
So, I would be delighted to see that feature incorporated upstream.

Signed-off-by: Alexander Schreiber <schreiberstein@gmail.com>

smcv

The problem with bind mounting sysfs surfaces when dealing with networking utilities like iproute2, ifupdown, bridge-utils (brctl) because they heavily rely on sysfs being in sync with the actual network namespace. Otherwise, I will see the host's network interfaces and not the container's.

Great, then please say so in the commit message!

smcv · 2025-12-03T11:28:57Z

completions/zsh/_bwrap

    '--size[Set size in bytes for next action argument]: :->after_size'
    '--symlink[Create symlink at DEST with target SRC]:symlink target:_files:symlink to create:_files:'
    '--sync-fd[Keep this fd open while sandbox is running]: :_guard "[0-9]#" "file descriptor to keep open"'
+    '--sys[Mount new sysfs on DEST]:mount point for sysfs:_files -/'


Perhaps

Suggested change

'--sys[Mount new sysfs on DEST]:mount point for sysfs:_files -/'

'--sys[Mount new sysfs instance on DEST]:mount point for sysfs:_files -/'

to emphasize that, like --proc, this is not the same thing as bind-mounting the existing /sys.

smcv · 2025-12-03T11:29:19Z

bubblewrap.c

           "    --file-label LABEL           File label for temporary sandbox content\n"
           "    --proc DEST                  Mount new procfs on DEST\n"
           "    --dev DEST                   Mount new dev on DEST\n"
+           "    --sys DEST                   Mount new sysfs on DEST\n"


Suggested change

" --sys DEST Mount new sysfs on DEST\n"

" --sys DEST Mount new sysfs instance on DEST\n"

smcv · 2025-12-03T11:30:48Z

bubblewrap.c

+          if (ensure_dir (dest, 0755) != 0)
+            die_with_error ("Can't mkdir %s", op->dest);
+
+          if (unshare_pid || opt_pidns_fd != -1)


This seems like something that should be addressed, particularly if --unshare-net is involved in your original use-case for wanting this feature.

(Of course, if you change --sys so that it unconditionally mounts a new sysfs instance, then there will be no need to have this check at all.)

smcv · 2025-12-03T11:33:36Z

bubblewrap.c

+          else
+            {
+              /* Use system sysfs, as we share pid namespace anyway */
+              privileged_op (privileged_op_socket,
+                             PRIV_SEP_OP_BIND_MOUNT, 0, 0, 0,
+                             "oldroot/sys", dest);
+            }


This seems quite "do what I mean". I think it would be easier to document this option clearly if it always created a new sysfs instance, and never bind-mounted the existing /sys.

(Also easier to be sure that it's correct - if the option autodetects whether it needs a new instance or can keep the current instance, then that opens the possibility that the autodetection has bugs.)

smcv · 2025-12-03T11:36:44Z

bubblewrap.c

      break;

+    case PRIV_SEP_OP_SYS_MOUNT:
+      if (mount ("sysfs", arg1, "sysfs", MS_NOSUID | MS_NOEXEC | MS_NODEV, NULL) != 0)


What is special about /sys/fs/selinux that it should always be bind-mounted automatically, especially if the other filesystems below /sys should not?

On my Debian system I have /sys/firmware/efi/efivars, /sys/kernel/security, /sys/fs/cgroup, /sys/fs/pstore, /sys/fs/bpf and more. Should those be bind-mounted too? If yes, why? If no, why not?

I think it might be better (and certainly easier to document) for --sys to only mount the sysfs, and if callers of bwrap (such as Flatpak or Glycin or whatever) want to bind-mount other /sys-based filesystems like /sys/fs/selinux, they can --bind them explicitly.

smcv · 2025-12-03T11:43:13Z

bwrap.xml

    </varlistentry>
+    <varlistentry>
+      <term><option>--sys <arg choice="plain">DEST</arg></option></term>
+      <listitem><para>Mount sysfs on <arg choice="plain">DEST</arg></para></listitem>


This definitely needs to expand on what this option does, and why it isn't just --bind /sys DEST.

For example perhaps something like

Suggested change

<listitem><para>Mount sysfs on <arg choice="plain">DEST</arg></para></listitem>

<listitem><para>Mount a new instance of sysfs on <arg choice="plain">DEST</arg>.

This is not necessarily the same as <literal>--bind /sys DEST</literal>:

for example, if used with <arg>--unshare-net</arg>,

the <filename>/sys/class/net</filename> in the new sysfs instance

will reflect the new network namespace,

whereas <literal>--bind /sys DEST</literal> would result in

a <filename>/sys/class/net</filename> that describes the original network

namespace.</para></listitem>

If --sys has the side-effect of bind-mounting other filesystems (which I think it should not) then the man page should mention that.

Conversely, if --sys does not have the side-effect of bind-mounting other filesystems and instead leaves it up to the caller of bwrap, then it would be good to say something like

This does not mount any filesystems that might exist below the original /sys, such as /sys/fs/selinux or /sys/kernel/tracing. Those filesystems should be bind-mounted separately, using --bind or similar, if desired.

rusty-snake reviewed Sep 5, 2025

View reviewed changes

Added --sys argument for mounting sysfs

2aec80b

Signed-off-by: Alexander Schreiber <schreiberstein@gmail.com>

schreiberstein force-pushed the sysfs branch from 2135b29 to 2aec80b Compare September 6, 2025 21:52

smcv requested changes Dec 3, 2025

View reviewed changes

	'--sys[Mount new sysfs on DEST]:mount point for sysfs:_files -/'
	'--sys[Mount new sysfs instance on DEST]:mount point for sysfs:_files -/'

	" --sys DEST Mount new sysfs on DEST\n"
	" --sys DEST Mount new sysfs instance on DEST\n"

-      <listitem><para>Mount sysfs on <arg choice="plain">DEST</arg></para></listitem>
+      <listitem><para>Mount a new instance of sysfs on <arg choice="plain">DEST</arg>.
+        This is not necessarily the same as <literal>--bind /sys DEST</literal>:
+        for example, if used with <arg>--unshare-net</arg>,
+        the <filename>/sys/class/net</filename> in the new sysfs instance
+        will reflect the new network namespace,
+        whereas <literal>--bind /sys DEST</literal> would result in
+        a <filename>/sys/class/net</filename> that describes the original network
+        namespace.</para></listitem>

Added --sys argument for mounting sysfs #699

Are you sure you want to change the base?

Added --sys argument for mounting sysfs #699

Uh oh!

Conversation

schreiberstein commented Sep 5, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rusty-snake Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smcv commented Sep 6, 2025

Uh oh!

schreiberstein commented Sep 6, 2025

Uh oh!

smcv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rusty-snake Sep 5, 2025 •

edited

Loading