Skip to content

Conversation

@sbauersfeld
Copy link
Contributor

By default, the kernel will send a GetXAttrs("security.capabilities") request to the FUSE daemon before every Write operation. It does this because the kernel may need to clear setuid/setgid bits and security capabilities when a file is written. This incurs a performance cost because of the extra round trips between kernel and user space when calling GetXAttrs.

Some filesystems do not even support security capabilities. Setting the HandleKillPriv flags allows those filesystems to avoid the extra GetXAttrs calls.

Another option is to return ENOSYS for the GetXAttr call, but that also means that the filesystem cannot support GetXAttr calls for any attribute at all.

This change allows filesystems to make use of the HandleKillPriv flags.

sbauersfeld and others added 19 commits January 21, 2026 18:03
This commit adds comprehensive integration tests for FUSE HANDLE_KILLPRIV_V2
support, which allows the kernel to delegate privilege bit clearing to the
filesystem when files are written, truncated, or have their ownership changed.

The tests verify:
- Creating files in setgid directories
- Opening setuid files for write
- Writing to setuid files
- Truncating setuid files
- Chown operations on setuid files
- Normal operations don't incorrectly set KillSuidgid flags

The test filesystem uses helper methods (AddTestFile/AddTestDir) to bypass
normal FUSE operations when setting up test scenarios with specific permission
bits. This avoids the need for a fully-functional chmod implementation.

Tests that require HANDLE_KILLPRIV_V2 support (Linux kernel >= 5.12) are
automatically skipped on older kernels with informative messages.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Key fixes:
- Added DisableWritebackCaching=true (critical for write operations to reach
  filesystem immediately with KillSuidgid flags set)
- Fixed CreateFile test to use shell redirection (O_CREAT|O_TRUNC scenario)
- Renamed and fixed OpenFile test - O_TRUNC splits into OpenFile + SetInodeAttributes(size=0)
- Fixed SetInodeAttributes test expectations (kernel always sets flag, filesystem decides action)
- Cleaned up comments to be concise while explaining unexpected behaviors

All 9 tests now pass on Linux 5.15.

The key insight is that KILLPRIV_V2 requires DisableWritebackCaching=true for
write operations. With writeback caching enabled, the kernel buffers writes in
the page cache and KillSuidgid flags don't reach the filesystem until later.
Add helper functions to eliminate duplication:
- skipIfNotRoot: Centralized root check with consistent messaging
- skipIfKernelTooOld: Centralized kernel version check
- runAsNobody: Extracted su command execution pattern

Benefits:
- Reduced file from 300 to 278 lines (22 lines saved)
- Eliminated repetitive skip logic across 7 tests
- Simplified su command execution in 5 tests
- Easier to maintain and review
- All tests still pass
The kernel always sets the KillSuidgid flag for SetInodeAttributesOp,
regardless of CAP_FSETID or file mode. The filesystem is responsible
for checking OpContext.Uid and file mode to decide whether to clear
privilege bits.

This was confirmed through integration testing on Linux 5.15.
Removed comments that just repeat what the function name says:
- 'NewKillPrivFS creates a new KillPrivFS' → removed entirely
- 'skipIfNotRoot skips the test if not running as root' → removed entirely
- 'skipIfKernelTooOld skips the test...' → removed entirely
- 'runAsNobody executes a shell command...' → removed entirely

Improved comments to explain WHY rather than WHAT:
- AddTestFile/AddTestDir: Now emphasize they bypass normal FUSE operations
- kernelSupportsKillprivV2: More concise, keeps the important '5.12' context

Comments should explain unexpected behaviors and design decisions, not
restate what's already obvious from the code.
Fixed architectural flaw where all files shared a single fileData array.
Now each inode has its own data array, making the filesystem correct.

Changes:
- Changed inodeInfo to use pointer type (map[uint64]*inodeInfo)
- Added 'data []byte' field to inodeInfo struct
- Removed global 'fileData []byte' from KillPrivFS
- Updated all operations to use per-inode data (info.data)
- Simplified SetInodeAttributes by removing unnecessary re-fetch

Benefits:
- Files now have independent data storage (correct behavior)
- Cleaner code: no need to re-assign inodes after mutation
- Reduced from 403 to 397 lines (6 lines saved)
- All tests pass
Updated documentation to match Linux kernel implementation precisely:

1. Clarified privilege bit clearing rules:
   - security.capability: always cleared on chown/write/truncate
   - setuid: always cleared on chown; conditional on write/truncate
   - setgid: always cleared on chown; conditional on write/truncate
     (requires both CAP_FSETID check AND group execute permission)

2. Added CRITICAL warning about writeback caching:
   - KILLPRIV_V2 relies on WriteFileOp reaching the server
   - Writeback caching buffers writes in kernel page cache
   - WriteFileOp may not be sent immediately (or at all until flush)
   - Result: privilege bits may not be cleared when they should be
   - Recommendation: Set DisableWritebackCaching=true when using V2

3. Simplified flag behavior descriptions for clarity

Based on kernel commit message:
torvalds/linux@63f9909
Added comprehensive testing for O_TRUNC behavior with KILLPRIV_V2:

1. Updated existing test comment to clarify default behavior:
   - Without EnableAtomicTrunc: O_TRUNC splits into OpenFile + SetInodeAttributes
   - KillSuidgid flag set on SetInodeAttributes operation

2. Added new test suite (KillPrivFSAtomicTruncTest) to verify atomic trunc:
   - Enables EnableAtomicTrunc mount option
   - Verifies O_TRUNC sends single OpenFile operation
   - Confirms KillSuidgid flag set on OpenFile (not SetInodeAttributes)

This ensures both O_TRUNC modes work correctly with KILLPRIV_V2 support.

Test coverage: 10 tests now pass (added 1 new test)
Applied gofmt to align constant declarations and struct fields:
- Aligned SetattrValid constants and helper methods
- Aligned WriteFlags constants
- Aligned test struct fields in killpriv_test.go

No functional changes, only whitespace alignment.
@sbauersfeld
Copy link
Contributor Author

fyi @jacobsa, requesting to add support for HandleKillPriv and HandleKillPrivV2. The implementation is fairly straightforward but the exact behavior of kernel interactions with the fuse daemon when HandleKillPrivV2 is enabled is a bit strange. I tried to add sufficient test coverage and documentation to cover the relevant interactions. Please take a look when you have some free time!

@sbauersfeld
Copy link
Contributor Author

fyi @jacobsa, requesting to add support for HandleKillPriv and HandleKillPrivV2. The implementation is fairly straightforward but the exact behavior of kernel interactions with the fuse daemon when HandleKillPrivV2 is enabled is a bit strange. I tried to add sufficient test coverage and documentation to cover the relevant interactions. Please take a look when you have some free time!

cc @stapelberg

@sbauersfeld
Copy link
Contributor Author

fyi @jacobsa, requesting to add support for HandleKillPriv and HandleKillPrivV2. The implementation is fairly straightforward but the exact behavior of kernel interactions with the fuse daemon when HandleKillPrivV2 is enabled is a bit strange. I tried to add sufficient test coverage and documentation to cover the relevant interactions. Please take a look when you have some free time!

cc @stapelberg

cc @geertj

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant