-
Notifications
You must be signed in to change notification settings - Fork 117
Support kill-priv flags in init handshake #188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This commit adds comprehensive integration tests for FUSE HANDLE_KILLPRIV_V2 support, which allows the kernel to delegate privilege bit clearing to the filesystem when files are written, truncated, or have their ownership changed. The tests verify: - Creating files in setgid directories - Opening setuid files for write - Writing to setuid files - Truncating setuid files - Chown operations on setuid files - Normal operations don't incorrectly set KillSuidgid flags The test filesystem uses helper methods (AddTestFile/AddTestDir) to bypass normal FUSE operations when setting up test scenarios with specific permission bits. This avoids the need for a fully-functional chmod implementation. Tests that require HANDLE_KILLPRIV_V2 support (Linux kernel >= 5.12) are automatically skipped on older kernels with informative messages. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Key fixes: - Added DisableWritebackCaching=true (critical for write operations to reach filesystem immediately with KillSuidgid flags set) - Fixed CreateFile test to use shell redirection (O_CREAT|O_TRUNC scenario) - Renamed and fixed OpenFile test - O_TRUNC splits into OpenFile + SetInodeAttributes(size=0) - Fixed SetInodeAttributes test expectations (kernel always sets flag, filesystem decides action) - Cleaned up comments to be concise while explaining unexpected behaviors All 9 tests now pass on Linux 5.15. The key insight is that KILLPRIV_V2 requires DisableWritebackCaching=true for write operations. With writeback caching enabled, the kernel buffers writes in the page cache and KillSuidgid flags don't reach the filesystem until later.
Add helper functions to eliminate duplication: - skipIfNotRoot: Centralized root check with consistent messaging - skipIfKernelTooOld: Centralized kernel version check - runAsNobody: Extracted su command execution pattern Benefits: - Reduced file from 300 to 278 lines (22 lines saved) - Eliminated repetitive skip logic across 7 tests - Simplified su command execution in 5 tests - Easier to maintain and review - All tests still pass
The kernel always sets the KillSuidgid flag for SetInodeAttributesOp, regardless of CAP_FSETID or file mode. The filesystem is responsible for checking OpContext.Uid and file mode to decide whether to clear privilege bits. This was confirmed through integration testing on Linux 5.15.
Removed comments that just repeat what the function name says: - 'NewKillPrivFS creates a new KillPrivFS' → removed entirely - 'skipIfNotRoot skips the test if not running as root' → removed entirely - 'skipIfKernelTooOld skips the test...' → removed entirely - 'runAsNobody executes a shell command...' → removed entirely Improved comments to explain WHY rather than WHAT: - AddTestFile/AddTestDir: Now emphasize they bypass normal FUSE operations - kernelSupportsKillprivV2: More concise, keeps the important '5.12' context Comments should explain unexpected behaviors and design decisions, not restate what's already obvious from the code.
Fixed architectural flaw where all files shared a single fileData array. Now each inode has its own data array, making the filesystem correct. Changes: - Changed inodeInfo to use pointer type (map[uint64]*inodeInfo) - Added 'data []byte' field to inodeInfo struct - Removed global 'fileData []byte' from KillPrivFS - Updated all operations to use per-inode data (info.data) - Simplified SetInodeAttributes by removing unnecessary re-fetch Benefits: - Files now have independent data storage (correct behavior) - Cleaner code: no need to re-assign inodes after mutation - Reduced from 403 to 397 lines (6 lines saved) - All tests pass
Updated documentation to match Linux kernel implementation precisely:
1. Clarified privilege bit clearing rules:
- security.capability: always cleared on chown/write/truncate
- setuid: always cleared on chown; conditional on write/truncate
- setgid: always cleared on chown; conditional on write/truncate
(requires both CAP_FSETID check AND group execute permission)
2. Added CRITICAL warning about writeback caching:
- KILLPRIV_V2 relies on WriteFileOp reaching the server
- Writeback caching buffers writes in kernel page cache
- WriteFileOp may not be sent immediately (or at all until flush)
- Result: privilege bits may not be cleared when they should be
- Recommendation: Set DisableWritebackCaching=true when using V2
3. Simplified flag behavior descriptions for clarity
Based on kernel commit message:
torvalds/linux@63f9909
Added comprehensive testing for O_TRUNC behavior with KILLPRIV_V2: 1. Updated existing test comment to clarify default behavior: - Without EnableAtomicTrunc: O_TRUNC splits into OpenFile + SetInodeAttributes - KillSuidgid flag set on SetInodeAttributes operation 2. Added new test suite (KillPrivFSAtomicTruncTest) to verify atomic trunc: - Enables EnableAtomicTrunc mount option - Verifies O_TRUNC sends single OpenFile operation - Confirms KillSuidgid flag set on OpenFile (not SetInodeAttributes) This ensures both O_TRUNC modes work correctly with KILLPRIV_V2 support. Test coverage: 10 tests now pass (added 1 new test)
Applied gofmt to align constant declarations and struct fields: - Aligned SetattrValid constants and helper methods - Aligned WriteFlags constants - Aligned test struct fields in killpriv_test.go No functional changes, only whitespace alignment.
|
fyi @jacobsa, requesting to add support for HandleKillPriv and HandleKillPrivV2. The implementation is fairly straightforward but the exact behavior of kernel interactions with the fuse daemon when HandleKillPrivV2 is enabled is a bit strange. I tried to add sufficient test coverage and documentation to cover the relevant interactions. Please take a look when you have some free time! |
cc @stapelberg |
cc @geertj |
By default, the kernel will send a GetXAttrs("security.capabilities") request to the FUSE daemon before every Write operation. It does this because the kernel may need to clear setuid/setgid bits and security capabilities when a file is written. This incurs a performance cost because of the extra round trips between kernel and user space when calling GetXAttrs.
Some filesystems do not even support security capabilities. Setting the HandleKillPriv flags allows those filesystems to avoid the extra GetXAttrs calls.
Another option is to return ENOSYS for the GetXAttr call, but that also means that the filesystem cannot support GetXAttr calls for any attribute at all.
This change allows filesystems to make use of the HandleKillPriv flags.