Releases: go-webgpu/goffi
v0.3.7 - ARM64 Darwin Comprehensive Support
ARM64 Darwin Comprehensive Support
This release adds comprehensive ARM64 darwin (Apple Silicon) support, tested on M3 Pro.
Added
-
ARM64 Darwin comprehensive support (PR #9 by @ppoage)
- Tested on Apple Silicon M3 Pro (64 ns/op benchmark)
- Nested struct handling via
placeStructRegisters() - Mixed int/float struct support via
countStructRegUsage() ensureStructLayout()for auto-computing size/alignment- Assembly shim (
abi_capture_test.s) for ABI verification - Comprehensive darwin ObjC tests (747 lines)
- Struct argument tests (537 lines)
-
r2 (X1) return for 9-16 byte struct returns
Call8Floatnow returns both X0 and X1- Fixes struct returns between 9-16 bytes on ARM64
-
uint64 bit patterns for float registers
- Cleaner handling of mixed float32/float64 arguments
Fixed
- BenchmarkGoffiStringOutput segfault on darwin
- Pointer argument now correctly passed as
unsafe.Pointer(&strPtr)
- Pointer argument now correctly passed as
Contributors
- @ppoage - ARM64 Darwin fixes, ObjC tests, assembly shim
Full Changelog: v0.3.6...v0.3.7
v0.3.6 - ARM64 HFA/Large Struct Return Fix
Critical Fixes for ARM64 (Apple Silicon M1/M2/M3/M4)
Fixed
-
ARM64 HFA (Homogeneous Floating-point Aggregate) returns
- NSRect (4 × float64) returned zeros on Apple Silicon
- Root cause: assembly only saved D0-D1, HFA needs D0-D3
- Solution: save all 4 float registers for HFA returns
-
ARM64 large struct return via X8 (sret)
- Non-HFA structs >16 bytes returned via implicit pointer in X8
- Root cause: X8 register never loaded before function call
- Solution: load rvalue pointer into X8 for sret calls
Added
ReturnHFA2,ReturnHFA3,ReturnHFA4return flag constantshandleHFAReturnfunction for processing HFA struct returns- Unit tests for ARM64 HFA classification
Technical Details
- AAPCS64: HFA structs with 1-4 same-type floats return in D0-D3
- AAPCS64: Large non-HFA structs (>16 bytes) return via hidden pointer in X8
- NSRect = CGRect = 4 × float64 = 32 bytes = HFA (returns in D0-D3)
Impact
- Fixes blank window issue on macOS ARM64 (GPU window size was 0×0)
- Fixes gogpu#24
Full Changelog: v0.3.5...v0.3.6
v0.3.5 - Windows Stack Arguments Fix
Fixed
- Windows stack arguments not implemented (Critical)
- Functions with >4 arguments caused
panic: stack arguments not implemented - Win64 ABI: first 4 args in registers (RCX/RDX/R8/R9), args 5+ on stack
- Solution: Use
syscall.SyscallNwith variadic args for unlimited argument support - Affected Vulkan functions:
vkCreateGraphicsPipelines(6 args),vkCmdBindVertexBuffers(5 args), etc.
- Functions with >4 arguments caused
Changed
- Simplified Windows FFI - removed intermediate syscall wrapper
- Removed:
internal/syscall/syscall_windows_amd64.go call_windows.gonow callssyscall.SyscallNdirectly withargs...- Cleaner code, fewer indirections
- Removed:
Technical Details
syscall.SyscallN(fn, args...)supports up to 15+ arguments- Handles both register (1-4) and stack (5+) arguments automatically
- Same approach used by purego for Windows FFI
Full Changelog
v0.3.4 - Windows Stack Overflow Fix
Fixed
- Windows stack overflow on Vulkan API calls (Critical)
callWin64assembly usedNOSPLIT, $32- prevented Go runtime stack growth- Solution: Replace with
syscall.SyscallN(Go runtime's asmstdcall mechanism) - Matches purego's proven approach for Windows FFI
Changed
- Windows FFI architecture refactored
- Removed:
internal/arch/amd64/call_windows.s - Added:
internal/syscall/syscall_windows_amd64.go - Uses Go runtime's built-in stack management
- Removed:
Technical Details
The custom Windows assembly used NOSPLIT directive which prevents Go runtime from growing the goroutine stack. When C functions (especially Vulkan/WebGPU APIs) require more stack space than the fixed 32 bytes, this caused STACK_OVERFLOW (Exception 0xc00000fd).
The fix uses syscall.SyscallN which internally leverages runtime.cgocall + asmstdcall, properly managing stack growth through Go runtime.
Full Changelog
v0.3.3 - PointerType Bug Fix
Summary
Hotfix for critical PointerType argument passing bug (#4)
PointerType was passing address instead of value in all Execute implementations.
Fixed
- PointerType dereference - Now correctly uses
*(*uintptr)(avalue[idx])instead ofuintptr(avalue[idx]) - Affected files:
internal/arch/arm64/call_unix.gointernal/arch/amd64/call_unix.gointernal/arch/amd64/call_windows.go
- Additional fixes:
- Added missing SInt8/UInt8/SInt16/UInt16 type handling in AMD64 Unix
- Fixed float32 handling in Windows (was treating as float64)
Added
- Regression tests for argument passing:
TestPointerArgumentPassing- strlen-based test for PointerTypeTestIntegerArgumentTypes- abs-based test for integer types
- Both tests use documented API pattern:
[]unsafe.Pointer{unsafe.Pointer(&arg)}
Technical Details
The API contract (ffi.go line 43) specifies that avalue contains pointers TO argument values:
[]unsafe.Pointer{unsafe.Pointer(&arg)}For PointerType, this means we receive &ptr and must dereference to get ptr.
Upgrade
go get github.com/go-webgpu/goffi@v0.3.3Full Changelog
goffi v0.3.2 - ARM64 HFA Classification Fix
What's Changed
Fixed
- ARM64 HFA (Homogeneous Floating-point Aggregate) classification bug
- HFA structs (e.g.,
NSRectwith 4 doubles) were incorrectly passed by reference - Now correctly passed in floating-point registers D0-D7 per AAPCS64 ABI
- Fix: Check HFA status before struct size in
classifyArgumentARM64() - Affects Objective-C runtime calls on Apple Silicon (M1/M2/M3/M4)
- HFA structs (e.g.,
Technical Details
- AAPCS64 requires HFA detection before size-based classification
- Example:
NSRect(4 × float64 = 32 bytes) is HFA → uses D0-D3, not reference - Critical for macOS ARM64 Objective-C interop
CI Improvements
- Remove deprecated
macos-13runner (deprecated Dec 2025) - Add
macos-latestfor ARM64 Apple Silicon testing - ARM64 coverage informational only (cross-compile verified status)
Platform Support
| Platform | Architecture | Status |
|---|---|---|
| Linux | AMD64 | ✅ Production (tested) |
| Windows | AMD64 | ✅ Production (tested) |
| macOS | ARM64 | ✅ Cross-compile verified (CI tested) |
| macOS | Intel | ✅ Production (same ABI as Linux) |
| Linux | ARM64 | 🟡 Cross-compile verified |
Installation
go get github.com/go-webgpu/goffi@v0.3.2Full Changelog
v0.3.1 - ARM64 Build Constraints Hotfix
Hotfix
Fixes ARM64 build constraints for dynamic library loading functions.
Fixed
dl_unix.go: Added ARM64 supportdl_darwin.go: Added ARM64 supportstubs/caller.go: Exclude ARM64 from stubs
Bug Fixed
undefined: ffi.LoadLibrary on ARM64 platforms (macOS Apple Silicon, Linux ARM64)
Reported by
go-webgpu project
Installation
go get github.com/go-webgpu/goffi@v0.3.1
Full Changelog: v0.3.0...v0.3.1
v0.3.0 - ARM64 Architecture Support
What's New
ARM64 architecture support (AAPCS64 ABI for Linux and macOS)
Features
- Complete ARM64 implementation with AAPCS64 calling convention
- 2000-entry callback trampolines for ARM64
- X0-X7 integer registers, D0-D7 floating-point registers
- Homogeneous Floating-point Aggregate (HFA) detection
- Cross-compile verified
Platform Support
| Platform | Architecture | Status |
|---|---|---|
| Linux | AMD64 | Fully supported |
| Windows | AMD64 | Fully supported |
| macOS | AMD64 | Fully supported |
| Linux | ARM64 | Cross-compile verified |
| macOS | ARM64 | Cross-compile verified |
Installation
go get github.com/go-webgpu/goffi@v0.3.0
Note
ARM64 support is feature-complete but awaiting real hardware testing.
Full Changelog: v0.2.1...v0.3.0
goffi v0.2.1 - Windows Callback Hotfix
Fixes
- Windows callbacks: Use
syscall.NewCallbackfor correct Win64 ABI - Documentation: Add SEH/C++ exception limitation (Go #12516)
Windows Callback Requirements
// Windows requires exactly one uintptr-sized return value
cb := ffi.NewCallback(func(a, b, c, d uintptr) uintptr {
return 0
})Supported types: int64, uint64, uintptr, pointers
NOT supported on Windows: int8/16/32, float32/64, void return
Full Changelog: v0.2.0...v0.2.1
goffi v0.2.0 - Callback Support
🎉 goffi v0.2.0 - Callback Support
This release adds callback support for C-to-Go function calls, enabling WebGPU async APIs and other callback-based C libraries.
✨ New Features
NewCallbackAPI - Register Go functions as C callbackscb := ffi.NewCallback(func(status int, adapter uintptr) { // Handle async result })
- 2000-entry trampoline table - Pre-compiled assembly for optimal performance
- Thread-safe callback registry - Mutex-protected callback storage
- Full ABI support - System V AMD64 (Linux, macOS) and Win64 (Windows)
📦 Supported Argument Types
- Integers:
int,int8,int16,int32,int64 - Unsigned:
uint,uint8,uint16,uint32,uint64,uintptr - Floats:
float32,float64 - Pointers:
*T,unsafe.Pointer - Boolean:
bool
🎯 Use Case: WebGPU Async Operations
// Create callback for wgpuInstanceRequestAdapter
cb := ffi.NewCallback(func(status int, adapter uintptr, msg uintptr, ud uintptr) {
result := (*adapterResult)(unsafe.Pointer(ud))
result.status = status
result.adapter = adapter
close(result.done)
})
// Pass to C function
ffi.CallFunction(&cif, wgpuRequestAdapter, nil,
[]unsafe.Pointer{&instance, &opts, &cb, &userdata})⚠️ Known Limitations
- Maximum 2000 callbacks per process (memory never released)
- Complex types (string, slice, map, chan, interface) not supported
- Callbacks must return at most one value
📊 Test Coverage
- 20 comprehensive tests covering all scenarios
- Thread safety validation
- Stack argument handling
- All supported types tested
Full Changelog: v0.1.1...v0.2.0