Skip to content

Releases: go-webgpu/goffi

v0.3.7 - ARM64 Darwin Comprehensive Support

03 Jan 07:18

Choose a tag to compare

ARM64 Darwin Comprehensive Support

This release adds comprehensive ARM64 darwin (Apple Silicon) support, tested on M3 Pro.

Added

  • ARM64 Darwin comprehensive support (PR #9 by @ppoage)

    • Tested on Apple Silicon M3 Pro (64 ns/op benchmark)
    • Nested struct handling via placeStructRegisters()
    • Mixed int/float struct support via countStructRegUsage()
    • ensureStructLayout() for auto-computing size/alignment
    • Assembly shim (abi_capture_test.s) for ABI verification
    • Comprehensive darwin ObjC tests (747 lines)
    • Struct argument tests (537 lines)
  • r2 (X1) return for 9-16 byte struct returns

    • Call8Float now returns both X0 and X1
    • Fixes struct returns between 9-16 bytes on ARM64
  • uint64 bit patterns for float registers

    • Cleaner handling of mixed float32/float64 arguments

Fixed

  • BenchmarkGoffiStringOutput segfault on darwin
    • Pointer argument now correctly passed as unsafe.Pointer(&strPtr)

Contributors

  • @ppoage - ARM64 Darwin fixes, ObjC tests, assembly shim

Full Changelog: v0.3.6...v0.3.7

v0.3.6 - ARM64 HFA/Large Struct Return Fix

29 Dec 09:38
da2f8f7

Choose a tag to compare

Critical Fixes for ARM64 (Apple Silicon M1/M2/M3/M4)

Fixed

  • ARM64 HFA (Homogeneous Floating-point Aggregate) returns

    • NSRect (4 × float64) returned zeros on Apple Silicon
    • Root cause: assembly only saved D0-D1, HFA needs D0-D3
    • Solution: save all 4 float registers for HFA returns
  • ARM64 large struct return via X8 (sret)

    • Non-HFA structs >16 bytes returned via implicit pointer in X8
    • Root cause: X8 register never loaded before function call
    • Solution: load rvalue pointer into X8 for sret calls

Added

  • ReturnHFA2, ReturnHFA3, ReturnHFA4 return flag constants
  • handleHFAReturn function for processing HFA struct returns
  • Unit tests for ARM64 HFA classification

Technical Details

  • AAPCS64: HFA structs with 1-4 same-type floats return in D0-D3
  • AAPCS64: Large non-HFA structs (>16 bytes) return via hidden pointer in X8
  • NSRect = CGRect = 4 × float64 = 32 bytes = HFA (returns in D0-D3)

Impact

  • Fixes blank window issue on macOS ARM64 (GPU window size was 0×0)
  • Fixes gogpu#24

Full Changelog: v0.3.5...v0.3.6

v0.3.5 - Windows Stack Arguments Fix

27 Dec 11:26
934decc

Choose a tag to compare

Fixed

  • Windows stack arguments not implemented (Critical)
    • Functions with >4 arguments caused panic: stack arguments not implemented
    • Win64 ABI: first 4 args in registers (RCX/RDX/R8/R9), args 5+ on stack
    • Solution: Use syscall.SyscallN with variadic args for unlimited argument support
    • Affected Vulkan functions: vkCreateGraphicsPipelines (6 args), vkCmdBindVertexBuffers (5 args), etc.

Changed

  • Simplified Windows FFI - removed intermediate syscall wrapper
    • Removed: internal/syscall/syscall_windows_amd64.go
    • call_windows.go now calls syscall.SyscallN directly with args...
    • Cleaner code, fewer indirections

Technical Details

  • syscall.SyscallN(fn, args...) supports up to 15+ arguments
  • Handles both register (1-4) and stack (5+) arguments automatically
  • Same approach used by purego for Windows FFI

Full Changelog

v0.3.4...v0.3.5

v0.3.4 - Windows Stack Overflow Fix

27 Dec 10:55
9f9c843

Choose a tag to compare

Fixed

  • Windows stack overflow on Vulkan API calls (Critical)
    • callWin64 assembly used NOSPLIT, $32 - prevented Go runtime stack growth
    • Solution: Replace with syscall.SyscallN (Go runtime's asmstdcall mechanism)
    • Matches purego's proven approach for Windows FFI

Changed

  • Windows FFI architecture refactored
    • Removed: internal/arch/amd64/call_windows.s
    • Added: internal/syscall/syscall_windows_amd64.go
    • Uses Go runtime's built-in stack management

Technical Details

The custom Windows assembly used NOSPLIT directive which prevents Go runtime from growing the goroutine stack. When C functions (especially Vulkan/WebGPU APIs) require more stack space than the fixed 32 bytes, this caused STACK_OVERFLOW (Exception 0xc00000fd).

The fix uses syscall.SyscallN which internally leverages runtime.cgocall + asmstdcall, properly managing stack growth through Go runtime.

Full Changelog

v0.3.3...v0.3.4

v0.3.3 - PointerType Bug Fix

24 Dec 14:23
a935222

Choose a tag to compare

Summary

Hotfix for critical PointerType argument passing bug (#4)

PointerType was passing address instead of value in all Execute implementations.

Fixed

  • PointerType dereference - Now correctly uses *(*uintptr)(avalue[idx]) instead of uintptr(avalue[idx])
  • Affected files:
    • internal/arch/arm64/call_unix.go
    • internal/arch/amd64/call_unix.go
    • internal/arch/amd64/call_windows.go
  • Additional fixes:
    • Added missing SInt8/UInt8/SInt16/UInt16 type handling in AMD64 Unix
    • Fixed float32 handling in Windows (was treating as float64)

Added

  • Regression tests for argument passing:
    • TestPointerArgumentPassing - strlen-based test for PointerType
    • TestIntegerArgumentTypes - abs-based test for integer types
  • Both tests use documented API pattern: []unsafe.Pointer{unsafe.Pointer(&arg)}

Technical Details

The API contract (ffi.go line 43) specifies that avalue contains pointers TO argument values:

[]unsafe.Pointer{unsafe.Pointer(&arg)}

For PointerType, this means we receive &ptr and must dereference to get ptr.

Upgrade

go get github.com/go-webgpu/goffi@v0.3.3

Full Changelog

v0.3.2...v0.3.3

goffi v0.3.2 - ARM64 HFA Classification Fix

23 Dec 16:58
3e057f3

Choose a tag to compare

What's Changed

Fixed

  • ARM64 HFA (Homogeneous Floating-point Aggregate) classification bug
    • HFA structs (e.g., NSRect with 4 doubles) were incorrectly passed by reference
    • Now correctly passed in floating-point registers D0-D7 per AAPCS64 ABI
    • Fix: Check HFA status before struct size in classifyArgumentARM64()
    • Affects Objective-C runtime calls on Apple Silicon (M1/M2/M3/M4)

Technical Details

  • AAPCS64 requires HFA detection before size-based classification
  • Example: NSRect (4 × float64 = 32 bytes) is HFA → uses D0-D3, not reference
  • Critical for macOS ARM64 Objective-C interop

CI Improvements

  • Remove deprecated macos-13 runner (deprecated Dec 2025)
  • Add macos-latest for ARM64 Apple Silicon testing
  • ARM64 coverage informational only (cross-compile verified status)

Platform Support

Platform Architecture Status
Linux AMD64 ✅ Production (tested)
Windows AMD64 ✅ Production (tested)
macOS ARM64 ✅ Cross-compile verified (CI tested)
macOS Intel ✅ Production (same ABI as Linux)
Linux ARM64 🟡 Cross-compile verified

Installation

go get github.com/go-webgpu/goffi@v0.3.2

Full Changelog

v0.3.1...v0.3.2

v0.3.1 - ARM64 Build Constraints Hotfix

28 Nov 08:16

Choose a tag to compare

Hotfix

Fixes ARM64 build constraints for dynamic library loading functions.

Fixed

  • dl_unix.go: Added ARM64 support
  • dl_darwin.go: Added ARM64 support
  • stubs/caller.go: Exclude ARM64 from stubs

Bug Fixed

undefined: ffi.LoadLibrary on ARM64 platforms (macOS Apple Silicon, Linux ARM64)

Reported by

go-webgpu project

Installation

go get github.com/go-webgpu/goffi@v0.3.1

Full Changelog: v0.3.0...v0.3.1

v0.3.0 - ARM64 Architecture Support

28 Nov 07:59

Choose a tag to compare

What's New

ARM64 architecture support (AAPCS64 ABI for Linux and macOS)

Features

  • Complete ARM64 implementation with AAPCS64 calling convention
  • 2000-entry callback trampolines for ARM64
  • X0-X7 integer registers, D0-D7 floating-point registers
  • Homogeneous Floating-point Aggregate (HFA) detection
  • Cross-compile verified

Platform Support

Platform Architecture Status
Linux AMD64 Fully supported
Windows AMD64 Fully supported
macOS AMD64 Fully supported
Linux ARM64 Cross-compile verified
macOS ARM64 Cross-compile verified

Installation

go get github.com/go-webgpu/goffi@v0.3.0

Note

ARM64 support is feature-complete but awaiting real hardware testing.

Full Changelog: v0.2.1...v0.3.0

goffi v0.2.1 - Windows Callback Hotfix

27 Nov 07:53

Choose a tag to compare

Fixes

  • Windows callbacks: Use syscall.NewCallback for correct Win64 ABI
  • Documentation: Add SEH/C++ exception limitation (Go #12516)

Windows Callback Requirements

// Windows requires exactly one uintptr-sized return value
cb := ffi.NewCallback(func(a, b, c, d uintptr) uintptr {
    return 0
})

Supported types: int64, uint64, uintptr, pointers

NOT supported on Windows: int8/16/32, float32/64, void return


Full Changelog: v0.2.0...v0.2.1

goffi v0.2.0 - Callback Support

27 Nov 06:04

Choose a tag to compare

🎉 goffi v0.2.0 - Callback Support

This release adds callback support for C-to-Go function calls, enabling WebGPU async APIs and other callback-based C libraries.

✨ New Features

  • NewCallback API - Register Go functions as C callbacks
    cb := ffi.NewCallback(func(status int, adapter uintptr) {
        // Handle async result
    })
  • 2000-entry trampoline table - Pre-compiled assembly for optimal performance
  • Thread-safe callback registry - Mutex-protected callback storage
  • Full ABI support - System V AMD64 (Linux, macOS) and Win64 (Windows)

📦 Supported Argument Types

  • Integers: int, int8, int16, int32, int64
  • Unsigned: uint, uint8, uint16, uint32, uint64, uintptr
  • Floats: float32, float64
  • Pointers: *T, unsafe.Pointer
  • Boolean: bool

🎯 Use Case: WebGPU Async Operations

// Create callback for wgpuInstanceRequestAdapter
cb := ffi.NewCallback(func(status int, adapter uintptr, msg uintptr, ud uintptr) {
    result := (*adapterResult)(unsafe.Pointer(ud))
    result.status = status
    result.adapter = adapter
    close(result.done)
})

// Pass to C function
ffi.CallFunction(&cif, wgpuRequestAdapter, nil,
    []unsafe.Pointer{&instance, &opts, &cb, &userdata})

⚠️ Known Limitations

  • Maximum 2000 callbacks per process (memory never released)
  • Complex types (string, slice, map, chan, interface) not supported
  • Callbacks must return at most one value

📊 Test Coverage

  • 20 comprehensive tests covering all scenarios
  • Thread safety validation
  • Stack argument handling
  • All supported types tested

Full Changelog: v0.1.1...v0.2.0