-
Notifications
You must be signed in to change notification settings - Fork 1
ARM64 Darwin Fixes #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hey @ppoage, thanks for the comprehensive work on ARM64! Quick note: we merged v0.3.6 on Dec 29 with similar HFA/sret fixes, so there's a struct layout conflict now: v0.3.6 layout:
Your layout:
What we'd like to adopt from your PR:
Issues:
Options:
Preferred: Option 1 — please rebase on main, adapt to our struct layout, and force-push. This keeps the fix properly attributed to you. Note: This is preliminary feedback — I'm still out of town without my laptop. Will do a deeper review when I'm back. Will review wgpu/naga/gogpu PRs next — the type-safe ObjC messaging looks critical and great! |
…ests Changes - Return path: read X1 for 9–16 byte struct returns; interpret D0–D3 as raw bits for float/HFA. - Arg path: AAPCS64 mixed struct packing into GPR/FPR; HFA structs routed to FP regs; by‑ref fallback for >16 bytes. - Classification: nested HFA detection; shared struct‑reg counting for mixed int/float. - Example: darwin uses libSystem puts with correct return type and arg storage. - Ignore Go build cache (.gocache/). Fixes - 9–16 byte struct returns now capture X0+X1. - Correct float/HFA return decoding and mixed struct arg packing on ARM64. Tests - Add nested HFA cases and struct‑arg packing tests. - Add Darwin ObjC/CoreGraphics FFI coverage. - Refresh callback/coverage/benchmark/ffi tests. All tests pass on Darwin. Current benchmark stats: M3 Pro BenchmarkGoffiOverhead: 58 ns/op BenchmarkGoffiIntArgs: 67 ns/op BenchmarkGoffiStringOutput: crashes BenchmarkGoffiMultipleArgs: 73 ns/op BenchmarkDirectGo: 9 ns/op BenchmarkPrepareCallInterface: 26 ns/op BenchmarkLoadLibrary: 27 us/op BenchmarkGetSymbol: 498 ns/op BenchmarkPrepCIF: 7 ns/op BenchmarkNewCallback: 12 ns/op BenchmarkCallbackInvoke: 146 ns/op BenchmarkCallbackFloat: 148 ns/op
|
Ok, rebased. I have some extra changes in this rebase, but most should be necessary. Once the initial fix was made with the objc cffi (HFA/etc), I had to fix nested struct handling because that was the root cause for why we couldn't send or receive window size. And then I was seeing some weird things between floats/ints so I added mixed support. FYI BenchmarkGoffiStringOutput segfaults on mac. It looks like it calls with the string as a pointer. Didn't want to modify your benchmark, but it does run if that's fixed. Edit: Also when trying to add textures to metal, discovered oversized structs (size >16) aren't being handled properly. I fixed that and added a test for coverage (which now passes). That will be coming in a future commit |
…ent classification for ARM64. Added asm shim to test function call through ABI interface.
|
Ok, I fixed the oversized struct handling. Also added an assembly shim for use in testing to verify the whole chain through the ffi call doesn't mangle/drop something. FYI, the current ffi doesn't accept the stack based extra arguments. I'm seeing if I can add the texture pipeline for metal, but the current ABI implementation isn't complete and I don't know what approach you prefer to take here. |
|
Thanks for the excellent rebase! I've done a deep review and the PR looks great. Key improvements I verified:
One small fix needed for the benchmark: In // args contains pointers to argument storage, not the values themselves
_ = CallFunction(cif, sym, unsafe.Pointer(&result), []unsafe.Pointer{unsafe.Pointer(&strPtr)})Once that's fixed, I'll approve and merge. Great work on the ARM64 darwin support! Re: stack-based extra arguments - that's a known limitation, we can address it in a future PR. |
benchmark results: M3 Pro BenchmarkGoffiStringOutput: 64 ns/op
|
Fix for the benchmark pushed! Note: I haven't run any of the tests for other OS's, I assume they pass as ARM64 was the one getting changes |
kolkov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick fix! Benchmark fix looks correct.
Verified:
- strPtr now passed as
unsafe.Pointer(&strPtr)- correct API usage - 64 ns/op on M3 Pro - excellent performance
Approving. Will merge once CI passes.
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment Thanks for integrating Codecov - We've got you covered ☂️ |
To be edited~
Note: I added some checks and panics as we were experiencing very hard to pin down issues with segfaults, silent failures that caused later corruptions, and race conditions. At one point, running with -race actually stopped the segfault.