feat(arrow/memory): experimenting with addcleanup #322

zeroshade · 2025-03-20T22:33:28Z

Rationale for this change

Experimenting with using AddCleanup for automatic cleanup of the Buffers without requiring Retain and Release everywhere even for custom Allocators.

What changes are included in this PR?

Changes to the underlying Buffer object to leverage AddCleanup when using go1.24 or higher.

Are these changes tested?

Yes, though more tests need to be added to ensure more complex scenarios don't fail.

Are there any user-facing changes?

lidavidm

So the idea is that via AddCleanup we can avoid explicit retain/release calls? I suppose if it makes basic usage of the library simpler, then it makes sense. I think we should caution that for "library" code, it should not be used and retain/release should be used instead.

Is there a runtime cost to attaching the cleanup callback? An application that does intend to use retain/release may not want to pay the price if there is one.

FWIW, Java may be of interest here. Java had community libraries like Netty that effectively relied on explicit retain/release callbacks to avoid GC overhead and access native memory. The modern FFM API offers an API that is instead based around explicit "arenas", and one of the arenas is managed by the garbage collector (= no manual memory management), while others require you to close the arena (but not individual allocations). (Incidentally, arrow-java has "the worst of both worlds": both allocators and individual allocations must be closed.) That may be too restrictive of a paradigm here, though.

zeroshade · 2025-03-28T17:50:12Z

I suppose if it makes basic usage of the library simpler, then it makes sense. I think we should caution that for "library" code, it should not be used and retain/release should be used instead.

Would make sense to put this into the documentation.

Alternatively, we could use a build tag to control whether it uses addcleanup or not? (turn whichever isn't in use into no-ops)

Is there a runtime cost to attaching the cleanup callback? An application that does intend to use retain/release may not want to pay the price if there is one.

Yes, there's a runtime cost to attaching the cleanup callback (albeit, AFAIK it's very small). In theory, it would be overall cheaper for the AddCleanup solution vs the repeated calls for Retain and Release.

One important caveat to keep in mind (that technically still exists even with the current retain/release paradigm) is that users should never maintain a reference to the []byte from the Bytes() method of memory.Buffer longer than the scope of the buffer (or call to Release). In the case of a non-go allocator, those bytes could get wiped out from under them when the releasing occurs. It may make sense to utilize the new weak.Ptr in some way for how we expose the byte slices from the buffer.

zeroshade · 2025-03-28T17:51:02Z

@chriscasola Given this came out of your comment on #269 would you be willing to take a look at this and give your thoughts on mine and @lidavidm's comments above?

chriscasola · 2025-08-27T19:35:08Z

@zeroshade sorry for the super late reply. Your implementation looks good to me. I do question whether we should add even more complexity to the documented API, and tell users that "sometimes" they don't need Retain and Release.

Can we make Retain and Release noops in go 1.25+ and mark them as deprecated?

pixelherodev · 2025-10-23T19:58:13Z

Is there a runtime cost to attaching the cleanup callback? An application that does intend to use retain/release may not want to pay the price if there is one.

Can we make Retain and Release noops in go 1.25+ and mark them as deprecated?

For what it's worth, Retain and Release are slower in my profiling than the native GC. We have a database using Arrow for part of the API, and patching out the refcounting from arrow-go cut CPU usage significantly.

pixelherodev · 2025-10-27T20:01:20Z

Is there anything I can do to help get this ready / merged? This is a pretty nice performance win for us, and I'm happy to help with both testing and development work :)

zeroshade · 2025-10-27T22:49:54Z

I'll go through this again this week and mark it ready for review after I rebase it. This was only an initial run though of this so there'll probably be more refinement necessary.

lidavidm · 2025-10-27T23:15:49Z

I'm still concerned about not being able to achieve deterministic memory management, though. I have to worry enough about memory behavior with the current APIs (and there are already some things that are nondeterministic) and I'm worried this will exacerbate it.

zeroshade · 2025-11-03T20:42:05Z

@lidavidm @pixelherodev what would you both think about a build flag that would make the retain/release calls no-ops and solely use AddCleanup so that a consumer could essentially "opt-in" if they aren't concerned about the non-deterministic nature of the GC?

lidavidm · 2025-11-04T02:22:42Z

@lidavidm @pixelherodev what would you both think about a build flag that would make the retain/release calls no-ops and solely use AddCleanup so that a consumer could essentially "opt-in" if they aren't concerned about the non-deterministic nature of the GC?

I think that's reasonable, I guess we'd need to duplicate some testing but that's OK. I assume the Go compiler will inline/dead-code-eliminate calls properly so that the overhead doesn't increase.

pixelherodev · 2025-11-07T03:27:21Z

@lidavidm @pixelherodev what would you both think about a build flag that would make the retain/release calls no-ops and solely use AddCleanup so that a consumer could essentially "opt-in" if they aren't concerned about the non-deterministic nature of the GC?

Having the option to turn off retain/release sounds good to me.

I assume the Go compiler will inline/dead-code-eliminate calls properly so that the overhead doesn't increase.

We've used that internally for similar behavioral switches; IIRC we checked and there's no asm generated when it's disabled.

Edit to clarify: we is referring to my employer, not to Arrow :)

One other thought, though: if we're already using memory.DefaultAllocator, the Go allocator, Free is a no-op. We shouldn't even need the cleanups, or the release/retain. IMO it almost makes more sense to move the AddCleanup logic into memory.Allocator, and not invoke it at all for the Go allocator.

That, combined with Retain/Release being disabled, would allow users of the default allocator to not be paying overhead for special cases - does that design make sense?

if they aren't concerned about the non-deterministic nature of the GC

I'd like to address this a bit more though, too: what are the cases where this is a problem? In my testing, the cost of Retain/Release is often strictly higher than a GC interruption would be anyways, and that's with pretty heavy memory allocations (which I'm still working on cutting down :)

pixelherodev · 2025-11-07T03:29:10Z

One other thought, though: if we're already using memory.DefaultAllocator, the Go allocator, Free is a no-op. We shouldn't even need the cleanups, or the release/retain. IMO it almost makes more sense to move the AddCleanup logic into memory.Allocator, and not invoke it at all for the Go allocator.

That, combined with Retain/Release being disabled, would allow users of the default allocator to not be paying overhead for special cases - does that design make sense?

Bonus: Cleanups are 16 bytes; this avoids an additional 16 bytes for each Buffer for the Go allocator, which will never use them.

pixelherodev · 2025-11-07T03:38:34Z

Copying from #269:

The two problems with totally dropping retain/release:

AddCleanup is non-deterministic on when it runs, so for situations where the user is using a custom allocator (instead of the default Go allocator) we'd still want to make sure that the Retain/Release paradigm allows users to control when memory is freed

It's also not guaranteed to run before program exit, which may not be a problem given that typically it's just calling Free on some allocations so maybe this isn't an issue.

2 is a non-issue IMO. The non-determinism of 1 is also, I think, a bit overstated? AddCleanup is allowed to batch cleanups but the documentation says this is only likely to happen with objects that are less than sixteen bytes and that do not contain pointers, neither of which applies to most of the types currently using Retain/Release.

Practically speaking, the cleanups will most likely run within 1 or 2 GC cycles. Are there cases where this is insufficient? How important is it that memory be freed as quickly as possible?

lidavidm · 2025-11-07T03:44:55Z

I (not speaking from a maintainer perspective but a personal one) have users who are concerned about memory usage, and are not working with Go directly (instead Go components are part of a larger system). I could insert explicit GC calls all over the application, but at that point I'd lean towards dropping Go entirely (other reasons factor into this, though).

This sounds like a win for pure Go users, so I'm in favor of it. I was hoping we could still keep the current guarantees for non-Go users, but it sounds like that will be difficult.

pixelherodev · 2025-11-07T05:01:08Z

I (not speaking from a maintainer perspective but a personal one)

That's the perspective I was looking for here :)

have users who are concerned about memory usage

In terms of quantity? Maximum heap usage? I'm assuming you're aware of GOGC and GOMEMLIMIT?

I'm wondering if we can find another solution for such users without needing thousands of lines of code like this to support it...

lidavidm · 2025-11-07T05:03:28Z

Ah, the other issue is that the memory is allocated via things like mallocator, or Go is given handles to external allocations - so the GC isn't necessarily aware of the (full extent of the) memory used. (And with external handles it's nice to be able to say, release this handle right now instead of hoping the GC gets it.)

pixelherodev · 2025-11-10T11:07:38Z

Ah, the other issue is that the memory is allocated via things like mallocator, or Go is given handles to external allocations - so the GC isn't necessarily aware of the (full extent of the) memory used. (And with external handles it's nice to be able to say, release this handle right now instead of hoping the GC gets it.)

That's mostly just about timing, yeah? Having the memory be freed faster?

If the goal is to bound memory usage of the Go components, whcih aren't as controlled as the others, would it not be sufficient to set a soft memory limit and use the Go allocator instead?

lidavidm · 2025-11-10T12:41:30Z

Not permissible with cgo.

pixelherodev · 2025-11-10T20:32:48Z

Not permissible with cgo.

Ahhhh, gotcha. That's tricky then... is this for Go calling into C, or vice versa? Is there a defined operational timespan? Trying to think if something like an arena allocator would be sufficient, to at least reuse the same memory...

pixelherodev · 2025-11-10T20:34:11Z

It sounds like the best option that doesn't break anything for anyone, while still getting at least most of the benefits, is to add a build flag to switch from retain/release to AddCleanup, and then to move the cleanup logic into the Allocators, so that anyone using the GoAllocator has no additional runtime overhead for memory management?

We can always try and revisit later and figure out a solution for non-Go use cases..

pixelherodev · 2025-11-20T21:57:49Z

If it's helpful, I can probably build on top of this PR with those changes?

zeroshade · 2025-11-21T15:46:56Z

that would be super helpful, thanks! Sorry for not finding the time yet to focus on this

pixelherodev · 2025-11-21T19:44:13Z

that would be super helpful, thanks! Sorry for not finding the time yet to focus on this

No worries, completely understandable!

pixelherodev · 2025-11-21T20:18:57Z

It sounds like the best option that doesn't break anything for anyone, while still getting at least most of the benefits, is to add a build flag to switch from retain/release to AddCleanup, and then to move the cleanup logic into the Allocators, so that anyone using the GoAllocator has no additional runtime overhead for memory management?

How to best disable retain/release is another interesting question. We could move every single copy of the Retain/Release implementation into their own files with the build tag, but that's kinda gross IMO, even if we just have it as one per package.

Might be best to have a single implementation of retain/release and refcounting, and then just embed that as a substructure wherever needed - which, actually, is probably a good change to make on its own anyways, so I'll start by sending in a PR with that I think

Edit: ah, except the problem is that each one needs to be able to free all of its refcounted subfields, too, not just to hand it back to the allocator. I'll think this over and se if I can find a way to clean this up...

pixelherodev · 2025-11-21T20:51:49Z

There's currently 121 different implementations of Release, and 56 of Retain. A lot of them are just this shell:

func (t *Type) Release() {
    debug.Assert(t.refCount.Load() > 0, "too many releases")
    if t.refCount.Add(-1) == 0 {
        if t.someSubfield != nil {
            t.someSubfield.Release()
            t.someSubfield = nil
        }
    }
}

[not to mention that that debug assertion is still doing an extra atomic per release; the Load() can't be optimized out. That said, on x64 at least, it's just

XCHGL AX, AX
MOVQ	16(AX), CX

, hardly a big deal.]

Realistically, these are just wrappers of the subfields. A lot of these are allocated on the Go heap, referencing Allocator memory as a subfield. What we're basically saying is "the subfield's memory is valid for as long as the field itself is."

...except that the field will actually outlive the subfield, so we nil out the reference to the allocated memory, because the field itself is on the Go heap.

These shells could all become

func (t *Type) Release() {
    t.someSubfield.Release()
}

without issue IMO, and the same change made to the Retain. The Retain/Release is really tracking how many additional references to the subfield are present. [The main issue is that on go=>Allocator memory boundaries, we really do want to nil out the pointer immediately to prevent use-after-frees; on go->go boundaries, worst-case is a panic, not memory error.]

What we really have is some allocations that need managed explicitly, and some Go-heap-allocated memory which references it in a dependency tree.

e.g. ipc/MessageReader references ipc/Message, which holds two actual Buffers that need to be freed.

type Allocation struct {
    allocator *mem.Allocator
    data []byte
}

type Refcount struct {
    dependencies []*Refcount
    memories []*Allocation
}

We could then e.g. have Buffer embed an Allocation at the root so that it can be used as one, instead of storing the two pointers directly; then, it'd embed a Refcount which points at that allocation - and any child buffer would have a Refcount with a dependency of the parent.

Then instead of each type implementing the release logic manually, it would set up the dependencies and .Release() would be a single implementation that would walk the dependencies?

pixelherodev · 2025-11-21T23:56:55Z

Opened a proof-of-concept to add a build flag to disable refcounting: #578

I'm wondering: for those of using just the Go allocator, is AddCleanup even necessary?

If retain/release is intended to be kept as is for other users, should we just have RR by default, and a norc flag for Go users to turn it off?

zeroshade · 2025-11-22T19:51:18Z

I'm wondering: for those of using just the Go allocator, is AddCleanup even necessary?

In that scenario, it wouldn't be. But the problem is that there's no good way to know whether or not we need the addcleanup without pushing it as a responsibility onto the allocator which would defeat the purpose.

If retain/release is intended to be kept as is for other users, should we just have RR by default, and a norc flag for Go users to turn it off?

At least for our first pass at this, I think that's the right way to go, and we can start introducing using the AddCleanup (behind a flag) behind a potential flag. If it becomes worthwhile, we might look at making the breaking change and a new major version release

pixelherodev · 2025-11-24T08:48:49Z

At least for our first pass at this, I think that's the right way to go, and we can start introducing using the AddCleanup (behind a flag) behind a potential flag. If it becomes worthwhile, we might look at making the breaking change and a new major version release

In that case, mind reviewing the PoC for adding a flag to opt-out of refcounting? I'd like the approach reviewed before I finish writing the code :)

zeroshade · 2025-11-24T15:44:49Z

will do, will try to review today

zeroshade mentioned this pull request Mar 20, 2025

Learning Curve of the API #269

Open

lidavidm reviewed Mar 21, 2025

View reviewed changes

zeroshade force-pushed the add-cleanup branch from b4566dc to 3695a54 Compare March 25, 2025 21:47

zeroshade added 3 commits November 3, 2025 15:30

feat(arrow/memory): experimenting with addcleanup

a9f48e3

use RecordEqual and ArrayEqual

4072fa8

fix arraydata equal func

5c1a547

zeroshade force-pushed the add-cleanup branch from 3695a54 to 5c1a547 Compare November 3, 2025 20:30

zeroshade added 2 commits November 3, 2025 15:33

upgrade tinygo version

5de26d6

fix tinygo handling

3191f1b

feat(arrow/memory): experimenting with addcleanup #322

Are you sure you want to change the base?

feat(arrow/memory): experimenting with addcleanup #322

Uh oh!

Conversation

zeroshade commented Mar 20, 2025

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

lidavidm left a comment

Choose a reason for hiding this comment

Uh oh!

zeroshade commented Mar 28, 2025

Uh oh!

zeroshade commented Mar 28, 2025

Uh oh!

chriscasola commented Aug 27, 2025

Uh oh!

pixelherodev commented Oct 23, 2025

Uh oh!

pixelherodev commented Oct 27, 2025

Uh oh!

zeroshade commented Oct 27, 2025

Uh oh!

lidavidm commented Oct 27, 2025

Uh oh!

zeroshade commented Nov 3, 2025

Uh oh!

lidavidm commented Nov 4, 2025

Uh oh!

pixelherodev commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pixelherodev commented Nov 7, 2025

Uh oh!

pixelherodev commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lidavidm commented Nov 7, 2025

Uh oh!

pixelherodev commented Nov 7, 2025

Uh oh!

lidavidm commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pixelherodev commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lidavidm commented Nov 10, 2025

Uh oh!

pixelherodev commented Nov 10, 2025

Uh oh!

pixelherodev commented Nov 10, 2025

Uh oh!

pixelherodev commented Nov 20, 2025

Uh oh!

zeroshade commented Nov 21, 2025

Uh oh!

pixelherodev commented Nov 21, 2025

Uh oh!

pixelherodev commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pixelherodev commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pixelherodev commented Nov 21, 2025

Uh oh!

zeroshade commented Nov 22, 2025

Uh oh!

pixelherodev commented Nov 24, 2025

Uh oh!

zeroshade commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

pixelherodev commented Nov 7, 2025 •

edited

Loading

pixelherodev commented Nov 7, 2025 •

edited

Loading

lidavidm commented Nov 7, 2025 •

edited

Loading

pixelherodev commented Nov 10, 2025 •

edited

Loading

pixelherodev commented Nov 21, 2025 •

edited

Loading

pixelherodev commented Nov 21, 2025 •

edited

Loading