-
Notifications
You must be signed in to change notification settings - Fork 84
feat(arrow/memory): experimenting with addcleanup #322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
lidavidm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the idea is that via AddCleanup we can avoid explicit retain/release calls? I suppose if it makes basic usage of the library simpler, then it makes sense. I think we should caution that for "library" code, it should not be used and retain/release should be used instead.
Is there a runtime cost to attaching the cleanup callback? An application that does intend to use retain/release may not want to pay the price if there is one.
FWIW, Java may be of interest here. Java had community libraries like Netty that effectively relied on explicit retain/release callbacks to avoid GC overhead and access native memory. The modern FFM API offers an API that is instead based around explicit "arenas", and one of the arenas is managed by the garbage collector (= no manual memory management), while others require you to close the arena (but not individual allocations). (Incidentally, arrow-java has "the worst of both worlds": both allocators and individual allocations must be closed.) That may be too restrictive of a paradigm here, though.
Would make sense to put this into the documentation. Alternatively, we could use a build tag to control whether it uses addcleanup or not? (turn whichever isn't in use into no-ops)
Yes, there's a runtime cost to attaching the cleanup callback (albeit, AFAIK it's very small). In theory, it would be overall cheaper for the AddCleanup solution vs the repeated calls for Retain and Release. One important caveat to keep in mind (that technically still exists even with the current retain/release paradigm) is that users should never maintain a reference to the |
|
@chriscasola Given this came out of your comment on #269 would you be willing to take a look at this and give your thoughts on mine and @lidavidm's comments above? |
|
@zeroshade sorry for the super late reply. Your implementation looks good to me. I do question whether we should add even more complexity to the documented API, and tell users that "sometimes" they don't need Can we make |
For what it's worth, Retain and Release are slower in my profiling than the native GC. We have a database using Arrow for part of the API, and patching out the refcounting from arrow-go cut CPU usage significantly. |
|
Is there anything I can do to help get this ready / merged? This is a pretty nice performance win for us, and I'm happy to help with both testing and development work :) |
|
I'll go through this again this week and mark it ready for review after I rebase it. This was only an initial run though of this so there'll probably be more refinement necessary. |
|
I'm still concerned about not being able to achieve deterministic memory management, though. I have to worry enough about memory behavior with the current APIs (and there are already some things that are nondeterministic) and I'm worried this will exacerbate it. |
3695a54 to
5c1a547
Compare
|
@lidavidm @pixelherodev what would you both think about a build flag that would make the retain/release calls no-ops and solely use |
I think that's reasonable, I guess we'd need to duplicate some testing but that's OK. I assume the Go compiler will inline/dead-code-eliminate calls properly so that the overhead doesn't increase. |
Having the option to turn off retain/release sounds good to me.
We've used that internally for similar behavioral switches; IIRC we checked and there's no asm generated when it's disabled. Edit to clarify: we is referring to my employer, not to Arrow :) One other thought, though: if we're already using memory.DefaultAllocator, the Go allocator, Free is a no-op. We shouldn't even need the cleanups, or the release/retain. IMO it almost makes more sense to move the AddCleanup logic into memory.Allocator, and not invoke it at all for the Go allocator. That, combined with Retain/Release being disabled, would allow users of the default allocator to not be paying overhead for special cases - does that design make sense?
I'd like to address this a bit more though, too: what are the cases where this is a problem? In my testing, the cost of Retain/Release is often strictly higher than a GC interruption would be anyways, and that's with pretty heavy memory allocations (which I'm still working on cutting down :) |
Bonus: Cleanups are 16 bytes; this avoids an additional 16 bytes for each Buffer for the Go allocator, which will never use them. |
|
Copying from #269:
2 is a non-issue IMO. The non-determinism of 1 is also, I think, a bit overstated? AddCleanup is allowed to batch cleanups but the documentation says this is only likely to happen with objects that are less than sixteen bytes and that do not contain pointers, neither of which applies to most of the types currently using Retain/Release. Practically speaking, the cleanups will most likely run within 1 or 2 GC cycles. Are there cases where this is insufficient? How important is it that memory be freed as quickly as possible? |
|
I (not speaking from a maintainer perspective but a personal one) have users who are concerned about memory usage, and are not working with Go directly (instead Go components are part of a larger system). I could insert explicit GC calls all over the application, but at that point I'd lean towards dropping Go entirely (other reasons factor into this, though). This sounds like a win for pure Go users, so I'm in favor of it. I was hoping we could still keep the current guarantees for non-Go users, but it sounds like that will be difficult. |
That's the perspective I was looking for here :)
In terms of quantity? Maximum heap usage? I'm assuming you're aware of GOGC and GOMEMLIMIT? I'm wondering if we can find another solution for such users without needing thousands of lines of code like this to support it... |
|
Ah, the other issue is that the memory is allocated via things like mallocator, or Go is given handles to external allocations - so the GC isn't necessarily aware of the (full extent of the) memory used. (And with external handles it's nice to be able to say, release this handle right now instead of hoping the GC gets it.) |
That's mostly just about timing, yeah? Having the memory be freed faster? If the goal is to bound memory usage of the Go components, whcih aren't as controlled as the others, would it not be sufficient to set a soft memory limit and use the Go allocator instead? |
|
Not permissible with cgo. |
Ahhhh, gotcha. That's tricky then... is this for Go calling into C, or vice versa? Is there a defined operational timespan? Trying to think if something like an arena allocator would be sufficient, to at least reuse the same memory... |
|
It sounds like the best option that doesn't break anything for anyone, while still getting at least most of the benefits, is to add a build flag to switch from retain/release to AddCleanup, and then to move the cleanup logic into the Allocators, so that anyone using the GoAllocator has no additional runtime overhead for memory management? We can always try and revisit later and figure out a solution for non-Go use cases.. |
|
If it's helpful, I can probably build on top of this PR with those changes? |
|
that would be super helpful, thanks! Sorry for not finding the time yet to focus on this |
No worries, completely understandable! |
How to best disable retain/release is another interesting question. We could move every single copy of the Retain/Release implementation into their own files with the build tag, but that's kinda gross IMO, even if we just have it as one per package. Might be best to have a single implementation of retain/release and refcounting, and then just embed that as a substructure wherever needed - which, actually, is probably a good change to make on its own anyways, so I'll start by sending in a PR with that I think Edit: ah, except the problem is that each one needs to be able to free all of its refcounted subfields, too, not just to hand it back to the allocator. I'll think this over and se if I can find a way to clean this up... |
|
There's currently 121 different implementations of Release, and 56 of Retain. A lot of them are just this shell: func (t *Type) Release() {
debug.Assert(t.refCount.Load() > 0, "too many releases")
if t.refCount.Add(-1) == 0 {
if t.someSubfield != nil {
t.someSubfield.Release()
t.someSubfield = nil
}
}
}[not to mention that that debug assertion is still doing an extra atomic per release; the Load() can't be optimized out. That said, on x64 at least, it's just , hardly a big deal.] Realistically, these are just wrappers of the subfields. A lot of these are allocated on the Go heap, referencing Allocator memory as a subfield. What we're basically saying is "the subfield's memory is valid for as long as the field itself is." ...except that the field will actually outlive the subfield, so we nil out the reference to the allocated memory, because the field itself is on the Go heap. These shells could all become func (t *Type) Release() {
t.someSubfield.Release()
}without issue IMO, and the same change made to the Retain. The Retain/Release is really tracking how many additional references to the subfield are present. [The main issue is that on go=>Allocator memory boundaries, we really do want to nil out the pointer immediately to prevent use-after-frees; on go->go boundaries, worst-case is a panic, not memory error.] What we really have is some allocations that need managed explicitly, and some Go-heap-allocated memory which references it in a dependency tree. e.g. ipc/MessageReader references ipc/Message, which holds two actual Buffers that need to be freed. type Allocation struct {
allocator *mem.Allocator
data []byte
}
type Refcount struct {
dependencies []*Refcount
memories []*Allocation
}We could then e.g. have Buffer embed an Allocation at the root so that it can be used as one, instead of storing the two pointers directly; then, it'd embed a Refcount which points at that allocation - and any child buffer would have a Refcount with a dependency of the parent. Then instead of each type implementing the release logic manually, it would set up the dependencies and |
|
Opened a proof-of-concept to add a build flag to disable refcounting: #578 I'm wondering: for those of using just the Go allocator, is AddCleanup even necessary? If retain/release is intended to be kept as is for other users, should we just have RR by default, and a norc flag for Go users to turn it off? |
In that scenario, it wouldn't be. But the problem is that there's no good way to know whether or not we need the addcleanup without pushing it as a responsibility onto the allocator which would defeat the purpose.
At least for our first pass at this, I think that's the right way to go, and we can start introducing using the AddCleanup (behind a flag) behind a potential flag. If it becomes worthwhile, we might look at making the breaking change and a new major version release |
In that case, mind reviewing the PoC for adding a flag to opt-out of refcounting? I'd like the approach reviewed before I finish writing the code :) |
|
will do, will try to review today |
Rationale for this change
Experimenting with using AddCleanup for automatic cleanup of the Buffers without requiring
RetainandReleaseeverywhere even for custom Allocators.What changes are included in this PR?
Changes to the underlying Buffer object to leverage AddCleanup when using go1.24 or higher.
Are these changes tested?
Yes, though more tests need to be added to ensure more complex scenarios don't fail.
Are there any user-facing changes?