Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
729654c
Add new argument to `gil_safe_call_once_and_store::call_once_and_stor…
XuehaiPan Dec 13, 2025
d2b7605
Add per-interpreter storage for `gil_safe_call_once_and_store`
XuehaiPan Dec 13, 2025
e741760
Make `~gil_safe_call_once_and_store` a no-op
XuehaiPan Dec 14, 2025
5d1d678
Fix C++11 compatibility
XuehaiPan Dec 14, 2025
0bac82d
Improve thread-safety and add default finalizer
XuehaiPan Dec 14, 2025
2a4b118
Merge branch 'master' into subinterp-call-once-and-store
XuehaiPan Dec 14, 2025
be97110
Try fix thread-safety
XuehaiPan Dec 14, 2025
3e77ce9
Try fix thread-safety
XuehaiPan Dec 14, 2025
d5b8813
Add a warning comment
XuehaiPan Dec 15, 2025
f6d0f88
Simplify `PYBIND11_INTERNALS_VERSION >= 12`
XuehaiPan Dec 15, 2025
7d8339e
Try fix thread-safety
XuehaiPan Dec 15, 2025
1920f43
Try fix thread-safety
XuehaiPan Dec 15, 2025
900bed6
Merge branch 'master' into subinterp-call-once-and-store
XuehaiPan Dec 15, 2025
a6754ba
Revert get_pp()
XuehaiPan Dec 16, 2025
1aed3ab
Update comments
XuehaiPan Dec 16, 2025
b61e902
Move call-once storage out of internals
XuehaiPan Dec 17, 2025
b72cd41
Revert internal version bump
XuehaiPan Dec 17, 2025
ac02a32
Cleanup outdated comments
XuehaiPan Dec 17, 2025
ddb6dd4
Move atomic_bool alias into pybind11::detail namespace
rwgk Dec 20, 2025
3fb52df
Add explicit #include <unordered_map> for subinterpreter support
rwgk Dec 20, 2025
32deca4
Remove extraneous semicolon after destructor definition
rwgk Dec 20, 2025
a4d4d73
Add comment explaining unused finalize parameter
rwgk Dec 20, 2025
7cb30ce
Add comment explaining error_scope usage
rwgk Dec 20, 2025
7d34139
Improve exception safety in get_or_create_call_once_storage_map()
rwgk Dec 20, 2025
78e3945
Add timeout-minutes: 3 to cpptest workflow steps
rwgk Dec 20, 2025
1014ee4
Add progress reporter for test_with_catch Catch2 runner
rwgk Dec 20, 2025
21d0dc5
clang-format auto-fix (overlooked before)
rwgk Dec 20, 2025
e1b1b1b
Disable "Move Subinterpreter" test on free-threaded Python 3.14+
rwgk Dec 21, 2025
89cae6d
style: pre-commit fixes
pre-commit-ci[bot] Dec 21, 2025
a090637
Add test for gil_safe_call_once_and_store per-interpreter isolation
rwgk Dec 21, 2025
cb5e7d7
Add STARTING/DONE timestamps to test_with_catch output
rwgk Dec 21, 2025
0f8f32a
Disable stdout buffering in test_with_catch
rwgk Dec 21, 2025
a3abdee
EXPERIMENT: Re-enable hanging test to verify CI log buffering fix
rwgk Dec 21, 2025
d6f2a7f
Revert "Disable stdout buffering in test_with_catch"
rwgk Dec 21, 2025
9b70460
Use USES_TERMINAL for cpptest to show output immediately
rwgk Dec 21, 2025
8951004
Fix clang-tidy performance-avoid-endl warning
rwgk Dec 21, 2025
c4cbe73
Add SIGTERM handler to show when test is killed by timeout
rwgk Dec 21, 2025
f330a79
Fix typo: atleast -> at_least
rwgk Dec 21, 2025
6c1ccb9
Fix GCC warn_unused_result error for write() in signal handler
rwgk Dec 21, 2025
3c01ff3
Add USES_TERMINAL to other C++ test targets
rwgk Dec 21, 2025
9e9843d
Revert "EXPERIMENT: Re-enable hanging test to verify CI log buffering…
rwgk Dec 21, 2025
e7c2648
Update comment to reference PR #5940 for Move Subinterpreter fix
rwgk Dec 21, 2025
58c08ac
Add alias `interpid_t = std::int64_t`
XuehaiPan Dec 21, 2025
305a293
Add isolation and gc test for `gil_safe_call_once_and_store`
XuehaiPan Dec 21, 2025
f6bba0f
Add thread local cache for gil_safe_call_once_and_store
XuehaiPan Dec 21, 2025
66e4697
Revert "Add thread local cache for gil_safe_call_once_and_store"
XuehaiPan Dec 21, 2025
d0819cc
Revert changes according to code review
XuehaiPan Dec 21, 2025
5ce00e5
Relocate multiple-interpreters tests
XuehaiPan Dec 21, 2025
97b50fe
Add more tests for multiple interpreters
XuehaiPan Dec 21, 2025
8819ec4
Remove copy constructor
XuehaiPan Dec 21, 2025
e84e9c1
Merge remote-tracking branch 'upstream/master' into subinterp-call-on…
XuehaiPan Dec 22, 2025
d9daef5
Apply suggestions from code review
XuehaiPan Dec 22, 2025
9a3328b
Refactor to use per-storage capsule instead
XuehaiPan Dec 22, 2025
b68faf0
Merge remote-tracking branch 'upstream/master' into subinterp-call-on…
XuehaiPan Dec 22, 2025
bc20601
Update comments
XuehaiPan Dec 22, 2025
b39c049
Update singleton tests
XuehaiPan Dec 22, 2025
9ef71ec
Use interpreter id type for `get_num_interpreters_seen()`
XuehaiPan Dec 22, 2025
98370f2
Suppress unused variable warning
XuehaiPan Dec 22, 2025
534235e
HACKING
XuehaiPan Dec 22, 2025
d038714
Revert "HACKING"
XuehaiPan Dec 22, 2025
3a2c34a
Try fix concurrency
XuehaiPan Dec 22, 2025
99a095d
Test even harder
XuehaiPan Dec 22, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 7 additions & 9 deletions include/pybind11/detail/internals.h
Original file line number Diff line number Diff line change
Expand Up @@ -420,8 +420,8 @@ inline PyThreadState *get_thread_state_unchecked() {

/// We use this counter to figure out if there are or have been multiple subinterpreters active at
/// any point. This must never decrease while any interpreter may be running in any thread!
inline std::atomic<int> &get_num_interpreters_seen() {
static std::atomic<int> counter(0);
inline std::atomic<int64_t> &get_num_interpreters_seen() {
static std::atomic<int64_t> counter(0);
return counter;
}

Expand Down Expand Up @@ -564,7 +564,7 @@ class internals_pp_manager {
/// acquire the GIL. Will never return nullptr.
std::unique_ptr<InternalsType> *get_pp() {
#ifdef PYBIND11_HAS_SUBINTERPRETER_SUPPORT
if (get_num_interpreters_seen() > 1) {
if (get_num_interpreters_seen() > 1 || last_istate_tls() == nullptr) {
// Whenever the interpreter changes on the current thread we need to invalidate the
// internals_pp so that it can be pulled from the interpreter's state dict. That is
// slow, so we use the current PyThreadState to check if it is necessary.
Expand All @@ -590,11 +590,8 @@ class internals_pp_manager {
/// Drop all the references we're currently holding.
void unref() {
#ifdef PYBIND11_HAS_SUBINTERPRETER_SUPPORT
if (get_num_interpreters_seen() > 1) {
last_istate_tls() = nullptr;
internals_p_tls() = nullptr;
return;
}
last_istate_tls() = nullptr;
internals_p_tls() = nullptr;
#endif
internals_singleton_pp_ = nullptr;
}
Expand All @@ -606,7 +603,6 @@ class internals_pp_manager {
// this could be called without an active interpreter, just use what was cached
if (!tstate || tstate->interp == last_istate_tls()) {
auto tpp = internals_p_tls();

delete tpp;
}
unref();
Expand Down Expand Up @@ -660,6 +656,8 @@ class internals_pp_manager {

char const *holder_id_ = nullptr;
on_fetch_function *on_fetch_ = nullptr;
// Pointer-to-pointer to the singleton internals for the first seen interpreter (may not be the
// main interpreter)
std::unique_ptr<InternalsType> *internals_singleton_pp_;
};

Expand Down
264 changes: 254 additions & 10 deletions include/pybind11/gil_safe_call_once.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,31 @@
#pragma once

#include "detail/common.h"
#include "detail/internals.h"
#include "gil.h"

#include <cassert>
#include <mutex>

#ifdef Py_GIL_DISABLED
#if defined(Py_GIL_DISABLED) || defined(PYBIND11_HAS_SUBINTERPRETER_SUPPORT)
# include <atomic>
#endif
#ifdef PYBIND11_HAS_SUBINTERPRETER_SUPPORT
# include <cstdint>
# include <memory>
# include <string>
#endif

PYBIND11_NAMESPACE_BEGIN(PYBIND11_NAMESPACE)

PYBIND11_NAMESPACE_BEGIN(detail)
#if defined(Py_GIL_DISABLED) || defined(PYBIND11_HAS_SUBINTERPRETER_SUPPORT)
using atomic_bool = std::atomic_bool;
#else
using atomic_bool = bool;
#endif
PYBIND11_NAMESPACE_END(detail)

// Use the `gil_safe_call_once_and_store` class below instead of the naive
//
// static auto imported_obj = py::module_::import("module_name"); // BAD, DO NOT USE!
Expand Down Expand Up @@ -48,12 +62,23 @@ PYBIND11_NAMESPACE_BEGIN(PYBIND11_NAMESPACE)
// functions, which is usually the case.
//
// For in-depth background, see docs/advanced/deadlock.md
#ifndef PYBIND11_HAS_SUBINTERPRETER_SUPPORT
// Subinterpreter support is disabled.
// In this case, we can store the result globally, because there is only a single interpreter.
//
// The life span of the stored result is the entire process lifetime. It is leaked on process
// termination to avoid destructor calls after the Python interpreter was finalized.
template <typename T>
class gil_safe_call_once_and_store {
public:
// PRECONDITION: The GIL must be held when `call_once_and_store_result()` is called.
//
// NOTE: The second parameter (finalize callback) is intentionally unused when subinterpreter
// support is disabled. In that case, storage is process-global and intentionally leaked to
// avoid calling destructors after the Python interpreter has been finalized.
template <typename Callable>
gil_safe_call_once_and_store &call_once_and_store_result(Callable &&fn) {
gil_safe_call_once_and_store &call_once_and_store_result(Callable &&fn,
void (*)(T &) /*unused*/ = nullptr) {
if (!is_initialized_) { // This read is guarded by the GIL.
// Multiple threads may enter here, because the GIL is released in the next line and
// CPython API calls in the `fn()` call below may release and reacquire the GIL.
Expand All @@ -74,29 +99,248 @@ class gil_safe_call_once_and_store {
T &get_stored() {
assert(is_initialized_);
PYBIND11_WARNING_PUSH
#if !defined(__clang__) && defined(__GNUC__) && __GNUC__ < 5
# if !defined(__clang__) && defined(__GNUC__) && __GNUC__ < 5
// Needed for gcc 4.8.5
PYBIND11_WARNING_DISABLE_GCC("-Wstrict-aliasing")
#endif
# endif
return *reinterpret_cast<T *>(storage_);
PYBIND11_WARNING_POP
}

constexpr gil_safe_call_once_and_store() = default;
// The instance is a global static, so its destructor runs when the process
// is terminating. Therefore, do nothing here because the Python interpreter
// may have been finalized already.
PYBIND11_DTOR_CONSTEXPR ~gil_safe_call_once_and_store() = default;

// Disable copy and move operations.
gil_safe_call_once_and_store(const gil_safe_call_once_and_store &) = delete;
gil_safe_call_once_and_store(gil_safe_call_once_and_store &&) = delete;
gil_safe_call_once_and_store &operator=(const gil_safe_call_once_and_store &) = delete;
gil_safe_call_once_and_store &operator=(gil_safe_call_once_and_store &&) = delete;

private:
// The global static storage (per-process) when subinterpreter support is disabled.
alignas(T) char storage_[sizeof(T)] = {};
std::once_flag once_flag_;
#ifdef Py_GIL_DISABLED
std::atomic_bool
#else
bool
#endif
is_initialized_{false};

// The `is_initialized_`-`storage_` pair is very similar to `std::optional`,
// but the latter does not have the triviality properties of former,
// therefore `std::optional` is not a viable alternative here.
detail::atomic_bool is_initialized_{false};
};
#else
// Subinterpreter support is enabled.
// In this case, we should store the result per-interpreter instead of globally, because each
// subinterpreter has its own separate state. The cached result may not shareable across
// interpreters (e.g., imported modules and their members).

PYBIND11_NAMESPACE_BEGIN(detail)

template <typename T>
struct call_once_storage {
alignas(T) char storage[sizeof(T)] = {};
std::once_flag once_flag;
void (*finalize)(T &) = nullptr;
std::atomic_bool is_initialized{false};

call_once_storage() = default;
~call_once_storage() {
if (is_initialized) {
if (finalize != nullptr) {
finalize(*reinterpret_cast<T *>(storage));
} else {
reinterpret_cast<T *>(storage)->~T();
}
}
}
call_once_storage(const call_once_storage &) = delete;
call_once_storage(call_once_storage &&) = delete;
call_once_storage &operator=(const call_once_storage &) = delete;
call_once_storage &operator=(call_once_storage &&) = delete;
};

PYBIND11_NAMESPACE_END(detail)

// Prefix for storage keys in the interpreter state dict.
# define PYBIND11_CALL_ONCE_STORAGE_KEY_PREFIX PYBIND11_INTERNALS_ID "_call_once_storage__"

// The life span of the stored result is the entire interpreter lifetime. An additional
// `finalize_fn` can be provided to clean up the stored result when the interpreter is destroyed.
template <typename T>
class gil_safe_call_once_and_store {
public:
// PRECONDITION: The GIL must be held when `call_once_and_store_result()` is called.
template <typename Callable>
gil_safe_call_once_and_store &call_once_and_store_result(Callable &&fn,
void (*finalize_fn)(T &) = nullptr) {
if (!is_last_storage_valid()) {
// Multiple threads may enter here, because the GIL is released in the next line and
// CPython API calls in the `fn()` call below may release and reacquire the GIL.
gil_scoped_release gil_rel; // Needed to establish lock ordering.
// There can be multiple threads going through here.
storage_type *value = nullptr;
{
gil_scoped_acquire gil_acq;
// Only one thread will enter here at a time.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem true if we're talking about a free-threading build of Python? In general the assumption in this code that gil_scoped_acquire enforces mutual exclusion is not valid on free-threaded Python.

I think get_or_create_call_once_storage_map() needs to be reworked to do something like a test-and-test-and-set, i.e.: If the dict entry doesn't exist, then create the map and use PyDict_SetDefault[Ref]; if setdefault returns a different map, use that and free the one you just created. You will also need a lock to synchronize updates to the map. It might be easier to use a Python dict (which comes with a built-in lock) instead of a std::unordered_map, with pointer-cast-to-integer keys and capsule values. The capsule destructor would finalize a single element instead of all of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change to per-storage capsule approach rather than one capsule of a collection of storages.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment here still says "only one thread will enter at a time" which is not true.

value = get_or_create_storage_in_state_dict();
}
assert(value != nullptr);
std::call_once(value->once_flag, [&] {
// Only one thread will ever enter here.
gil_scoped_acquire gil_acq;
// fn may release, but will reacquire, the GIL.
::new (value->storage) T(fn());
value->finalize = finalize_fn;
value->is_initialized = true;
last_storage_ptr_ = reinterpret_cast<T *>(value->storage);
is_initialized_by_at_least_one_interpreter_ = true;
});
// All threads will observe `is_initialized_by_at_least_one_interpreter_` as true here.
}
// Intentionally not returning `T &` to ensure the calling code is self-documenting.
return *this;
}

// This must only be called after `call_once_and_store_result()` was called.
T &get_stored() {
T *result = last_storage_ptr_;
if (!is_last_storage_valid()) {
gil_scoped_acquire gil_acq;
auto *value = get_or_create_storage_in_state_dict();
result = last_storage_ptr_ = reinterpret_cast<T *>(value->storage);
}
assert(result != nullptr);
return *result;
}

gil_safe_call_once_and_store() = default;
// The instance is a global static, so its destructor runs when the process
// is terminating. Therefore, do nothing here because the Python interpreter
// may have been finalized already.
PYBIND11_DTOR_CONSTEXPR ~gil_safe_call_once_and_store() = default;

// Disable copy and move operations because the memory address is used as key.
gil_safe_call_once_and_store(const gil_safe_call_once_and_store &) = delete;
gil_safe_call_once_and_store(gil_safe_call_once_and_store &&) = delete;
gil_safe_call_once_and_store &operator=(const gil_safe_call_once_and_store &) = delete;
gil_safe_call_once_and_store &operator=(gil_safe_call_once_and_store &&) = delete;

private:
using storage_type = detail::call_once_storage<T>;

// Indicator of fast path for single-interpreter case.
bool is_last_storage_valid() const {
return is_initialized_by_at_least_one_interpreter_
&& detail::get_num_interpreters_seen() == 1;
}

// Get the unique key for this storage instance in the interpreter's state dict.
// The return type should not be `py::str` because PyObject is interpreter-dependent.
std::string get_storage_key() const {
// The instance is expected to be global static, so using its address as unique identifier.
// The typical usage is like:
//
// PYBIND11_CONSTINIT static gil_safe_call_once_and_store<T> storage;
//
return PYBIND11_CALL_ONCE_STORAGE_KEY_PREFIX
+ std::to_string(reinterpret_cast<std::uintptr_t>(this));
}

// Get or create per-storage capsule in the current interpreter's state dict.
// Use test-and-set pattern with `PyDict_SetDefault` for thread-safe concurrent access.
storage_type *get_or_create_storage_in_state_dict() {
error_scope err_scope; // preserve any existing Python error states

auto state_dict = reinterpret_borrow<dict>(detail::get_python_state_dict());
const std::string key = get_storage_key();
PyObject *capsule_obj = nullptr;

// First, try to get existing storage (fast path).
{
capsule_obj = detail::dict_getitemstring(state_dict.ptr(), key.c_str());
if (capsule_obj != nullptr) {
// Storage already exists, get the storage pointer from the existing capsule.
void *raw_ptr = PyCapsule_GetPointer(capsule_obj, /*name=*/nullptr);
if (!raw_ptr) {
raise_from(PyExc_SystemError,
"pybind11::gil_safe_call_once_and_store::"
"get_or_create_storage_in_state_dict() FAILED "
"(get existing)");
throw error_already_set();
}
return static_cast<storage_type *>(raw_ptr);
}
if (PyErr_Occurred()) {
throw error_already_set();
}
}

// Storage doesn't exist yet, create a new one。
// Use unique_ptr for exception safety: if capsule creation throws,
// the storage is automatically deleted.
auto storage_ptr = std::unique_ptr<storage_type>(new storage_type{});
// Create capsule with destructor to clean up when the interpreter shuts down.
auto new_capsule = capsule(
storage_ptr.get(), [](void *ptr) -> void { delete static_cast<storage_type *>(ptr); });

// Use `PyDict_SetDefault` for atomic test-and-set:
// - If key doesn't exist, inserts our capsule and returns it.
// - If key exists (another thread inserted first), returns the existing value.
// This is thread-safe because `PyDict_SetDefault` will hold a lock on the dict.
//
// NOTE: Here we use `dict_setdefaultstring` instead of `dict_setdefaultstringref` because
// the capsule is kept alive until interpreter shutdown, so we do not need to handle incref
// and decref here.
capsule_obj
= detail::dict_setdefaultstring(state_dict.ptr(), key.c_str(), new_capsule.ptr());
if (capsule_obj == nullptr) {
throw error_already_set();
}

// Check whether we inserted our new capsule or another thread did.
if (capsule_obj == new_capsule.ptr()) {
// We successfully inserted our new capsule, release ownership from unique_ptr.
return storage_ptr.release();
}
// Another thread already inserted a capsule, use theirs and discard ours.
{
// Disable the destructor of our unused capsule to prevent double-free:
// unique_ptr will clean up the storage on function exit, and the capsule should not.
if (PyCapsule_SetDestructor(new_capsule.ptr(), nullptr) != 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be simpler (since it can't fail) to release the unique_ptr here, effectively passing ownership to the new capsule.

raise_from(PyExc_SystemError,
"pybind11::gil_safe_call_once_and_store::"
"get_or_create_storage_in_state_dict() FAILED "
"(clear destructor of unused capsule)");
throw error_already_set();
}
// Get the storage pointer from the existing capsule.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates the block of code starting at line 264; can you rearrange so it only has to be written once? Something like:

  • try to get capsule from dict
  • if unsuccessful, try to add capsule to dict, or obtain the concurrently-added one
  • regardless of which of the above paths is taken, wind up with a reference to the right capsule object; pull the pointer from it and return

I think that would simplify the control flow here a good deal. You could also treat a newly created capsule as taking ownership of the storage as soon as it has been successfully created, so you don't need to switch between returning the unique_ptr.release() on one path vs the capsule pointer on another.

void *raw_ptr = PyCapsule_GetPointer(capsule_obj, /*name=*/nullptr);
if (!raw_ptr) {
raise_from(PyExc_SystemError,
"pybind11::gil_safe_call_once_and_store::"
"get_or_create_storage_in_state_dict() FAILED "
"(get after setdefault)");
throw error_already_set();
}
return static_cast<storage_type *>(raw_ptr);
}
}

// No storage needed when subinterpreter support is enabled.
// The actual storage is stored in the per-interpreter state dict via
// `get_or_create_storage_in_state_dict()`.

// Fast local cache to avoid repeated lookups when there are no multiple interpreters.
// This is only valid if there is a single interpreter. Otherwise, it is not used.
// WARNING: We cannot use thread local cache similar to `internals_pp_manager::internals_p_tls`
// because the thread local storage cannot be explicitly invalidated when interpreters
// are destroyed (unlike `internals_pp_manager` which has explicit hooks for that).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If internals_pp_manager can do it, why can't we?

Should the map be a member of local_internals so we can benefit from the caching and finalization logic that already exists there? We can add things to local internals without bumping the internals version.

Copy link
Contributor Author

@XuehaiPan XuehaiPan Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are still some bugs with the internals_pp_manager. internals and local_internals are not reliable for now. There is still some work that needs to be done to ensure it works correctly under the multiple-interpreters case. So I added this warning comment, and maybe we can do some improvement in the future. The first priority for this PR is to make things work right.

For example, in the comment #5933 (comment) and the CI failure https://github.com/pybind/pybind11/actions/runs/20414341520?pr=5933, py::native_enum<...> in import mod complains that the enum type is already registered. However, the registration map is stored in internals and local_internals, which should be per-interpreter dependent.

if (detail::get_local_type_info(typeid(EnumType)) != nullptr
|| detail::get_global_type_info(typeid(EnumType)) != nullptr) {
pybind11_fail(
"pybind11::native_enum<...>(\"" + enum_name_encoded
+ "\") is already registered as a `pybind11::enum_` or `pybind11::class_`!");
}
if (detail::global_internals_native_enum_type_map_contains(enum_type_index)) {
pybind11_fail("pybind11::native_enum<...>(\"" + enum_name_encoded
+ "\") is already registered!");
}

This indicates there exist bugs in internals_pp_manager. The bugs are not related to gil_safe_call_once_and_store.

I suspect that the unref and destroy hooks for internal_pp_manager only work for interpreters created by the pybind11::subinterpreter API. The tls storage will be malformed when the interpreters are created/destroyed from the Python side. For example, using concurrent.interpreters and concurrent.futures.InterpreterPoolExecutor and internal_pp_manager is not aware that the interpreter is destroyed from the Python side.

cc @b-pass

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that the unref and destroy hooks for internal_pp_manager only work for interpreters created by the pybind11::subinterpreter API. The tls storage will be malformed when the interpreters are created/destroyed from the Python side.

That does seem likely. Can we fix it for everyone, instead of working around it for this specific case while leaving a problem in other areas? I believe everything would work out if we do per-interpreter cleanup from the internals capsule destructor, since clearing the interpreter state dict is one of the last actions on interpreter shutdown.

T *last_storage_ptr_ = nullptr;
// This flag is true if the value has been initialized by any interpreter (may not be the
// current one).
detail::atomic_bool is_initialized_by_at_least_one_interpreter_{false};
};
#endif

PYBIND11_NAMESPACE_END(PYBIND11_NAMESPACE)
Loading
Loading