Skip to content

Conversation

@ohadravid
Copy link
Contributor

@ohadravid ohadravid commented Nov 10, 2025

Summary

Switch the thread local destructors implementation on Windows to use the Fiber Local Storage APIs, which provide native support for setting a callback to be called on thread termination, replacing the current tls_callback symbol-based implementation.

Except for some spellchecking, no LLMs were used to produce code / comments / text in this PR.

Current Implementation

On Windows, in order to support thread locals with destructors,
the standard library uses a special tls_callback symbol that is used to call the destructors::run() hook on thread termination.

This has two downsides:

  1. It is not well documented, and seems to cause some problems 1 2 3.
  2. It disallows some synchronization operations, as mentioned in LocalKey's documentation.

as an example of point 2, this code, which uses JoinHandle::join in a thread local Drop impl, will deadlock on stable:

Join-on-Drop Deadlock Example
struct JoinOnDrop(Option<JoinHandle<()>>);

impl Drop for JoinOnDrop {
    fn drop(&mut self) {
        self.0.take().unwrap().join().unwrap();
    }
}

thread_local! {
    static HANDLE: JoinOnDrop = {
        let thread = std::thread::spawn(|| {   
            println!("Starting...");
            // std::thread::sleep(Duration::from_secs(3));
            println!("Done");
        });

        JoinOnDrop(Some(thread))
    };
}


fn main() {
    let thread = std::thread::spawn(|| {
        HANDLE.with(|_| {
            println!("Some other thread");
        })
    });

    thread.join().unwrap();

    println!("Done");
}

Proposed Change

We can use the Fls{Alloc,Set,Get,Free} functions (see https://devblogs.microsoft.com/oldnewthing/20191011-00/?p=102989)
to implement the dtor callback needed for thread locals that have a Drop implementation.

We allocate a single key, and use its destructor callback to run all the registered destructors when a thread is shutting down.

With this implementation, the above code sample will not deadlock (but it still might not be a good idea to do this!).

Safety and Compatibility

Destructors will only run once: we use the common thread_local + atomic pattern to only set the Fls maker value once. The destructor callback is only called when that value is non-zero, so we are guaranteed that it will only be called once.

Destructors will only run at thread exit: we verify that we are not running in a fiber during the destructors callback. This means that using fibers (which is very rare) will result in thread local being leaked, unless the fiber is converted back to a thread using ConvertFiberToThread before thread termination. This is not ideal, but should be OK as destructors are not guaranteed to run, but it needs to be documented.

  • To be documented (replaces the current note in the docs about synchronization, and should also be noted in the rt module).

It might be possible for the user to use something like the current tls_callback to observe an already-freed thread locals, which is something that can also happen in the current implementation.

Destructors will only run on the correct thread: Fibers cannot be moved between threads.
Destructors will only run on the correct thread: they are registered to a thread_local list, so fiber movement between threads does not matter.

Users cannot observe different locals because they are using fibers: because we only use an Fls local marker to trigger the destructors callback, we don't change anything about how users interact with "normal" thread locals and fiber locals.

Other Notes

The implementation is based on the key::racy and guard::apple code, because we need a LazyKey-like racey static and an enable function.

While TLS slots are limited to 1088,
FLS slots are currently limited to 4000
per process.

Miri

Because miri is aware to the thread local implementation, I also implemented these functions and support for them in the interpreter here:

https://github.com/rust-lang/miri/compare/master...ohadravid:miri:windows-fls-support?expand=1

I guess that this will need to be merged before this PR (if this is accepted) - let me know and I'll open that PR as well.

Targets without target_thread_local

In *-gnu Windows targets, the target_thread_local feature is unavailable.

We could also change the "key" (non-target_thread_local) Windows impl at
library\std\src\sys\thread_local\key\windows.rs
to be based on the Fls functions. I can add it to this PR, or as a separate PR, if you think this is preferable.

Also, I used a Cell in a #[thread_local] to store the resulting key, like the other implementations.
This works, but I'm not sure if this is 100% OK given that we have these targets as well.

@rustbot rustbot added O-windows Operating system: Windows S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Nov 10, 2025
@rustbot
Copy link
Collaborator

rustbot commented Nov 10, 2025

r? @ChrisDenton

rustbot has assigned @ChrisDenton.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rust-log-analyzer

This comment has been minimized.

@ohadravid
Copy link
Contributor Author

Hi @ChrisDenton, any chance you can take a look at this? Tests are only failing because of missing support in Miri (which I have implemented in a branch)

Thanks!

@bjorn3
Copy link
Member

bjorn3 commented Dec 25, 2025

Wouldn't this conflict with the use of fibers by user code? If you switch to a fiber, access a tls variable for the first time on a thread, switch back and destroy the fiber, the tls variable would get incorrectly deinitialized. And if you move a fiber to another thread and exit the original thread, the tls variable would get deallocated while the fiber still has a reference to it that will cause a use-after-free when destroying the fiber.

@ohadravid
Copy link
Contributor Author

ohadravid commented Dec 25, 2025

Wouldn’t this conflict with the use of fibers by user code?

No - The way the new code works is that std will, at runtime initialization, call enable, which sets a single fiber-local to invoke the special std destructor when that thread exits. Because enable is called at that point (and not when the first TLS variable is accessed), there should be no UAF: the user can start a fiber (== convert the current thread into a fiber), but by then enable has already been called and the TLS variables will be alive for the lifetime of the thread.

Edit: fibers can't be moved between threads, so the text below is not really relevant. I also update the code to match my next comment about order of fiber/thread exit.


However, I now think you are technically right @bjorn3 , but only if the user starts the executable outside of Rust and performs runtime initialization (or otherwise triggers enable()) while already running in a non-main-fiber. Following your example, they would need to: (1) create a non-Rust thread, (2) convert that thread to a fiber, (3) create a new fiber, (4) cause enable() to be called in that fiber, and (5) move that fiber to a new thread and exit the original thread, leading to a UAF.

I couldn’t find whether there is even a documented way to do this (since rt::init is private). Maybe init_current via std::thread::current(), but that isn’t exposed via FFI anyway. Is this documented anywhere else?

In theory, I could add a check when setting the destructor that GetCurrentFiber returns null (though it’s a macro, not a function, so not easy), and avoid registering the callback in that case.

However, it seems better to document that this usage is unsupported if that's something that's not currently guaranteed to work.

@bjorn3
Copy link
Member

bjorn3 commented Dec 25, 2025

Initially, I thought not. The way the new code works is that std will, at runtime initialization, call enable, which sets a single fiber-local to invoke the special std destructor when that thread exits. Because enable is called at that point (and not when the first TLS variable is accessed), there should be no UAF: the user can start a fiber (== convert the current thread into a fiber), but by then enable has already been called and the TLS variables will be alive for the lifetime of the thread.

That still doesn't account for multiple fibers running on the same thread or fibers migrating between threads, right? The TLS variables have unique storage per thread, while the fiber-local destructor runs once per fiber on whichever thread destroys the fiber in the end as I understand it.

However, I now think you are technically right @bjorn3 , but only if the user starts the executable outside of Rust and performs runtime initialization (or otherwise triggers enable()) while already running in a non-main-fiber. Following your example, they would need to: (1) create a non-Rust thread, (2) convert that thread to a fiber, (3) create a new fiber, (4) cause enable() to be called in that fiber, and (5) move that fiber to a new thread and exit the original thread, leading to a UAF.

enable would have to be called as soon as the first Rust TLS variable is accessed, right? Otherwise cdylibs would never run the destructor for TLS variables.

@ohadravid
Copy link
Contributor Author

ohadravid commented Dec 25, 2025

the fiber-local destructor runs once per fiber

I will add a comment in the code so it's more clear, but no - because we only execute the unsafe { set(key, ptr::without_provenance(1)) }; line once, in some fiber (== usually just the thread), and the dtor callback is only called when the key's value is not zero, the dtors will only run when the thread exists, no matter what fibers are executed later, which matches the existing behvaior.

My understanding is that there isn't a safe way to exit a fiber without terminating the thread anyway (to use fibers, a thread must always start by calling ConvertThreadToFiber and DeleteFiber says "If the currently running fiber calls DeleteFiber, its thread calls ExitThread and terminates. However, if a currently running fiber is deleted by another fiber, the thread running the deleted fiber is likely to terminate abnormally because the fiber stack has been freed.")

enable would have to be called as soon as the first Rust TLS variable is accessed, right? Otherwise cdylibs would never run the destructor for TLS variables.

Hm, seems like library/std/src/sys/thread_local/destructors/list.rs::register does that on first access, yes. There's also __rust_std_internal_init in library/std/src/sys/thread_local/native/mod.rs which I think suggests that this happens post-rt::init anyway? I'm not sure.

@ohadravid ohadravid force-pushed the windows-thread-local-dtors-using-fls branch from 6502684 to 0327bec Compare December 26, 2025 12:34
@rustbot
Copy link
Collaborator

rustbot commented Dec 26, 2025

The Miri subtree was changed

cc @rust-lang/miri

@rustbot

This comment has been minimized.

@ChrisDenton
Copy link
Member

ChrisDenton commented Jan 8, 2026

I did see this before the holidays but didn't have time to investigate. Last time I considered this I had concerns because rust does not manage threads, except those it spawns itself (and even then only to a degree). Which means an FLS destructor may run before the OS thread finishes whereas TLS is expected to be valid for the duration of the OS thread. Maybe those concerns are unfounded or mitigate but I'd want to be very sure before switching to it.

I'll ping the windows group in case anyone has reasons we should/shouldn't do this.

@rustbot ping windows

@rustbot

This comment was marked as outdated.

@rustbot
Copy link
Collaborator

rustbot commented Jan 8, 2026

Hey Windows Group! This bug has been identified as a good "Windows candidate".
In case it's useful, here are some instructions for tackling these sorts of
bugs. Maybe take a look?
Thanks! <3

cc @albertlarsan68 @arlosi @ChrisDenton @danielframpton @dpaoliello @gdr-at-ms @kennykerr @luqmana @nico-abram @retep998 @sivadeilra @wesleywiser

Copy link
Contributor

@dpaoliello dpaoliello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I'm in favor of this approach: it replaces undocumented features that have caused issues with a documented feature.

View changes since this review

@dpaoliello
Copy link
Contributor

My understanding is that there isn't a safe way to exit a fiber without terminating the thread anyway (to use fibers, a thread must always start by calling ConvertThreadToFiber and DeleteFiber says "If the currently running fiber calls DeleteFiber, its thread calls ExitThread and terminates. However, if a currently running fiber is deleted by another fiber, the thread running the deleted fiber is likely to terminate abnormally because the fiber stack has been freed.")

Is it possible to delete a fiber that isn't running and will never run again?

@dpaoliello
Copy link
Contributor

Last time I considered this I had concerns because rust does not manage threads, except those it spawns itself (and even then only to a degree). Which means an FLS destructor may run before the OS thread finishes whereas TLS is expected to be valid for the duration of the OS thread.

Assuming that Fiber destruction is linked to thread destruction (or that we can somehow only run this code when the last fiber is destroyed), then is the gap between the fibers being destroyed and the thread being destroyed observable? I think it only matters to code running in DLL Detach, which shouldn't be messing with TLS stuff in Rust anyway...

@ChrisDenton
Copy link
Member

We have no way of ensuring that FlsSetValue is called on the main fiber. So if DeleteFiber is called on the fiber that used FlsSetValue then the FLS callback will be run but the OS thread could continue indefinitely (running other fibers or switching back to a non-fiber thread).

A very quick sketch
use std::ffi::c_void;
use windows::Win32::System::Threading::{
    ConvertThreadToFiber, CreateFiber, DeleteFiber, FlsAlloc, FlsSetValue, SwitchToFiber, ConvertFiberToThread,
};

fn main() {
    unsafe {
        let main = ConvertThreadToFiber(None);
        let fiber = CreateFiber(0, Some(fiber_start), Some(main));
        println!("switching to another fiber");
        SwitchToFiber(fiber);
        DeleteFiber(fiber); // Invokes the FLS callback.
        println!("end of main fiber");
    }
}

unsafe extern "system" fn fls_callback(_param: *const c_void) {
    println!("fls dealloc");
}

extern "system" fn fiber_start(main: *mut c_void) {
    println!("fiber started");
    unsafe {
        let index = FlsAlloc(Some(fls_callback));
        let _ = FlsSetValue(index, Some(1234 as _));
        SwitchToFiber(main);
    };
}

@ohadravid
Copy link
Contributor Author

ohadravid commented Jan 10, 2026

Is it possible to delete a fiber that isn't running and will never run again?

I totally missed that! Thanks @ChrisDenton's for the sketch 🙏

We have no way of ensuring that FlsSetValue is called on the main fiber.

I think it should be possible to call GetCurrentFiber we can use IsThreadAFiber to check if we are running in a fiber or a thread, and not register run the callback when running in a fiber (leading to a leak).

That's a bit of a hack (which might defeat the purpose of replacing the current tls_callback hack), but I think it solves this problem because you cannot DeleteFiber(main) from a different fiber.

@rust-log-analyzer

This comment has been minimized.

@ohadravid ohadravid force-pushed the windows-thread-local-dtors-using-fls branch 3 times, most recently from ab9eddd to 456fa3b Compare January 11, 2026 10:15
@rust-log-analyzer

This comment has been minimized.

@ohadravid ohadravid force-pushed the windows-thread-local-dtors-using-fls branch 2 times, most recently from 25cf05e to e559e78 Compare January 11, 2026 12:44
Copy link
Member

@RalfJung RalfJung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! The Miri code largely LGTM, though I have some nits.

View changes since this review

@ohadravid ohadravid force-pushed the windows-thread-local-dtors-using-fls branch from 0179ba3 to e14adff Compare January 23, 2026 15:25
@ohadravid ohadravid force-pushed the windows-thread-local-dtors-using-fls branch from e14adff to 4a4d0dd Compare January 23, 2026 16:16
Copy link
Member

@RalfJung RalfJung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r=me on the Miri parts (modulo the last assertion nit). Thanks a lot!

View changes since this review

@ohadravid ohadravid force-pushed the windows-thread-local-dtors-using-fls branch from 4a4d0dd to 0c3496f Compare January 23, 2026 16:40
@ohadravid
Copy link
Contributor Author

ohadravid commented Jan 23, 2026

@RalfJung sorry but I added another small change because I missed something 🙈 : turns out Windows zeros the key's value after the dtor is run (also in Wine: https://github.com/wine-mirror/wine/blob/wine-11.0/dlls/ntdll/thread.c#L715).
I added a test & fixed that by keeping another state variable with the last key we ran (pushed in a separate commit). I also verified that windows-tls.rs really produces the expected output in a real Windows env.
Keys without dtors aren't zeroed.

@ChrisDenton
Copy link
Member

ChrisDenton commented Jan 23, 2026

I was thinking that there isn't really a need to setup a special thread_local with Drop, because std::thread::current() will enable the guard explicitly which will register the dtor callback (the new test would also pass when calling Convert back into a thread even if the thread will set the thread_local in the fiber. I'll add a this case as well to the test). We can call it explicitly to be extra safe, but it is called as part of the rt thread-init flow anyway.

It's not documented that calling current() will enable the guard, but we could document it 😅

I don't know about wasmtime specifically but I'm not particularly keen on that in general. We can't guarantee that rust's main is run. For example, if it's compiled as a DLL or the entry point of another language is used (e.g. C/C++). I'm also not sure we should stably guarantee that thread::current will always initialise TLS destructors. It's always possible we may optimise the main thread further and I'm not too keen on restricting future possibilities.

@alexcrichton
Copy link
Member

For Wasmtime we do expect entrypoints outside of Rust (e.g. using the C API of Wasmtime), so we can't rely on Rust's runtime initialization. With this change I'd probably go the route of thread-local-with-dtor-that-does-nothing and hit that whenever we fiber switch to ensure it's initialized for the current thread. The main worry for me is that we allow arbitrary Rust code (the embedder of Wasmtime) to run within a Windows fiber, and that's the risk of breakage here. It sounds like we can mitigate that with thread-local-with-dtor-that-does-nothing, however.

Otherwise though, right, we don't let anything abnormally kill the fiber -- or at least not baked into Wasmtime. Fibers always exit "cleanly" from the perspective of the fiber itself. Put another way, if host code panics or wasm traps, we always catch that within the context of the fiber, exit the fiber, then do whatever's necessary when we're back on a thread.

@ohadravid ohadravid force-pushed the windows-thread-local-dtors-using-fls branch from 90d305b to cf421a4 Compare January 24, 2026 08:54
@ohadravid ohadravid force-pushed the windows-thread-local-dtors-using-fls branch from cf421a4 to c72a0fb Compare January 24, 2026 13:12
@ohadravid
Copy link
Contributor Author

ohadravid commented Jan 24, 2026

@alexcrichton python ./ci/run-tests.py passes (without changes) with a compiler built using branch 🎉 (I'm running on aarch64 and a few debug:: tests aren't passing, but that's an env problem as it also happens to me on stable).

Should I open a PR to add the TLS+Drop access somewhere before the function you mentioned?

Suggested addition to `resume`
let parent_fiber = if is_fiber {
    wasmtime_fiber_get_current()
} else {
    // Newer Rust versions use fiber local storage to register an internal hook that 
    // calls thread locals' destructors on thread exit.
    // This has a limitation: the hook only runs in a regular thread (not in a fiber).
    // We convert back into a thread once execution returns to this function,
    // but we must also ensure that the hook is registered before converting into a fiber.
    // Otherwise, a different fiber could be the first to register the hook,
    // causing the hook to be called (and skipped) prematurely when that fiber is deleted.
    struct Guard;

    impl Drop for Guard {
        fn drop(&mut self) {}
    }
    assert!(needs_drop::<Guard>());
    thread_local!(static GUARD: Guard = Guard);
    GUARD.with(|_g| {});
    ConvertThreadToFiber(ptr::null_mut())
};

Should I also add another test like resume_separate_thread that uses a thread_local with a dtor in a thread created with CreateThread and make sure it works as expected?

@alexcrichton
Copy link
Member

Thanks for testing! And yeah no worries about the debug tests, they're a bit finnicky with precise verisons of installed tools anyway. By no means feel obligated to send a PR to Wasmtime, but if you're willing it'd be much appreciated! The change you propose is what I was thinking as well, so looks good to me 👍

@rust-bors

This comment has been minimized.

@ohadravid ohadravid force-pushed the windows-thread-local-dtors-using-fls branch from c72a0fb to 66af8be Compare January 27, 2026 11:02
@rustbot
Copy link
Collaborator

rustbot commented Jan 27, 2026

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. O-windows Operating system: Windows proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants