Skip to content

Conversation

@Jenya705
Copy link
Contributor

@Jenya705 Jenya705 commented Nov 30, 2025

Objective

Enables accessing slices from tables directly via Queries.

Fixes: #21861

Solution

One new trait:

  • ContiguousQueryData allows to fetch all values from tables all at once (an implementation for &T returns a slice of components in the set table, for &mut T returns a mutable slice of components in the set table as well as a struct with methods to set update ticks (to match the fetch implementation))

Methods contiguous_iter, contiguous_iter_mut and similar in Query and QueryState making possible to iterate using these traits.

Macro QueryData was updated to support contiguous items when contiguous(target) attribute is added (a target can be all, mutable and immutable, refer to the custom_query_param example)

Testing

  • sparse_set_contiguous_query test verifies that you can't use next_contiguous with sparse set components
  • test_contiguous_query_data test verifies that returned values are valid
  • base_contiguous benchmark (file is named iter_simple_contiguous.rs)
  • base_no_detection benchmark (file is named iter_simple_no_detection.rs)
  • base_no_detection_contiguous benchmark (file is named iter_simple_no_detection_contiguous.rs)
  • base_contiguous_avx2 benchmark (file is named iter_simple_contiguous_avx2.rs)

Showcase

Examples contiguous_query, custom_query_param

Example

let mut world = World::new();
let mut query = world.query::<(&Velocity, &mut Position)>();
let mut iter = query.contiguous_iter_mut(&mut world).unwrap();
// velocity's type is &[Velocity]
// position's type is &mut [Position]
// ticks's type is ContiguousComponentTicks
for (velocity, (position, mut ticks)) in iter {
    for (v, p) in velocity.iter().zip(position.iter_mut()) {
        p.0 += v.0;
    }
    // sets ticks
    ticks.mark_all_as_updated();
}

Benchmarks

Code for base benchmark:

#[derive(Component, Copy, Clone)]
struct Transform(Mat4);

#[derive(Component, Copy, Clone)]
struct Position(Vec3);

#[derive(Component, Copy, Clone)]
struct Rotation(Vec3);

#[derive(Component, Copy, Clone)]
struct Velocity(Vec3);

pub struct Benchmark<'w>(World, QueryState<(&'w Velocity, &'w mut Position)>);

impl<'w> Benchmark<'w> {
    pub fn new() -> Self {
        let mut world = World::new();

        world.spawn_batch(core::iter::repeat_n(
            (
                Transform(Mat4::from_scale(Vec3::ONE)),
                Position(Vec3::X),
                Rotation(Vec3::X),
                Velocity(Vec3::X),
            ),
            10_000,
        ));

        let query = world.query::<(&Velocity, &mut Position)>();
        Self(world, query)
    }

    #[inline(never)]
    pub fn run(&mut self) {
        for (velocity, mut position) in self.1.iter_mut(&mut self.0) {
            position.0 += velocity.0;
        }
    }
}

Iterating over 10000 entities from a single table and increasing a 3-dimensional vector from component Position by a 3-dimensional vector from component Velocity

Name Time Time (AVX2) Description
base 5.5828 µs 5.5122 µs Iteration over components
base_contiguous 4.8825 µs 1.8665 µs Iteration over contiguous chunks
base_contiguous_avx2 2.0740 µs 1.8665 µs Iteration over contiguous chunks with enforced avx2 optimizations
base_no_detection 4.8065 µs 4.7723 µs Iteration over components while bypassing change detection through bypass_change_detection() method
base_no_detection_contiguous 4.3979 µs 1.5797 µs Iteration over components without registering update ticks

Using contiguous 'iterator' makes the program a little bit faster and it can be further vectorized to make it even faster

@Jenya705 Jenya705 marked this pull request as draft November 30, 2025 15:04
@Jondolf Jondolf added C-Feature A new feature, making something new possible A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times S-Needs-Review Needs reviewer attention (from anyone!) to move forward D-Complex Quite challenging from either a design or technical perspective. Ask for help! D-Unsafe Touches with unsafe code in some way labels Nov 30, 2025
@hymm
Copy link
Contributor

hymm commented Dec 1, 2025

@Jenya705
Copy link
Contributor Author

Jenya705 commented Dec 1, 2025

This pr just enables slices from tables to be returned directly when applicable, it doesn't implement any batches and it doesn't ensure any specific (other than rust's) alignment (yet these slices may be used to apply simd).

  • Am I right in my understanding that some things might not properly vectorize due to alignment issues even if they use as_contiguous_iter?

This pr doesn't deal with any alignments but (as of my understanding) you can always take sub-slices which would meet your alignment requirements. And just referring to the issue #21861, even without any specific alignment the code gets vectorized.

No, the returned slices do not have any specific (other than rust's) alignment requirements.

@chengts95
Copy link

The solution looks promising to solve issue #21861.

If you want to use SIMD instructions explicitly, alignment is something you usually have to manage yourself (with an aligned allocator or a peeled prologue). Auto-vectorization won’t “update” the alignment for you – it just uses whatever alignment it can prove and otherwise emits unaligned loads. From that perspective, a contiguous slice is already sufficient; fully aligned SIMD is a separate concern on top of that.

Copy link
Contributor

@hymm hymm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a full review, but onboard with the general approach in this pr. Overall this is fairly straightforward. I imagine we'll eventually want to have some simd aligned storage, but in the meantime users can probably align their components manually.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

You added a new example but didn't add metadata for it. Please update the root Cargo.toml file.

@Jenya705 Jenya705 marked this pull request as ready for review December 3, 2025 20:02
@ecoskey
Copy link
Contributor

ecoskey commented Jan 22, 2026

The problem with sparse components is that their iteration corresponds more to iterating over pointers to the actual data (i.e., &[&T]) where as for table components it is straight up &[T] which allows for optimizations to occur, I don't see how it would be any useful to implement contiguous iteration over sparse components, because for that we would have to look for entities whose components lie after each other in memory, which itself removes the whole point of the thing because of the unnecessary checks we would have to do. (Or I didn't understand what you meant with the sparse-set component iteration) (P.s: Now looking at #22500 I understand what you mean and it shouldn't be hard to rewrite my code to support your chunk based approach, i.e. return the chunks which were found to match the query)

I'm sort of talking about #22500, and sort of not. Let me try to better articulate my thoughts here...

It seems like right now, the main point of this PR is to enable a specific optimization for table iteration, which is absolutely valuable! And you're right that that doesn't really apply to sparse set components. In that lens, it also makes sense why you've opted to name it "contiguous access" rather than "storage/archetype iteration".

I've been looking at this PR a slightly different way. To me, the valuable core idea here is removing the inner loop of query iteration, and handing that off to the user to allow them to do whatever they want. On the one hand, that enables accessing raw slices of components, but I think there's value beyond that!

For example, if a user had some setup where they need to process per-archetype data in a query, for example for some kind of cache, they might not care if some of the terms hand back a sparse set rather than a dense column. In that case, framing this feature as "iterating over archetypes" makes a lot of sense I think! We could still allow contiguous access where available, but allow ourselves more flexibility with how the feature is used. With that framing, I think you could even implement something like #22500 in userspace! In that sense, I think this pr works well to the same goal of giving users better control over query iteration, though I don't think it actually needs to (or should) iterate over chunks as suggested by @chescock.

Now, none of these points are blockers for this PR, but maybe worth considering for future direction. I think there's a lot of potential here 🙂

So the current implementation uses slices because it is kind of a "first class citizen" within rust so it has a lot of functions implemented for it (slices). It might be better to make some kind of a wrapper over ReadFetch and WriteFetch for components, but then functionally it will be the same from the perspective of the user, because you would have to iterate over it like you would do with a slice and if you want to update changed ticks you would have to do it separately from the actual mutation of the data, because otherwise it won't (shouldn't) be auto-vectorized. I am going to look into it nevertheless.

We should definitely still have a way to access slices directly. My point here was more about replacing the (&[T], &ContiguousComponentTicks) item with a wrapper struct around Read/WriteFetch, that could then have data_slice(), changed_ticks_slice() methods etc.

@Jenya705 Jenya705 requested review from chescock and ecoskey January 22, 2026 21:01
Copy link
Contributor

@hymm hymm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recent changes fixed my reservations with this pr. Just some nits left.

) -> Self::Contiguous<'w, 's> {
fetch.components.extract(
|table| {
// SAFETY: set_table was previously called
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// SAFETY: set_table was previously called
// SAFETY: Caller ensures `set_table` was previously called

last_run: Tick,
) -> Self {
Self {
// SAFETY: The invariants are upheld by the caller.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should list each safety invariant for each unsafe block here as they might apply to different invariants of from_slice_ptrs. We want to be able to trace how each invariant is passed through.

@@ -0,0 +1,55 @@
//! Demonstrates how contiguous queries work
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs some description of why you would use a contiguous query and what they are.


`Query` and `QueryState` have new methods `contiguous_iter`, `contiguous_iter_mut` and `contiguous_iter_inner`, which allows querying contiguously (i.e., over tables). For it to work the query data must implement `ContiguousQueryData` and the query filter `ArchetypeFilter`. When a contiguous iterator is used, the iterator will jump over whole tables, returning corresponding data. Some notable implementors of `ContiguousQueryData` are `&T` and `&mut T`, returning `&[T]` and `ContiguousMut<T>` correspondingly, where the latter structure lets you get a mutable slice of components as well as corresponding ticks. Some notable implementors of `ArchetypeFilter` are `With<T>` and `Without<T>` and notable types **not implementing** it are `Changed<T>` and `Added<T>`.

This is for example useful, when an operation must be applied on a big amount of entities lying in the same tables, which allows for the compiler to auto-vectorize the code, thus speeding it up.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This is for example useful, when an operation must be applied on a big amount of entities lying in the same tables, which allows for the compiler to auto-vectorize the code, thus speeding it up.
For example, this is useful when an operation must be applied on a large amount of entities lying in the same tables, which allows for the compiler to auto-vectorize the code, thus speeding it up.

Copy link
Contributor

@chescock chescock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! My comments are mostly just thoughts on more polish for the Contiguous(Ref|Mut) types, and shouldn't block this PR.

pub(crate) changed: &'w [Tick],
#[expect(
unused,
reason = "ZST in release mode, for the back-portability with ComponentTicksRef"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unused lint is surprising. I wouldn't have expected this to be different from the other slices. ... Oh, I see, there are accessor methods for data_slice() and added_ticks_slice(), but there's no changed_by_slice()!

Do you want to add changed_by_slice() to ContiguousRef and ContiguousMut, and then remove the expect?

It might even make sense to add methods like pub fn get(&self, index: usize) -> Ref<'_, T> that take an index and pack everything into a Ref or Mut, but maybe anyone who wants those would use iter instead of contiguous_iter anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed the DetectChanges trait which has a method to return changed_by values, added the methods to return changed_by values.

/// # Safety
///
/// - The result of [`ContiguousQueryData::fetch_contiguous`] must represent the same result as if
/// [`QueryData::fetch`] was executed for each entity of the set table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually need to be a safety requirement? It's clearly a bug not to do this, but I don't see how it would cause UB if it returned some other unrelated data.

If we do keep this as a safety requirement, then the safety comments on the impls might need some changes. The one on &mut T in particular seems out-of-date.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it a safe trait


/// Data type returned by [`ContiguousQueryData::fetch_contiguous`](crate::query::ContiguousQueryData::fetch_contiguous) for [`Ref<T>`].
#[derive(Clone)]
pub struct ContiguousRef<'w, T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a pub way to construct these? I bet someone will ask for one to be added later, but I think it's reasonable to leave it out until someone does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See ContiguousRef::new

impl<'w, T> ContiguousRef<'w, T> {
/// Returns the data slice.
#[inline]
pub fn data_slice(&self) -> &[T] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can extend the lifetimes returned from ContiguousRef, since & references are Copy.

Suggested change
pub fn data_slice(&self) -> &[T] {
pub fn data_slice(&self) -> &'w [T] {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added method ContiguousRef::into_inner which returns &'w [T]


/// Data type returned by [`ContiguousQueryData::fetch_contiguous`](crate::query::ContiguousQueryData::fetch_contiguous) for [`Ref<T>`].
#[derive(Clone)]
pub struct ContiguousRef<'w, T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to impl Deref<Target = [T]> for ContiguousRef and ContiguousMut?

Copy link
Contributor Author

@Jenya705 Jenya705 Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented Deref<Target = [T]>, DerefMut<Target = [T]>, AsRef<[T]>, AsMut<[T]> and IntoIterator<Item = &T|&mut T> for ContiguousRef and ContiguousMut. Traits returning mutable references also update change ticks automatically, added bypass_change_detection as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-ECS Entities, components, systems, and events C-Feature A new feature, making something new possible C-Performance A change motivated by improving speed, memory usage or compile times D-Complex Quite challenging from either a design or technical perspective. Ask for help! D-Unsafe Touches with unsafe code in some way M-Release-Note Work that should be called out in the blog due to impact S-Needs-Review Needs reviewer attention (from anyone!) to move forward

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Raw table iteration to improve query iteration speed by bypassing change ticks

7 participants