Skip to content

Conversation

@rinon
Copy link
Collaborator

@rinon rinon commented Apr 18, 2024

No description provided.

kkysen added 2 commits April 17, 2024 04:48
When backtraces are disabled (due to their large performance overhead),
we can still capture the `Location::caller()`, which is quite cheap.
This is very useful for rare panics in CI, which can be hard to reproduce.
… add `#[track_caller]`s when `debug_assertions`.
@rinon rinon requested a review from fbossen April 18, 2024 00:27
Copy link
Collaborator Author

@rinon rinon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails due to out of bounds slice access in the third backup_lpf call (https://github.com/memorysafety/rav1d/pull/981/files#diff-e9e188b522fe10286c7075c8068b07949fff51301681598c10c42927aa884061R339). I think this length must be wrong. @fbossen, can you look at this? Looks like you added the slice originally and I don't really understand that PR (#728).

…ounds error messages (#980)

When backtraces are disabled (due to their large performance overhead),
we can still capture the `Location::caller()`, which is quite cheap.
This is very useful for rare panics in CI, which can be hard to
reproduce.

This also makes the out-of-bounds error messages more informative (they
say which thing exactly is out of bounds and what the bound was, like
`std`).
@rinon rinon force-pushed the sjc/copy_lpf_bug branch from 5786cb6 to a707716 Compare April 18, 2024 06:25
@rinon rinon changed the base branch from main to sjc/init_lowest_pixel_mem April 18, 2024 06:25
@rinon rinon force-pushed the sjc/copy_lpf_bug branch from a707716 to 2993f6b Compare April 18, 2024 06:27
kkysen added 22 commits April 18, 2024 00:17
…elds (#978)

I found these by running the argon tests in debug mode (not just
`debug_assertions`), as these `libstd` assertions are only run in actual
debug mode.
…_dav1d`, as this is the only callsite and we'll need to store stuff on the stack.
…ntMutArcSlice<refmvs_temporal_block>>`s.
…vs_pool` now that `mvs`s have been `Arc`ified.
This fixes a bug from #971 where I was overslicing. It wasn't caught by
the normal tests, only the argon tests that only ran on `main` once I
merged.
`fn splat_mv` is already a method, and `fn rav1d_refmvs_save_tmvs` was
already like a method, but as a free `fn`. So this just makes them all
methods on `Rav1dRefmvsDspContext`, which helps things for `fn
load_tmvs` as I make it and the `mvs` fields safe.
…tMutArcSlice<refmvs_temporal_block>>`s (#984)

* Fixes `{,ref_}mvs{,_ref}` fields of #713.
* Fixes `refmvs{,_pool}` fields of #641.
This was already `Arc`ified in a previous PR, so this removes the now
unused pool.
kkysen and others added 23 commits April 19, 2024 12:55
…e `&'static`s (#986)

* Fixes `dsp` field of #713.
* Fixes all `fn *dsp_init`s of #862.
* Improves DSP initialization code size of #809.

This
* Makes all `fn *dsp_init*`s safe.
* Changes all `fn *dsp_init*`s into `fn *DSPContext::new`s that directly
initialize the type without `unsafe`ly zero initializing it first. This
is done by directly initializing in `fn default`, which is for the
fallback `fn`s, and then updating the `fn` ptrs in the `fn
init_{x86,arm}`s. These `fn`s are also all `const` now, as that was
trivial.
* Adds `wrap_fn_ptr!` and `enum_map!`s to the array fields of
`Rav1dMCDSPContext`, since I needed a default value to initialize the
arrays directly if I want it the indices to be named (rather than a
normal array, whose order is easy to mix up). `wrap_fn_ptr!` provides
that default value, which is then optimized out.
* Replaces the `Rav1dContext::dsp` array, which is manually lazily
initialized, with `fn Rav1dDSPContext::get`, which uses `OnceLock` for
lazy initialization.
* Makes the `Rav1dFrameData::dsp` raw ptr into a `&'static` now that we
lazily initialize it with `OnceLock`.

This also dramatically reduces the code size of DSP initialization (in
KiB):

| Target                          | Before | After |
| ------------------------------- | ------ | ----- |
|      `x86_64-unknown-linux-gnu` | 192.5  | 95.6  |
|        `i686-unknown-linux-gnu` | 106.5  | 39.6  |
|     `aarch64-unknown-linux-gnu` |  88.2  | 15.3  |
| `armv7-unknown-linux-gnueabihf` | 103.8  | 17.0  |

This is measured before using `cargo bloat --filter 'dsp_init'` and
after using `cargo bloat --filter 'DSPContext::new'`.
@rinon rinon requested a review from kkysen April 19, 2024 23:46
@rinon rinon force-pushed the sjc/copy_lpf_bug branch from 2993f6b to 217cd21 Compare April 19, 2024 23:46
@rinon
Copy link
Collaborator Author

rinon commented Apr 20, 2024

Oops, this got merged as part of #950. Closing this, but if you could have a look at 217cd21 still I'd appreciate it.

@rinon rinon closed this Apr 20, 2024
@kkysen
Copy link
Collaborator

kkysen commented Apr 20, 2024

Oops, this got merged as part of #950. Closing this, but if you could have a look at 217cd21 still I'd appreciate it.

What's the difference between src_y_stride and just y_stride? Besides not knowing that, everything else in the commit LGTM.

@rinon
Copy link
Collaborator Author

rinon commented Apr 20, 2024

y_stride is the output stride (from lr_stride[..]), src_y_stride is the stride for cdef_line_buf. In the original C code we use src_stride[..] as the strides for cdef_line_buf, rather than lr_stride[..] The two strides can be different due to (I think) superscaling?

@rinon rinon deleted the sjc/copy_lpf_bug branch July 2, 2024 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants