Use real data for the overlaps at the start / end of every iblock iteration #4

David-McKenna · 2020-07-30T11:34:23Z

Hey Cees,

Decided to port this one back while I was doing the cuFFT one -- it re-uses data in order to pad the FFT data with real data where previously 0s were used.

The overall setup to perform this results in less samples being processed on the first iteration so that the end of the first buffer can be filled, but after that it just recycles the data already in the buffer from the current iteration to prepare cp1p/cp2p for the next iteration.

So the overall data structure look like this:

I'm going to make a note of it here as it took me a couple tries to get the indexing on it right: on the first iteration, I discarded / offset the output by 2 * noverlap samples as we are effectively losing noverlap samples on each end of the data.

At the start because we offset the starting point in the array due to there being insufficient data, losing noverlap samples, and at the end we perform overlap which causes another loss of noverlap samples.

Overall the implementation is stable judging by my outputs, but I suspect the process could be made more efficient by tweaking the block/grid sizes for padd_next_iteration (since it only needs to iterate over the first 2 * noverlap samples) and the new unpack_and_padd (as it can skip the first 2 * noverlap samples), though with my layout it's hard to judge what kind of performance effect it'll have on your setup (I reduce nforward from 100 to 8 and increase nsub to 488)

Cheers

This is achieved by initially processing less data on the first iteration to fill up the buffer, then on future iterations re-using data for overlaps. The general layout of a given block is <overlap_0><data0..N><overlap_1>, followed by the next iteration <dataN><overlap_1><data0...N><overlap_2>, etc.

David-McKenna · 2024-09-23T05:38:22Z

Given the activity on #1, I don't 100% remember if I fixed it or not, in the back of my mind I think there was an indexing error in this MR that I never fixed in that branch after I fixed it in my main one.

As it happens, I'm tweaking the IE613 version of cdmt at the moment, so I have it all pulled down. I'll run a diff and see if I can spot the error and submit an updated MR with the fix in place if that was the case. Not sure if it'll be tonight, but I can probably do it tomorrow evening AU time.

cbassa · 2024-09-23T06:09:50Z

No worries. I'm trying to make some changes to the code to write out 32bit floats and be able to select a time range (your #2 ), but even running it with a latest cuda version only yields zeros, so I was hopeful your first MR might fix it. It doesn't.

David-McKenna · 2024-09-23T06:15:37Z

That functionality is in the cdmt_udp.cu file in my fork as I always wrote out floats then digifil'd the data for resampling, but there's a lot of other major changes in there too. Roughly following the redig flag might help out

https://github.com/David-McKenna/cdmt/blob/master/cdmt_udp.cu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use real data for the overlaps at the start / end of every iblock iteration #4

Use real data for the overlaps at the start / end of every iblock iteration #4

Uh oh!

David-McKenna commented Jul 30, 2020

Uh oh!

David-McKenna commented Sep 23, 2024

Uh oh!

cbassa commented Sep 23, 2024

Uh oh!

David-McKenna commented Sep 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Use real data for the overlaps at the start / end of every iblock iteration #4

Are you sure you want to change the base?

Use real data for the overlaps at the start / end of every iblock iteration #4

Uh oh!

Conversation

David-McKenna commented Jul 30, 2020

Uh oh!

David-McKenna commented Sep 23, 2024

Uh oh!

cbassa commented Sep 23, 2024

Uh oh!

David-McKenna commented Sep 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants