Skip to content

Conversation

@David-McKenna
Copy link
Contributor

Hey Cees,

Decided to port this one back while I was doing the cuFFT one -- it re-uses data in order to pad the FFT data with real data where previously 0s were used.

The overall setup to perform this results in less samples being processed on the first iteration so that the end of the first buffer can be filled, but after that it just recycles the data already in the buffer from the current iteration to prepare cp1p/cp2p for the next iteration.

So the overall data structure look like this:

t=N: overlap | processed data | overlap
t=0: <overlap_0 = 0> | <noverlap = reflected data><data_0> | <overlap_1 = data>
t=1: <data_0 overlap> | <overlap_1><data_1> | <overlap_2>
t=2: <data_1 overlap> | <overlap_2>
... etc.

I'm going to make a note of it here as it took me a couple tries to get the indexing on it right: on the first iteration, I discarded / offset the output by 2 * noverlap samples as we are effectively losing noverlap samples on each end of the data.

At the start because we offset the starting point in the array due to there being insufficient data, losing noverlap samples, and at the end we perform overlap which causes another loss of noverlap samples.

Overall the implementation is stable judging by my outputs, but I suspect the process could be made more efficient by tweaking the block/grid sizes for padd_next_iteration (since it only needs to iterate over the first 2 * noverlap samples) and the new unpack_and_padd (as it can skip the first 2 * noverlap samples), though with my layout it's hard to judge what kind of performance effect it'll have on your setup (I reduce nforward from 100 to 8 and increase nsub to 488)

Cheers

This is achieved by initially processing less data on the first iteration to fill up the buffer, then on future iterations re-using data for overlaps.

The general layout of a given block is <overlap_0><data0..N><overlap_1>, followed by the next iteration <dataN><overlap_1><data0...N><overlap_2>, etc.
@David-McKenna
Copy link
Contributor Author

Given the activity on #1, I don't 100% remember if I fixed it or not, in the back of my mind I think there was an indexing error in this MR that I never fixed in that branch after I fixed it in my main one.

As it happens, I'm tweaking the IE613 version of cdmt at the moment, so I have it all pulled down. I'll run a diff and see if I can spot the error and submit an updated MR with the fix in place if that was the case. Not sure if it'll be tonight, but I can probably do it tomorrow evening AU time.

@cbassa
Copy link
Owner

cbassa commented Sep 23, 2024

No worries. I'm trying to make some changes to the code to write out 32bit floats and be able to select a time range (your #2 ), but even running it with a latest cuda version only yields zeros, so I was hopeful your first MR might fix it. It doesn't.

@David-McKenna
Copy link
Contributor Author

That functionality is in the cdmt_udp.cu file in my fork as I always wrote out floats then digifil'd the data for resampling, but there's a lot of other major changes in there too. Roughly following the redig flag might help out

https://github.com/David-McKenna/cdmt/blob/master/cdmt_udp.cu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants