-
Notifications
You must be signed in to change notification settings - Fork 212
Remove fragmenting from Giraffe #4765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This is what I managed to get out of Anthropic Claude on the subject of removing fragmenting and coalescing things to go straight from zip code trees to chaining. I had to make a couple changes to make it pass the Giraffe tests. I read through the code and it looks plausible, but it's possible the funnel logic is wrong or that the less apt parameter defaults get kept. This needs to be evaluated for mapping and calling accuracy against the version that has the fragmenting code but defaults to bypassing it.
|
If we're getting rid of chaining, the vg/src/algorithms/chain_items.cpp Lines 329 to 332 in aa8171c
But if we simply chain all seeds directly, then every seed will correspond to an anchor border, since every seed will be its own anchor. |
|
@faithokamoto We can't get rid of the abstraction of having an vg/src/minimizer_mapper_from_chains.cpp Lines 1386 to 1391 in 0c86c8f
So I think even with the removal of fragmenting, we still have to deal with having seeds in play that are not |
|
I checked this on calling with It looks like this removes a few calling errors. I also evaluated speed previously and we don't get too much slower. So I think this is ready. |
|
This should also be tested on R10 |
Changelog Entry
To be copied to the draft changelog by merger:
Description
To avoid metaphysical angst about why recombination penalties at fragmenting make things worse instead of better, this PR removes fragmenting entirely (on top of some commits merely bypassing it).
Bypassing fragmenting seems to decrease speed substantially on simulated hifi reads, increase accuracy somewhat on simulated hifi reads, and decrease speed somewhat on real hifi reads. (I haven't gotten R10 results yet because my whole-node timing jobs are still in queue.)
This code has been almost all synthesized by Anthropic Claude, using almost all of its patience (aka token limit for the day). I reviewed it and it appears to have done what I wanted to do and glommed the two step functions together (even though it did this by writing a new one and then deleting the old ones), but this still needs to be tested for mapping and calling accuracy effects (vs. d1625a9).