-
Notifications
You must be signed in to change notification settings - Fork 36
Closed
Description
There has been a big change in pairtools functions since last release (April 2019!).
With recent dedup and parse updates which add functionality it is important to document changes and release them as the new version.
Note: this is the header post connecting multiple issues, feel free to update and improve!
PR with updates: #117
Post merge:
- sphinx docs update with incorporated walkthroughs
Fixes by modules:
pairtools dedup
- finalize detection of optical duplicates detecting optical duplicates #106 and optical dedup stats #59, also related to add non-additive stat summaries to dedup/stats: complexity estimation, cis/total, ... #54
- chunked dedup by @Phlya
- improvement of dedup to include reporting of the parent readID by @Phlya and @agalitsyna
pairtools stats/scaling
- split dedup stats and regular stats
- output chromosome size to the stats output add chromsizes into pairtools stats output #83
- pairtools stats: YAML output? format pairtools stats output as YAML #111 and should we store stats as YAML (or json) #79
- pairtools scaling tool which takes into account chromosome sizes: improve distance binning for FR,RR,FR,RF pairs "scalings" in stats output #81, calculate Pc(s) in stats #56?
pairtools parse
- parse complex walks engine and tools: Parse2 update (#99) #109
- stdin and stdout reporting defaults: make auto_open default to stdin/stdout when path evaluates to False #48
- flipping issue: standalone flip fails when "reduced.chrom.sizes" provided, should go lexicographical #91
pairtools phase
- make work with both pip and github versions of bwa: pairtools phase critical update #114
pairtools restrict
- Handle empty pairs with "!" chromosomes: restrict can't handle '!' chromosome #76
- Problem with restriction sites header/first rfrag: Assignment of fragment in restrict passed the first fragment. #73
- Suggestions by @golobor: add restriction tool #16
pairtools merge
- do not require sorting? in merge, add an option to concatenate pairsam files instead of merging sorted files #23
- headers handling: in merge, add an option to take a header from the first file and ignore other headers #18
General improvements:
Headers maintenance
- allow adding a header to a headerless file CLI to add header #119
or broader addition of the headed module, draft: Header CLI #121
Code maintenance
- transfer pairlib into sandbox of pairtools lib
- separate cli and lib
- Remove OrderedDict: ImportError: cannot import name 'Mapping' from 'collections' #113
- Clean up deprecation warnings, e.g. VisibleDeprecationWarning #71
- Fix input errors without explanations, e.g. merge without arguments #61
Specific proposals:
Docs improvements
- pairtools walkthrough
- phasing walkthrough
- parse docs update
Tests proposals
- add tests for dedup @Phlya : tests for dedup #5
- add tests for stats, and merge: tests for dedup #5
Enhancements
- add summaries: Add summaries #105
- support of bwa mem2, which is 2-3 times faster than usual bwa mem: speedup Hi-C analysis with bwa mem2 #118
- I/O single utility instead of repetitive code in each module
Post-release
- let the user to define the rule of "best representative" in each cluster, in particular, by MAPQ? modify the pairtools-dedup to select the best MAPQ alignment as output of deduplication #95
Declined for this release
- bam annotation? add extra pair-related annotation to bams, which would allow to restore .pairs from annotated bams #67
- report mapq in the stats: collect mapq stats in the pairs-stats if possible #80 (or extend to any specified additional fields?)
- support Python 3.10: not possible due to conda problem with glibc
- single-cell walkthrough: too detailed
- more extended description of pair types standards, maybe a walkthrough (see question: total_single_sided_mapped contains Pair types #112, also Fragment-level analysis #68, type of UU (unique-unique) is close to each other #104)
- Add tests for compression-decompression: add tests for compression/decompression #51
- Add tests for example_pipeline @golobor : add a test for the example_pipeline #35
Resolved with no implementation
- duplicate the data processing history (currently stored in @pg fields) in #command fields of the .pairs header:, declined for now: duplicate the data processing history (currently stored in @PG fields) in #command fields of the .pairs header #70
- suggestion to set the default temporary folder to ./ instead of $TMPDIR, declined for now PairTools sort failure on merged data #84
- sort is parallel, but someone reported that it is not for their case, no reproducible example: pairtools sort not parallelizing #72
- pairtools subsampling is present, not clear what might be the modifications: pairsamtools subsampling [new tool, enhancement] #66
- unified way of changing the separator, not clear why it's needed and what are the use cases: in all pairsamtools, have a unified way to provide/override column indices and the separator #50
zqbake
Metadata
Metadata
Assignees
Labels
No labels