Column with annotated mismatched in parse/parse2 #134

agalitsyna · 2022-07-01T03:18:00Z

This PR introduces several substantial changes that improve usability of pairtools.

1. Mismatches reporting

I utilized "MD" field of sam file and added an option to extract mismatches from the alignment pairs. Here, it is reported as additional column "mismatches" with parse/parse2. With the help of Anton's code on scsHi-C and pysam engine to parse mismatches, it turned out to be rather simple.

For now, the user can request to store mismatches as a separate column of .pairs file in a comprehensive format: "{ref_letter}:{mutated_letter}:{phred}:{ref_position}:{read_position}" (all mutations listed separated by comma).

This column, in principle, can be converted into two important types of data: 1. number of converted pairs per alignment/pair (needed for scsHi-C); 2. nucleotide variants in your Hi-C genome, 3. mutated positions in read (might be useful for Methyl-Hi-C and related stuff).

Example output:

This feature, although not producing any specific analysis, is potentially very powerful. The column with mutations can be used in downstream analysis as is, although we may want to design more specific functions for pairtools in the future.

You may see that the code to support this feature is tiny and easy to support.

2. Docs improvements

There was no description of additional columns produced by various modules of pairtools. I added the summary table of extra columns in formats docs.
More cross-references between docs and tutorials

3. Python 3.10 support by tests

Tests work with Python 3.10 now

4. parse2

flipping is off by default for parse2, we've added explanations to this decision

Previous tests were not working with Python 3.10 because both pysam and bioframe did not support some packages from conda's python 3.10. A workaround is to install them separately through pip, which does not have these requirements.

…s described in the main page

golobor · 2022-07-04T08:47:51Z

Wow, this is really great!!! This feature is indeed essential for many protocols - some even organize haplotype-resolved Hi-C based on such mapping approach. And the amount of new code is surprisingly tiny. Super nice!!
A couple of questions:
(a) if I understand correctly, currently parse always executes get_mismatches_c? This may potentially be a bit costly, right? One alternative would be to only calculate it when users specify --add-cols mismatches.
(b) is this feature available both in parse and parse2?..

agalitsyna · 2022-07-04T10:01:33Z

Thanks!
(a) Good catch, it makes sense to run it only if additional column with mismatches is requested
(b) Yes, it's available for both, although for parse2 only mutations from the left-sided alignment will be reported (for the case of readthrough, see this lines. But it works for SAM tags and other alignment properties reported for complex walks. We did not decide on any voting scheme for readthroughs, not sure it should be addressed in more detail for now.

What do you think of mismatches format? Current is rather lengthy, but seems to be comprehensive: one mismatch is "{ref_letter}:{mut_letter}:{phred}:{ref_position}:{read_position}", and multiple will be reported as comma-separated list.

…commendations.

agalitsyna · 2022-07-11T23:07:33Z

I will merge it for now because it would be great to start see the docs updates. If there are suggestions on how to improve mutations reporting, will be great to have them submitted separately!

agalitsyna added 8 commits June 30, 2022 23:02

Column with annotated mismatched in parse/parse2

593da0d

extra columns update in the docs; EXTRA_COLUMNS moved to pairsam_format.

513d561

draft docs on extra columns update

ec15514

draft docs on extra columns update

ddf6c32

docs update, python3.10 version update of channels

57a118e

docs update, python3.10 version update of channels

583d6ef

docs update, python3.10 version update of channels

54958c6

docs update, python3.10 version update of channels

532dde1

agalitsyna requested review from Phlya and golobor July 3, 2022 19:13

README updates: links updated; Slack link invotation posted; new tool…

1b18a5c

…s described in the main page

calculate mutations on demand

498ce78

agalitsyna added 2 commits July 7, 2022 14:11

flipping is off for parse2 by default. Important notes on flipping re…

521425b

…commendations.

test flip added

a1d0738

agalitsyna merged commit 1ad161f into master Jul 11, 2022

agalitsyna deleted the detect_mutations branch June 16, 2025 19:32

agalitsyna restored the detect_mutations branch June 16, 2025 19:32

agalitsyna deleted the detect_mutations branch June 16, 2025 19:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Column with annotated mismatched in parse/parse2 #134

Column with annotated mismatched in parse/parse2 #134

Uh oh!

agalitsyna commented Jul 1, 2022 •

edited

Loading

Uh oh!

golobor commented Jul 4, 2022

Uh oh!

agalitsyna commented Jul 4, 2022 •

edited

Loading

Uh oh!

agalitsyna commented Jul 11, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Column with annotated mismatched in parse/parse2 #134

Column with annotated mismatched in parse/parse2 #134

Uh oh!

Conversation

agalitsyna commented Jul 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Mismatches reporting

2. Docs improvements

3. Python 3.10 support by tests

4. parse2

Uh oh!

golobor commented Jul 4, 2022

Uh oh!

agalitsyna commented Jul 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agalitsyna commented Jul 11, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

agalitsyna commented Jul 1, 2022 •

edited

Loading

agalitsyna commented Jul 4, 2022 •

edited

Loading