-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Describe the possible issue
Issue first described here: pathoplexus/pathoplexus#790 (comment).
Previously the L and S segments could be grouped resulting in the PPX entry: https://pathoplexus.org/seq/PP_000QFTH.3, however after Genbank added the isolate and strain name to the data ingested by NCBI Virus ingest broke up the grouping as the strain names are not the same.
Evidence of the problem
I believe this is actually a typo as all other relevant fields match, e.g. the Genbank files https://www.ncbi.nlm.nih.gov/nuccore/KT384397.1 and https://www.ncbi.nlm.nih.gov/nuccore/KT384388.1 both contain:
AUTHORS Yadav,P.D., Shete,A.M. and Mourya,D.T.
TITLE First report of nosocomial outbreak of Crimean-Congo hemorrhagic
fever, Rajasthan State, India
JOURNAL Unpublished
...
AUTHORS Yadav,P.D., Shete,A.M. and Mourya,D.T.
TITLE Direct Submission
JOURNAL Submitted (11-AUG-2015) Maximum Containment Laboratory, National
Institute of Virology, Maximum Containment Laboratory, Pune,
Maharashtra India, India
and the strain names are very similar: /strain="NIV130776" vs /strain="NIV1310776"
Suggested change
Manual curation of the ingested strain names to be the same, leading to the now revoked sequence being unrevoked.
Full list of affected sequences
PP_000QFTH.3 will need to be unrevoked and the isolate name curated, sequences PP_004G4NH and PP_004G4PF will need to be revoked.