Skip to content

Conversation

@rfdougherty
Copy link

Fixed the data harvesting scripts to allow drug data update. Also refactored the fuzzy-matching feature to use fuzzyset2 and resolved several bugs in the code.

match_data["match_similarity"] = similarity
match_data["match_variant"] = fuzzy_matched_variant
match_data["matching_string"] = cand
lookup_name = match_data.get("name", m)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes!

@woodthom2
Copy link
Member

Hi @rfdougherty! Thanks so much for this pull request and I really appreciate the time you have put into it and your willingness to contribute. Please forgive my late reply.

I just have a quick request, there are a lot of files changed (17 files), so it's a bit hard for me to review as this is the majority of the files in the project. I can see at a glance that some things have been removed, such as the call to curl if the user is on Windows - I am not sure if this is intentional or part of the PR.

Would it be possible please to split it up into atomic PRs - if you are fixing multiple issues can you send them as separate PRs, ideally each one modifying only one or two files, and also remove things from the PR that don't need to be in there? Then I can review more easily. If not, I will take the time to review and try to merge as soon as I get some time, perhaps I will merge the files individually.

I would like to get it merged as the changes look really valuable, especially if you have improved the data ingestion!

We could always connect on a quick video call to go through the changes if that works? I'm free in the week beginning 29 December.

@rfdougherty
Copy link
Author

rfdougherty commented Dec 23, 2025 via email

@woodthom2
Copy link
Member

woodthom2 commented Dec 24, 2025 via email

@woodthom2
Copy link
Member

woodthom2 commented Dec 24, 2025 via email

@rfdougherty
Copy link
Author

rfdougherty commented Dec 26, 2025 via email

@rfdougherty
Copy link
Author

I'm splitting this into two PRs. The first is ready for review: #25. I'll close this PR now.

@rfdougherty rfdougherty closed this Jan 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants