-
Notifications
You must be signed in to change notification settings - Fork 0
Update environment pins; fix demo notebook; and sync to Hugging Face #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
alegresor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please do not put line breaks in README. There should be an option in you editor to toggle word wrap in order to view long lines. Line breaks are hard to maintain when text is edited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates environment dependencies and adds infrastructure for uploading the LDData repository to Hugging Face Datasets Hub. The main changes support transitioning from a standalone GitHub repository to a publicly accessible dataset on Hugging Face, making low-discrepancy point set parameters more discoverable and easier to use in QMC research.
Key changes:
- Updates
qmcpyto version 2.0 andqmctoolsclto version 1.1.5 to fix broken pip installations - Adds comprehensive upload tooling (
upload.py,git_lfs_upload.sh) and GitHub Actions workflow for automated synchronization to Hugging Face - Reorganizes documentation: transforms
README.mdinto a Hugging Face dataset card and moves technical specifications toLD_DATA.md
Reviewed Changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
env.yml |
Updates qmcpy and qmctoolscl versions, adds huggingface_hub dependency for upload functionality |
upload.py |
New Python script for uploading repository to Hugging Face Datasets Hub with retry logic and fallback mechanisms |
scripts/git_lfs_upload.sh |
New bash script for git-based uploads using git-lfs for large files |
README.md |
Transformed into a Hugging Face dataset card with usage examples, citations, and dataset structure documentation |
LD_DATA.md |
New file containing the original technical specification for low-discrepancy data formats (moved from old README) |
LICENSE.txt |
Adds Apache 2.0 license file |
.gitignore |
Adds patterns for Python cache files, VS Code settings, and script directories |
.github/workflows/sync-to-huggingface.yml |
New GitHub Actions workflow for automated synchronization to Hugging Face on push |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@alegresor @copilot @ zitterbewegung Pass CI tests now. |
|
@sou-cheng-choi I've opened a new pull request, #8, to work on those changes. Once the pull request is ready, I'll request review from you. |
|
We should not create a wrapper on top of the huggingface_hub if we are only going to update or create a dataset. See https://huggingface.co/docs/datasets/en/upload_dataset |
|
Dataset location is at https://huggingface.co/datasets/QMCSoftware/LDData/tree/main |
alegresor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this @sou-cheng-choi, it looks like you have done a lot!
May I suggest you break this into much smaller PRs that would be easier to review and faster to get merged? I would suggest the following
- a PR which adds the LICENSE
- a PR which removes
env.ymlin favor ofpyproject.toml - A PR which adds your enhancements to the
README.mdand adds theLD_Data.md - One which adds the HuggingFace action
For adding the HF action, I must admit I do not understand what many of your files are doing. The reporuslanmv/How-to-Sync-Hugging-Face-Spaces-with-a-GitHub-Repository gives a MWE of how to sync data from a repo into HF. Based on their MWE, I would expect the suggested PR 4. would only add a single file to .github/workflows/ which automatically uploads the dataset to HF whenever something is pushed to a branch.
|
I will close this PR and break it into multiple PRs following your suggestions. |
|
Reopen so that I won't forget opening sub-PRs. |
pip installinenv.ymlby updating theqmctoolsclpip pin to version 1.1.5andqmcpyto 2.0.