Add support for Yoda repositories #100

stsnel · 2025-04-01T15:47:28Z

Yoda is a research data management solution developed by Utrecht University and used by multiple institutes around the world. It enables researchers and their partners to securely deposit, share, publish and preserve large amounts of research data during all stages of a research project.

This adds support for the Yoda repositories of Utrecht University (UU) and Vrije Universiteit Amsterdam (VU).

This PR addresses issue #5

stsnel · 2025-05-20T16:19:45Z

Just made a small edit to also add the WUR Yoda repository, which published its first data package recently.

J535D165 · 2025-05-20T18:45:24Z

Thanks a lot, @stsnel. I will review it tomorrow!

J535D165

Sorry it took so long to review @stsnel. I'm slowly catching up with my GitHub notifications now.

I love the PR. I hope to see a REST API for Yoda in the future, but having support for Yoda in DataHugger before that is amazing. I have a couple of small feedback for you. I hope to merge soon.

J535D165 · 2025-06-26T08:02:46Z

datahugger/services.py

+        if not hasattr(self, "_files"):
+            self._requests_cache_file = tempfile.NamedTemporaryFile(delete=False)
+            requests_cache.install_cache(self._requests_cache_file.name)
+            self._files = self._harvest_files()
+            self._cleanup_requests_cache()
+        return self._files


I wonder why you cache this. It makes sense, but is there a reason to do this for Yoda specifically? Or should we implement this feature for all services in a generic way?

The original intent of this part of the code was to cache DNS query responses. In case of Yoda we need to request every file in the data set individually (rather than say, just requesting a single zip file that contains all the files). This can result in significant overhead for name resolution. Apart from that, flaky DNS servers can result in failures to harvest all files.

However, this solution ultimately involves monkey patching the requests module (or one of the lower-level modules), which can potentially interfere with other software that depends on datahugger. The implementation also didn't help (that much) with improving performance.

After reconsidering, I have removed this part of the code.

datahugger/services.py

J535D165 · 2025-06-26T08:17:33Z

datahugger/services.py

+        folders_to_process = [contents_url]
+        files_to_download = []
+
+        while True:


I wonder whether the while loop is needed here.

The purpose of the loop is to iterate through any subcollections (subdirectories) of the data packages if needed. I've adjusted the loop condition and added a comment to make this clearer.

J535D165 · 2025-06-26T08:18:53Z

docs/images/logos.png

J535D165 · 2025-06-26T08:22:33Z

By the way, I'm fixing some of the broken tests, so don't worry about them.

stsnel · 2025-06-27T07:07:31Z

Thank you for the feedback 👍 ! I expect to be able to process the feedback and respond within a few days.

Yoda is a research data management solution developed by Utrecht University and used by multiple institutes around the world. It enables researchers and their partners to securely deposit, share, publish and preserve large amounts of research data during all stages of a research project. This adds support for the Yoda repositories of Utrecht University (UU), Vrije Universiteit Amsterdam (VU), as well as Wageningen University & Research (WUR).

for more information, see https://pre-commit.ci

stsnel force-pushed the add-yoda-support branch from d7d2b65 to 9437016 Compare May 20, 2025 16:18

J535D165 requested changes Jun 26, 2025

View reviewed changes

stsnel force-pushed the add-yoda-support branch 2 times, most recently from 3705bb7 to 165fb6c Compare July 1, 2025 20:45

stsnel force-pushed the add-yoda-support branch from 031dd5f to bafb85b Compare July 1, 2025 21:04

[pre-commit.ci] auto fixes from pre-commit.com hooks

9238d01

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for Yoda repositories #100

Add support for Yoda repositories #100

Uh oh!

stsnel commented Apr 1, 2025

Uh oh!

stsnel commented May 20, 2025

Uh oh!

J535D165 commented May 20, 2025

Uh oh!

J535D165 left a comment

Uh oh!

J535D165 Jun 26, 2025

Uh oh!

stsnel Jul 1, 2025

Uh oh!

Uh oh!

J535D165 Jun 26, 2025

Uh oh!

stsnel Jul 1, 2025

Uh oh!

J535D165 Jun 26, 2025

Uh oh!

J535D165 commented Jun 26, 2025

Uh oh!

stsnel commented Jun 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add support for Yoda repositories #100

Are you sure you want to change the base?

Add support for Yoda repositories #100

Uh oh!

Conversation

stsnel commented Apr 1, 2025

Uh oh!

stsnel commented May 20, 2025

Uh oh!

J535D165 commented May 20, 2025

Uh oh!

J535D165 left a comment

Choose a reason for hiding this comment

Uh oh!

J535D165 Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

stsnel Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

J535D165 Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

stsnel Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

J535D165 Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

J535D165 commented Jun 26, 2025

Uh oh!

stsnel commented Jun 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants