Skip to content

Conversation

@gbrls
Copy link

@gbrls gbrls commented Oct 26, 2023

  • Added infer as a dependency so the inference of the archive type (zip, gzip) doesn't rely on the file extension.
  • Added option to extract the archive to a specific directory.
  • The raw archive is deleted after it's extracted. (This may cause problems, in my testings the archive are downloaded again if they're deleted)

- added infer as dependency to check for zip and gz archives.
- added option to extract to a configurable directory.
- after an archive is extracted, the compressed file is deleted.
@epwalsh
Copy link
Owner

epwalsh commented Oct 26, 2023

The raw archive is deleted after it's extracted. (This may cause problems, in my testings the archive are downloaded again if they're deleted)

Yea this probably break some assumptions, but I think we can fix that. If we keep track of the extraction path in the metadata we could avoid downloading again when the extraction path exists.

@gbrls
Copy link
Author

gbrls commented Oct 27, 2023

The redownload issue was fixed using the approach you wrote above! The extraction and deletion of the original archive seems to be working without breaking anything else now. (There might be some corner cases when combining the extraction_dir and subdir options).

I found a bug in my implementation, it's still redownloading the full archive.

Copy link
Owner

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you're still working on this but a couple general comments

let mut existing_meta: Vec<Meta> = vec![];
let glob_string = format!(
"{}.*.meta",
"{}*.meta",
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this was intentional or not, but the glob pattern before wasn't matching against the .meta files. Changing to this fixed it, and the files stopped being downloaded again.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, that looks like a bug. Good catch!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is super important to fix for offline mode, hope this PR can pass soon

Copy link
Owner

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to be careful not to break existing workflows. I think removing the compressed cached file for remote resources is fine (when extract=true), but when the resource is a local path we shouldn't remove that file. I think this would be a quick fix, just need to add an argument to extract_archive that specifies whether to keep or delete that file, then set that accordingly in Cache::cached_path_with_options().

Also looks like there's a couple CI failures that should be easy to fix. Let me know if you have trouble there.

Other than that looks good to me!

let mut existing_meta: Vec<Meta> = vec![];
let glob_string = format!(
"{}.*.meta",
"{}*.meta",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, that looks like a bug. Good catch!

@gbrls
Copy link
Author

gbrls commented Oct 27, 2023

Makes total sense to me! I'll be back later with those changes, thank you for the reviews and guidance.

@YtvwlD
Copy link

YtvwlD commented Mar 17, 2024

This is related to #68, right? What's missing for this to work?

I was trying to download a .tar.zstd archive (but the URL has no file extension) with this crate and it would be great if it worked. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants