Skip to content

Conversation

@rodrigoberriel
Copy link
Contributor

While #392 was supposed to fix #294, it did not. When uploading a file using fs.put, kwargs is still not being forwarded, because these calls are passed to _put_file which does not rely on _pipe, unlike write_* calls.

This PR is very simple: also injects kwargs on bc.upload_blob for the _put_file calls.

@rodrigoberriel
Copy link
Contributor Author

rodrigoberriel commented Apr 25, 2024

@TomAugspurger would you mind to consider merging this one for some of the next releases?

Without it, if someone wants to use put and, for instance, set tags and content-type for the blob, the user would have to issue two additional requests (set_blob_tags and set_http_headers) which adds latency and incurs in additional logging costs. I don't know if there's a workaround, but we currently have to rely on a internal fork only because of this simple change.

If the change is too small or you need anything else, please let me know.

@karosc
Copy link

karosc commented Nov 3, 2025

This would be really great for me as well. I'd love to be able to put objects with tags. s3fs does something like this and also provides a distinct interface for getting and setting tags on objects.

Is there any interest in getting this PR merged? How can we move this forward?

@karosc
Copy link

karosc commented Nov 4, 2025

It also feels like there is some precedent with #392 getting merged. Why can we set tags using fs.write_bytes, but not with fs.put?

@martindurant
Copy link
Member

@kyleknap , ok with you?

I think the point of maybe implementing the attr methods is also fair, although I don't know if you can simply change the metadata in azure (I assume you can).

@kyleknap
Copy link
Collaborator

@martindurant I'm ok with the change of proxying the kwargs to the upload_blob() method. I think my only hesitation is that it might be more robust to have a designated upload_blob_kwargs where a dictionary of the additional kwargs are provided e.g.

fs.upload_file(
   'local-file',
   'blob-name',
   upload_blob_kwargs={
       "tags": {
           "tag1": "val1"
       }
   }

this would ensure that if we want to add any future keyword arguments that we expose won't collide with any Azure SDK keyword arguments, but it does make it more verbose. That being said, given write_bytes() already accepts SDK keyword arguments as top-level keyword arguments and the usage pattern is similar to s3fs, I do think it makes sense to stay consistent here. Thoughts?

I think the point of maybe implementing the attr methods is also fair, although I don't know if you can simply change the metadata in azure (I assume you can).

Yeah tags can be set through separate API calls with the Azure SDK. I'm open to exposing it given s3fs offers this already, but I'd prefer we keep this PR to just proxying the keyword arguments, and we can revisit exposing a get_tags and put_tags method if still needed.

Otherwise, for this PR, it would be great if we could get a test added for both put_file() and put() to make sure the keyword arguments are passed to the SDK client call (e.g., setting tags) so that we definitively confirm that we've addressed: #294. @rodrigoberriel let us know if you have the bandwidth to update the PR. Otherwise, we can push some tests up and get it merged.

@rodrigoberriel
Copy link
Contributor Author

Sure. Today is kinda busy, but I can get it done by the weekend.

@rodrigoberriel
Copy link
Contributor Author

@kyleknap Today wasn't that busy after all =] Azurite still doesn't fully support tags, so I had to mock the call to upload_blob. I tried to keep the style consistent with the other tests. Please let me know if you'd like any changes.

@kyleknap
Copy link
Collaborator

@rodrigoberriel thanks! That's a bummer on the tag support. Does azurite support the content settings/content type? I'm also fine with not including tags as part of the test case and we can just assert the content type matches what we uploaded. If so, that can help us avoid needing to use mocks/patching and simplify the test case a bit more. I don't have a strong preference on this. So, let us know.

Otherwise, I plan to pull down the PR today to try it out and get a full review in.

@rodrigoberriel
Copy link
Contributor Author

@kyleknap it does support it. I've updated the test.

Copy link
Collaborator

@kyleknap kyleknap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rodrigoberriel This looks good to me! I also confirmed that it was working correctly with tags when trying it out. A couple of updates we probably should add as well before merging:

  • Let's rebase the PR off of main (or merging main into this branch works too). Mainly in order to get the tests to pass using the latest version of the Azure SDK there's a fix we made to the CI earlier this year that we'll want to pull in.
  • Once updated with main, let's add a changelog entry for it in this section. I think we can keep it to a sentence to call out that we proxy these keyword arguments now to the Azure Blob SDK client to support content settings, tags, etc.

Afterwords @martindurant I think we should be set to enable CI for the pull request to make sure CI passes and get the PR merged assuming you are good with the change too!

@martindurant
Copy link
Member

I enabled CI now, and can do again following changes.

@rodrigoberriel rodrigoberriel force-pushed the hotfix/inject-kwargs-put branch from 16417c6 to 2ee1015 Compare November 12, 2025 20:50
@rodrigoberriel
Copy link
Contributor Author

Rebased, squashed, and updated the changelog.

Comment on lines 1465 to 1468
assert all(
(k in blob_info["content_settings"] and blob_info["content_settings"][k] == v)
for k, v in content_settings.items()
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For fixing the CI failures, we can go ahead and simplify the assertions to just:

assert blob_info["content_settings"]["content_type"] == "application/json"
assert blob_info["content_settings"]["content_encoding"] == "UTF-8"

It looks like for older versions of the Azure Blob SDK it does not handle in well for content_settings

@rodrigoberriel rodrigoberriel force-pushed the hotfix/inject-kwargs-put branch from 2ee1015 to d42a634 Compare November 12, 2025 21:27
@rodrigoberriel rodrigoberriel force-pushed the hotfix/inject-kwargs-put branch from d42a634 to 80b589a Compare November 12, 2025 21:59
Copy link
Collaborator

@kyleknap kyleknap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! @martindurant we should be good to re-enable CI and get it merged.

@martindurant martindurant merged commit 8b2fe07 into fsspec:main Nov 12, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Set blob Content-Type in AzureBlobFileSystem.put()

4 participants