-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Thank you very much for your work and your tremendous contributions to the community.
After reviewing the data samples provided on Hugging Face, I noticed that a significant number of samples [1] [2] [3] still exhibit caption confounding issues. While the authors claim to have resolved this problem using ChatGPT, the actual effectiveness may be limited. How should we address this issue? Is the version we're reviewing incorrect, or do we need additional post-processing steps?
Additionally, the authors included statistics on the quantity of each data category in their paper [Fig 4a]. However, the current version of the samples does not contain a “category” field. How was this statistical functionality implemented? Can we quickly extract data for specific categories, such as all images and corresponding captions for the radiology category?
Thank you again for the great work and I'm looking forawrd to your reply.