Skip to content

Request for Data and Code Release - 5+ Months After Publication #1

@younaman

Description

@younaman

Hi authors,
I came across your excellent paper "On the (In)Security of LLM App Stores" published at IEEE S&P 2025. The research addresses important security concerns in the LLM app ecosystem.
In the paper, you mentioned:

"We will make our data and tools publicly available upon acceptance."
In the officially accepted paper, I found this GitHub repository.

However, it's been over 6 months since the paper was published, and this repository currently only contains a README file.
Could you please provide an update on:

When will the ToxicDict (31,783 toxic words) be released?
Will the collected dataset of 786,036 LLM apps be made available?
Are there plans to release the detection tools and automated framework?
What are the main blockers preventing the code/data release?

I'm particularly interested in reproducing the consistency analysis and malicious behavior detection methods for my own research on LLM application security. If there are privacy/legal concerns preventing full data release, could you consider releasing:

Anonymized metadata
Detection tool implementations
Evaluation scripts
ToxicDict (which should be less sensitive)

Thank you for your time and looking forward to the release!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions