About Codebase for "PRIME: Language Model Personalization with Cognitive Memory and Thought Processes"
The CMV dataset is available at data directory. The original raw data is crawled from the Academic Torrents website.
The folder structure of the CMV dataset is as follows:
data/
├── history_data.json # Contains the history engagements from 2013 to 2022. One engagement is a linearized conversation between the post author and one or more commenters.
├── evaluation_data.json # Contains 133 user queries (2023-2024) used for evaluation.
└── target_authors_collection.json # Contains a set of 41 *active* target authors studied in this work. Refer to the paper for more details (Section 3 and Appendix C).
Note: The history_data.json file contains history conversations published by authors more than those in the target_authors_collection.json. Please use the target_authors_collection.json to filter in the conversations of the target authors.
We are still refactoring and cleaning the codebase, Please stay tuned for more updates.