-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Hello there! I have a question regarding an analysis step seen at line 42 of generate_umap.py. At this step the code performs a train/test split of the data before performing the UMAP. I noticed two things about this function.
- The value of n applied to the function is 2500. From my understanding of the train/test split this will find 2500 useable cells for the training sample and an additional 2500 useable cells for the testing sample. Does this value have significance for the UMAP calculation? Why was this number chosen over another? Are there decisions/pitfalls to consider when choosing this value of n?
- I noticed you used a train/test split of 50/50. Could you share how you came to choose these split values? Is there something specific to your data that made you choose this split? Again are there decisions/pitfalls to consider when choosing the percentage split inherent to these sorts of experiments?
Thank you for your time and for sharing your thoughts
Metadata
Metadata
Assignees
Labels
No labels