Parameter n in Train/Test split step

Hello there! I have a question regarding an analysis step seen at [line 42](https://github.com/BarasLab/SpatialUMAP/blob/65710390512e8512eb585bc03939a724b3b73c5c/generate_umap.py#L42C1-L42C1) of _generate_umap.py_. At this step the code performs a train/test split of the data before performing the UMAP. I noticed two things about this function. 

1. The value of n applied to the function is 2500. From my understanding of the train/test split this will find 2500 useable cells for the training sample and an additional 2500 useable cells for the testing sample. Does this value have significance for the UMAP calculation? Why was this number chosen over another? Are there decisions/pitfalls to consider when choosing this value of n?
2. I noticed you used a train/test split of 50/50. Could you share how you came to choose these split values? Is there something specific to your data that made you choose this split? Again are there decisions/pitfalls to consider when choosing the percentage split inherent to these sorts of experiments?

Thank you for your time and for sharing your thoughts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parameter n in Train/Test split step #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Parameter n in Train/Test split step #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions