Skip to content

Parameter n in Train/Test split step #2

@djsmith17

Description

@djsmith17

Hello there! I have a question regarding an analysis step seen at line 42 of generate_umap.py. At this step the code performs a train/test split of the data before performing the UMAP. I noticed two things about this function.

  1. The value of n applied to the function is 2500. From my understanding of the train/test split this will find 2500 useable cells for the training sample and an additional 2500 useable cells for the testing sample. Does this value have significance for the UMAP calculation? Why was this number chosen over another? Are there decisions/pitfalls to consider when choosing this value of n?
  2. I noticed you used a train/test split of 50/50. Could you share how you came to choose these split values? Is there something specific to your data that made you choose this split? Again are there decisions/pitfalls to consider when choosing the percentage split inherent to these sorts of experiments?

Thank you for your time and for sharing your thoughts

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions