Skip to content

Feature consideration: Change the default "group" header to reflect what the group column represents #10

@jlanej

Description

@jlanej

When writing output files, we default to having a column named "group" in the header

https://github.com/PankratzLab/kd-match/blob/435d6d159e399723974071eaddfe3cdb0792cf9d/src/main/java/org/pankratzlab/kdmatch/KDMatch.java#L114 .

The group column corresponds to a category that must be matched within … so if cases and controls had to be the same sex, the group would either be male/female or 1/2 etc. If they had to be the same sex, from the same sequencing center, and ancestry, the groups would be appended and would look like 2_UMGC_AFR or 1_UMGC_AFR. In this example, we could consider changing the group header to instead be something like "Group_Matched_within_Sex_SequencingCenter_Ancestry" (or whatever the corresponding input files contained in the header). But after writing that out, it starts to look like a very long header... so maybe it isn't the best way forward.

The updated header would potentially be passed to the methods writing the output files here

KDMatch.writeToFile(naiveMatches.stream(), outputBaseFileName,
setConvert.stream().toArray(String[]::new),
setConvert.stream().toArray(String[]::new), initialNumSelect);
or here
KDMatch.writeToFile(optimizedMatches.stream(), outputOptFileName,
setConvert.stream().toArray(String[]::new),
setConvert.stream().toArray(String[]::new), finalNumSelect);

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions