working with UK biobank WES final release

Hello,
I am planning to work with the UK Biobank WES final release and want to start by generating a MatrixTable (MT), as this is required for performing a series of quality control (QC) steps on the data.
However, the dataset is very large (~15 TB for 500k samples), and my attempts to generate a single MT have not completed successfully — the longest run before termination was 9 hours.
I would appreciate guidance on:
Best practices for working with such large WES data
Instance type and number of nodes recommended for generating MTs and for qc later 
Whether it is better to split the data by chromosome or any other strategy to make MT generation feasible
Any advice on resource configuration in Hail / RAP for this scale of data
My goal is to generate MTs that can be used for:
Sample QC 
Variant QC
Genotype QC
Thank you very much in advance for any insights or practical advice — any recommendations from those who have successfully worked with the 500k WES final release would be very helpful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

working with UK biobank WES final release #48

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

working with UK biobank WES final release #48

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions