Skip to content

NFDI4Microbiota/ena_wizard_tool

Repository files navigation

Usage example:

python nfdi-ena-cli.py --metadata example.tsv --fasta-dir fasta --ena-user 'your username' --ena-password 'your password' --study-name 'study example' --study-title 'title for the study' --study-description 'description for the study'

ENA Automatic Submission System

This system aims to automate the validation and submission of metadata and sequencing data to the European Nucleotide Archive (ENA), following the metadata standards defined by the MIXS specification.

Features

  • Metadata validation:

    • Date format verification (ISO 8601).
    • Expected value checks and unit validation.
    • Controlled vocabulary and ontology validation (e.g., ENVO, CHEBI, NCBI Taxonomy).
  • Automated submission:

    • Upload of metadata and sequencing data files.
    • Integration with the ENA submission API (planned for future phases).

Initial Scope

In its initial version, this system supports only Terrestrial metadata.

The metadata fields and requirements are based on the MIXS specification and are described below.


Metadata Structure Overview

Category Metadata Definition Reference Expected Value / Unit Example
Project metadata project_name Name of the project within which the sequencing was organized MIXS:0000092 Free text string Forest soil metagenome
Site metadata collection_date The time of sampling, either as an instance (single point in time) or interval. ISO8601 format compliant MIXS:0000011 YYYY-MM-DD 2013-03-25T12:42:31+01:00
collected_by Name of person or institute that collected the sample ENA Reference Free text string UFZ - Centre for environmental research
geo_loc_name Geographic location (country/sea and region). Use INSDC/GAZ list MIXS:0000010 Free text or ontology USA: Maryland, Bethesda / GAZ:00051071
lat Latitude in decimal degrees (WGS84) MIXS:0000009 Decimal degrees, max 8 decimals -41.373744
lon Longitude in decimal degrees (WGS84) MIXS:0000009 Decimal degrees, max 8 decimals 146.266145
elev Elevation from Earth's surface in meters MIXS:0000093 Meter 100 m
alt Altitude above Earth's surface MIXS:0000094 Meter 100 m
depth Depth below surface (e.g., soil, sediment) MIXS:0000018 Meter 100 m
env_broad_scale Major environmental system(s) (e.g., biome). Use EnvO terms MIXS:0000012 Ontology terms separated by " " aquatic biome [ENVO:00002030] terrestrial biome [ENVO:00000446]
env_local_scale Environmental entities near sample. Use subclass of env_broad_scale MIXS:0000013 Ontology terms separated by " " woodland biome [ENVO:01000175] tundra biome [ENVO:01000180]
env_medium Environmental materials in contact with the sample MIXS:0000014 Ontology terms separated by " " arable soil [ENVO:00005742] bulk soil [ENVO:00005802]
chem_administration Chemicals applied to host or site. Use CHEBI IDs MIXS:0000751 CHEBI;timestamp; multiple values separated by " " agar [CHEBI:2509];2018-05-11T20:00Z castor oil [CHEBI:140618];2023-12-07T17:00+02:00
temp Environmental temperature MIXS:0000113 Degree Celsius 25 degree Celsius
salinity Environmental salinity MIXS:0000183 Practical salinity unit or percentage 25 practical salinity unit
pH Environmental pH MIXS:0001001 pH value pH 7.2
Sample metadata samp_name Local sample identifier (used in sequencing, unique per submitter) MIXS:0001107 Free text Soil1Sample2Seq2
source_mat_id Unique ID of the material sample used for extraction MIXS:0000001 Culture collection IDs or unique local ID MPI012345
samp_size Total amount of sample (volume, mass, area) MIXS:0000001 ml, g, m² 2000 ml 1000 g soil
temp Sample temperature at time of sampling MIXS:0000113 Degree Celsius 25 degree Celsius
salinity Total concentration of dissolved salts MIXS:0000183 Practical salinity unit or percentage 25 practical salinity unit
ph pH of the sample or its aqueous phase MIXS:0001001 pH value 7.2
samp_taxon_id NCBI taxon ID of sample or control MIXS:0001320 NCBI Taxonomy ID 749906
samp_collect_method Method of sample collection MIXS:0001225 PMID, DOI, URL or free text
microbial_isolate Was a microbial isolate cultured? Y/N
microb_cult_med Microbial culture medium used, if applicable MIXS:0001216 Ontology terms or free text minimal defined medium [MCO:0000881]
Host metadata host_taxid NCBI taxon ID of the host MIXS:0000250 NCBI Taxonomy ID Homo sapiens [NCBI:txid9606]
host_common_name Common name of host MIXS:0000248 Free text human
host_height Height of host MIXS:0000264 cm, mm, m 177 cm
host_length Length of host MIXS:0000256 cm, mm, m 100 cm
host_tot_mass Total mass of the host MIXS:0000263 kg, g 77 kg
host_body_site Body site from where sample was collected MIXS:0000867 FMA or UBERON ontology gut [FMA:45615]
host_body_product Substance produced by the host body (e.g. mucus, blood) MIXS:0000867 FMA or UBERON ontology mucus [FMA:66938] blood plasma [UBERON:0001969]
host_age Age of the host at collection MIXS:0000255 year, day, hour 28 y
host_sex Sex of the host MIXS:0000811 male, female, unknown female
host_diet Diet of the host MIXS:0000869 Free text or ontology omnivore [ecocore:00000082]
host_disease_stat Diagnosed disease(s) of the host MIXS:0000031 Free text or Disease Ontology avian influenza

Future Work

  • Expansion to other MIXS packages (e.g., host-associated, built environment).
  • Full ENA submission automation (metadata XML generation, file uploads).
  • GUI interface for simplified data upload.

References

About

Data Submission Tool for ENA

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published