- Python >= 3.6
- The following python packages:
- networkx
- geopy
- validate_email
- xlsxwriter
- py3dns (on Windows this silently conflicts if dnspython is already installed)
- requests
- diskcache
- yapsy (see below if you run Python 3.12 or higher)
- whoosh
- roman
- typing-extensions
- openpyxl
- If you run into GraphQL errors when retrieving data from the Directory, upgrade the client with:
pip3 install --upgrade molgenis-emx2-pyclient
-
Verify installation:
python3 -m ensurepip -
For each of the above packages
pip3 install --upgrade <package> -
Ensure that file
/etc/resolve.confexists. If it does not exist create it with the following contents:
nameserver 8.8.8.8 -
Download the MOLGENIS Python API library:
pip3 install molgenis-py-client -
Install/update root certificates (also check install_certifi.py script)
pip3 install --upgrade certifi -
If you want support for checking mappings of ORPHA codes to ICD-10 codes for RD biobanks, you need to get en_product1.xml from http://www.orphadata.org/cgi-bin/ORPHAnomenclature.html
-
If you have Python 3.12 or higher, for the time being (December, 2024) you need a patched version of yapsy module to deal with instability in the Python - some stable components being removed from the Python core. Here is a quick howto:
git clone https://github.com/AmeyaVS/yapsy.git cd yapsy/package python -m build pip install --force-reinstall dist/Yapsy-2.0.0-py3-none-any.whl
The simplest way to run the default validation suite is:
python3 data-check.py
Purge all caches (directory + remote checks) and output both stdout and XLSX:
python3 data-check.py --purge-all-caches -X test_results.xlsx
Purge only the directory cache and output to XLSX only, with verbose logging:
python3 data-check.py -v --purge-cache directory -N -X test_results.xlsx
Enable ORPHA-to-ICD checks (requires en_product1.xml):
python3 data-check.py -O en_product1.xml
Debug mode with fresh caches:
python3 data-check.py -d --purge-all-caches
- full-text-search.py - full text search of the Directory using Whoosh with Lucene search syntax.
./full-text-search.py 'bbmri-eric:ID:UK_GBR-1-101'./full-text-search.py '"Cell therapy"~3'(note shell escaping of quotes)./full-text-search.py '*420*'./full-text-search.py --purge-cache directory --purge-cache index -v 'DE_*'./full-text-search.py 'myID' | perl -ne "while(<>) {if(m/^.*?'id':\\s+'(.+?)'.*$/) {print \\$1 . \"\\n\";}}"
- exporter-all.py - exports all biobanks/collections with aggregate counts and optional filters; can output withdrawn entities.
python3 exporter-all.py -X all.xlsx - exporter-bbmri-cohorts.py - BBMRI Cohorts network statistics plus warnings/errors aggregation.
python3 exporter-bbmri-cohorts.py -X bbmri_cohorts_stats.xlsx -XWE bbmri_cohorts_warnings.xlsx - exporter-cohorts.py - lists COHORT/POPULATION_BASED collections and summaries per country/biobank.
python3 exporter-cohorts.py -X cohorts.xlsx - exporter-country.py - counts biobanks/collections per country (stdout or XLSX).
python3 exporter-country.py - exporter-covid.py - COVID-related collections/biobanks with sample/donor totals.
python3 exporter-covid.py -X covid.xlsx - exporter-diagnosis.py - diagnosis inspection utility (development-focused logging).
python3 exporter-diagnosis.py -d > diagnosis-exporter.log 2>&1 - exporter-ecraid.py - ECRAID-relevant collections (BSL2/3 and pathogen material) and institutions.
python3 exporter-ecraid.py -X ecraid.xlsx - exporter-institutions.py - lists juridical persons (institutions) per country.
python3 exporter-institutions.py -X institutions.xlsx - exporter-mission-cancer.py - cancer and pediatric-cancer analytics using ICD/ORPHA mapping.
python3 exporter-mission-cancer.py -O en_product1.xml -X mission-cancer.xlsx - exporter-negotiator-orphans.py - finds collections with Negotiator representatives and auto-population candidates.
python3 exporter-negotiator-orphans.py negotiator_reps.xlsx -X negotiator_orphans.xlsx - exporter-obesity.py - obesity collections (including pediatric segmentation) with sample totals.
python3 exporter-obesity.py -X obesity.xlsx - exporter-pediatric.py - pediatric and pediatric-only collections based on age ranges.
python3 exporter-pediatric.py -X pediatric.xlsx - exporter-quality-label.py - quality label export for biobanks/collections and combined labels.
python3 exporter-quality-label.py
- get-contacts.py - contact generator for Negotiator invitation pipelines.
./get-contacts.py --purge-all-caches -X contacts.xlsx - COVID19DataPortal_XMLFromBBMRIDirectory.py - builds COVID-19 Data Portal XML from Directory collections.
python3 COVID19DataPortal_XMLFromBBMRIDirectory.py -x bbmriDirectory_Covid19DataPortal.xml - add_orphacodes.py - adds OrphaCodes to a Directory EMX export.
python3 add_orphacodes.py -d directory.xlsx -O en_product1.xml -o directory-with-orpha.xlsx - directory-stats.py - legacy biobank size estimation stats.
python3 directory-stats.py -N - directory-tables-modifier.py - import/delete records in staging tables using CSV and .env credentials.
python3 directory-tables-modifier.py -schema ERIC --csvImportData Biobanks.csv - geocoding_2022.py - generates geoJSON output from Directory data and config.
python3 geocoding_2022.py geocoding.config -o bbmri-directory-geojson - install_certifi.py - refreshes root certificates for Directory access.
python3 install_certifi.py