Rapid Integration and Visualization for Enhanced Research (RIVER) is an integrated ecosystem for data and computing, designed with a monolithic architecture using a Python backend ("backsheep") and a JavaScript frontend ("vite"). While the current structure is monolithic, it is architected for potential refactoring into microservices. For scientific applications, River aims to be lightweight and serve as a system controller—connecting data, software, and users. For quick recorded video demo, follow here: https://www.youtube.com/watch?v=boabEFNIkNA
River consists of the following components:
- Backend: Asynchronous web server built with BlackSheep, a high-performance Python framework. Utilizes PostgreSQL for the database, Redis for caching, and Celery for job task monitoring.
- Frontend: React.js application powered by Vite and Material UI (MUI) for the user interface.
- Traefik: Modern reverse proxy used for routing and load balancing.
The River Platform for my research group is deployed at: https://platform.riverxdata.com You can try your own credentials to test the platform. However, your credential should be easily revoked after testing.
Currently, RiverXData platform supports 3 main components:
- Version control: Designed to have a base class that can be used to extended to support for more version control type. Currently, support Github via token
- Storage: Designed to have a base class that can be used to extended to support for more storage type. Currently, support compatible S3 storage via IAM keys and optional ARN
- Computing: Designed to have a base class that can be used to extended to support for more computing type. Currently, support for a Linux server or SLURM scheduler via SSH.
Tutorial for how to develop/add tools to platform with ease: https://www.youtube.com/watch?v=boabEFNIkNA
There are 2 types of tools that can be added to the platform: non-ui tools and web-based tools.
You add your github credential, it will retrieve pipeline parameters on the repository, allow the platform UI to allow you add/modify the parameters.
The tools requires to have the river folder, with main.sh file as the script executor. Beside that, we support all of the nf-core pipelines. Parameters are configured via
the nextflow_schema.json and profile in the the conf folder.
- template.: The template with "Hello world" to print out the inputs of all parameters.
- sarek: Variant calling pipeline for germline and somatic variants from whole genome, exome, or targeted sequencing data. Supports tumor/normal analyses.
- rnaseq: RNA-seq processing pipeline that includes quality control, alignment or pseudo-alignment, quantification, and generation of gene expression matrices.
- ampliseq: Amplicon sequencing pipeline for microbial community profiling, such as 16S rRNA gene sequencing.
- quantms: Quantitative proteomics pipeline for label-free and isobaric labeling analyses using both DDA and DIA data.
- taxprofiler: Taxonomic profiling pipeline for shotgun metagenomics, supporting multiple tools and producing standardized outputs.
- methylseq: DNA methylation analysis pipeline using bisulfite-treated sequencing data. Supports multiple aligners and provides comprehensive QC.
- circrna: Pipeline for detecting and quantifying circular RNAs (circRNAs) from RNA-seq data, including miRNA target prediction.
- mag: Metagenomic pipeline for assembling, binning, and annotating metagenome-assembled genomes (MAGs) from short or long reads.
- atacseq: ATAC-seq pipeline to identify open chromatin regions, perform peak calling, and assess data quality with various QC metrics.
- rnafusion: Fusion detection pipeline using RNA-seq data, combining results from multiple fusion detection tools into reports and visualizations.
- template.: The streamlit app that simulate the gene expression data of BRCA1, BRCA2 between 2 groups: cancer vs normal
- tf-finder.: The wrapper of TFinder, which is a Python easy-to-use web tool for identifying Transcription Factor Binding Sites (TFBS) and Individual Motif (IM).
- CARTAR: The wrapper of CARTAR webserver, designed to assist scientist in the in silico identification and validation of immunotherapetic targets present in the cell surface to attack tumoral cells
You should deploy your own platform. Using the below tutorial
A .env file is required for deployment. To quick setup for validation, the committed .env can be used. The Google Client ID can be used safelly.
It can be deployed using docker compose on cloud. Beside, the Google Client ID, follow here. For user to test after deployment, follow
##Developer to simulate the approriate services for testing purpose. Bedefault, the .env is used for staging only which supports the localhost setup. For binding the domain to a VPS, please adjust the domain name
for VITE_BACKEND_URL, FRONTEND_URL and URL.
Please adjust your setup on the .env file. For the detail explaination of the variables, see below:
| Variable Name | Description | Example Value |
|---|---|---|
LETSENCRYPT_EMAIL |
Email address for Let's Encrypt SSL certificate registration. | nttg8100@gmail.com |
VITE_BACKEND_URL |
Backend API URL for the frontend to connect to. | http://localhost |
FRONTEND_URL |
URL where the frontend is served. | http://localhost |
VITE_APP_GOOGLE_CLIENT_ID |
Google OAuth client ID for authentication. | 212676895890-3ad1thuq1kmenn32noc0kut7rl9lelk9.apps.googleusercontent.com |
CACHE_DB_HOST |
Hostname for the Redis cache used by Celery. | river-redis |
BASE_API_HOST |
Hostname for the backend API server. | river-backend |
POSTGRES_DATABASE |
Name of the PostgreSQL database. | river |
POSTGRES_USER |
PostgreSQL database username. | river |
POSTGRES_PORT |
PostgreSQL database port. | 5432 |
POSTGRES_PASSWORD |
PostgreSQL database password. | password |
POSTGRES_HOST |
Hostname for the PostgreSQL database server. | river-db |
APP_ENV |
Application environment (e.g., prod, dev). |
prod |
NOTE: FOR NETWORK COMMUNICATION WITH "dev" ENVIRONMENT, THE /etc/hosts should add this line 127.0.0.1 river-localstack to /etc/hosts to access the S3 storage everywhere
Only backend has tests using pytest. For credential, object the github token at here.
Use the provided Makefile to automate environment setup and service management.
- Frontend (Node.js 20.17.0):
make dev-frontend
- Backend (Python 3.12.11):
make dev-backend
- Traefik (v3.5.0):
make dev-traefik
- SLURM (builds local SLURM Docker image):
make dev-slurm
To set up all at once:
make dev- Start SLURM and Redis:
make start-dev-infra
- Start Local PostgreSQL DB and initialize/migrate:
make start-dev-db
- Backend (dev mode):
make start-backend
- Frontend:
make start-frontend
- Traefik:
make start-traefik
- Celery Worker:
make start-celery
- Start test infrastructure (Localstack, Redis, Test DB):
make start-test-infra
- Run backend tests:
- Auth:
make test-auth - Organization:
make test-org - Credential:
make test-cred - Project:
make test-pro - Storage:
make test-storage - Public Analysis:
make test-public-analysis - Job:
make test-job - All:
make test-all
- Auth:
- Start SLURM:
make start-slurm
- Start Localstack (S3 simulation):
make start-localstack
- Remove development DB volume:
make clean-dev-db
- Deploy production stack:
make production
Refer to the Makefile for additional targets and details.
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
