Skip to content

riverxdata/river

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rapid Integration and Visualization for Enhanced Research Platform

RIVER Platform Overview

Rapid Integration and Visualization for Enhanced Research (RIVER) is an integrated ecosystem for data and computing, designed with a monolithic architecture using a Python backend ("backsheep") and a JavaScript frontend ("vite"). While the current structure is monolithic, it is architected for potential refactoring into microservices. For scientific applications, River aims to be lightweight and serve as a system controller—connecting data, software, and users. For quick recorded video demo, follow here: https://www.youtube.com/watch?v=boabEFNIkNA

Overview

River consists of the following components:

  • Backend: Asynchronous web server built with BlackSheep, a high-performance Python framework. Utilizes PostgreSQL for the database, Redis for caching, and Celery for job task monitoring.
  • Frontend: React.js application powered by Vite and Material UI (MUI) for the user interface.
  • Traefik: Modern reverse proxy used for routing and load balancing.

Usage

The River Platform for my research group is deployed at: https://platform.riverxdata.com You can try your own credentials to test the platform. However, your credential should be easily revoked after testing.

Currently, RiverXData platform supports 3 main components:

  • Version control: Designed to have a base class that can be used to extended to support for more version control type. Currently, support Github via token
  • Storage: Designed to have a base class that can be used to extended to support for more storage type. Currently, support compatible S3 storage via IAM keys and optional ARN
  • Computing: Designed to have a base class that can be used to extended to support for more computing type. Currently, support for a Linux server or SLURM scheduler via SSH.

How to add a tool on RIVER platform

Tutorial for how to develop/add tools to platform with ease: https://www.youtube.com/watch?v=boabEFNIkNA

There are 2 types of tools that can be added to the platform: non-ui tools and web-based tools. You add your github credential, it will retrieve pipeline parameters on the repository, allow the platform UI to allow you add/modify the parameters. The tools requires to have the river folder, with main.sh file as the script executor. Beside that, we support all of the nf-core pipelines. Parameters are configured via the nextflow_schema.json and profile in the the conf folder.

Non-UI tool template:

  • template.: The template with "Hello world" to print out the inputs of all parameters.
  • sarek: Variant calling pipeline for germline and somatic variants from whole genome, exome, or targeted sequencing data. Supports tumor/normal analyses.
  • rnaseq: RNA-seq processing pipeline that includes quality control, alignment or pseudo-alignment, quantification, and generation of gene expression matrices.
  • ampliseq: Amplicon sequencing pipeline for microbial community profiling, such as 16S rRNA gene sequencing.
  • quantms: Quantitative proteomics pipeline for label-free and isobaric labeling analyses using both DDA and DIA data.
  • taxprofiler: Taxonomic profiling pipeline for shotgun metagenomics, supporting multiple tools and producing standardized outputs.
  • methylseq: DNA methylation analysis pipeline using bisulfite-treated sequencing data. Supports multiple aligners and provides comprehensive QC.
  • circrna: Pipeline for detecting and quantifying circular RNAs (circRNAs) from RNA-seq data, including miRNA target prediction.
  • mag: Metagenomic pipeline for assembling, binning, and annotating metagenome-assembled genomes (MAGs) from short or long reads.
  • atacseq: ATAC-seq pipeline to identify open chromatin regions, perform peak calling, and assess data quality with various QC metrics.
  • rnafusion: Fusion detection pipeline using RNA-seq data, combining results from multiple fusion detection tools into reports and visualizations.

Web-based tool template:

  • template.: The streamlit app that simulate the gene expression data of BRCA1, BRCA2 between 2 groups: cancer vs normal
  • tf-finder.: The wrapper of TFinder, which is a Python easy-to-use web tool for identifying Transcription Factor Binding Sites (TFBS) and Individual Motif (IM).
  • CARTAR: The wrapper of CARTAR webserver, designed to assist scientist in the in silico identification and validation of immunotherapetic targets present in the cell surface to attack tumoral cells

Deployment

You should deploy your own platform. Using the below tutorial

A .env file is required for deployment. To quick setup for validation, the committed .env can be used. The Google Client ID can be used safelly. It can be deployed using docker compose on cloud. Beside, the Google Client ID, follow here. For user to test after deployment, follow ##Developer to simulate the approriate services for testing purpose. Bedefault, the .env is used for staging only which supports the localhost setup. For binding the domain to a VPS, please adjust the domain name for VITE_BACKEND_URL, FRONTEND_URL and URL.

Please adjust your setup on the .env file. For the detail explaination of the variables, see below:

Variable Name Description Example Value
LETSENCRYPT_EMAIL Email address for Let's Encrypt SSL certificate registration. nttg8100@gmail.com
VITE_BACKEND_URL Backend API URL for the frontend to connect to. http://localhost
FRONTEND_URL URL where the frontend is served. http://localhost
VITE_APP_GOOGLE_CLIENT_ID Google OAuth client ID for authentication. 212676895890-3ad1thuq1kmenn32noc0kut7rl9lelk9.apps.googleusercontent.com
CACHE_DB_HOST Hostname for the Redis cache used by Celery. river-redis
BASE_API_HOST Hostname for the backend API server. river-backend
POSTGRES_DATABASE Name of the PostgreSQL database. river
POSTGRES_USER PostgreSQL database username. river
POSTGRES_PORT PostgreSQL database port. 5432
POSTGRES_PASSWORD PostgreSQL database password. password
POSTGRES_HOST Hostname for the PostgreSQL database server. river-db
APP_ENV Application environment (e.g., prod, dev). prod

Developer

NOTE: FOR NETWORK COMMUNICATION WITH "dev" ENVIRONMENT, THE /etc/hosts should add this line 127.0.0.1 river-localstack to /etc/hosts to access the S3 storage everywhere

Credential

Only backend has tests using pytest. For credential, object the github token at here.

Setting Up the Development Environment

Use the provided Makefile to automate environment setup and service management.

1. Install Dependencies

  • Frontend (Node.js 20.17.0):
    make dev-frontend
  • Backend (Python 3.12.11):
    make dev-backend
  • Traefik (v3.5.0):
    make dev-traefik
  • SLURM (builds local SLURM Docker image):
    make dev-slurm

To set up all at once:

make dev

2. Start Development Infrastructure

  • Start SLURM and Redis:
    make start-dev-infra
  • Start Local PostgreSQL DB and initialize/migrate:
    make start-dev-db

3. Start Services

  • Backend (dev mode):
    make start-backend
  • Frontend:
    make start-frontend
  • Traefik:
    make start-traefik
  • Celery Worker:
    make start-celery

4. Testing

  • Start test infrastructure (Localstack, Redis, Test DB):
    make start-test-infra
  • Run backend tests:
    • Auth: make test-auth
    • Organization: make test-org
    • Credential: make test-cred
    • Project: make test-pro
    • Storage: make test-storage
    • Public Analysis: make test-public-analysis
    • Job: make test-job
    • All: make test-all

5. SLURM and S3-Compatible Services

  • Start SLURM:
    make start-slurm
  • Start Localstack (S3 simulation):
    make start-localstack

6. Clean Up

  • Remove development DB volume:
    make clean-dev-db

7. Production Deployment

  • Deploy production stack:
    make production

Refer to the Makefile for additional targets and details.

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.

License: GPL v3

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •