Match your DNA data against an open ended catalogue of DNA traits with private LLM-powered analysis
🔗 Repository: github.com/Monadic-DNA/Explorer
- Features
- Development
- Production Deployment
- Semantic Search Setup
- Premium Features & Payments
- License
- Semantic Search: LLM-powered semantic search understands the meaning of your queries (e.g., "memory loss" finds "cognitive decline" studies)
- Interactive exploration of GWAS Catalog studies with quality-aware filtering
- Upload and analyze your personal genetic data (23andMe, AncestryDNA, Monadic DNA)
- Private LLM analysis powered by Nillion's nilAI, your data is processed in a Trusted Execution Environment
- Premium Features: LLM-powered genetic analysis chat, Run All analysis, comprehensive reports
- Dual payment system: Credit/debit cards (Stripe) or stablecoins (USDC/USDT/DAI on Ethereum, Base, Arbitrum, Optimism, Polygon)
- Save and export your results
- Privacy-focused: All processing happens on your infrastructure (no third-party APIs for search)
-
Set up PostgreSQL with the pgvector extension:
# Create database createdb gwas_catalog # Apply schema (includes pgvector setup) psql gwas_catalog < sql/postgres_schema.sql
-
Fetch and load GWAS Catalog data:
- Download from https://www.ebi.ac.uk/gwas/api/search/downloads/alternative
- Load "All associations v1.0.2" TSV into the
gwas_catalogtable - See https://www.ebi.ac.uk/gwas/docs/file-downloads for details
-
Apply indexes for better performance:
psql gwas_catalog < sql/postgres_indexes.sql
The repository includes a Next.js single-page application for exploring GWAS studies.
# Set PostgreSQL connection string
export POSTGRES_DB="postgresql://user:password@localhost:5432/gwas_catalog"
# Install dependencies and start dev server
npm install
npm run devThe development server defaults to http://localhost:3000.
When running npm run dev on localhost, the app automatically enables dev mode to speed up your development workflow:
What Auto-Loads:
- Genotype file - After uploading once, auto-loads on next session
- Results file - After loading/exporting once, auto-loads on next session
- Personalization password - Auto-unlocks encrypted personal data
How It Works:
-
Chrome/Edge (Full Auto-Load):
- Uses File System Access API to store persistent file handles in IndexedDB
- Files load automatically with zero interaction
- Password stored in IndexedDB to auto-unlock personalization
-
Brave/Firefox (Fallback Mode):
- Brave disables File System Access API by default for privacy
- File pickers appear automatically on app load
- Just select your files - still faster than manual navigation
- Password auto-unlock works in all browsers
First-Time Setup:
npm run dev
# 1. Upload your genotype file (saves handle/marker)
# 2. Load or export results (saves handle/marker)
# 3. Set up personalization (saves password)
# Next load: Everything restores automatically!Security Note:
- Dev mode ONLY activates when
NODE_ENV==='development'ANDhostname==='localhost' - Password stored in plain text in IndexedDB (local only, never sent to server)
- Clear dev data:
indexedDB.deleteDatabase('monaidc_dna_explorer_dev_mode')in browser console
Enable Full Auto-Load in Brave:
- Open
brave://settings/ - Privacy and security → Site and Shields Settings → File System Access
- Add exception for
http://localhost:3000
- Set up your PostgreSQL database with the GWAS catalog data
- Set the
POSTGRES_DBenvironment variable to your PostgreSQL connection string:
export POSTGRES_DB="postgresql://username:password@host:port/database"
# or for production with SSL:
export POSTGRES_DB="postgresql://username:password@host:port/database?sslmode=require"- Build and start the application:
npm run build
npm startGWAS Database (Required):
POSTGRES_DB: PostgreSQL connection string (required for all environments)
LLM Features:
- LLM Provider Selection: Configure in the UI (Menu Bar > LLM Settings button)
- Nillion nilAI (Default): Privacy-preserving LLM in Trusted Execution Environment
- Requires
NILLION_API_KEYenvironment variable
- Requires
- Ollama (Local): Run LLM models on your own machine
- Requires Ollama installation with gpt-oss-20b model
- Configure address and port in UI settings
- HuggingFace (Cloud): Cloud-based LLM via HuggingFace Router
- Configure API key directly in UI settings (stored in browser localStorage)
- Nillion nilAI (Default): Privacy-preserving LLM in Trusted Execution Environment
- Privacy: All providers send data directly from browser to LLM service - never through our servers
Authentication (Required for Premium):
NEXT_PUBLIC_DYNAMIC_ENVIRONMENT_ID: Dynamic.xyz environment ID for logging in
Blockchain Payments (Required for Premium):
ALCHEMY_API_KEY: Alchemy API key for blockchain indexer queriesNEXT_PUBLIC_EVM_PAYMENT_WALLET_ADDRESS: EVM wallet address where users send ETH/USDC paymentsNEXT_PUBLIC_SUBSCRIPTION_CACHE_HOURS: Cache duration in hours (default: 1)
See .env.local.example for complete configuration details.
The complete database schema is provided in sql/postgres_schema.sql with pgvector support.
The schema includes:
gwas_catalogtable with auto-incrementingidprimary keystudy_embeddingstable for semantic search (pgvector)embedding_cachetable for query caching- All necessary indexes including HNSW for vector similarity search
To initialize a fresh database:
# Apply schema
psql $POSTGRES_DB < sql/postgres_schema.sql
# Apply indexes
psql $POSTGRES_DB < sql/postgres_indexes.sqlArchitecture benefits:
- Simple integer foreign key JOINs (faster than string operations)
- Foreign key constraints ensure data integrity
- No redundant lookup tables needed
- Reduced storage and improved query performance
The application includes LLM-powered semantic search that understands the meaning of queries, not just keywords.
- PostgreSQL with pgvector (required)
- Python 3.8+ with GPU support (for initial embedding generation)
The complete database schema (including semantic search support) is in sql/postgres_schema.sql. See Database Schema section above for setup instructions.
Note: The pgvector extension is automatically enabled by the schema. Most managed PostgreSQL services (DigitalOcean, AWS RDS, etc.) allow extension creation by database owners.
Use your local GPU to generate embeddings for all studies:
# Install Python dependencies
pip install -r scripts/requirements.txt
# Generate embeddings for PostgreSQL - save local backup
POSTGRES_DB="postgresql://..." python scripts/generate-embeddings.py --save-local embeddings_backup.npz
# Load from local backup to new database (no GPU needed)
POSTGRES_DB="postgresql://..." python scripts/generate-embeddings.py --load-local embeddings_backup.npz
# Optional: Limit for testing
python scripts/generate-embeddings.py --limit 1000
# Adjust batch size based on GPU VRAM
python scripts/generate-embeddings.py --batch-size 256 # Default: 512Time estimate: 20-60 minutes for 1M studies on a modern GPU (RTX 3080/4090)
Local backup benefits:
- Reuse embeddings for multiple databases
- No need to regenerate on database migration
- Transfer embeddings between environments
- Backup file size: ~500 MB compressed (for 1M studies, 512 dims)
The script uses nomic-embed-text-v1.5 with:
- 512 dimensions (33% storage savings, 0.5% quality loss vs 768)
- Matryoshka representation learning (efficient truncation)
- Task-specific prefixes (
search_document:for studies)
The application automatically generates query embeddings on-the-fly using Transformers.js.
For DigitalOcean App Platform (Node.js Buildpack):
-
Configure deployment using the provided
.do/app.yaml:- Node.js buildpack (auto-detected)
- Health check with 60s timeout (allows model download)
- Professional-XS instance (1 vCPU, 1 GB RAM)
-
Set environment variables in DO dashboard:
POSTGRES_DB: Your PostgreSQL connection string
-
Deploy - push to GitHub or deploy via DO CLI
-
First deployment: Health check downloads model (~30-50s) before routing traffic
-
Subsequent deployments: Model re-downloads on each deploy (health check handles it)
Cold start behavior:
- First request after deploy: 10-20s (health check pre-warms model)
- Subsequent requests: <100ms (model already loaded)
- Model downloads: ~137 MB per deployment (cached in
/tmp/.transformers-cache)
Alternative: Docker Deployment (Optional)
For faster cold starts and no repeated downloads, use the provided Dockerfile:
# .do/app.yaml
services:
- name: web
dockerfile_path: Dockerfile # Switch to DockerBenefits:
- Cold start: 10-20s (model pre-baked in image)
- Model downloaded once during build (not per deploy)
- Larger image size: +137 MB
For local development:
npm install
npm run devThe model downloads automatically on first search (~137 MB, cached in .transformers-cache/).
Try these queries to see semantic search in action:
- "memory loss" → finds studies about "cognitive decline", "dementia", "Alzheimer's"
- "heart attack" → finds "myocardial infarction", "coronary artery disease"
- "diabetes risk" → finds "type 2 diabetes", "insulin resistance", "hyperglycemia"
API Usage:
# Semantic search (default)
curl "http://localhost:3000/api/studies?search=alzheimer%20risk"
# Keyword search (fallback)
curl "http://localhost:3000/api/studies?search=alzheimer%20risk&semantic=false"Two-tier caching for fast queries:
- Memory cache (100 hot queries): <1ms
- PostgreSQL cache (10K warm queries): 2-5ms
- Generation (cache miss): 50-100ms
Query flow:
User query → Check memory cache → Check DB cache → Generate embedding →
pgvector similarity search → Filter + rank → Return results
Storage requirements:
- Study embeddings: ~2 KB per study (512 dims × 4 bytes)
- 1M studies: ~2 GB embeddings + ~4 GB HNSW index = 6 GB total
- Query cache: ~2 KB per cached query (~20 MB for 10K queries)
- Compared to old architecture: Saves ~105 MB by eliminating redundant lookup table
- ✅ No third-party APIs: All embedding generation happens on your infrastructure
- ✅ Self-hosted models: Uses open-source nomic-embed-text-v1.5
- ✅ Query privacy: Search queries never leave the servers
- ✅ Cache encryption: Database cache uses standard PostgreSQL security
- ✅ Ephemeral processing: Query embeddings computed transiently (not logged)
Check embedding service status:
curl http://localhost:3000/api/healthMonitor cache performance:
-- PostgreSQL
SELECT
COUNT(*) as total_queries,
AVG(access_count) as avg_accesses,
MAX(access_count) as most_popular_count
FROM embedding_cache;
-- Top 20 most popular queries
SELECT query, access_count, accessed_at
FROM embedding_cache
ORDER BY access_count DESC
LIMIT 20;Clean up old cache entries (run periodically):
# Via API (requires auth)
curl -X POST http://localhost:3000/api/admin/cache-cleanup \
-H "Authorization: Bearer $ADMIN_SECRET"
# Or manually in PostgreSQL
DELETE FROM embedding_cache
WHERE accessed_at < NOW() - INTERVAL '90 days'
OR id IN (
SELECT id FROM embedding_cache
ORDER BY accessed_at ASC
LIMIT (SELECT COUNT(*) - 10000 FROM embedding_cache)
);Slow first search after deployment:
- Model is loading (~5-10s). Health check at
/api/healthwarms it up automatically.
"Vector dimension mismatch" errors:
- Ensure you used
--dimensions 512when generating embeddings - Check migration created
vector(512)column (notvector(768))
Embeddings not found:
- Verify schema applied:
\d study_embeddingsshould show table exists - Check embeddings generated:
SELECT COUNT(*) FROM study_embeddings;
Poor search quality:
- Ensure HNSW index created:
\d+ study_embeddingsshould showidx_study_embeddings_embedding - Verify pgvector extension is enabled:
SELECT * FROM pg_extension WHERE extname = 'vector';
Monadic DNA Explorer offers premium features including LLM-powered genetic analysis chat, Run All analysis, and comprehensive reports.
The app supports two payment methods for premium subscriptions:
- Fixed $4.99/month - Standard recurring subscription
- Auto-renewal - Automatically renews monthly
- Instant activation - Subscription activates immediately
- Managed via Stripe - Cancel anytime through customer portal
- Setup: See
STRIPE_INTEGRATION.mdfor detailed instructions
- Flexible amounts - Pay any amount ($1+ USD equivalent)
- One-time payment - No auto-renewal, top up when needed
- Supported chains: Ethereum, Base (recommended), Arbitrum, Optimism
- Accepted tokens: ETH and USDC
- Examples: $4.99 = 30 days, $10 = 60 days, $50 = 300 days
- Setup: See
STABLECOIN_PAYMENTS.mdfor detailed instructions
Key Difference:
- Stripe: Fixed recurring subscription ($4.99/month, auto-renews)
- Stablecoin: Flexible prepaid (choose your amount, no auto-renewal)
Payment Stacking: Users can combine both! Subscribe with card for recurring billing, then add extra months with stablecoin payments as needed.
See PAYMENT_METHODS.md for detailed comparison.
Stripe Card Payments (Recurring):
- User logs in via Dynamic.xyz (for identity)
- User selects "Pay with Card" (fixed $4.99/month)
- Redirected to Stripe Checkout for secure payment
- Subscription created, payments recorded in PostgreSQL
- Auto-renews monthly, cancel anytime via Stripe portal
Blockchain Stablecoin Payments:
- User logs in via Dynamic.xyz
- User sends ETH or USDC to payment wallet from connected wallet
- App queries Alchemy indexer to find all payments from user's wallet
- App uses Alchemy Prices API to get historical prices at transaction time
- App calculates subscription:
days = (amountUSD / 4.99) * 30
Combined Subscription:
- Both payment sources are checked and combined
- Total days = blockchain days + Stripe days
- Subscription status cached in localStorage for 1 hour
See detailed setup guides:
- Stripe Payments:
STRIPE_INTEGRATION.md - Stablecoin Payments:
STABLECOIN_PAYMENTS.md
Quick Start:
# Set required environment variables in .env.local
ALCHEMY_API_KEY=your_alchemy_api_key
NEXT_PUBLIC_EVM_PAYMENT_WALLET_ADDRESS=0xYourWalletAddress
NEXT_PUBLIC_DYNAMIC_ENVIRONMENT_ID=your_dynamic_environment_id- ✅ Zero infrastructure costs (no database)
- ✅ No personal data storage (GDPR-friendly)
- ✅ Transparent (all payments verifiable on-chain)
- ✅ Flexible (users can top up anytime)
- ✅ Free tier API usage sufficient for 5,000+ daily active users
Dual License: This software is available under a dual licensing model:
Free for personal, educational, academic, and non-commercial use under the MIT License. See LICENSE-MIT.md for details.
Commercial use requires obtaining a commercial license. This includes:
- Use in commercial products or services
- Use by for-profit organizations
- Integration into commercial applications
- Revenue-generating activities
Contact us for commercial licensing:
- Email: hello@monadicdna.com
- Website: https://monadicdna.com
See LICENSE for full dual license details and LICENSE-COMMERCIAL.md for commercial license terms.
Cerezo M, Sollis E, Ji Y, Lewis E, Abid A, Bircan KO, Hall P, Hayhurst J, John S, Mosaku A, Ramachandran S, Foreman A, Ibrahim A, McLaughlin J, Pendlington Z, Stefancsik R, Lambert SA, McMahon A, Morales J, Keane T, Inouye M, Parkinson H, Harris LW. doi.org/10.1093/nar/gkae1070 Nucleic Acids Research, Volume 53, Issue D1, 6 January 2025, Pages D998–D1005