A workflow designed to clean fastq files for the SEACONNECT project
-
Updated
Aug 21, 2019 - Python
A workflow designed to clean fastq files for the SEACONNECT project
Sievio is a Python toolkit that streams GitHub, local repositories, and other text/code sources into clean JSONL corpora for LLM pre-training, fine-tuning, or RAG. It includes structure-aware chunking, robust Unicode decoding, pluggable quality and safety screening, and optional dataset card generation and deduplication support.
Automated quality filtering for diabetic retinopathy images using adaptive, medically informed thresholds.
Add a description, image, and links to the quality-filtering topic page so that developers can more easily learn about it.
To associate your repository with the quality-filtering topic, visit your repo's landing page and select "manage topics."