AURA (Atypical Understanding & Recognition for Accessibility) is a project dedicated to advancing speech recognition for individuals with atypical speech patterns. Leveraging the Speech Accessibility Project (SAP) dataset, we fine-tune state-of-the-art models to improve recognition accuracy and accessibility. Our approach is evaluated on both the SAP and TORGO datasets, demonstrating its robustness across diverse atypical speech scenarios.
Setup the environment called aura by running the following commands:
conda update conda -y
conda create --name aura python=3.10
conda activate aura
conda install ipykernel ipywidgets -y
python -m ipykernel install --user --name aura --display-name "aura"Install the required packages by running the following command:
pip install -r requirements.txt
pip install -e .To run the notebook, execute the following command:
bash lora_finetune.shDuring training, you can check your GPU usage:
watch -n 0.5 -c gpustat -cp --colorMerge the LoRA weights with the base model using the following command:
bash merge_lora.shEvaluate the model on the SAP and TORGO datasets using the following command:
bash evaluate.shThe data is prepared in JSONL manifest files that feed into the training and evaluation pipelines.
Each line in the JSONL manifest file represents a single audio sample with its transcription and metadata. The format is as follows:
{
"audio": {
"path": "/path/to/audio/file.wav"
},
"sentence": "Transcription of the audio file",
"sentences": [],
"duration": 14.07
}The SAP dataset includes recordings from individuals with atypical speech patterns.
First download SAP dataset to /path/to/sap and extract all the tar files.
cd /path/to/sap/Train
for i in $(seq -f "%03g" 0 16); do tar xvf SpeechAccessibility_2025-03-31_$i.tar; done
tar xvf SpeechAccessibility_2025-03-31_Train_Only_Json.tar
cd /path/to/sap/Dev
for i in $(seq -f "%03g" 0 2); do tar xvf SpeechAccessibility_2025-03-31_$i.tar; done
tar xvf SpeechAccessibility_2025-03-31_Dev_Only_Json.tarThis will let you get a bunch of tar files, and you need to extract all of them.
for f in *.tar; do [[ $f != SpeechAccessibility_* ]] && tar xvf "$f"; doneBefore creating manifests, SAP audio files need to be preprocessed because some WAV files are not mono-channel. Convert them to mono with a 16kHz sample rate using:
python sap_mono_converter.py --input-dir /path/to/sap --sample-rate 16000 --output-suffix mono-16kAfter preprocessing, create the manifest files:
python prepare_sap.py --sap-dir /path/to/sap-mono-16k --output-dir /path/to/outputThis will generate train.jsonl and dev.jsonl files for training and validation.
The TORGO dataset contains speech recordings from individuals with cerebral palsy (CP) and amyotrophic lateral sclerosis (ALS), classified by severity levels:
python prepare_torgo.py --torgo-dir /path/to/torgo --output-dir /path/to/outputThis script processes the TORGO dataset and generates three JSONL files:
torgo_severe.jsonl: Contains recordings from speakers with severe dysarthriatorgo_moderate.jsonl: Contains recordings from speakers with moderate dysarthriatorgo_mild.jsonl: Contains recordings from speakers with mild dysarthria
This project is licensed under the MIT License.