GitHub - LauraFe01/singer-detector: One shot learning to identify the voice of a singer using transfer learning based on VGGish architecture

Problem Statement

As a singer, I tend to record a lot of songs (bhajans specificaly, songs like these) on my phone, sang by other singers and myself alike. The result is a jumble of songs with generic names like "My Recording 67.wav". My loved ones would often ask me to send songs that I have sung and I found it very difficult to find anything in this mess. I took the opportunity to solve this problem with machine learning.

Methodology

TL;DR

I used voice recordings that I had gathered on my phone over the last year (total of ~350 bhajans, ~80 were sang by me, ~20 different singers). I developed a simple html/js tool that would help annotate the songs, shared subsets of the data to friends and family members and within a few weeks I had a usable dataset. I then converted the dataset into 4 second spectrograms that could be fed into a deep neural net based on VGGish model.

I used two different methods to identify my voice in a given snipet of audio

Generalizable model -
Used a siamese network to train a model that generates a "fingerprint" of a given singer. A new audio sample is compared to the fingerprint using a distance metric and is classified as my voice if the distance is within a defined threshold
Non Generalizable model
- Binary Classifier - Trained a model that predicts whether a given spectrogram is my voice or not
- Multi-class Classifier - Trained a model that that predicts whether a given spectrogram is one or many different singers present in the dataset

Results

Non generalizable models performed much better in identifying my voice, >99% accuracy and recall for both binary and multi class classifier
The siamese network performed very well at the task of distinguishing between two artists (>90% accuracy on validation data). This however did not directly translate into stellar performance in the one-shot learning task. Using an average of "fingerprints" generated for spectrograms as the my voice's fingerprint, I was able to identify my songs with a ~70% accuracy.

For details, checkout the blog.

Future Work

Add more variety to the dataset and retrain (more female singers, more songs without any percussion and supporting instruments)
Evaluate different methods of generating a fingerprint for a singer
Deploy model as a consumable API

References

Keras Implementations of VGGish : https://github.com/DTaoo/VGGish
Gemmeke, J. et. al., AudioSet: An ontology and human-labelled dataset for audio events, ICASSP 2017
Hershey, S. et. al., CNN Architectures for Large-Scale Audio Classification, ICASSP 2017
One Shot Learning with Keras

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
DatasetPrep		DatasetPrep
Song Annotator		Song Annotator
VGGish		VGGish
DataGenerator.py		DataGenerator.py
Full Pipeline.ipynb		Full Pipeline.ipynb
Generate Fingerprints.ipynb		Generate Fingerprints.ipynb
LICENSE		LICENSE
Model Creation.ipynb		Model Creation.ipynb
Model Prediction.ipynb		Model Prediction.ipynb
ModelLogger.py		ModelLogger.py
README.md		README.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Problem Statement

Methodology

TL;DR

Results

Future Work

References

About

Uh oh!

Releases

Packages

Languages

License

LauraFe01/singer-detector

Folders and files

Latest commit

History

Repository files navigation

Problem Statement

Methodology

TL;DR

Results

Future Work

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages