Skip to content

Sandalots/YourNextAlbum

Repository files navigation

Deployed

Deployed at: https://yournextalbum-lwfn7mddzxpgygdmasti3z.streamlit.app

YourNextAlbum

  • pitchfork_album_scraper.py: Scrapes thousands of album reviews from Pitchfork using Selenium and BeautifulSoup. The scraper collects review text, scores, album/artist info, and album art URLs.
  • album_reviews_dataset_creator.py: Cleans and preprocesses the raw review data using NLTK (Natural Language Toolkit) for tokenization, stopword removal, lemmatization, and stemming, plus HTML/text cleaning and feature engineering (e.g., sentiment, mood, instrumentation, context).
  • album_reviews_text_sentiment_analyser.py: Uses NLP models (HuggingFace Transformers) to analyze review sentiment, summarize text, and extract features like mood, energy, instrumentation, production style, novelty, and more. Models used include DistilBERT for sentiment and BART for summarization.
  • album_recommender_model.py: Builds recommendation models using TF-IDF and SentenceTransformer embeddings. It combines review features, album metadata, and semantic search to match user prompts to relevant albums.
  • album_recommender_prompt_app.py: Streamlit web app for interactive recommendations. Users enter a prompt (e.g., “dreamy synthpop for studying”), and the app displays recommended albums with cover art, artist, score, and key highlights. The UI features custom CSS for a modern look and smooth animations.

Album Recommendation System Pipeline

  1. Scraping: pitchfork_album_scraper.py collects thousands of reviews from Pitchfork, saving them as CSV.
  2. Preprocessing: album_reviews_dataset_creator.py cleans and normalizes the data, engineering features for downstream analysis.
  3. Sentiment Analysis: album_reviews_text_sentiment_analyser.py applies NLP models and custom keyword lists to extract rich features from each review.
  4. Recommendation: album_recommender_model.py builds semantic and keyword-based models to match user queries to albums.
  5. User Interface: album_recommender_prompt_app.py provides a fast, interactive Streamlit UI for exploring recommendations.

Models & Features

  • Sentiment Analysis: DistilBERT (HuggingFace) for positive/negative/neutral sentiment.
  • Summarization: BART (HuggingFace) for concise review summaries.
  • Semantic Search: SentenceTransformer for embedding-based matching.
  • TF-IDF: For keyword-based similarity.
  • Feature Extraction: Custom logic for mood, energy, instrumentation, production, novelty, context, and more.

Streamlit End to End Album Recommendation Prompt Application

  • Enter a prompt describing your desired album (genre, mood, context, etc.).
  • View recommended albums with cover art, artist, score, and highlights.
  • UI features include centered content, fade-in animations, custom button styling, and hidden Streamlit menu/deploy buttons.

Installation & Running the Album Recommendation System

python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt

python pitchfork_album_scraper.py  # Scrape reviews, (DO NOT RUN UNLESS YOU REALLY HAVE TO AS IT WILL TAKE A VERY LONG TIME TO SCRAPE ALL ~5000 ALBUM REVIEWS!!!)
python album_reviews_dataset_creator.py # Preprocess with NLTK and turn into a dataset
python album_reviews_text_sentiment_analyser.py # Analyze (8+ feature types) from review texts
python album_recommender_model.py # builds the model and recommends albums based on given prompt

streamlit run album_recommender_prompt_app.py # Launches a streamlit UI to allow users to prompt the created model for album recommendations

(or alternatively use `./scripts/quickstart.sh` on Unix/macOS.)

Demo Video

A demonstration video of YourNextAlbum in action is included in this repository.

📹 View the demo video: demo.mov

Analysis Features from the Author's Album Review(s)

  • Sentiment & Consensus - Tone + score alignment
  • Instrumentation - Guitar, synth, drums, vocals, etc.
  • Mood & Energy - Sad/happy/dark + high/medium/low
  • Production - Polished/raw/experimental, analog/digital
  • Novelty - Innovative vs. derivative detection
  • Context - Party, study, workout, relaxation, etc.
  • Temporal - Vintage/contemporary/timeless
  • Polarizing - Divisive "love it or hate it" albums

Output CSV Files (outputs/ directory)

  • pitchfork_reviews.csv: Raw scraped reviews from Pitchfork. Contains all original review text, scores, album/artist info, and URLs as collected by pitchfork_album_scraper.py.
  • pitchfork_reviews_preprocessed.csv: Cleaned and preprocessed dataset. Text is normalized, tokenized, lemmatized, and stopwords are removed using NLTK. Used as input for feature extraction and sentiment analysis.
  • pitchfork_reviews_preprocessed_plus_sentiments.csv: Fully analyzed dataset with all extracted features and sentiment results. Includes columns for sentiment, mood, instrumentation, production, novelty, context, and more, generated by album_reviews_text_sentiment_analyser.py.

Model Files (models/ directory)

  • tfidf_vectorizer.pkl: Saved TF-IDF vectorizer used to transform review and album text into feature vectors for similarity search.
  • album_tfidf_matrix.pkl: Precomputed TF-IDF feature matrix for all albums, used for fast keyword-based recommendations.
  • album_semantic_embeddings.npy: SentenceTransformer embeddings for all albums, enabling semantic search and matching user prompts to albums based on meaning.

As these models are pre-saved and come with the repo, you can run the streamlit application as you wish even without training them yourself.

Error Analysing

To analyze errors and evaluate the performance of the album recommendation system, use the provided error analysis script. This script runs a large set of prompts through the recommender, generates recommendations, and computes a variety of metrics and analyses to help you understand model performance and behavior.

Run the following commands:

python3 analysis/error_analyze_recommender.py

The python source file will output:

  • Recall@5: How many relevant albums were successfully recommended out of all possible relevant albums (for each prompt).
  • Precision@: How many of the recommended albums are actually relevant.
  • nDCG@5 (Normalized Discounted Cumulative Gain): Measures ranking quality, giving higher scores for relevant albums ranked higher.
  • MRR@5 (Mean Reciprocal Rank): Indicates how early the first relevant album appears in the recommendations.
  • Per-genre breakdowns: Average metrics for each genre present in the recommendations.
  • Diversity and overlap analysis: Statistics and visualizations on unique genres/albums per prompt, overlap between prompts, and most commonly recommended albums and artists.
  • Detailed prompt-by-prompt results: For each prompt, shows the ground truth, recommended albums, and all computed metrics.
  • Top 20 artists: Lists the most frequently recommended artists.

This aids in determining YourNextModel performance, as-well as any biases slash undesired behaviours of the models.

Requirements

  • Python 3.11** (recommended; other versions may cause dependency issues such as pyarrow)
  • pip** (Python package manager)
  • System packages (install via Homebrew on macOS):
  • cmake (required for some Python packages)
  • openssl (required for secure connections)
  • apache-arrow (required for pyarrow)
  • Python packages (installed via pip install -r requirements.txt):
  • pyarrow (for Arrow/Parquet data support)
  • selenium (for web scraping automation)
  • beautifulsoup4, requests, lxml (for web scraping and parsing)
  • pandas, numpy (for data processing)
  • scikit-learn, sentence-transformers, transformers, torch (for ML/NLP)
  • streamlit (for the web app UI)
  • matplotlib, seaborn (for visualizations)
  • nltk (for text preprocessing)
  • watchdog (for file monitoring)

See requirements.txt for the full list of Python dependencies.

About

AI Album Recommendation Model using TF-IDF, SentenceTransformers & BART.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published