Skip to content

SQL-based analysis of Spotify dataset to explore track performance, artist trends, and engagement metrics using PostgreSQL.

License

Notifications You must be signed in to change notification settings

uttkarshrajput/Spotify_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Spotify Dataset Analysis using SQL

Spotify Logo

Project Overview

This project focuses on analyzing a Spotify music dataset using SQL to understand track performance, artist trends, and engagement metrics. The goal of this project was to strengthen my SQL skills by working with a real-world dataset and applying concepts such as data exploration, aggregation, window functions, and query optimization.

The project covers the complete workflow — from understanding the dataset structure to writing optimized SQL queries and extracting meaningful insights.


Dataset Information

  • Source: Kaggle – Spotify Dataset

  • Data Type: Structured CSV data

  • Key Attributes:

  • Artist, Track, Album, Album Type

  • Audio features (danceability, energy, tempo, loudness, etc.)

  • Engagement metrics (views, likes, comments, streams)

  • Flags such as licensed and official_video


Tools & Technologies Used

  • Database: PostgreSQL

  • Query Language: SQL

  • SQL Concepts Applied:

  • Filtering & Aggregations

  • GROUP BY & HAVING

  • Subqueries & CTEs

  • Window Functions

  • Indexing & Query Optimization

  • Tool: pgAdmin 4


Project Approach

  • Explored and understood the dataset structure and columns

  • Created database tables and imported the dataset

  • Wrote SQL queries categorized into:

  • Easy (basic filtering and counts)

  • Medium (aggregations and grouping)

  • Advanced (CTEs, window functions, optimization)

  • Analyzed query performance using EXPLAIN ANALYZE

  • Improved performance by applying indexing techniques


Key Business Questions Answered

  • Which tracks have more than 1 billion streams?

  • What are the top 5 tracks with the highest energy levels?

  • How does engagement differ for official vs non-official videos?

  • Which artists and albums generate the highest views?

  • How do audio features like energy and liveness vary across albums?


Query Optimization

  • To understand performance tuning, I analyzed query execution plans using EXPLAIN ANALYZE.

  • Measured execution and planning time for queries

  • Created indexes on frequently queried columns (e.g., artist)

  • Compared query performance before and after indexing

  • Observed significant reduction in execution time after optimization

  • This helped me understand how indexing impacts query performance on large datasets.


Key Learnings

  • Practical application of SQL for data analysis

  • Writing efficient and optimized queries

  • Using window functions for advanced analytics

  • Understanding query execution plans

  • Working with real-world datasets and business-oriented questions


Key Insights

  • Tracks with official videos generally have higher engagement
  • High-energy tracks tend to receive more views
  • Indexing significantly reduced query execution time

Conclusion

This project enhanced my hands-on experience in SQL and data analysis. It demonstrates my ability to work with structured datasets, write optimized queries, and extract meaningful insights. The project reflects my readiness for entry-level roles in Data Analysis and Business Intelligence.

About

SQL-based analysis of Spotify dataset to explore track performance, artist trends, and engagement metrics using PostgreSQL.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published