YouTube Data Scraper

A Python script collection that extracts data from YouTube using Playwright. The scripts support processing both videos and channels, saving data in CSV and JSON formats.

Features

Video Scraper (`video.py`)

Extracts video metadata:
- Title
- Description
- Channel name
- View count
- Like count
- Upload date
- Duration
- Keywords
- And more...
Captures top 5 comments for each video
Supports processing multiple videos
Tracks processed videos to avoid duplicates
CSV output compatible with Excel

Channel Scraper (`channel.py`)

Extracts channel information:
- Channel description
- Social media links (with resolved redirect URLs)
- Location
- Join date
- Subscriber count
- Video count
- View count
Saves data in JSON format

Requirements

Python 3.7+
Playwright
A stable internet connection

Installation

Clone the repository:

git clone https://github.com/xkchok/yt-data-scraper.git
cd yt-data-scraper

Create and activate a virtual environment:

pip install uv
uv sync

Install dependencies:

uv add playwright
uvx playwright install chromium

Usage

Video Scraper

Add your video IDs to the video_ids list in video.py:

video_ids = [
    'c2tuxS3Pcto',  # Example video ID
    'kxs9Su_mbpU',  # Another video ID
    'mvkbCZfwWzA'   # Another video ID
]

Run the script:

uv run video.py

The video scraper will:

Create a timestamped CSV file for the data
Process each video and save its data immediately
Track processed videos in processed_videos.txt
Skip any previously processed videos

Channel Scraper

Edit the channel URLs in channel.py:

channels = [
    "https://www.youtube.com/@The_FirstTake",
    # Add more channels here
]

Run the script:

uv run channel.py

The channel scraper will:

Visit each channel's "About" page
Extract available information
Save the data to a JSON file

Output Files

Video Scraper

video_data_[timestamp].csv: Contains the extracted video data in CSV format
processed_videos.txt: Tracks which videos have been processed and when

Channel Scraper

channel_data_[timestamp].json: Contains the extracted channel data with this structure:

{
  "scrape_time": "2024-03-21T12:34:56.789012",
  "channels": [
    {
      "channel_name": "ChannelName",
      "url": "https://youtube.com/@ChannelName",
      "description": "Channel description...",
      "social_links": [
        {
          "title": "Link title",
          "text": "Link text",
          "url": "Actual URL (not YouTube redirect)"
        }
      ],
      "subscribers": "1.2M subscribers",
      "views": "100M views",
      "country": "Country name",
      "joined": "Joined date"
    }
  ]
}

Notes

Video Scraper

Videos are processed one at a time to avoid rate limiting
Failed videos won't be marked as processed
All text fields are cleaned and formatted for CSV compatibility

Channel Scraper

Some channels may have different amounts of information available

Error Handling

Both scripts include error handling for:

Network issues
Missing elements
Loading failures
File writing errors

Failed items will be reported but won't stop the scripts from processing other items.

Contributing

Feel free to submit issues and enhancement requests!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
channel.py		channel.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock
video.py		video.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

YouTube Data Scraper

Features

Video Scraper (`video.py`)

Channel Scraper (`channel.py`)

Requirements

Installation

Usage

Video Scraper

Channel Scraper

Output Files

Video Scraper

Channel Scraper

Notes

Video Scraper

Channel Scraper

Error Handling

Contributing

About

Uh oh!

Uh oh!

Languages

xkchok/yt-data-scraper

Folders and files

Latest commit

History

Repository files navigation

YouTube Data Scraper

Features

Video Scraper (video.py)

Channel Scraper (channel.py)

Requirements

Installation

Usage

Video Scraper

Channel Scraper

Output Files

Video Scraper

Channel Scraper

Notes

Video Scraper

Channel Scraper

Error Handling

Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages

Video Scraper (`video.py`)

Channel Scraper (`channel.py`)