Skip to content

cawa102/Youtube-Transcript-Collector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

YouTube Transcript Collector

A Google Colab notebook that collects YouTube video transcripts and combines them into a single text file.

Features

  • Collect by Channel: Enter a YouTube channel URL/ID to download transcripts from all videos or filter by date
  • Collect by Search: Search for videos by keyword, view results with view counts, and select specific videos
  • Custom Video List: Enter specific video URLs or IDs directly
  • Multi-language Support: Automatically detects and fetches available transcripts (English, Japanese, and more)
  • Formatted Output: Exports all transcripts to a single text file with video metadata

Quick Start

  1. Open in Google Colab: Click the button below to open the notebook

    Open In Colab

  2. Get a YouTube Data API Key (free):

    • Go to Google Cloud Console
    • Create a project and enable "YouTube Data API v3"
    • Create an API key under Credentials
  3. Run the notebook cells in order

Output Format

++++++++++++++[Video Title | 2024-01-15]
transcript text goes here...
==============END=================


++++++++++++++[Another Video | 2024-01-10]
another transcript text...
==============END=================

Requirements

  • Google Colab (recommended) or Python 3.x environment
  • YouTube Data API Key (free, 10,000 units/day quota)

Python Dependencies

youtube-transcript-api>=1.0.0
google-api-python-client
python-dateutil

Usage Options

Option 1: By Channel

  1. Enter a YouTube channel URL (supports multiple formats):

    • https://www.youtube.com/@ChannelHandle
    • https://www.youtube.com/channel/UCxxxxxx
    • Channel name (will search)
  2. Choose to download:

    • All videos from the channel
    • Only videos published after a specific date

Option 2: By Search

  1. Enter a search keyword
  2. View results with title, channel, date, and view count
  3. Select videos:
    • Select all results
    • Select specific videos by number (e.g., 1,3,5,7-10)

Bonus: Custom Video List

Enter specific video URLs or IDs directly to collect their transcripts.

API Quota Usage

API Call Cost
Search 100 units
Channel info 1 unit
Video list 1 unit
Video statistics 1 unit

Daily quota: 10,000 units (sufficient for ~100 searches or thousands of video info requests)

Project Structure

YouTube-Transcript-Collector/
├── README.md
├── LICENSE
├── requirements.txt
├── .gitignore
├── YouTube_Transcript_Collector.ipynb  # Main notebook
├── CLAUDE.md                            # Development guidelines
└── docs/
    ├── requirement.md                   # Requirements specification
    ├── architecture.md                  # System architecture
    ├── api_reference.md                 # API reference
    ├── testing_guide.md                 # Testing guide
    └── changelog.md                     # Change history

Limitations

  • Videos with disabled captions cannot be processed
  • Some videos may not have transcripts available
  • API quota limits apply (10,000 units/day)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

About

Get YouTube transcripts from Videos/Channels/KeyWords

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published