Podcast Cliff RSS

This project is capable of fetching and transcribing media files, usually from a podcast RSS feed. It leverages Redis Queue (RQ) to process tasks asynchronously, and optimizes transription by splitting the audio into chunks allowing for the processing each chunk in parallel.

Transcription is handled by Vosk API.

Application Flow

feed_item_requested(url, title=None):
- Fetches episode metadata and enqueues a media download request.
media_download_requested(json_string):
- Downloads the media from the provided URL and saves it to disk.
- Enqueues the task media.new_file_present.
new_file_present(json_string):
- Simply enqueues audio.new_file_present for further processing.
fragment_saved(json_string):
- Receives saved fragment info, transcribes audio using transcribe_audio.
- Enqueues file.fragment_list_completed when all transcription files are ready.
fragment_list_completed(json_string):
- Compiles the full transcript from the audio fragments and saves it.
- Enqueues file.transcript_file_saved.
transcript_file_saved(json_string):
- Cleans up the media and transcript files by removing them from the disk.

Prerequisites

Docker and Docker Compose installed on your machine.

Installation

Clone the repository:

git clone https://github.com/jsamos/podcast-cliff.git
cd podcast-cliff

Build the Docker containers:
```
docker-compose build
```

Usage

Start the services:
```
docker-compose up
```

API

The API will be available at http://localhost:#{API_PORT}. You can use the following endpoint to enqueue a transcription job:

Endpoint: /transcribe/rss
Method: POST
Content-Type: application/json
Authentication: Basic Auth (use the API_TOKEN as the username, leave the password blank)
Body parameters:
- rss_url (required): The URL of the RSS feed
- title (optional): The title of the specific episode to transcribe

   curl -X POST http://localhost:5001/transcribe/rss \
   -H "Content-Type: application/json" \
   -H "Authorization: Basic $(echo -n ${API_USERNAME}:${API_PASSWORD} | base64)" \
   -d '{
     "rss_url": "https://feeds.captivate.fm/the-game-alex-hormozi/",
     "title": "Why Branding Makes You Money"
   }'

CLI

RSS Feeds

Exec into the message-processor container to run the run_rss.py script:

docker compose exec message-processor bash
python run_rss.py <RSS_FEED_URL> [--title "<EPISODE_TITLE>"]

Replace <RSS_FEED_URL> with the actual URL of the RSS feed you want to fetch. For example:

python run_rss.py https://feeds.captivate.fm/the-game-alex-hormozi/

Tail the logs of the message-processor to see the processing logs:
```
docker compose logs -f message-processor
```

Optional Parameters

The run_rss.py script supports an optional parameter to specify the episode you want to fetch:

--title: Fetches a specific episode by its title. It will use "fuzzy" matching to find the episode, incase you copy pasta badly

Example:

To fetch episode by title from the RSS feed:

python fetch_rss.py https://feeds.captivate.fm/the-game-alex-hormozi/ --title 'Why Branding Makes You Money'

Default Behavior:

If the --title parameter is not provided, the script will fetch the latest episode (the first item in the RSS feed).

Local files:

The app also supports transcribing local files using the run_stored.py script:

python run_stored.py <PATH_TO_FILE>

it currently supports mp3 and wav files.

Contributing

Fork the repository.
Create a new branch (git checkout -b feature-branch).
Make your changes.
Commit your changes (git commit -am 'Add new feature').
Push to the branch (git push origin feature-branch).
Create a new Pull Request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
api		api
docs		docs
message-processor		message-processor
.env.example		.env.example
.gitignore		.gitignore
README.MD		README.MD
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Podcast Cliff RSS

Table of Contents

Application Flow

Prerequisites

Installation

Usage

API

CLI

RSS Feeds

Optional Parameters

Example:

Default Behavior:

Local files:

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

jsamos/podcast-cliff

Folders and files

Latest commit

History

Repository files navigation

Podcast Cliff RSS

Table of Contents

Application Flow

Prerequisites

Installation

Usage

API

CLI

RSS Feeds

Optional Parameters

Example:

Default Behavior:

Local files:

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages