This project preprocesses audio, uploads it to Google Cloud Storage, transcribes it using Google Cloud Speech-to-Text, generates SRT subtitles, and adds these subtitles to a video file.
- Node.js installed on your machine.
- Google Cloud account and project set up.
ffmpeginstalled on your machine.- Environment variables configured in a
.envfile.
Create a .env file in the root directory of your project and add the following variables:
GOOGLE_CLOUD_PROJECT_ID=your-project-id
GOOGLE_CLOUD_KEY_FILE=path-to-your-service-account-json-file
GOOGLE_CLOUD_STORAGE_BUCKET_NAME=your-bucket-name
-
Clone the repository:
git clone https://github.com/mmiddletonn/VideoTranscription.git cd VideoTranscription -
Install the necessary dependencies:
npm install
Run the script with the following command:
node index.jsThe script performs the following steps:
-
Preprocess Audio: Converts and preprocesses the audio file (
audio.mp4) to remove noise and normalize levels, saving it asaudio.mp3. -
Upload Audio: Uploads the preprocessed audio file (
audio.mp3) to Google Cloud Storage. -
Transcribe Audio: Transcribes the audio file using Google Cloud Speech-to-Text and generates an SRT file (
subtitles.srt) with word-level timestamps. -
Add Subtitles to Video: Adds the generated subtitles to the original video file (
audio.mp4), producing an output video file (output_video_with_audio.mp4).
Preprocesses the input audio file and converts it to MP3 format.
- inputFilePath: Path to the input audio file (e.g.,
audio.mp4). - outputFilePath: Path to the output MP3 file (e.g.,
audio.mp3).
Uploads the preprocessed audio file (audio.mp3) to Google Cloud Storage.
Transcribes the audio file stored in Google Cloud Storage and generates an SRT file (subtitles.srt) with subtitles.
Adds subtitles to the video file.
- inputVideo: Path to the input video file (e.g.,
audio.mp4). - subtitlesFile: Path to the SRT subtitles file (e.g.,
subtitles.srt). - outputVideo: Path to the output video file (e.g.,
output_video_with_audio.mp4).
Errors encountered during each step are logged to the console.
This project is licensed under the MIT License.