txt2speech

Set of utilities to play with Google's text-to-speech API and generate a spoken audio from TXT files. It can be used e.g. to generate a computer spoken "audiobook" from a text, where no audio version exists.

Based on samples in googleapis/nodejs-text-to-speech.

As it is a kind of "MVP" for my personal use case, parts are a bit hardcoded.

It works only with Google Cloud account with activated billing, but there is a "free tier - pricing". Currently, ~1M of input text per month should be free (but do recheck actual state, as this may change). So there will be no charge until some amount of processed data. For my use cases, there is quite a lot free.

As of 2025-05 free tier includes Chirp 3 HD voices (1M chars free tier).

See --help option for description of parameters.

Preconditions:

Authenticate against Google cloud:
- Create a service account
- Pass credentials in the GOOGLE_APPLICATION_CREDENTIALS environment variable.
In order mp3 merging to work, ffmpeg must be on the path.

API

"texttospeech.googleapis.com" needs to be activated in your Google Cloud project https://cloud.google.com/text-to-speech/docs/reference/rest/?apix=true

Authentication (alternative to using service account - NOT recommended, but possible e.g. for quick testing):

set USER x@x.com
set PROJECT_ID project-id

gcloud auth revoke $USER && gcloud auth login $USER
gcloud config set project $PROJECT_ID
gcloud auth application-default set-quota-project $PROJECT_ID
gcloud config set billing/quota_project $PROJECT_ID
gcloud auth application-default login

# don't forget to revoke default credentials
gcloud auth application-default revoke

Check voice demos at (new Chirp 3: HD voices): https://cloud.google.com/text-to-speech/docs/chirp3-hd

Troubleshooting

This request contains sentences that are too long. Consider splitting up long sentences.

See fixLongLines(), which is currently commented out. In my case, the problem was, that although particular line contained "." as request by Google TTS, some "." were followed by further characters e.g. line "xx xx xx.12 sss sss sss.13 xx xx xx." then the "." was not recognized as the end of sentence. The solution was to remove the numbers.

Versions

Tag v2021-ssml for the older version of the code, which uses SSML to generate the audio

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.github/workflows		.github/workflows
.idea		.idea
src		src
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
ts		ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

txt2speech

Preconditions:

API

Troubleshooting

This request contains sentences that are too long. Consider splitting up long sentences.

Versions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

robert7/txt2speech

Folders and files

Latest commit

History

Repository files navigation

txt2speech

Preconditions:

API

Troubleshooting

This request contains sentences that are too long. Consider splitting up long sentences.

Versions

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages