Skip to content

Conversation

@mitcheb0219
Copy link
Contributor

Added Ability to select Voice Model for ElevenLabs.

Added ability to use voice-styles for Elevenlabs and Azure backends.

  • Elevenlabs: Voice styles are user-generated. Users can click any tag to have it copied to clipboard for pasting into FFXIV Chat
    (Note. Use of voice tags requires leveraging ElevenLabs' V3 voice model. Any messages containing a style tag (i.e. [whispering]) will automatically be forced onto the V3 model for that Say request.

  • Azure: Voice styles are limited in scope and dependent on the selected voice. Users can click any tag to have it copied to clipboard for pasting into FFXIV Chat. Any mismatching voice tags will be ignored by the speech synthesizer.

Added command "/tttstyles" which will open the "Voice Styles" widget for the currently active backend. This widget leverages the existing backend-switching architecture.

These features allow the user to use more granular expressiveness with their TTS messages. Common use-cases would be for role-players or users who are non-verbal/shy.

mitcheb0219 and others added 4 commits December 31, 2025 13:29
Added Ability to select Voice Model for ElevenLabs.

Added ability to use voice-styles for Elevenlabs and Azure backends.

- Elevenlabs:  Voice styles are user-generated.  Users can click any tag to have it copied to clipboard for pasting into FFXIV Chat

-Azure: Voice styles are limited in scope and dependent on the selected voice.  Users can click any tag to have it copied to clipboard for pasting into FFXIV Chat

Added command "/tttstyles" which will open the widget for the currently active backend.

These features allow the user to use more granular expressiveness with their TTS messages.  Common use-cases would be for role-players or users who are non-verbal/shy.
Updated SSML generation to ignore styling tags if using SYSTEM backend.  Tags will still be stripped from the message.
throw new ElevenLabsMissingCredentialsException("No ElevenLabs authorization keys have been configured.");
}

var modelId = (text.Contains("[") && text.Contains("]")) ? "eleven_v3" : model; //if user uses SSML tags, force eleven_v3 model
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be reflected in the configuration window somehow - it'd be poor UX to select a model and implicitly always use eleven_v3 just because SSML tags are being used. At the very least, it should visibly lock the model selector in the config window.

More importantly, this seems likely to hit non-SSML messages, too - [] aren't particularly uncommon characters to use in macros and such.

Copy link
Contributor Author

@mitcheb0219 mitcheb0219 Jan 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a tooltip in the styles widget would help communicate that. The swap to V3 if using a tag would only apply for that specific message. Any other messages sent without a tag would use the model selected in the preset.

As far as identifying the tags, agreed that [] could often come up in unintended ways. Maybe will switch it to ~~tag~~

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would be better to have a toggle in the Elevenlabs backend that would say "enable voice styles" which would then lock the preset to V3?

The original design was just so a user wouldn't be locked in to the more expensive model and could utilize it only in situations where a voice style was wanted

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would be better to have a toggle in the Elevenlabs backend that would say "enable voice styles" which would then lock the preset to V3?

I think this is a good idea, and we should also have a tooltip, too - the tooltip can just be next to (or maybe on) that setting. We can also use ImRaii.Disabled() to disable interacting with the model selector directly.

Alternatively, we can go the other way around and have the checkbox be disabled (and non-interactable) unless the model is set to V3, with the checkbox tooltip stating that output styles are disabled unless V3 is selected.

// This regex captures the style name in group 1 and the text in group 2.
// It replaces the whole match with the SSML tag, effectively removing
// the [styleName] text from the spoken output.
text = Regex.Replace(text, @"\[([^\]]+)\]([^\[]*)", m =>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted elsewhere that this is too general, but also, where exactly are these style tags added to the text in the first place? I think this is perhaps not the right place for this code to live, but I'm not sure where the style tags come from, so I'm not sure yet.

Copy link
Contributor Author

@mitcheb0219 mitcheb0219 Jan 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the code so far there are two scenarios:

Elevenlabs uses user-generated tags that their AI tries to meaningfully interpret. This is why the Elevenlabs styles widget depends on user input. Any styles can be added/removed, are listed alphabetically, and allow the user to easily copy a desired tag to clipboard before beginning their chat message.

Azure, on the other hand, has very restrictive tags that are specific to the voice selected (i.e. Jenny Neural). Some voices have no style options at all. Luckily the available styles for the voice can be captured in the API call that returns the voices for the backend's dropdown box. So as the user changes voices in the currently active preset, the list of available styles in the widget will update as well. Unlike Elevenlabs, the user cannot make their own tags.

Copy link
Contributor Author

@mitcheb0219 mitcheb0219 Jan 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user would add the tag to their message in the message body.

For example:

[whisper] I may have let him die on purpose.

The whisper tag would get interpreted but not actually read out in the resulting TTS

Here's a small demo of it:

https://youtu.be/mLjpBeBzDTY?si=N9XJy952BTRH_lFP

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think I may have misunderstood this from the beginning - this is actually more complicated than I initially recognized. I thought this was something that would apply to all messages, but it only applies to people following a particular convention in their messages.

In that case, we probably need this to be configurable, unless this is just a convention that ~everyone already follows. I would think people would want to define the patterns used for extracting styles, but we would have some sort of reasonable defaults ([] might be fine). That would be something along the lines of how inclusion/exclusion rules are configured, but we might be able to simplify things for this.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the main things I'm thinking about is how this works for detecting styles from other people's messages - presumably people have pre-existing conventions for how they communicate styles, and I would expect there to be multiple conventions floating around, which is why some sort of configurable patterns makes sense to me. You might want to, say, plug in a handful of patterns that match the conventions of people you interact with regularly, for example.

[Backend]VoiceStylesUI.cs added for backends that will have style tags enabled.

Added "Voice Styles" button to each backend supporting styles to make the discovery aspect easier for the user.

Modified the tag to "[[ ]]" to reduce risk of accidental usage.

Fixed bug with deleting Voice Presets.  This bug is present in the current release (1.34.0.0).  Please see adjustments in ``BackendUI.cs``

Fixed odd behavior when toggling to Elevenlabs backend when the last used preset has been deleted.  Preset dropdown was not rendering and forced the user to create a new preset in order to select any other existing presets.
@mitcheb0219 mitcheb0219 changed the title Added Model Selection for Elevenlabs _ Added Voice Styles for Elevenlabs and Azure Added Model Selection for Elevenlabs _ Added Voice Styles for Elevenlabs, Azure, and OpenAI. Minor UI bugfixes Jan 2, 2026
mitcheb0219 and others added 3 commits January 1, 2026 22:14
Bug was causing weird behavior when deleting a voice preset from Azure, System, Elevenlabs, Kokoro, Uberduck, and GoogleCloud backends.

Added logic to select first preset from list if the list contained any presets and a preset has not already been selected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants