I suppose that if someone builds a system where there’s a LLM doing mapping not just from the spoken text, but from descriptive text to speech – like, do Tortoise TTS but with a Stable Diffusion style prompt for description, it’d be possible to hear SirMechsALot’s voice. That’d be interesting.
What do you use for this?
Mine are made in Midjourney:
I use Stable Difussion
Sir mechs a lot Steps: 20, Sampler: DPM++ 3M SDE Karras, CFG scale: 7, Seed: 2748980831, Size: 768x1280, Model hash: 74dda471cc, Model: realvisxlV20_v20Bakedvae, Clip skip: 2, RNG: CPU, Version: v1.6.0
I suppose that if someone builds a system where there’s a LLM doing mapping not just from the spoken text, but from descriptive text to speech – like, do Tortoise TTS but with a Stable Diffusion style prompt for description, it’d be possible to hear SirMechsALot’s voice. That’d be interesting.