It doesn't work that way.
TTS lang/voice files are just words, text. What synthesizes the voice is the engine.