Wondering what the best possible models currently available are for voice LLMs, as I have not found a benchmark for this.
There aren’t many speech-to-speech models to begin with, and ASR and TTS are difficult to benchmark… Because they are difficult to quantify.
Well, there are a few.
thanks john!