Recently, spoken language models (SLMs) have been highlighted as next-generation technology that surpasses the limitations of text-based language models by learning human speech without text to understand and generate linguistic and non-linguistic information.
They’re at 16 minutes. Let’s not jump to 24 hours just yet. Eventually, sure.