Looking under the hood at the tech stack that powers multimodal AI

Ryan chats with Russ d’Sa, cofounder and CEO of LiveKit, about multimodal AI and the technology that makes it possible. They talk through the tech stack required, including the use of WebRTC and UDP protocols for real-time audio and video streaming. They also explore the big challenges involved in ensuring privacy and security in streaming data, namely end-to-end encryption and obfuscation.

Credit: Alexandra Francis

Multimodal AI combines different modalities—audio, video, text, etc.—to enable more humanlike engagement and higher-quality responses from the AI model.

WebRTC is a free, open-source project that allows developers to add real-time communication capabilities that work on top of an open standard to their applications. It supports video, voice, and generic data.

LiveKit is an open-source project that provides scalable, multi-user conferencing based on WebRTC. It’s designed to provide everything developers need to build real-time voice and video applications. Check them out on GitHub.

Connect with Russ on LinkedIn or X and explore his posts on the LiveKit blog.

Stack Overflow user Kristi Jorgji threw inquiring minds a lifejacket (badge) by answering their own question: Error trying to import dump from mysql 5.7 into 8.0.23.

TRANSCRIPT

Add to the discussion