Loading…

llm

Tragedy of the (data) commons

Ben chats with Shayne Longpre and Robert Mahari of the Data Provenance Initiative about what GenAI means for the data commons. They discuss the decline of public datasets, the complexities of fair use in AI training, the challenges researchers face in accessing data, potential applications for synthetic data, and the evolving legal landscape surrounding AI and copyright.

Detecting errors in AI-generated code

Ben chats with Gias Uddin, an assistant professor at York University in Toronto, where he teaches software engineering, data science, and machine learning. His research focuses on designing intelligent tools for testing, debugging, and summarizing software and AI systems. He recently published a paper about detecting errors in code generated by LLMs. Gias and Ben discuss the concept of hallucinations in AI-generated code, the need for tools to detect and correct those hallucinations, and the potential for AI-powered tools to generate QA tests.

The world’s largest open-source business has plans for enhancing LLMs

Ben and Ryan talk to Scott McCarty, Global Senior Principal Product Manager for Red Hat Enterprise Linux, about the intersection between LLMs (large language models) and open source. They discuss the challenges and benefits of open-source LLMs, the importance of attribution and transparency, and the revolutionary potential for LLM-driven applications. They also explore the role of LLMs in code generation, testing, and documentation.

The framework helping devs build LLM apps

Ben and Eira talk with LlamaIndex CEO and cofounder Jerry Liu, along with venture capitalist Jerry Chen, about how the company is making it easier for developers to build LLM apps. They touch on the importance of high-quality training data to improve accuracy and relevance, the role of prompt engineering, the impact of larger context windows, and the challenges of setting up retrieval-augmented generation (RAG).

How do you evaluate an LLM? Try an LLM.

On this episode: Stack Overflow senior data scientist Michael Geden tells Ryan and Ben about how data scientists evaluate large language models (LLMs) and their output. They cover the challenges involved in evaluating LLMs, how LLMs are being used to evaluate other LLMs, the importance of data validating, the need for human raters, and more needs and tradeoffs involved in selecting and fine-tuning LLMs.

Are long context windows the end of RAG?

The home team is joined by Michael Foree, Stack Overflow’s director of data science and data platform, and occasional cohost Cassidy Williams, CTO at Contenda, for a conversation about long context windows, retrieval-augmented generation, and how Databricks’ new open LLM could change the game for developers. Plus: How will FTX co-founder Sam Bankman-Fried’s sentence of 25 years in prison reverberate in the blockchain and crypto spaces?

A leading ML educator on what you need to know about LLMs

Machine learning scientist, author, and LLM developer Maxime Labonne talks with Ben and Ryan about his role as lead machine learning scientist, his contributions to the open-source community, the value of retrieval-augmented generation (RAG), and the process of fine-tuning and unfreezing layers in LLMs. The team talks through various challenges and considerations in implementing GenAI, from data quality to integration.