Issue 225: LLM on LLM evaluation

From the blog

How do you evaluate an LLM? Try an LLM.

On this episode: Stack Overflow senior data scientist Michael Geden tells Ryan and Ben about how data scientists evaluate large language models (LLMs) and their output. They cover the challenges involved in evaluating LLMs, how LLMs are being used to evaluate other LLMs, the importance of data validating, the need for human raters, and more needs and tradeoffs involved in selecting and fine-tuning LLMs.

How to succeed as a data engineer without the burnout

The key strategies for building a headache-free data platform.

Diverting more backdoor disasters

In the wake of the XZ backdoor, Ben and Ryan unpack the security implications of relying on open-source software projects maintained by small teams. They also discuss the open-source nature of Linux, the high cost of education in the US, the value of open-source contributions for job seekers, and what Apple is up to AI-wise.

Move faster and safer using feature flags on AWS

Learn to implement practical DevOps techniques based on how Amazon and AWS enhances the speed, availability and security of its software through the use of feature flags. The discussion will cover various topics such as release flags, trunk-based development, and the A/B Testing methods employed by Amazon.com and AWS services.

If everyone is building AI, why aren't more projects in production?

Ben talks with Shane McAllister, lead developer advocate at MongoDB, Stanimira Vlaeva, senior developer advocate at MongoDB, and Miku Jha, director, AI/ML and generative AI at Google Cloud, about the challenges and opportunities of operationalizing and scaling generative AI models in enterprise organizations.

Interesting questions

How was Rome able to conscript and equip 400k soldiers during 2nd Punic War in a pre-industrial society?

If you want people to fight your wars, you need to make it worth their while.

Should I disclose a mental disorder that's been impacting my job to HR/my boss?

“Your employer doesn’t need to know the nature of the condition, but let them know if there are any accommodations you need in the meantime.”

Does the success of AI (Large Language Models) support Wittgenstein's position that "meaning is use"?

Well, that all depends on what you mean by "use." And "meaning."

After creating HTML, why did Tim Berners-Lee bother creating HTTP? Why didn't he just write a HTML renderer for a FTP client?

Have you ever tried using FTP without a nice client?