Loading…

Issue 225: LLM on LLM evaluation

Welcome to ISSUE #225 of The Overflow! This newsletter is by developers, for developers, written and curated by the Stack Overflow team and Cassidy Williams. This week: how to run data engineering without burning out your team, why they bothered with HTTP instead of just shipping HTML over FTP, and what happens when burnout hits farmers.

From the blog

How do you evaluate an LLM? Try an LLM.

On this episode: Stack Overflow senior data scientist Michael Geden tells Ryan and Ben about how data scientists evaluate large language models (LLMs) and their output. They cover the challenges involved in evaluating LLMs, how LLMs are being used to evaluate other LLMs, the importance of data validating, the need for human raters, and more needs and tradeoffs involved in selecting and fine-tuning LLMs.

How to succeed as a data engineer without the burnout

The key strategies for building a headache-free data platform.

Diverting more backdoor disasters

In the wake of the XZ backdoor, Ben and Ryan unpack the security implications of relying on open-source software projects maintained by small teams. They also discuss the open-source nature of Linux, the high cost of education in the US, the value of open-source contributions for job seekers, and what Apple is up to AI-wise.

Move faster and safer using feature flags on AWS

Learn to implement practical DevOps techniques based on how Amazon and AWS enhances the speed, availability and security of its software through the use of feature flags. The discussion will cover various topics such as release flags, trunk-based development, and the A/B Testing methods employed by Amazon.com and AWS services.

If everyone is building AI, why aren't more projects in production?

Ben talks with Shane McAllister, lead developer advocate at MongoDB, Stanimira Vlaeva, senior developer advocate at MongoDB, and Miku Jha, director, AI/ML and generative AI at Google Cloud, about the challenges and opportunities of operationalizing and scaling generative AI models in enterprise organizations.

Interesting questions

How was Rome able to conscript and equip 400k soldiers during 2nd Punic War in a pre-industrial society?

If you want people to fight your wars, you need to make it worth their while.

Should I disclose a mental disorder that's been impacting my job to HR/my boss?

“Your employer doesn’t need to know the nature of the condition, but let them know if there are any accommodations you need in the meantime.”

Does the success of AI (Large Language Models) support Wittgenstein's position that "meaning is use"?

Well, that all depends on what you mean by "use." And "meaning."

After creating HTML, why did Tim Berners-Lee bother creating HTTP? Why didn't he just write a HTML renderer for a FTP client?

Have you ever tried using FTP without a nice client?

Links from around the web

A single atom layer of gold

Scientists were able to make a thin sheet of gold that is literally a single atom thick. There are some cool applications for chemical production and conversion.

Anchor position tool

CSS Anchor Positioning is coming soon to a browser near you, and here's how it works!

America’s young farmers are burning out. I quit, too

A lot of us have fantasies of leaving tech and running off and starting a farm...but that's not as easy as it sounds.

Trip report: Node.js collaboration summit (2024 London)

Node runs a lot of the internet. Here's what's next.


Looking for the tools, technologies, and skills your team needs to evolve in the AI era? Stack Overflow's Industry Guide to AI has your answers.