Loading…

Stack Exchange knowledge is for everyone (and now available on Snowflake Marketplace)

Snowflake customers can now easily enrich their AI applications and agentic systems with some of the most trusted, highest-quality data available while respecting our community members who provide this content with proper attribution.

Article hero image

As generative AI tools proliferate throughout the software industry, we’ve seen the increased importance of training language models on good data. LLMs provide knowledge faster than most manual searches, but bad data doesn’t translate to knowledge—“garbage in, garbage out” has become the mantra of the AI industry. High-quality data makes LLMs perform accurately and efficiently; bad data is a liability.

That’s why we created our Knowledge Solutions product: to ground LLMs and other AI tools in the high-quality, validated, and trustworthy answers provided as part of Stack Overflow and the many Stack Exchange sites. Our approach was to ensure ethical, responsible use of data for community good while reinvesting in the communities that produced this wealth of knowledge.

So far, we’ve found several partners who share our vision, and the work our community does is helping to make their AI products more factually accurate. Individual partnerships have helped get this process started, but we want to get our high-quality knowledge base into the hands of every company seeking data to build AI solutions in alignment with our vision of socially-responsible AI. Last month, for example, we partnered with Moveworks to create a Stack Overflow integration available in their marketplace.

Now we’re excited to announce that Stack Overflow data is available on the Snowflake Marketplace and can be supported as a Cortex Knowledge Extension. Snowflake customers can now easily enrich their AI applications and agentic systems including Snowflake Intelligence with some of the most trusted, highest-quality data available on both technical and non-technical topics while respecting our community members who provide this content with proper attribution.

About 150 Stack Exchange sites and stackoverflow.com are included, so if you want your AI application to be knowledgeable about everything from Ubuntu to cooking, we’ve got you covered. The data includes questions, answers, comments, tags, and votes: all the core data, written and validated by subject matter experts, plus metadata that provides quality signals. With minimal effort, all of this can be queried using natural language in Snowflake's highly scalable platform.

“What gets me excited about a partnership with Snowflake is that this puts our data, which we already know is high-quality, into the hands of more experts across the world to use and to improve the world around them,” said Michael Foree, Director of Data Science and Data Platform at Stack Overflow. “Snowflake is a platform that I personally brought into Stack Overflow. They make it easy for people to work with data. By partnering with Snowflake, we're putting our valuable data into the hands of experts all over the world.”

The AI ecosystem is rapidly evolving, and we want to make sure our data has a place in it no matter where it goes. Research has shown that the structured data Stack Overflow produces is essential to accurate GenAI. Respecting the source of that data—our community—is central to our mission and future survival. By partnering with Snowflake, we’re ensuring that our treasury of knowledge across domains can drive advancement in the AI ecosystem and make us all more productive and more confident in the output of AI tools.

This partnership is also a big win for our community. Stack Overflow has become a trusted source of knowledge for a range of experts. The attribution requirement will recognize the work of those experts and increase the trust users have in the AI apps built upon it. Our CEO Prashanth Chandrasekar spoke about the value of trust at HumanX: “When people put their neck on the line by using these AI tools, they want to make sure they can rely on it. By providing attribution in links and citations, you're grounding these AI answers in real truth.”

GenAI has been one of the most exciting technologies in a long time. With help from our incredible community, we can make it more reliable as well.

Add to the discussion

Login with your stackoverflow.com account to take part in the discussion.