The next generation SQL? Explore a new way to analyze, visualize, and share data

We chat with the CEO of Count about a new programming language his startup has created for data analysis.

Article hero image

Stack Overflow recently partnered with Count, a UK startup, to host an event where community members could use Count’s data platform to dig into the insights from our annual developer survey. The Stack Overflow community is one of the most active and open on the internet. Our users have provided more than 27 million answers to over 18 million questions! Every year Stack Overflow also works on our annual developer survey. This year 90,000 coders took the time to answer a host of wide-ranging questions, helping us create one of the richest datasets about software development in the world. The results of this year’s survey are available for download as a CSV file or in our standard report. In May, we worked with Glitch to showcase how their platform can be used to examine and display our data. Now we're excited to be working with a new startup, Count, to provide another option to showcase the full richness of the survey and allow you to find your own insights. Count has uploaded and made available the full 2019 Developer Survey data, giving the Stack Overflow community a new way to investigate, visualize, and share what they find with others.

Digging in

On Tuesday we kicked off the release in Count with an event in London, giving community members early access to the platform to see what they could find.

Attendees were quickly able to drill down into the high-level statistics from our report, combining results from multiple survey questions to reveal more nuanced answers and create personalized views of the results relevant to their interests. You can see some of their discoveries below:

As the event was run in London, naturally there was a lot of interest from the event attendees about developer salaries.

We know developers are in high demand butone of the attendees was able to see the speed developers were getting new jobs by language. Typescript, Scala, Elixir, Kotlin and Ruby developers seem to be moving fastest.

Rust was the most loved language in the 2019 Developer Survey, with over 83% of developers who have used it substantially in the last year planning to keep doing so.

The data is available to the full Stack Overflow community at stackoverflow.count.co. You can dive into the examples above or build your own insights combining any aspect of the data you want. Be sure to let us know what you’ve found on twitter using #devsurvey2019. We had a great event in London and after we finished, we asked Count co-founder and CEO Ollie Hughes to write a short essay explaining why his team decided to create a new service, and entirely new programming language, for the purpose of exploring, visualizing, and sharing insights into data. We’ll let him take it from here:

A bit of background context

Hey everyone, I’m Ollie Hughes from Count. I founded Count with my co-founder Oli (yes two Olivers - we’ve been friends for 15 years so it’s not as confusing as you might think) with the aim of making the digital world a bit easier to explore. Here’s why. Firstly, let’s start with something which you probably already know. Using data has become increasingly popular. Even a cursory look at the Stack Overflow Trends page will quickly show you the indomitable rise of Python along with other specialist analytical languages as more people learn how to use data better.

The share of questions from languages such as C++, JavaScript, and Java has declined as more analytical languages such as Python and R have risen in popularity. Interestingly the share of SQL questions on SO has remained relatively constant over time but has only recently been surpassed by R in popularity. Pandas (Python’s most popular data analysis module) is following quickly behind. This growth is not surprising. The value of data within business has been well documented and the amount of publicly available data has increased at least 10 fold in the last decade. Add into the mix organizations like FiveThirtyEight and The Pudding who engage millions of people with their interactive data journalism and online communities like Kaggle who help people learn new skills and it’s easy to see why being a “data person” has never been more exciting.

So, what’s the problem?

With all this backdrop it’s a fair question to ask what does a new organization like Count think it can bring to the table and even more understandable to ask why we believe it’s necessary to develop a completely new programming language which no one knows? Well, even with all the resources and tools mentioned above, we believe exploring data online - and we are specifically focusing on relational data for the moment - is still not that easy. Despite its value, and in contrast to other media types like video and images, relational data remains fragmented and hard to find - dotted around the web in different locations and formats. To analyse it in this state takes quite a bit of time and no small amount of skill, particularly if you’re combining data from different sources. And though there has been a large increase in people learning query languages like Python and R, it is still only a relatively small proportion of people who know them well. In December 2017 (when we were getting stuck into this problem properly) we did a survey of 6000 adults across the UK to understand their level of comfort using data. The results showed that though the majority of respondents felt comfortable reading statistics, 84% didn’t feel comfortable doing basic data analysis. This puts free-form analysis of the data on the web out of the reach of most people. If you’re an individual or organization who wants to release data publicly online, this situation leaves you with limited options to properly show the value of your data. Either you publish the data:

  • In its raw form (such as a csv, json or through an API), which gives your users complete flexibility in how they use it but also significantly limits your audience because of the time and skill it takes to explore it or,
  • in an aggregated, visual form that lets users instantly digest some information but only for a small subset of pre-selected queries.

Download to csv. Spotify publish fascinating data on the most listened to tracks by day here. They provide the data in a table on the website or as a csv download. Neither make it easy for most users to dig into the data... Recently a few new platforms such as Observable and Glitch have started to address this trade-off. They offer users a really flexible way to build web apps or visualize data that others can then fork and tweak for themselves to explore in a different way. Both these solutions have grown highly enthusiastic communities in a short time and allow users to do far more than analyse data, but these platforms are oriented toward people who know or want to learn languages such as JavaScript.

So what’s different about Count?

When the Count team started to tackle this problem, we set out to find a solution that would offer the flexibility of a query language but that could appeal to as wide an audience as possible. We analyzed thousands of queries and interviewed hundreds of people all using different tools and datasets. We focused our research on descriptive data queries because they form a fundamental step in everyone’s workflow. Our research provided a number of key insights, but two were most fundamental in shaping how we optimized our language’s design:

  1. The vast majority of queries were made up of a core set of functionality which was far smaller than the full functionality most analytical tools offered.
  2. The most valuable insights came from queries that involved multiple stages or joins across different datasets.

These insights were pretty consistent across different sectors, a user’s level of expertise and most data structures. A good example of what we found can be seen browsing the queries written in Stack Exchange’s data explorer. The data explorer provides access to 29 tables which collectively give full visibility of the activity on the Stack Overflow site. Despite the range of questions, this data can answer (and the no doubt higher than the average capability of Stack Overflow users) when you look at what functionality is being used, often it is a very small fraction of the full SQL language. Additionally, over half the queries involve joining multiple tables together to drill down into various aspects of the platform’s usage.

A simple but powerful way to explore data

Based on our research we designed our language to have the following key features:

  1. It is a declarative language (like SQL) - allowing users to just ask for what they want.
  2. To further enhance the declarative nature, the language will join datasets automatically if users request data from multiple tables, no matter the number of intermediate tables between the requested tables.
  3. Unnecessary syntax has been removed to optimize the speed and simplicity of common analytical tasks.

The features provide users with two main benefits:

  • Firstly it allows datasets to be combined fluidly regardless of the complexity of the data model. This, along with the simplified syntax, makes the language much faster to write.
  • Secondly, the interface is inherently modular, allowing users to build up complex queries using a series of simple steps which (because of the automatic joining) they can quickly join together.

Below are some example insights gleaned from the 2019 Developer Survey by members of the Stack Overflow community at our London event earlier this week. I’ve included the query in both our language and in SQL below each chart to highlight the difference. Alternatively, you can explore the language and the data yourself here. The average salary of developers by the language they work with:

The query in Count and the equivalent in SQL:

The average number of minutes spent on Stack Overflow per week by the number of years an individual has been coding

The query in Count and the equivalent in SQL:

We are excited to see what others do with the datasets in Count. You can read more about our plans, and provide us with feedback, at count.co or by following us on twitter: @counthq.

Login with your stackoverflow.com account to take part in the discussion.