Google got Looker. Salesforce bought Tableau. But open source tools are rising in popularity across the world of business intelligence and data analysis.

In our 2019 Dev Survey, we asked what kind of content Stack Overflow users would like to see beyond questions and answers. The most popular response was “tech articles written by other developers.”
So starting this week, we will begin highlighting articles written by your peers. If you have an idea and would like to submit a pitch, you can email pitches@stackoverflow.com. Our first piece comes from Alessio Civitillo, an Analytics Manager at TE Connectivity in Munich.

What To Make of Two Major Acquistions

Recently we saw two major deals take place in the business intelligence space: Google paid $2.6 billion to acquire Looker and Salesforce ponied up a whopping $15.7 billion for Tableau. Both of the recently purchased companies focused on offering cloud based BI tools, a space I am quite familiar with, having spent one year rebooting a major Salesforce project for 500 users and the last two-and-half years as an analytics manager at TE Connectivity serving about 600 internal users.

So why did Salesforce and Google make these big acquisitions? The most obvious answer is that working with data is increasingly happening across multiple departments, and many employees who are not well versed in programming or statistics are turning to these dashboards to help them understand data and share those insights. From sales, to revenue operations, to customer support, teams are recognizing the value of collecting and analyzing internal data.

So what comes next? Will these tools be easy to integrate and actually make the product suites offered by Salesforce and Google Cloud more attractive to folks like me? Personally, I see them as defensive moves, a strategy to protect incumbent product portfolios by snapping up fast growing competitors. If they can tightly integrate these acquisitions, that may help to consolidate usage among existing clients who previously worked with tools from multiple companies.

As I look at these acquisitions, however, I think it’s worth noting another interesting trend. While dashboards are great, I think that the flexibility of simpler, open-source tools is beginning to win out among developers like myself. For those willing to spend a little time learning how to program with these tools, they can provide a powerful alternative worth exploring.



Data, data, everywhere

Tableau has also captured the attention of marketing and sales departments in many companies worldwide. Many companies are using Salesforce already and will benefit from a tighter integration. Customers always appreciate well integrated solutions and the benefits will be even greater for those companies moving their infrastructure to the cloud. Eventually, Salesforce may move their analytics and reporting to the cloud and offer a solution that can work with data across the board and not just their own datasets.

So what’s the right solution for your company? Salesforce and Tableau? Google Suite and Looker? Microsoft’s Power BI, Office, and Azure? The key thing to understand from a business analytics standpoint is that dashboards like these are just one relatively small part of the puzzle. Things like ETL, data prep, and reporting operations are still handled by other tools. This space still has a 90’s era vibe to it. Many companies still push to keep this work in IT but this tends to increase the turnaround times and costs, which is a hard sell in a world where managers want things delivered quickly.

While Tableau and Looker are considered some of the best data exploration applications on the market today, they still feel like an isolated solution for BI managers. This is a very interesting trend that I don’t believe has received much attention in the press. There is a growing realization within the business intelligence community that no dashboard will save the day. Every time you find yourself going back to Excel, it’s a recognition that what many business analysts want is the flexibility to design their own approaches and custom tools that fit in-house problems.

For example, developing internal business applications is also becoming increasingly easy with solutions like Retool, which is part of an interesting new “rise of no code” trend applications. Making internal business tools without a big IT project is not a new idea. MS Access does exactly that, but what is new is that tools like Retool provide a way to easily build web applications with a simple workflow.



At work, my team and I are using those tools to build Salesforce and business applications. One advantage, in my view, is that it’s a simpler way to build the applications, but the other advantage gets us back to that magic word: integration. With Salesforce, you are locked in the Salesforce world and pulling data from other systems can be hard. Tools like Retool must make connectivity a top priority to survive so they are extremely good at integrating with other applications and databases.


Industry moving to open source tools

Isolated tools and processes don’t last long. Integration to existing processes and solutions is paramount. Tableau did not integrate well with the rest of the business analyst workflow and eventually felt like a very incomplete solution. Salesforce might be a great CRM, but it kind of lives in isolation and is mostly being used by sales organizations, so it can feel incomplete too in a way.

As the analytics industry advances further, it is important to keep this in mind. Any current modern analytic enterprise solution requires the orchestration of multiple tools sold by multiple vendors that don’t always work as well together as needed. This is an interesting opportunity for open source tools and vendors that take integration more seriously. It’s interesting because open source solutions have a natural tendency to integrate well with each other and avoid lock in. 

Maybe that’s why Jupyter Notebooks are exploding right now in popularity. They provide the type of live feedback users love with the power of a programming language with a rich ecosystem of libraries like Python. With Jupyter, analysts can connect to pretty much everything, can write to everything, and can output all kinds of interesting things. For example, developers like Greg Reda have been using tools like Jupyter for cohort analysis. This is a good approach when trying to crunch data on customer acquisition and to demonstrate which subset of customers has the best lifetime value. Here you can see how easily he created a cohort chart that looks good after finalizing his analysis:

import seaborn as sns

sns.set(style=‘white’)

plt.figure(figsize=(12, 8))

plt.title(‘Cohorts: User Retention’)

sns.heatmap(user_retention.T, mask=user_retention.T.isnull(), annot=True, fmt=‘.0%’);


Which outputs this nice cohort chart:
Open source is also catching up on enterprise. Vega is a solid implementation of the “grammar of graphics”, a concept to define data visualizations in a declarative way. Vega shares the same theoretical foundations as Tableau, has a Python implementation and is already integrated with Jupyter. Vega is so good that ElasticSearch officially made it an important part of their Kibana visualization platform last year.
OK, but what about analytics and BI in companies? Are we seeing a trend towards adoption of open source tools?

Airbnb is an example of a company that has put together a custom in-house toolkit so that any employee, even those not familiar with coding in SQL, can use data to make informed decisions. They called it Superset and they have open sourced it. Superset is now in the process of becoming part of the Apache software foundation.

Netflix is another example of a company doubling down on open source for BI and analytics. Netflix software engineers even developed their own version of Jupyter called nteract and have few interesting articles on using notebooks in production.

For business analytics managers like myself, the lesson is simple. Management might buy into good looking dashboard tools, but the workers actually doing things with the data need solutions that are easy to customize and integrate. In analytics a complete solution goes from the raw data all the way to the dashboard, the commentary, and the insights. While services like Tableau and Looker are nice, a mastery of languages like SQL and Python will give you the ability to wrangle complex, often messy data into reporting that can be used across your company.  New BI dashboards will come and go. More cloud enterprise applications will arrive with great fanfare but mastering the ability to build tools suited to your in-house needs will never go out of style.

Author

Alessio Civitillo
Alessio Civitillo is an Analytics Manager at TE Connectivity. Based in Munich, he manages a team of highly motivated data analysts based in Krakow using tools such as Python, Redshift, Alteryx and Tableau. In his spare time he can be found reading, cooking or near water where he likes to swim or kayak. http://assyem.com/pages/about.html

Related Articles

Comments

  1. DANTE CIFUENTES says:

    Can you elaborate on why Tableau is isolated?

    1. I think Tableau has done a lot of work in connecting to many different data sources, but it has somewhat failed in the data prep space. Most of the data is not consumed as it comes from the source, it requires a lot of preparation (think all the vlookups, sums, pivots, you do in Excel). So by not making data prep easy Tableau lives on an island, as long as you have data prep done in other tools you are good, but if you have bought just Tableau and hope for all to work it’s going to be a disappointment.

  2. Mike Honey says:

    You seem to be ignoring the elephant in the room: Power BI.

    It already covers all the requirements including ETL, data prep and reporting, integration with R & Python. It’s a no-code tool so accessible to a vastly larger audience of analysts, not just developers.

    Consider some recent stats:
    > 20 PB of data ingested / month.
    > 25m data models hosted

    https://youtu.be/D1AR2iL0DY8?t=284

    1. The point of the article is that more integrated tools tend to get better adoption. From that perspective PowerBI did well. Their huge success is for a good part due to their solid integration with Excel and the rest of the Office 365 stack. However, last time I checked (some months ago) their ETL process downloaded all data on desktop, their scheduling/automation was fairly basic. Tableau with Tableau Prep is covering their ETL weakness. So I find hard to think there is a clear winner between Tableau and PowerBI, for me they are on the same boat. In my case we had already some adoption of Tableau and the better Office 365 integration did not justify a switch. Would I choose PowerBI if I could start now from 0 today? I don’t know, PowerBI is fairly expensive once you start scaling to hundreds/thousands of users and it’s not Microsoft’s core product, if you need a specific feature or fix you mind find yourself in the weak position dealing with a company with a lot of other focus areas.

      Said that, there is a lot of stuff happening with Azure, so if PowerBI gets more and more integrated in that stack it might have a real advantage over Tableau.

      1. Filipe Banzoli says:

        I can even raise a great downfall for Power BI in my opinion: Power BI analysts must have windows PCs, once Power BI desktop only work on Windows… and the Power BI scheduler also works only on Windows..

        In the end of the day we fall in the same argument Alessio stated: Open-source tools and languages tend to be more flexible and in a world with multiple envoirnments and complexities it’s fundamental to know powerfull langagues such as Python and SQL..

      2. Mike Honey says:

        Above I probably should’ve said “low-code” as there are coding features available, but you can achieve a lot without writing any code.

        My point is the more accessible tools get the broadest adoption. If coding skills are required, you are immediately limited to a very small subset of people, who are less likely to also have subject matter expertise. A “task” becomes a “project”, needing a team and then a project manager.

        On the Power BI ETL process, you only described the “Import” scenario – Power BI has had DirectQuery (live querying of over 20 sources including most SQL / cube / big data platforms) since 2015. Alternatively with “Dataflows” you can run “Import” scenario processes in the cloud, delivering to a data lake.

        On scheduling, there is a REST API for refresh, so its more flexible.

        On costs and scale, with the Premium license you buy cloud capacity and can freely distribute to as many consumer users as that can support. Premium also features better scheduling. On any common scenario, Power BI is 4x – 10x cheaper than Tableau.

        1. My point is that there is no silver bullet and that eventually you need to look at the whole picture with integration as the number 1 priority. Truth is PowerBI is not going to integrate well in every situation.

          Also, price scales differently in PowerBI and it does get expensive. Their live connect is buggy and doesn’t work well. Also its ETL will download everything on desktop even on liveconnect. Dataflow seems to work only in Azure (as I said above if you go with Azure PBI does have some advantages over Tableau). We can continue point per point, but again if it works for you great, it’s just that I don’t believe it’s an “elephant in the room” and that it’s important to consider all things before adopting it.

  3. So Alessio, which tool in your opinion is ready to compete with the flexibility that Excel provides.

    1. First I would ask myself why do I need an Excel alternative and if that is a good enough reason to look for something different

  4. Mark Brown says:

    Hey Alessio, check out lookml. It is the new transformation later and as far as I know Looker is the only tool with a modeling later like that. Coupled with something like Fivetran and Snowflake, it is a pretty potent solution.

    1. I have read about that tool, but for now we are happy generating the sql required for the in database transformations with Python. We have our own library for that and so far it’s looking promising. We also plan to use Airflow so leveraging Python and its libraries seems to make sense for now. But lookml seems to be going in a good direction.

  5. thanks for the information.

  6. Gunnar Wolpe says:

    What is “CRM” in the context of this article? It would be helpful if acronyms and initialisms were spelled out on the first instance that they appear in a document. I have an MBA and decades of experience, but I hadn’t a clue about what you were trying to say without performing a Web search, and even then only guessing what it might mean.

    1. Timothy (TRiG) says:

      Customer Relationship Management, I’d guess.

    2. Jack Flower says:

      Where do they hand out MBAs to people that have never even heard of CRM?

  7. The most popular response was “tech articles written by other developers.” So you decided to start with a post written by an Analytics Manager that is more of an industry analysis than a tech article? Sorry but at least from my perspective, that’s something entirely different…

  8. Sean Conroy says:

    Where does TIBCO’s Spotfire fit into your analysis? I’m surprised it’s not mentioned at all – the majority of the Oil & Gas space is completely dedicated to Spotfire.

    1. There are many products in this space and the point of this article was to discuss more holistically how to build your analytics strategy. Said that, I normally refer to Garnter to get a sense of how a tool is doing, so: https://cdnl.tblsft.com/sites/default/files/blog/mq_2018-500_0.png

      They don’t seem to be doing that well. Please understand this doesn’t mean anything without context, so in your specific case TIBCO might be the right choice.

  9. data astronaut says:

    This article has a bizarrely narrow viewpoint. “a mastery of languages like SQL and Python…” — to paraphrase: “analysts should learn Python and SQL”. Cool story — this point has been made all over the web since about 2015 (and a company such as DataCamp bases its entire business model on this simple idea). And comparing BI software like Looker and Tableau to Jupyter Notebooks or to Excel is vapid. Looker and Tableau are useful as graphical interfaces to a properly maintained (i.e. automated loading of raw data & transformation into dimensional models) warehouse. Jupyter Notebooks are better suited for ad hoc analysis or modeling. Excel is useful for finance/accounting, or if you’re effectively just doing back-of-the-napkin arithmetic. In short: these pieces are often complementary. If you think you can replace BI software with Jupyter notebooks, then you’re almost certainly doing it wrong. I am aware of no sophisticated company that would view these things as substitutes. Metrics/KPIs and data science are different processes and require different systems; the system for the former works best when it a) updates data in a 100% automated manner and b) provides a portal that business users (i.e. non-analysts) can use to inspect/segment/compare the data used for KPIs.

    1. Thanks for the comment and definitely the article is more of an introduction to analytics architecture today than a full detailed description. Hopefully my “2 cents” are that open source makes integration easier, not that Tableau can replace BI stacks. Think how easily you can create a workflow in Jupyter and move it into Airflow for scheduling and automation, you can’t easily integrate solutions like that with more standard vendor software.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.