Loading…

One is not the loneliest number for API calls

Gil Feig, co-founder and CTO of Merge, joins the show to explore Merge’s approach for reducing third-party APIs to a single call, the complexities of and need for data normalization, and the role that AI and MCP plays in the future of API functionality.

Article hero image

Merge connects you to any third-party system for fast, secure integrations for your products and agents.

Connect with Gil on LinkedIn and X.

Shoutout to user Abhijit for winning a Lifeboat badge on their answer for Complex numbers in python.



TRANSCRIPT

[Intro music]

Ryan Donovan: Hello everyone, and welcome to the Stack Overflow Podcast, a place to talk all things software and technology. I am your host, Ryan Donovan, and today we're talking about third-party APIs and how to reduce them to a single call, and maybe getting into a little bit of the AI aspect of all that, too. My guest today is Gil Feig, the co-founder and CTO at Merge. So, welcome to the show, Gil.

Gil Feig: Thanks for having me, Ryan. Happy to be here.

Ryan Donovan: So, top of the show, we'd like to get to know our guests. So, how did you get into software and technology?

Gil Feig: It's a good question. I have been coding since I was 12, and I didn't get into it because I was like, 'I want to go build some enterprise B2B SaaS app,' obviously. I was 12, I was playing video games, and I saw a bunch of people botting, and I was like, 'why am I playing manually of everyone else is botting?' So, I got into botting and realized that it was actually a deep, deep community. There were libraries to solve anti-bot behavior, things that could solve, like these Rubik's cubes that would pop up randomly to solve. So, I got deep in that. Then I got accepted into the team that works on the core libraries, and I was like 13 or 14, and it just really sparked this passion for software, and I haven't stopped since.

Ryan Donovan: Right. That's funny. And most people are like, 'oh, I wanna make more games.' You're like, 'I wanna play these games less.'

Gil Feig: That's exactly it.

Ryan Donovan: So, this passion for simplifying and automating processes has led you to Merge, right? Today, every application networked web application is just a collection of APIs on the backend. I've seen and talked to a few folks sort of trying to simplify, like, accessing those APIs and data super graphs, GraphQL, and various other things. Can you talk about how Merge sort of solves the problem of, you know, reducing multiple points of entry to a single point?

Gil Feig: Yeah, absolutely. So, the problem that led to us actually solving this problem was that we noticed both at my company and my co-founder's previous company that everyone was building into the same APIs repeatedly, and notably, in B2B, especially. When you have to integrate, let's say with QuickBooks for accounting software, you wanna push invoices to your customer; you also need to integrate with Xero, and NetSuite, and Sage, because your customers are all using different ones, and you wanna make sure you can support a broad base. So, that's what led to Merge. We integrate with what we call 'categories,' so verticals of software like HR, ticketing, file storage—file storage has especially become really popular—accounting, and so on. We have seven categories, and that's growing fast. Essentially, we make an opinionated, normalized data model that contains what is common between all of those platforms, and then we integrate with all of them and translate to that one format. It's obviously more complex than, you know, simple field mappings. You have platforms that have no notion of whole objects, you have platforms that have, you know, tons of net new; so, we have that commonality and then a ton of features that let you obviously use the full underlying functionality of the APIs as well.

Ryan Donovan: Yeah. I wanna go into the couple jargon words you got there: the opinionated and normalized.

Gil Feig: Yes.

Ryan Donovan: What does that mean in terms of this data model?

Gil Feig: So, let's think maybe about a ticketing system like Jira and Asana. So, for example, Jira might have something called 'title,' on a project or on a ticket, and Asana might have something called, you know, 'name,' right? That's the simplest case that always gets returned to our customer as type. But then, you know, again, you have these certain fields that are just, like, really out there. In Jira, you have 'epics.' Those don't really exist in other platforms, and so we have to find ways to say, 'how does an epic fit into some sort of grouping of projects?' So, we have a generic grouping object that has types on it. That's really how we think about it. And then we also offer all the onboarding flows as well; so, our customer just pops up an iFrame, it guides their customer through the onboarding, and that's all they see of Merge. At that point, their system is connected, Merge is pulling in all of that data, we're normalizing it—as in, translating it to our data models—and those data models are opinionated because we had to decide, you know, we can't include every field from every platform, or else you end up with this really sparse API where you have thousands of fields and for each integration, only 20 come back. So, by doing that, we give people a way to really easily build products and features on top that they know will have broad coverage across all platforms.

Ryan Donovan: So, it's a sort of consistent data object, no matter what the platform.

Gil Feig: Exactly.

Ryan Donovan: How much of that, sort of, normalization, or that opinionation, was manual?

Gil Feig: Almost all of it. These days, we can use AI to help research across the different APIs, but there's still a ton of nuance in each platform. There are things that aren't documented, there's, you know, certain behaviors in the way people actually use platforms might be surprising. And an API might expose a field called 'Epic' to us, but it's never filled in because that's deprecated, or some buried functionality, and people do it a different way. So, it goes beyond just making that normalization. We do have to understand platform behavior as well.

Ryan Donovan: And for some of these solutions, I've seen where the single point of entry then sort of sets a chain reaction of calls, but I could also see the value of something like caching, or building out a sort of shared data store. Which direction have you all decided to go?

Gil Feig: So, Merge syncs the data, and there's a reason for this. So, notably, let's say you're talking to some API... I'm not gonna throw any APIs under the bus right now, but there are a lot like this where: some of them, let's say you want to get a hundred invoices from an accounting system, one API request and you get those invoices, you have all the data you need; but there are others where you just fetch a list of a hundred IDs, and then for each ID, you have to go fetch invoice details and those might be incomplete and you have to hit it different. So, you might end up with, you know, like 100 NAPI requests, and it's super inefficient. And so, in order for our customer to get that data back, for it to be constantly updated, merch is always syncing in the background and getting that data in that normalized format ready for our customer to retrieve at any moment. We then do ongoing syncs where we're just diffing the data and remitting updates to our customer via webhook to say, 'hey, new data, data deleted,' that sort of thing.

Ryan Donovan: Okay. You're not over there polling, you know, these 100 different IDs every day, or whatever.

Gil Feig: Exactly. It's– we differentiate between that initial sync and then the subsequent syncs, where the initial sync just, unfortunately, can slam a server if they don't have good access patterns. And we try to work very closely with these API providers to improve their access patterns. Obviously, it's really expensive for us to be running those things too.

Ryan Donovan: Yeah, was there a reason you went for webhooks over something like Kafka, or an event-driven system?

Gil Feig: We're considering Kafka an event-driven system as well. Mostly, it was because our broad customer base tends to be familiar with webhooks. They tend to hook up to a lot of things. We have had customers, you know, obviously, receive our webhook and then turn that into a Kafka event. So, you know, you can do it that way, but we've moved up market. We're selling to larger and larger companies now, and we have been asked for it. So, it's definitely something that we're considering.

Ryan Donovan: Yeah. So, this is a problem that I think everybody's facing. What do you think is driving just, the sort of explosion of APIs that everybody has to integrate?

Gil Feig: So, I think the biggest reason is customer demand. You know, 10 or 15 years ago, it was rare to find an API, and if you did, it was on pretty bad technology, and as people started using more platforms, they started seeing, 'oh, this platform I just bought integrates with this other platform I bought,' and it just became a market expectation. Obviously, dollars speak when it comes to people's roadmaps, and so, when you had sales team saying, 'hey, we can't close these customers because they're saying we don't integrate with X platform they use,' that's completely driving product teams to really emphasize these. So, that's why we've seen companies double down, become more open, and understand that in a lot of cases, actually, when you're more open, you become a source of truth. And thus, people become reliant on you – they're stuck on you because all their other software is integrated tightly.

Ryan Donovan: Right, and I think, you know, the current era – what's driving a lot of integrations is the sort of, agent connections, and tool use, and things like MCP have been set up to sort of address that. What do you think about the sort of MCP access, as opposed to like a centralized API access?

Gil Feig: So, I think, first of all: MCP is the protocol we were all waiting for, right? Everyone was tinkering with AI, and they were like, 'all right, I wanna make it, call this API.' So you'd write a little tool that could make an API call and, you know, it was working. It wasn't amazing, but MCP came out, and I think for everyone, it was like, 'finally.' It's not that this is some heavyweight or extremely inventive protocol; it's just a protocol that we've all been waiting for. We just wanted something that the world was gonna adopt, and so I think that, you know, with MCP, we now have simpler ways to make these third-party API calls from an agent. But I think one of the biggest problems that we're still seeing is, MCP servers: one, are not built super well – a lot of them were built for marketing, everyone wanted that marketing moment when MCP had its moment, so they put one engineer on it, they used AI to convert their APIs to, you know, simple wrappers... And it's funny, you actually go test some of the really popular, you know, some big software companies' MCP servers, and they fail on a basic request. So, we're excited for it. I think, actually, the biggest change we need to see for MCP to really take off is better access patterns from these third-party APIs, because ultimately, they can only do what the underlying API can do, right? They're just wrapping API calls and exposing them to the agent, and so, you know, if you try to ask it anything semantic, like, 'get me the ticket from our ticketing system that has the worst sentiment.' The only way it can possibly do that is to sync every ticket in the system and then vectorize them, or if it's a small enough dataset, use LLM context. But you know, regardless, you have to do full syncs of data to power anything beyond a question that's like, 'get me something with this specific ID or name.'

Ryan Donovan: Right. Yeah. That is interesting. But, you know, that is the nature of rest APIs, essentially. RPC or whatever they're using in the backend. Do you think there'll be the possibility of a new sort of API system that is AI-driven?

Gil Feig: I think it's possible. You know, I think it would be, you know– there's ways to expose this even, where you just have one endpoint that's like, 'send me an LLM, send me some sort of query or some sort of prompt.' And I have all—we as a business—have all of our data vectorized, and so when you're asking these, we can search across it and respond to you something a little bit more thought out. But, you know, companies aren't extremely incentivized to do that right now because it's so expensive to vectorize all that data and serve it up.

Ryan Donovan: Yeah. I mean, you know, most APIs are basically a thin layer on top of a database anyway, and I think, like, you add vectors to that database and you can have this pretty powerful, you know, LLM API. But then you have the calls, right?

Gil Feig: Yeah, exactly. I mean, it is a lot of added expense across the board. I think it is powerful, but I– companies just aren't driven by, you know, necessarily offering all that power.

Ryan Donovan: Yeah. And I think like, you know– I read something the other day that 95% of all companies' AI efforts have failed. And it's just like, everybody's trying stuff, not a lot is working, everybody's sort of figuring out what the patterns are, and best practices are.

Gil Feig: Yeah, I think it's completely true. I saw that article too. It was 95% of internal pilots, and it's interesting; I think a big part of that too, is because companies are just, you know, throwing people who haven't experimented, haven't seen what good looks like, and so they kind of just roll with it and they're like, 'oh, it's 90% accurate, but the 10% not good enough– we need to drop this.' And there are ways to get past that 10% that I think people just aren't exploring, from chaining calls, and you know, having agents that play off one another. So, I'm excited to see that, we actually at Merge, have been pushing really heavily for people– like our sort of journey in the AI evolution of the business is chaining LLM calls using subagents, and all of that.

Ryan Donovan: Hmm. That's interesting. But, before we get to that, I wanna see about– you said they're trying to figure out what 'good' is, and that's something I've been thinking about in terms of AI and AI projects. How do you actually define what 'good' is?

Gil Feig: Yeah. 'Good' varies per company, and it's all about risk tolerance, right? A self-driving car cannot be 90% accurate when it takes a left turn. Whereas, you know, something that's giving someone sample weekend plans can be 90% accurate, and it's no big deal.

Ryan Donovan: Right. And the stuff you talked about—these improved AI pathways, chaining calls, agentic stuff—can you talk a little bit more about what that landscape looks like?

Gil Feig: Yeah, so, we've explored a lot of different ones here. You know, some chaining with, like, tool sets directly. In code, we've experimented with some of the visual editors, and then we've experimented with some of the ones like, you know, call code, for example, where you have the, this sort of, agentic files and they choose to call each other as necessary. You tell this agent, 'call this agent, ' – we've actually found that to be the lightest way, and really enjoyable thing to work on. And we've had good results. We're working on a new project right now for a new, like, more MCP-focused AI product, and we are generating a lot of the connectors now. It requires some manual work; they're never 100% good enough, but it gets us close enough that it really does save us a ton of time.

Ryan Donovan: Mm-hmm. And something I've heard about—the, you know, not quite 100%—is that sometimes it's more effort to, kind of, disentangle the AI tech debt that it creates. Have you found that the case, or is it just like, you understand how these AIs work now?

Gil Feig: So, I do think it can be really tough to detangle everything that's going on there. But you can use AI to help figure that out, and use AI to do that before it ever shows a result to you, right? So, let's say that you're using AI to generate one of these AI connectors, right? You don't necessarily need to say, 'all right, it generated it. Lemme go test now.' Instead, the next agent generates tests and runs it across all of them, and those tests are built within guidelines, and maybe you have some static tests that run. So, you know, the results of those tests are feedback to an agent that goes and fixes. And so, we do end up with some issues, but most of them are gone by the time all of the agents have run over the code.

Ryan Donovan: Is there any sort of thing that needs to be human-done, that is, like, nobody but a human can do this part?

Gil Feig: It's a good question. So, okay, there's a few examples here. So, some would be actually testing and modifying the descriptions to see what the LLM is picking up on, because it can vary a lot per use case. You know, you ask the question, the LLM has to decide what tools to call, and poor descriptions on the tools, or descriptions that aren't totally relevant to the use case of the product itself, end up choosing wrong tools, or not choosing tools at all. And so do need to modify those for your use case quite a bit.

Ryan Donovan: Yeah. We just released our developer survey results for this year—about a month ago, as of this recording—and one of the things we found with AI is that people are using it more but trusting it less. It's sort of realizing this, sort of like, accuracy gap. What are y'all doing to sort of build the trust in to, sort of, reduce the inaccuracies?

Gil Feig: Yeah. I think it's generally playing agents off one another and letting them fight until it gets to a relatively solid solution. You know, you gain trust when code that's generated passes static tests that are written. You don't gain trust, I've realized, when it passes tests that the AI also wrote, because those tests are often– you go in and look at them and it's like, 'assert 1 = 1,' and you're like, 'all right, well that's always going to pass.' So, I think for us it's been continuing to modify prompts, continuing to get to a better place where we actually see that the code output is good enough, and that's how we're starting to build more and more trust. But, I think it would be crazy to 100% trust in AI to just go write everything right now, and actually, a good example is on our new product: we generated some stuff, and it was just publicly returning an API key from an endpoint. But like, obviously, we're software engineers, we're doing code reviews, we're looking, like– we caught that right away. But it's just, you know, it does some crazy things sometimes that aren't just a little bug, right? That truly could mean the end of your business.

Ryan Donovan: Right. Yeah. I mean, I think when you have software engineers working with these sort of code creation agents, it could be very powerful; but we did an experiment here, where we had our junior writer try to create something with code gen, and she created an app, but when she showed it to software engineers, they were like, 'what the hell is going on here?' And I see, you know, these sort of entrepreneurs thinking that they can just develop code with a, you know, code gen tool and run with it.

Gil Feig: Yeah. It's funny – it reminds me of, I don't know if you're a big Twitter or X person, but the Levels IO account who, you know– this guy vibe codes a whole game and then he's bragging and talking all about it and then gets DDoSed and is like, 'I don't know what to do, what's going on here?' And like, any software engineer was just like, 'quickly put it in front of CloudFlare.' But, you know, those sort of things are just really funny to see.

Ryan Donovan: Yeah. I wonder, you know, is there gonna be a new sort of 'prompting best practices' – that somebody has this huge template that they're just putting in there to think of all the OASP top 10 and make sure you fix them?

Gil Feig: Yeah. Like what happened with the Tea app... I don't know if you saw that too. Tea is an app that kind of blew up on the app store very recently; it's a place for women to leave experiences that they've had with men on dates and whatnot. And they made the women, when they signed up, upload, you know, an ID or a picture of themselves, posted all to a public S3 bucket with directory listing enabled–

Ryan Donovan: Oh my.

Gil Feig: So, you could just see every file, the whole thing, you know, millions of women's IDs and photos, and, you know, men were upset about this app, and so they were, you know– it was just really a dangerous situation for such a rookie mistake.

Ryan Donovan: Yeah.

Gil Feig: And that was all vibe coded, by the way. So, that's the problem.

Ryan Donovan: Yeah, the pure vibe coding does seem like a problem. But I wonder, you know– another article we posted a while back was like, 'how do you get junior developers when everybody's coding?' Right? Like, the junior dev was the sort of apprentice to the senior devs, and now that junior dev role can be filled by vibe coding. Like, how do you get junior developers?

Gil Feig: It's a really good question, and I guess the question is, you know – I think we're in this weird intermediate period where AI's not quite good enough to completely replace them. But yeah, we're gonna end up with maybe that weird– you know, when you see those population curves of countries where there's not enough funnel going up into the senior and staff roles. So, we better hope that AI moves fast. And maybe we do need a little bit of, like, the whole math class, 'you can't use a calculator!' Even though everyone's like, 'but I always can,' you know?

Ryan Donovan: Yeah. You gotta go there, show your work, you gotta, you know, write it by hand. On the other hand, I think there is a world where everybody is sort of an architect, right? Everybody's just dropping a spec in, and the real coding becomes writing a really good spec.

Gil Feig: Yeah. I mean, I think that's true, but even the specs – AI's getting pretty good, right? Like, obviously, it doesn't know all the details, but you spill out the project that you wanna build in, like, 3 sentences, and it's pretty good.

Ryan Donovan: Yeah. I mean, it sort of speaks to how much though, like, everybody wants to solve the same problems, right? Everybody's like, 'gimme a whiteboarding app,' and it's like, 'yeah, man, there's dozens of those.'

Gil Feig: Yeah. Obviously, some of the deeper things we're doing, like, we're doing a refactor of our whole data layer right now, right? Like, splitting across multiple databases and adding Dynamo, and just adding tons of different services. Yeah. AI is not good at that.

Ryan Donovan: Well, good. There's still a home for software engineers out there somewhere.

Gil Feig: For sure.

Ryan Donovan: You talked about the MCP servers – the sort of ‘next thing’ you're looking at. What do you think the future of APIs will be? I know we've sort of touched on a little bit of speculation, but what's the deep future of it?

Gil Feig: Yeah, so, I think when it comes to like the actual protocols and whatever, I just don't think it really matters, right? Like, ultimately, we're trying to transfer data across. But what does matter are, again, the access patterns, more specifically. And so, I think for AI or for APIs to really have that next evolution that enables a lot, we need better access patterns. We need the ability to search data on these APIs, and not just a fuzzy search. You do see some fuzzy or exact match searches in APIs, but we need semantic search. We need– beyond even elastic, like, it would be incredible if everyone had, you know, endpoints to fetch from a vectorized lookup. I think those sort of things are where we need to go. I don't know if we're gonna go there again, just because of the sheer expense of doing that across all your data – you'd probably need to start charging for your APIs and that makes you weaker in the competitive market. So, I'm not positive yet, but we need those better access patterns; that's what I can say.

Ryan Donovan: Yeah. I think people have been chasing a semantic web for, you know, as long as the web has been around.

Gil Feig: Yeah. Yeah, it's true.

Ryan Donovan: So, if you were to design ideal API with the best sort of access patterns, what would be on your wishlist?

Gil Feig: Hmm. Okay, so it would have core data models, crowd operations, all that good stuff. It would have no need to ever run end queries, right? You never wanna have to fetch model-by-model – you should get pages, and those pages of data should have submodels and everything expanded out, if needed, right? That should all be customizable. GraphQL is one way to do that. The RESP back has, you know, the expand parameter, so you can do all of that. Then the two additions would be an elastic search, fuzzy–, you know, just, text match-type lookup, and then a more semantic search endpoint for each of the data models, as well. And then of course, there's also things like – you need rich webhooks. I can go through a million. Like, we need to know what data was deleted without having to re-sync full data sets– there's a lot of that stuff as well.

Ryan Donovan: Yeah. The sort of update that people do through polling or various event-based systems – like, what's changed.

Gil Feig: Yeah, exactly. And right now, you know, you have this dilemma, especially when data's deleted, which—with GDPR and everything—makes it a big pain because if something's deleted from the system, you have no way of knowing, because there's no update. The only way of knowing is to rethink the entire dataset, or hope that they have some form of webhook that can let you know data was deleted.

Ryan Donovan: Ooph. There's another GDPR issue. It's amazing how many things GDPR can touch there.

Gil Feig: Yeah. It impacts every aspect of the business.

Ryan Donovan: Yeah.

Gil Feig: Honestly, for the best, but it is hard to do.

Ryan Donovan: Well, it's that time of the show again, ladies and gentlemen, where we shout out somebody who came onto Stack Overflow, dropped some knowledge, shared some curiosity, and earned themselves a badge. Today, we're shouting out a brand new, fresh, lifeboat badge - somebody who found a question that was sinking with a score of negative three or less, they dropped an answer that brought up the question, and they got themselves 20 points on the answer. So congrats to Abhijit for answering 'Complex Numbers in Python.' If you're curious about that, we'll have it in the show notes. I am Ryan Donovan. I edit the blog, host the podcast here at Stack Overflow. If you have questions, concerns, comments, et cetera, please email me at podcast@stackoverflow.com. And if you wanna reach out to me directly, you can find me on LinkedIn.

Gil Feig: Thank you so much for having me. I'm Gil Feig, again, co-founder and CTO at Merge. You can reach me anytime, @ Gil Feig on X, and also, feel free to shoot me a LinkedIn connection request.

Ryan Donovan: All right. Thank you for listening, everyone, and we'll talk to you next time.

Add to the discussion

Login with your stackoverflow.com account to take part in the discussion.