How event-driven architecture solves modern web app problems
Web-based applications have come a long way since we used to serve static HTML content from servers. Nowadays, applications are much more complex and use multiple frameworks, data centers, and technologies. In the last couple of years, we’ve seen two concepts dominate the IT market:
- moving our apps to the cloud;
- implementing a microservices architecture;
These ideas have shaped the way we design and build software. In a way, we are no longer building applications; instead, we are building platforms. Apps no longer share the same computational space. Instead, they have to communicate with each other through lightweight communication protocols, like REST APIs or RPC calls. This model made it possible to create some amazing software like Facebook, Netflix, Uber, and so many others.
In this article, we’ll discuss some of the problems driving innovation in modern web development. Then we’ll dive into the basics of event-driven architecture (EDA), which tries to address these problems by thinking about back-end architecture in a novel way.
The problems facing the modern web
Each technology has to handle the challenges that always-on, multi-user, asynchronous applications face today:
Availability
Instead of one application, we now have many, dozens maybe even hundreds of linked services, and each and one of them has to be ready to do its job 24/7. How can we do that? Most often the service is horizontally scaled to multiple instances, sometimes across multiple data centers, making it highly-available. All the requests coming into this specific service will be routed evenly across all instances. Some deployment tools offer self-healing capabilities, so if one instance goes down, it will create another instance to take its place.
Scalability
Scalability has much in common with availability. Availability is all about making sure that there is at least one instance of the service up and running, ready to serve incoming requests. Scalability, on the other hand, is focused on performance. If one application is overloaded, then we can create new instances of that application to accommodate the increased number of requests. But scaling up applications is not challenge-free, especially if we deal with stateful applications.
Single source of truth
Before microservices, this job was simple. All the data resided in a single place, typically some sort of relational database. But when multiple services share a database, you may create problems like dependencies between teams on schema changes or performance issues. A common pattern to solve this issue is to use a database per service. A distributed source of truth really helps maintain clean architecture, but we now have to deal with distributed transactions and the complexity of maintaining multiple databases.
Synchronous
In a typical request-response scenario, the client waits for the server to respond; it blocks all its activities until it receives a response or the timeout expires. If we take this behaviour and put it in a microservices architecture using chained calls across the system, we can easily end up with what I call “Microservices Hell.” Everything starts with one service call, let’s call it service A. But then, service A needs to call service B, and the fun goes on. The problem with this behaviour is that if a service has resources blocked (e.g: a thread is hanging), timeouts are now exponential. If we allow a 500 ms timeout per service and there are five service calls in the chain, then the first service would need to have a 2500 ms (2.5 seconds) timeout, whereas the last service would need to have a 500 ms timeout.
Introducing event-driven architecture
Event-driven architecture (EDA) is a software architecture paradigm promoting the production, detection, consumption of, and reaction to events.
-Wikipedia
In the classic three-tier applications, the core of our system was the data(base). In EDA, the focus is shifted towards the events and how they are flowing through the system. This shift allowed us to completely change the way we design applications tackling the problems mentioned above.
Before actually seeing how EDA does that, let’s see what exactly an event is. An event is an action that triggers either a notification or some kind of change in the state of the application. A light has been switched on (notification), the thermostat turned off the heating system (notification), a user changed his address (state change), or one of your friends changed his phone number (state change). All of these are events, but that doesn’t mean we should add them to an event-driven solution. In order for an event to be added, it must be relevant to the business. A user placing a new order is a relevant event for that specific business, but him/her eating pizza for lunch is not.
Which events are relevant to a business might be obvious when you think about them, but some of them might not. Especially those events that occur as a reaction to other events. To discover events that are flowing through the system, use a technique called Event Storming. Bring together the stakeholders on an application (from software engineers to business people and domain experts) and map out all the business processes as specific events. After all the business processes are mapped, the result can be used by engineering teams as requirements to build their applications.
Having figured out what events are and how they can be identified, let’s have a look over how they can solve the common problems mentioned earlier.
Events flow in a single direction, from a producer to a consumer. Compare this with a REST call. The event producer never expects a response from the consumer, while in a REST call there will always be a response. No response, no need to block the code execution until something else happens. This makes events asynchronous by nature, completely eliminating the risk of running through timeouts.
Events happen as the result of an action, so there is no target system; we can’t really say service A triggers events to service B; what we can say is service B is interested in the events produced by service A. But there may be some other parties interested as well, like service C or D.
So how can we make sure that an event triggered by one system, reaches all the interested services? Most of the time this problem is solved by message brokers. A broker is nothing more than an application that acts as a middle-man between the event generator (the application that created the event) and the event consumer. This way, the applications are nicely decoupled taking care of the Availability issue I talked about earlier in the post. If an application is not available momentarily, when it comes back online, it will start consuming events and processing them, catching up with all the events triggered when the application was down.
What about storage? Can events be stored in a database or will there be something else in place? Events can definitely be stored in databases, but by doing so, they lose their “event” aspect. After an event happens, we cannot change it, so events are immutable. Databases on the other hand… they are mutable, we can actually change data after it has been inserted.
A better approach to store events is by using Event Logs. Event logs are nothing more than a centralized data store where each event is stored as a sequence of immutable records, also called a log. Imagine the log as a journal, where each new event is appended to the end of the list. We can always recreate the latest state by replaying all the events from log from the beginning until present.
The only bit that I haven’t covered yet is scalability. Services built using the event-driven mindset are designed to be deployed in a multi-instance scenario. Since the state itself is stored in the event log, the service itself will be stateless, which allows surgical scaling of any particular service that we want.
The only exception to this pattern are services that have to create materialized views. In essence, a materialized view represents the state in a point in time of an event log. This approach is used to query the data more easily. Coming back to our scalability issue, materialized view is nothing more than events aggregated in a table like format, but where do we store these tables? Most often, we see these aggregations performed in memory, which automatically transforms our service into a stateful one. A quick and easy solution is to add a local database to each service that creates materialized views. This way, the state is stored in the database and the service is once again stateless.
Although event-driven architecture has existed for more than 15 years, only recently has it gained massive popularity, and there is a reason for that. Most companies are going through a “digital transformation” phase, and with that, crazy requirements occur. The complexity of these requirements force engineers to adopt new ways of designing software, ones that incur less coupling between services and lower maintenance overhead. EDA is one solution to these problems but it is not the only one. Also, you should not expect that everything can be solved by adopting EDA. Some features may still require good old-fashioned synchronous REST APIs or storing data in a relational database. Check what is best for you and design it appropriately!
Tags: bulletin, event-driven architectures, events, stackoverflow
9 Comments
Great post Bogdan ! Given the breadth and dept of the design architecture concerns, event driven approach try to address, it’s great summary of problem especially how it fits in larger context.
Wish there is some series of posts , podcasts exclusively for this topic which interlink to other major topics like distributed transactions, eventual consistency and so on…
Appreciate if anyone can add few links references as I have always been on hunt of related topics, design patterns.
Thank you!
Great post
Podcasts om the topic will really help
Anil, I and other academics have been exploring problems related to the event-driven architecture.
I’ve written blog posts about my work in this space, on race conditions [1] and denial of service [2, 3]. See the references of the articles for other related work. I also saw [4] go by which improves on my work in [1].
Visualization is another hard topic for event-driven programming. [5] is an interesting read.
Drop me a line (contact info via http://people.cs.vt.edu/davisjam) if you have comments, questions, or pet peeves/suggestions for research topics!
[1] https://medium.com/@davisjam/nodefz-eurosys17-2150730ecdfe?source=friends_link&sk=20ee75d1d734699a5334ed34b8240301
[2] https://medium.com/@davisjam/a-sense-of-time-for-javascript-and-node-js-68c9114f5d48?source=friends_link&sk=47eaa4e1c89b4b2e8525e66af2e8b5b6
[3] https://nodejs.org/ru/docs/guides/dont-block-the-event-loop/
[4] https://users-cs.au.dk/~amoeller/papers/noderacer/paper.pdf
[5] http://blogs.ubc.ca/karthik/files/2014/06/saba-icse16.pdf
Short, concise and to the point, well written.
A great article on some of the key problems in modern web development and how designing applications with an event-driven architecture can solve some of these problems in a more novel way 👍
Interesting topic, i work in am event driven architecture. My current Problem is the document/file handling. The Idee is to use a sharepoint as a document Store (Important for the collaboration of our users) . But not all users of the platform are sharepoint users. Also i have different Sites in sharepoint where the documents are located. Also each access to the store /file should Include a One Time Token.
My first idea was to use a rest Call of the Business Service (After it checked the permission) to the file Service to gerät this one Time Token. But it will break the event driven architecture. So what you‘d suggest ?
Hey Eduard, as I mentioned at the end of the article, EDA cannot solve all the problems. It looks that your scenario maybe is one of those problems, and that is ok! You will never find a purely event-driven company. Most of the time, you will see a mix between eventing and synchronous APIs.
There are 3 types of communication between services: events, queries, and commands. Events represent an action that happened in the past, and there is no response attached to it. A query represents an action happening now, and there is always a response. Finally, a command represents an action that will happen sometime in the ( near ) future, and there is a possible response attached to it.
Looking at your scenario ( accessing a file while presenting a token ), I can determine ( at least ) 2 actions: a query ( retrieve the file ) and a command ( generate one-time token ).
On the other hand, if your scenario was something like this: “retrieve all the users that have accessed these files”, things would have been much smoother for EDA. Possible flow: file accessed by X user -> send an event to the event store -> use the event store to create a materialized view containing all the users who have accessed this file.
How the event driving system will work in these use cases:
1. I want to build an item search form
2. let say I want the user to create an account and login immediately
3. I want the user log in immediately if their username and password are correct.
There are many more scenarios how those will fit into the event-driven? Everybody on the internet is saying you should use event-driven but nobody answers to those questions I asked above.
1. You want to query a projection in the form of something like elastic search
2/3. For parts of your system that require immediate response without CQRS, you can send a response back immediately and write the event to your event store simultaneously; look up outbox pattern. You can have a local data store for aggregates that might necessitate this kind of instant response mechanism. Additionally, event sourcing/CQRS as a combined system is not ideal for all parts of a system. In reality, event sourcing/CQRS make a lot more sense in some parts of your system compared to others. Being able to decouple the way in which your application layer communicates with your domain layer is really important from this aspect, because you should be able to build an API for your system that gets processed as both synchronous and CQRS-style calls, depending on your need, even though ultimately they roll up into root aggregate-based events. Typically you can use a kind of event-handler or translation handler to convert non-conforming application layer information into the requisite event-sourced system you’ve built.