When to use gRPC vs GraphQL
TLDR: Use GraphQL for client-server communication and gRPC for server-to-server. See the Verdict section for exceptions to this rule.
I’ve read a lot of comparisons of these two protocols and wanted to write one that is comprehensive and impartial. (Well, as impartial as I and my reviewers can make it đ.) I was inspired by the release of connect-web (a TypeScript gRPC client that can be used in the browser) and a popular HN post entitled GraphQL kinda sucks. My personal history of communication protocols built on top of layer 7:
- REST (Rails and Express)
- âĄď¸ DDP (Meteor’s WebSocket protocol)
- âĄď¸ GraphQL (which I wrote a book about)
- âĄď¸ gRPC (which I use at Temporal)
Background
gRPC was released in 2016 by Google as an efficient and developer-friendly method of server-to-server communication. GraphQL was released in 2015 by Meta as an efficient and developer-friendly method of client-server communication. They both have significant advantages over REST and have a lot in common. Weâll spend most of the article comparing their traits, and then weâll summarize each protocolâs strengths and weaknesses. At the end, weâll know why each is such a good fit for its intended domain and when we might want to use one in the otherâs domain.
Comparing gRPC and GraphQL features
Interface design
Both gRPC and GraphQL are Interface Description Languages (IDLs) that describe how two computers can talk to each other. They work across different programming languages, and we can use codegen tools to generate typed interfaces in a number of languages. IDLs abstract away the transport layer; GraphQL is transport-agnostic but generally used over HTTP, while gRPC uses HTTP/2. We donât need to know about transport-level details like the method, path, query parameters, and body format in as REST. We just need to know a single endpoint that we use our higher-level client library to communicate with.
Message format
Message size matters because smaller messages generally take less time to send over the network. gRPC uses protocol buffers (a.k.a. protobufs), a binary format that just includes values, while GraphQL uses JSON, which is text-based and includes field names in addition to values. The binary format combined with less information sent generally results in gRPC messages being smaller than GraphQL messages. (While an efficient binary format is feasible in GraphQL, itâs rarely used and isnât supported by most of the libraries and tooling.)
Another aspect that affects message size is overfetching: whether we can request only specific fields or will always receive all fields (âoverfetchingâ fields we donât need). GraphQL always specifies in the request which fields are desired, and in gRPC, we can use FieldMasks as reusable filters for requests.
Another benefit to gRPCâs binary format is faster serializing and parsing of messages compared to that of GraphQLâs text messages. The downside is that itâs harder to view and debug than the human-readable JSON. We at Temporal use protobufâs JSON format by default for the visibility benefit to developer experience. (That loses the efficiency that came with the binary format, but users who value the efficiency more can switch to binary.)
Defaults
gRPC also doesnât include default values in messages, which GraphQL can do for arguments but not request fields or response types. This is another factor in gRPC messagesâ smaller size. It also affects the DX of consuming a gRPC API. Thereâs no distinction between leaving an input field unset and setting it to the default value, and the default value is based on the type of the field. All booleans default to false, and all numbers and enums default to 0. We canât default the `behavior` enum input field to `BEHAVIOR_FOO = 2`âwe have to either put the default value first (`BEHAVIOR_FOO = 0`), which means it will always be the default in the future, or we follow the recommended practice of having a `BEHAVIOR_UNSPECIFIED = 0` enum value:
enum Behavior {
BEHAVIOR_UNSPECIFIED = 0;
BEHAVIOR_FOO = 1;
BEHAVIOR_BAR = 2;
}
The API provider needs to communicate what UNSPECIFIED
means (by documenting âunspecified will use the default behavior, which is currently FOO
â), the consumer needs to think about whether the server default behavior may change in the future (if the server saves the provided UNSPECIFIED
/ 0 value in some business entity the consumer is creating, and the server later changes the default behavior, the entity will start behaving differently) and whether that would be desired. If it wouldnât be desired, the client needs to set the value to the current default. Hereâs an example scenario:
service ExampleGrpcService {
rpc CreateEntity (CreateEntityRequest) returns (CreateEntityResponse) {}
}
message CreateEntityRequest {
string name = 1;
Behavior behavior = 2;
}
If we do:
const request = new CreateEntityRequest({ name: âmy entityâ })
service.CreateEntity(request)
weâll be sending BEHAVIOR_UNSPECIFIED
, which depending on the server implementation and future changes, might mean BEHAVIOR_FOO
now and BEHAVIOR_BAR
later. Or we can do:
const request = new CreateEntityRequest({ name: âmy entityâ, behavior: Behavior.BEHAVIOR_FOO })
service.CreateEntity(request)
to be certain the behavior is stored as FOO
and will remain FOO
.
The equivalent GraphQL schema would be:
type Mutation {
createEntity(name: String, behavior: Behavior = FOO): Entity
}
enum Behavior {
FOO
BAR
}
When we donât include behavior
in the request, the server code will receive and store FOO as the value, matching the = FOO
default in the schema above.
graphqlClient.request(`
mutation {
createEntity(name: âmy entityâ)
}
`
Itâs simpler than the gRPC version to know what will happen if the field isnât provided, and we donât need to consider whether to pass the default value ourselves.
Other typesâ defaults have other quirks. For numbers, sometimes the default 0 is a valid value, and sometimes it will mean a different default value. For booleans, the default false results in negatively named fields. When weâre naming a boolean variable while coding, we use the positive name. For instance, weâd usually declare let retryable = true
rather than let nonRetryable = false
. People generally find the former more readable, as the latter takes an extra step to understand the double negative (ânotRetryable
is false
, so itâs retryableâ). But if we have a gRPC API in which we want the default state to be retryable, then we have to name the field nonRetryable
, because the default of an retryable
field would be false
, like all booleans in gRPC.
Request format
In gRPC, we call methods one at a time. If we need more data than a single method provides, we need to call multiple methods. And if we need response data from the first method in order to know which method to call next, then weâre doing multiple round trips in a row. Unless weâre in the same data center as the server, that causes a significant delay. This issue is called underfetching.
This is one of the issues GraphQL was designed to solve. Itâs particularly important over high-latency mobile phone connections to be able to get all the data you need in a single request. In GraphQL, we send a string (called a document) with our request that includes all the methods (called queries and mutations) we want to call and all the nested data we need based on the first-level results. Some of the nested data may require subsequent requests from the server to the database, but theyâre usually located in the same data center, which should have sub-millisecond network latency.
GraphQLâs request flexibility lets front-end and back-end teams become less coupled. Instead of the front-end developers waiting for the back-end developers to add more data to a methodâs response (so the client can receive the data in a single request), the front-end developers can add more queries or nested result fields to their request. When thereâs a GraphQL API that covers the organizationâs entire data graph, the front-end team gets blocked waiting for backend changes much less frequently.
The fact that the GraphQL request specifies all desired data fields means that the client can use declarative data fetching: instead of imperatively fetching data (like calling `grpcClient.callMethod()“`), we declare the data we need next to our view component, and the GraphQL client library combines those pieces into a single request and provides the data to the components when the response arrives and later when the data changes. The parallel for view libraries in web development is using React instead of jQuery: declaring how our components should look and having them automatically update when data changes instead of imperatively manipulating the DOM with jQuery.
Another effect GraphQLâs request format has is increased visibility: the server sees each field thatâs requested. We can track field usage and see when clients have stopped using deprecated fields, so that we know when we can remove them as opposed to forever supporting something that we said weâd get rid of. Tracking is built into common tools like Apollo GraphOS and Stellate.
Forward compatibility
Both gRPC and GraphQL have good forward compatibility; that is, itâs easy to update the server in a way that doesnât break existing clients. This is particularly important for mobile apps that may be out of date, but also necessary in order for SPAs loaded in usersâ browser tabs to continue working after a server update.
In gRPC, you can maintain forward compatibility by numerically ordering fields, adding fields with new numbers, and not changing the types/numbers of existing fields. In GraphQL, you can add fields, deprecate old fields with the `@deprecated“` directive (and leave them functioning), and avoid changing optional arguments to be required.
Transport
Both gRPC and GraphQL support the server streaming data to the client: gRPC has server streaming and GraphQL has Subscriptions and the directives @defer, @stream, and @live. gRPCâs HTTP/2 also supports client and bidirectional streaming (although we canât do bidirectional when one side is a browser). HTTP/2 also has improved performance through multiplexing.
gRPC has built-in retries on network failure, whereas in GraphQL, it might be included in your particular client library, like Apollo Clientâs RetryLink. gRPC also has built-in deadlines.
There are also some limitations of the transports. gRPC is unable to use most API proxies like Apigee Edge that operate on HTTP headers, and when the client is a browser, we need to use gRPC-Web proxy or Connect (while modern browsers do support HTTP/2, there arenât browser APIs that allow enough control over the requests). By default, GraphQL doesnât work with GET caching: much of HTTP caching works on GET requests, and most GraphQL libraries default to using POST. GraphQL has a number of options for using GET, including putting the operation in a query parameter (viable when the operation string isnât too long), build-time persisted queries (usually just used with private APIs), and automatic persisted queries. Cache directives can be provided at the field level (the shortest value in the whole response is used for the Cache-Control headerâs `max-age`).
Schema and types
GraphQL has a schema that the server publishes for client devs and uses to process requests. It defines all the possible queries and mutations and all the data types and their relations to each other (the graph). The schema makes it easy to combine data from multiple services. GraphQL has the concepts of schema stitching (imperatively combining multiple GraphQL APIs into a single API that proxies parts of the schema) and federation (each downstream API declares how to associate shared types, and the gateway automatically resolves a request by making requests to downstream APIs and combining the results) for creating a supergraph (a graph of all our data that combines smaller subgraphs / partial schemas). There are also libraries that proxy other protocols to GraphQL, including gRPC.
Along with GraphQLâs schema comes further developed introspection: the ability to query the server in a standard way to determine what its capabilities are. All GraphQL server libraries have introspection, and there are advanced tools based on introspection like GraphiQL, request linting with graphql-eslint, and Apollo Studio, which includes a query IDE with field autocompletion, linting, autogenerated docs, and search. gRPC has reflection, but itâs not as widespread, and thereâs less tooling that uses it.
The GraphQL schema enables a reactive normalized client cache: because each (nested) object has a type field, types are shared between different queries, and we can tell the client which field to use as an ID for each type, the client can store data objects normalized. This enables advanced client features, such as a query result or optimistic update triggering updates to view components that depend on different queries that include the same object.
There are a few differences between gRPC and GraphQL types:
- gRPC version 3 (latest as of writing) does not have required fields: instead, every field has a default value. In GraphQL, the server can differentiate between a value being present and absent (null), and the schema can indicate that an argument must be present or that a response field will always be present.
- In gRPC, there is no standard way to know whether a method will mutate state (vs GraphQL, which separates queries and mutations).
- Maps are supported in gRPC but not in GraphQL: if you have a data type like `{[key: string] : T}`, you need to use a JSON string type for the whole thing.
A downside of GraphQLâs schema and flexible queries is that rate limiting is more complex for public APIs (for private APIs, we can allowlist our persisted queries). Since we can include as many queries as weâd like in a single request, and those queries can ask for arbitrarily nested data, we canât just limit the number of requests from a client or assign cost to different methods. We need to implement cost analysis rate limiting on the whole operation, for example by using the graphql-cost-analysis library to sum individual field costs and pass them to a leaky bucket algorithm.
Summary
Hereâs a summary of the topics weâve covered:
Similarities between gRPC and GraphQL
- Typed interfaces with codegen
- Abstract away the network layer
- Can have JSON responses
- Server streaming
- Good forward compatibility
- Can avoid overfetching
gRPC
Strengths
- Binary format:
- Faster transfer over network
- Faster serializing, parsing, and validation
- However, harder to view and debug than JSON
- HTTP/2:
- Multiplexing
- Client and bidirectional streaming
- Built-in retries and deadlines
Weaknesses
- Need proxy or Connect to use from the browser
- Unable to use most API proxies
- No standard way to know whether a method will mutate state
GraphQL
Strengths
- Client determines which data fields it wants returned. Results in:
- No underfetching
- Team decoupling
- Increased visibility
- Easier to combine data from multiple services
- Further developed introspection and tooling
- Declarative data fetching
- Reactive normalized client cache
Weaknesses
- If we already have gRPC services that can be exposed to the public, it takes more backend work to add a GraphQL server.
- HTTP GET caching doesnât work by default.
- Rate limiting is more complex for public APIs.
- Maps arenât supported.
- Inefficient text-based transport
Verdict
Server-to-server
In server-to-server communication, where low latency is often important, and more types of streaming are sometimes necessary, gRPC is the clear standard. However, there are cases in which we may find some of the benefits of GraphQL more important:
- Weâre using GraphQL federation or schema stitching to create a supergraph of all our business data and decide to have GraphQL subgraphs published by each service. We create two supergraph endpoints: one external to be called by clients and one internal to be called by services. In this case, it may not be worth it for services to also expose a gRPC API, because they can all be conveniently reached through the supergraph.
- We know our servicesâ data fields are going to be changing and want field-level visibility on usage so that we can remove old deprecated fields (and arenât stuck with maintaining them forever).
Thereâs also the question of whether we should be doing server-to-server communication ourselves at all. For data fetching (GraphQLâs queries), itâs the fastest way to get a response, but for modifying data (mutations), things like Martin Fowlerâs âsynchronous calls considered harmfulâ (see sidebar here) have led to using async, event-driven architecture with either choreography or orchestration between services. Microservices Patterns recommends using the latter in most cases, and to maintain DX and development speed, we need a code-based orchestrator instead of a DSL-based one. And once weâre working in a code-based orchestrator like Temporal, we no longer make network requests ourselvesâthe platform reliably handles it for us. In my opinion, thatâs the future.
Client-server
In client-server communication, latency is high. We want to be able to get all the data we need in a single round trip, have flexibility in what data we fetch for different views, and have powerful caching, so GraphQL is the clear winner. However, there are cases in which we may choose to use gRPC instead:
- We already have a gRPC API that can be used, and the cost of adding a GraphQL server in front of that isnât worth the benefits.
- JSON is not a good fit for the data (e.g. weâre sending a significant amount of binary data).
I hope this article aided your understanding of the protocols and when to use them! If youâd like to learn more about GraphQL, check out their site or my book, The GraphQL Guide. For more about gRPC, hereâs their site and documentation.
Thanks to Marc-AndrĂŠ Giroux, Uri Goldshtein, Sashko Stubailo, Morgan Kestner, Andrew Ingram, Lenny Burdette, Martin Bonnin, James Watkins-Harvey, Josh Wise, Patrick Rachford, and Jay Miller for reading drafts of this.
Tags: api, graphql, gRPC
12 Comments
Great article, thanks! There is one problematic thing in both frameworks and that is file download/upload. If I’m correct, both frameworks have no notion of binary payload, so file upload/download is typically handled through other means. It is sad that file handling cannot be expressed in the frameworks Interface Description Language and that it requires other endpoints and documentation.
I don’t know about gRPC, but it’s definitely possible to do binary download/upload with GraphQL.
It’s just not common and you’ll usually see GraphQL servers using base64 string for files instead.
I personally use base64 for files on our GraphQL servers and the 30% reduction in transfer speed isn’t a big deal, at least in our use cases.
But if your users download/upload files often then I would recommend using a binary format, rather than base64 string.
Since GraphQL is text-based, you have to Base64 encode (for example with Apollo’s Upload scalar and apollo-upload-client) or have a different endpoint. For (non-web) gRPC, you can client-stream the file in binary chunks.
Either way, if the client is a web browser or mobile app, I’d do client-side upload directly to the blob store like S3, Cloudinary, etc.
More info here:
https://graphql.guide/server/extended-topics/file-uploads/
I don’t know about Apollo, but the go graphql implementation allows uploading via multipart/formdata without having to Base64 encode the data.
Excellent comparison between two protocols-!
First I want to Appreciate your efforts! It’s a good one!
Secondly I want you to know that I’m a new bie! So incase I got it wrong!
In my situation, try to connect to my client through web socket, as a third party to there social vendors to collect there social information, and user generated data.
And give them chance to post content through my channel and as well to be receiving feed back of that post as well.
And all generated social data are to be used for ranking! On my web application! Please which possible combination is best fit?
1. Grapbql for client, grpc& and go in the back-end? Or otherwise?
2. And please which methodology will be feasible for the data manipulation? Assume each post should be recognized as a score assume “2” may be using a melware! And likes as +1, and all score should be forwarded to users ranking algorithm! Which will be use for the ranking by the highest in a group? Like a leader board? Thank you for sharing
Nicely written and quite informative. Thanks Loren.
What about using the “server for client” pattern where the client devs create their own little API gateway server that receives requests from the browser and goes does all the fetching necessary to create a custom-tailored response for the client? That way the backend devs don’t have to deal with the headache of trying to have good performance and security for every possible GraphQL query and the front-end devs can still have their network payloads tailored to the needs of the specific view the user is looking at.
The article briefly touched on this when talking about persisted queries for private servers.
Essentially it means that you specify all allowed queries on the server side and the clients have to chose the operation to perform.
It’s a simple way to cover your points about clients having their customized requests and backend devs not having to care about handling any possible query.
> If we already have gRPC services that can be exposed to the public, it takes more backend work to add a GraphQL server.
You could say the same thing in reverse: if you already have a GraphQL server, it would be more work to add a gRPC server.
> HTTP GET caching doesnât work by default.
That depends on the tool that you’re using on top of GraphQL.
If you’re using a tool that can automatically handle HTTP GET caching then it does work by default.
And you’re unlikely to use GraphQL without any tooling on top of it.
> Rate limiting is more complex for public APIs.
True, but there is probably a npm package for it.
I’ve only done rate limiting on individual fields up until now, so I can’t say for sure if this is a big deal or not.
> Maps arenât supported.
MAPS ARE SUPPORTED, and they seem to be used fairly often.
Maps may not be built-in, but you can find a npm package for it. (or a package from another source if you’re not using node.js)
Even coding your own support for maps should be fairly straight forward.
You said “you need to use a JSON string type for the whole thing”
If you’re talking about something like “JSONObject” from “graphql-scalars”, I wouldn’t call that a “string type”, since you’re outputting an object, same as everything else in the response.
> Inefficient text-based transport.
You can definitely use a binary transfer protocol with GraphQL.
But from my experience binary formats like BSON don’t really reduce the payload size enough to be worth the effort and browsers are in general faster at parsing JSON then binary, since JSON parsing is native and a binary parser would have to be implemented in javascript.
Honestly the entire article reads like it’s written by a gRPC dev who was forced to use GraphQL.
One could say your comment was written by a GraphQL dev that has not done s2s calls and was forced to read a gRPC vs GQL article đ¤ˇđ˝ââď¸
Honestly, I don’t think “30% perf reduction”, “there’s an npm package for that”, and “coding your own support” are even close to good answers. You don’t want any of this I’m s2s
But we agree for sure that browsers binary support sucks, but this is not gRPCs fault, which is why GQL/REST is best for cloent-server anyway.
> â30% perf reductionâ
In my comment I said binary transfer was definitely possible. I only mentioned that we don’t do it because it didn’t seem to make much of a difference in our case.
> âthereâs an npm package for thatâ, and âcoding your own supportâ
I don’t see how you can make an argument about both of those answers at the same time.
And gRPC / GraphQL are also npm packages, so I don’t see how you can complain that âthereâs an npm package for thatâ is not a good answer.
I personally prefer it when features are split into their own packages, because then you don’t need to install the ones you don’t need.