Alleviating GraphQL Performance Anxiety

extreme close up photo of frightened eyes

04 Jan 2022 Alleviating GraphQL Performance Anxiety

Posted by Ben Teese

Learning how to write a GraphQL server is one of the biggest challenges I’ve seen for those who are new to GraphQL. This is because it requires a change in mindset, especially if you are accustomed to writing REST servers. Specifically, instead of thinking in terms of implementing individual endpoints in isolation, you have to think in terms of implementing an entire GraphQL schema that can be queried in any way.

This need for flexibility can cause anxiety for some developers, especially when it comes to performance. To deal with these concerns, some developers panic and pre-emptively reduce the flexibility of their GraphQL schema to head-off anticipated issues. But in doing so, they start to negate the chief benefit of using GraphQL in the first place: being able to flexibility and efficiently access data over a network as slow and laggy as the internet. Taken to the extreme, developers can even end up in a situation where they’re paying all of the additional cost of using GraphQL, but not reaping any of the benefits.

It’s true that strictly speaking, the easiest way to write a GraphQL server isn’t necessarily the most performant, especially in comparison to writing an equivalent REST server. This is part of the trade-off of using GraphQL. However, it’s also true that the majority of the time, this performance difference won’t be noticeable to the end user. Furthermore, even if it turns out that there is an actual performance problem, there are plenty of mechanisms available to deal with it, without compromising your schema.

In this post I’m going to show you an example of a common source of performance anxiety for GraphQL server developers, and how it can trick them into thinking that they’re going to have to make adjustments to their schema to deal with it. I will then present an alternate approach that allows the best of both worlds – a performant solution and the simplest schema possible. In this case I’ll be using Apollo Server to write the server. I’ll be assuming you have some knowledge of both GraphQL and Apollo Server.

An Example

Imagine that your data graph needs to be able to contain information about your customer’s names and addresses. In that case, a simple GraphQL schema to satisfy these requirements might look something like this:

import gql from "graphql"

const typeDefs = gql`{
  type Customer {
    id: ID!
    firstName: String
    lastName: String
    streetAddress: String
    postcode: String
    city: String
    country: String
  }

  type Query {
    customer(id: ID!): Customer
  }
}`
...

Given a particular customer ID, this schema will let us query for all of the information we need about the customer. So far, so good.

However, imagine that when it comes to implementing a server that can satisfy this schema, we discover that there are actually two different REST endpoints for getting the data:

/customers/{customerId}
/customers/{customerId}/address

The first endpoint returns the firstName and lastName of the customer with the given ID. The second endpoint returns the streetAddress, postcode, city and country of the customer with the given ID.

It’s easy enough to write an Apollo RESTDataSource that fetches this data for us from a REST endpoint:

import { RESTDataSource } from "apollo-datasource-rest"
...
class CustomersDataSource extends RESTDataSource {
  constructor() {
    super()
    this.baseUrl = 'https://customers.example.com'
  }

  findById(id) {
    return this.get(`/customers/${id}`)
  }

  findAddressById(id) {
    return this.get(`/customers/${id}/address`)
  }
}
...

We could then write a resolver to get all of this data:

...
const resolvers = {
  Query: {
    async customer(_, { id }, { dataSources }) {
      return {
        id,
        ...(await dataSources.customers.findById(id)),
        ...(await dataSources.customers.findAddressById(id))
      }
    }
  }
}
...

And package the schema, datasource and resolvers up into a simple Apollo Server instance:

...
const server = new ApolloServer({ 
  typeDefs,
  resolvers,
  dataSources: () => ({
    customers: new CustomersDataSource()
  })
});

// Launch the server
server.listen().then(({ url }) => {
  console.log(`🚀  Server ready at ${url}`);
});

However, most developers will quickly see that this implementation won’t be particularly performant. It will call findAddressById even if the user didn’t actually ask for any address fields. So if I run this query:

query GetCustomerName($id: ID!) {
  customer(id: $id) {
    firstName
    lastName
  }
}

then the server will still call the /customers/{customerId}/address endpoint, even though we’re not actually using any of the data from that endpoint.

Faced with this realisation, many developers who are new to GraphQL will assume that the only way they can work around it is to adjust their GraphQL schema. Usually, they decide to create a separate new type for the address, so that it more closely mirrors the structure of the REST endpoints:

type Customer {
  id: ID!
  firstName: String
  lastName: String
  address: Address
}

type Address {
  streetAddress: String
  postcode: String
  city: String
  country: String
}

type Query {
  customer(id: ID!): Customer
}

This schema can then be implemented by adding a custom resolver for the address field on the Customer type:

export resolvers = {
  Query: {
    async customer(_, { id }, { dataSources }) {
      return {
        id,
        ...(await dataSources.customer.findById(id))
      }
    }
  },
  Customer: {
    address(customer, _, {dataSources}) {
      return dataSources.customer.findAddressById(customer.id)
    }
  }
}

(If you’re new to writing custom resolvers and are a little hazy as to what’s going on here, I recommend reading the Apollo Server resolver docs)

So now the GetCustomerName query won’t trigger a call to /customers/{customerId}/address. It will only call/customers/{customerId}.

To actually get address info, you’d have to run a query like this:

query GetCustomerNameAndAddress($id: ID!) {
  customer(id: $id) {
    firstName
    lastName
    address {
      streetAddress
      postcode
      city
      country
    }
  }
}

Problem solved, eh? Well, I guess so. But to solve it, we had to tweak our GraphQL schema. This breaks the Principled GraphQL concept of having an Abstract, Demand-Oriented Schema. In other words, we’ve let the details of an underlying REST API service dictate the shape of our schema. The result is a schema that is more complex than it needs to be. Furthermore, this problem will compound as we try and unify more and more services behind a single schema, because each service will inevitably have its own quirks and ways of doing things. The goal of GraphQL is to hide all of these differences from the client, but we’re doing precisely the opposite here.

What to do instead

So how can we have the best of both worlds: a simple schema and decent performance? Well, let’s revert to our original schema where the address fields are directly on the Customer type. We’ll then write custom resolvers for each of the address fields:

export resolvers = {
  Query: {
    customer(_, { id }, { dataSources }) {
      return {
        id,
        ...dataSources.customer.findById(id)
      }
    }
  },
  Customer: {
    streetAddress(customer, _, {dataSources}) {
      return dataSources.customer.findAddressById(customer.id).streetAddress
    }
    postcode(customer, _, {dataSources}) {
      return dataSources.customer.findAddressById(customer.id).postcode
    }
    city(customer, _, {dataSources}) {
      return dataSources.customer.findAddressById(customer.id).city
    }
    country(customer, _, {dataSources}) {
      return dataSources.customer.findAddressById(customer.id).country
    }
  }
}

At first glance this might look like a potentially worse solution. If you ask for more than one address field in a single query, won’t it call the /customers/{customerId}/address endpoint repeatedly?

In short, the answer is no. This is because RESTDataSource uses an in-memory cache to store the results of past fetches. This cache only lives for as long as each incoming GraphQL request (this is why the dataSources argument we provide to the ApolloServer constructor is a function – it gets re-invoked for each incoming request), but that’s long enough for what we need.

So now, we can run the query:

query GetCustomerNameAndAddress($id: ID!) {
  customer(id: $id) {
    firstName
    lastName
    streetAddress
    postcode
    city
    country
  }
}

and the/customers/{customerId}/address will only get called when the first address field is resolved. When the other three are resolved, the cached value will be used.

This is precisely what Apollo datasources are designed for. However, it mightn’t be immediately apparent or intuitive, especially if you have worked with REST servers in the past. Tools like RESTDataSource let us push the solution back into the server and kept the schema as we’d like it, but we have to know how to use it.

In this case, the tradeoff is between schema complexity, implementation complexity, and performance. The schema is simpler, but for the performance to be improved, the implementation is more complex. However, this is the right tradeoff to make, because there are many potential clients that are affected by the schema design, but only one server needs to implement that design in a performant manner. Consequently, it makes sense for the complexity to live in the server.

Wrapping Up

The example I’ve presented here isn’t the only type of GraphQL performance anxiety that I’ve witnessed. I’ve seen developers become so concerned about the possibility of rogue deep queries that they’ve removed bidirectional parent/child relationships from their schema and instead opted to have two variations of each type – one with access to a parent object, and the other with access to children. This left me wondering why they bothered to use GraphQL instead of REST in the first place. The saddest part is that there are many mitigations that you can use to block rogue queries, without compromising your schema. If GitHub can manage to protect their public GraphQL API without nobbling the schema, then so can you.

Having now spent many years building GraphQL servers, I’m at a point where if somebody on my team raises concerns that a particular schema design will not be performant, I tell them not to worry. Instead, I say that I am confident that, once we’ve gotten the schema exactly as we want it, then we’ll be able to find a way to implement it so that it meets our performance needs.

So the next time you find yourself experiencing GraphQL performance anxiety, take a deep breath and step back. Ask yourself whether it’s really going to be a problem. And if it is going to be a problem, consider what mechanisms are available to you to work around it, without compromising your schema. Your clients will thank you for it.

Tags:

Apollo Server, javascript

Ben Teese

ben.teese@shinesolutions.com

I'm a Senior Consultant at Shine Solutions.

Alleviating GraphQL Performance Anxiety

04 Jan 2022 Alleviating GraphQL Performance Anxiety

An Example

What to do instead

Wrapping Up

Tags:

Ben Teese

No Comments

Leave a ReplyCancel reply

Menu

Get in touch

Connect with us

Doug

Marcela

Trudi

Joy

James

Alleviating GraphQL Performance Anxiety

04 Jan 2022 Alleviating GraphQL Performance Anxiety

An Example

What to do instead

Wrapping Up

Tags:

Ben Teese

No Comments

Leave a ReplyCancel reply

Menu

Get in touch

Connect with us

Discover more from Shine Solutions Group