Tracking and fixing performance issues in graphql-ruby

If you’re working as a Ruby/Rails developer and want to implement GraphQL API in your application, you’re most likely using the graphql-ruby gem.

It’s a popular library that implements the GraphQL server using Ruby but unfortunately, it has the same problem as most of the Rails ecosystem – it makes it very, very easy to shoot yourself in the foot and create queries and mutations that are slow and hit your server and the database hard.

In this blog post, I will show you some of the techniques I use to figure out and find performance problems and also the best solution I have found so far to solve them.

Finding performance bottlenecks

In Helpling we’re happily using AppSignal for performance and incident tracking. Thankfully GraphQL gem includes support for tracing queries that also includes support for this specific platform. You can enable it by adding:

use(GraphQL::Tracing::AppsignalTracing)

to your GraphQL schema (the class that inherits from GraphQL::Schema).

But what to do if you don’t already use this kind of instrumentation or if you want to track very specific cases? Well, in this case, the easiest solution is to just… run your application with the Network tab open. You should use something that has a lot of records instead of only 2-3 test ones in the database. If your development database is tiny it may be a good idea to make a copy of the production database, run that one locally and use it for testing. Just don’t forget to anonymize the data before starting the app so you don’t send any test emails or push notifications to your users!

Chrome DevTools give you plenty of timing information that you can use to track slow requests. You can also copy a single request using Copy -> Copy as cURL to re-run a query in your terminal. My method is usually waiting until the app is fully loaded and then checking all requests one by one and running them separately to make sure that the time it takes for the query to run is not affected by other things happening at the same time.

Once you find a slow query you can start figuring out which fields are exactly the slow ones. To figure that out I usually go for GraphiQL, an additional tool that allows you to run arbitrary GraphQL queries against your app and rack-mini-profiler to get a better idea of what SQL queries are being run.

For example let’s say that we have a long, complex query that looks like this:

query fetchCustomerRelationships {
  customerRelationships {
    id
    cancelled
    provider {
      id
      firstname
      lastname
      live
      agency {
        id
        name
      }
      premium
    }
    address {
      id
      address
      postcode
    }
    series {
      id
      live
      hasPendingChangeRequests
      startTime
    }
  }
}

For my particular case, a query similar to this one took 10 seconds to run. That is a lot! What I did is I started bisecting this query by commenting out blocks of fields until it looked like this:

query fetchCustomerRelationships {
  customerRelationships {
#  .. rest of the query commented out  
    series {
#      id
#      live
      hasPendingChangeRequests
#      startTime
    }
  }
}

This particular part in code took around 3 seconds and for the user I was testing on was running hundreds of SQL queries. The implementation of the hasPendingChangeRequests reused a method in a Series model that was built with single Series in mind and was not optimized to be used in a collection. It’s a common problem when working with GraphQL – you want to reuse existing code to DRY it but you don’t fully see the performance implications as they’re hidden behind several layers of additional code.

Our implementation looked like this:

def has_pending_change_request?
  change_requests.outstanding_pending.any? ||
    ChangeRequest.outstanding_pending.joins(:event).where(events: { series_id: id }).any?
end

As you can imagine, this caused a pretty bad N+1 (or rather 2N+1) issue where for every Series we counted a bunch of rows in the database, ultimately slowing everything down to a crawl.

Amazing, we found our first problem! But… how to solve it?

Solving N+1 in GraphQL

The code in this section will seem advanced but it’s conceptually very easy.

GraphQL (Ruby) uses lazy execution to allow you to batch some operations. This is a very powerful thing but the API for it is a bit cumbersome which is why you should use an amazing gem graphql-batch from Shopify which massively simplifies the process.

GraphQL batch uses a technique you might have heard of if you’re a full-stack developer or ever heard your Javascript brethren discuss – promises. Since other people have put it better, allow me to quote here:

A promise is an object that may produce a single value some time in the future: either a resolved value, or a reason that it’s not resolved (e.g., a network error occurred). A promise may be in one of 3 possible states: fulfilled, rejected, or pending. Promise users can attach callbacks to handle the fulfilled value or the reason for rejection.

https://medium.com/javascript-scene/master-the-javascript-interview-what-is-a-promise-27fc71e77261

The way it works is that graphql-batch and graphql-ruby will run your code in three phases:

  • during the first phase every time your field will be requested you will be able to save a variable to an aggregator, for example, to store an ID of the association
  • during the second phase, once all the fields have been requested you can use the previously aggregated values to fetch the data and build an object with the results
  • in the third phase the object from the second phase will be used to resolve the promises so you can fetch specific results from it

Let’s look now how it looks in the code. In graphql-batch README you can find an example RecordLoader but ours will be even simpler than that:

class Loaders::Aggregator < GraphQL::Batch::Loader  
  def load(*args, &block)  
    @block = block  
    super  
  end  

  def perform(ids)  
    consumed_aggregator = @block.call(ids)  
    ids.each { |id| fulfill(id, consumed_aggregator) }  
   end  
end

So how do we use all this?! you may ask. It’s actually not that hard.

Let’s get back to our initial field:

field :has_pending_change_requests, Boolean, null: false, method: :has_pending_change_requests?

The first phase is to gather the IDs:

Loaders::Aggregator.load(object.id)

The second phase is to build our object that includes all the results we will require:

Loaders::Aggregator.load(object.id) do |series_ids|  
  [
    ChangeRequest  
      .outstanding_pending  
      .joins(:event)  
      .where('events.series_id IN (?)', series_ids)  
      .pluck('events.series_id'),  
    ChangeRequest
      .outstanding_pending  
      .where('series_id IN (?)', series_ids)  
      .pluck(:series_id),  
 ].flatten.sort.uniq  
end

And the third phase is to then use this object to get the results we need:

# ...
 ].flatten.sort.uniq  
end.then do |series_with_pending_change_requests|  
  series_with_pending_change_requests.include? object.id  
end  

If you were to run our code now you would notice that only one SQL request was run and the whole thing is much faster.

You may be surprised to see the way the aggregator was implemented in the 2nd step. You may even start scratching your head and thinking that this will actually be less performant or where the hell are JOINs – this particular code was used after carefully benchmarking different options and discovering that OR in PostgreSQL is really, really slow in some cases, but that’s a topic for a different blog post. It’s important to make sure your solutions are actually improving things so always benchmark and never assume and change blindly.

The aggregator loader we wrote is very simple but is already powerful. For your own code, you will likely want to experiment and try different loaders to simplify and deduplicate your code. The case described here returned an array but what if you need a hash?

For example, let’s say we need to fetch the count of all live events of a user. A naive implementation could look like this:

field :live_events_count, Integer, null: false

def live_events_count
  object.events.live.count
end

This implementation will be simple but will also introduce an N+1. If we were to use the aggregator we created a moment ago, it could look like this:

def live_events_count
  Loaders::Aggregator.load(object.id) do |user_ids|
    Event.live.where(user_id: user_ids).group(:user_id).count
  end.then do |counters|  
    counters[object.id]  
  end
end

As you can see, graphql-batch is a very powerful library that will allow you to solve performance bottlenecks relatively easily with just a bit more code. You should not be afraid to take your time, learn more about your app and use it to improve the performance and database load of your app.

Happy coding!

About Helpling

Site Footer