Faster code – benchmarking and improving Rails applications

From time to time we run into issues with the database load spiking like crazy and we usually narrow it down to a rake task or two hammering the database with lots of queries or we notice that an operation that shouldn’t take much memory and CPU is suddenly slowing down entire Heroku dyno. What’s going on? How to improve this?

Many methods can help with improving the codebase, in this article I’m only going to focus on some of them.

Benchmarking methods

In our codebase, we have several gems that can help us with finding the problem or evaluate the potential solutions.

Important note about warmup

When benchmarking your code you may notice that whatever thing you’re benchmarking as the first one will usually take longer and use more of everything. Depending on the situation you may want to warm up your caches first to make sure you’re not skewing your results towards whatever is run later (and benefits from all sorts of caches). This can usually be done by running the same code for both cases before you start measuring.

Keep in mind though that in some cases it may also ruin the results because the code you want to test should depend on the caches, in which case it’s usually better to run the test in two separate processes or threads to prevent this from happening, sometimes it may even be a good idea to clear database caches as well. An example here would be a rake task where the first SELECT will usually be slower but subsequent calls will hit cache 99% of the time. If you’re running this once a day, the chance of hitting the cache is low so your testing should accommodate for that.

Benchmark.ms

This is the simplest way of benchmarking things – it takes the block of code and checks how many milliseconds it took to run. Don’t let the simplicity steer you away – in many cases this will be enough to spot a glaring problem or compare the time it takes two algorithms to run.

⚠️ You may be inclined to just use start = Time.now and then subtract that later but this is not a good idea. Time.now is not monotonic and may change during the time the test is running. Benchmark.ms uses a monotonic clock under the hood.

Example:

result1 = Benchmark.ms do
  10.times do
    Rake::Task['expensive_operation_a'].invoke
    Rake::Task['expensive_operation_a'].reenable
  end
end

result2 = Benchmark.ms do
  10.times do
    Rake::Task['expensive_operation_b'].invoke
    Rake::Task['expensive_operation_b'].reenable
  end
end

puts "A took #{result1}ms"
puts "B took #{result2}ms"

The problem with this approach is that you need to handle running the code in multiple iterations yourself to make sure you’re getting the results that are closest to the truth.

Now let’s talk about the gems that we have in our codebase and how they can help us deliver better code.

benchmark-ips

benchmark-ips is a bit of a Benchmark.ms on steroids. It allows us to specify a warmup step, set how many seconds we want to run it, and gives us a nice detailed report on the resulting iterations per second with a comparison between different implementations. This is especially useful when testing things that should run as fast as possible, for example in a loop, or when benchmarking hot paths of the application like code that is ran in an API endpoint.

Example:

require 'benchmark/ips'

Benchmark.ips do |x|
  x.config(time: 5, warmup: 2)

  x.report("implementation_a") do
    SomeExpensiveCodeA.call
  end

  x.report("implementation_b") do
    SomeExpensiveCodeB.call
  end

  x.compare
end

And example report:

Calculating -------------------------------------
    implementation_a    71.254k i/100ms
    implementation_b    68.658k i/100ms
-------------------------------------------------
    implementation_a     4.955M (± 8.7%) i/s -     24.155M
    implementation_b    24.011M (± 9.5%) i/s -    114.246M

Comparison:
    implementation_a: 24011974.8 i/s
    implementation_b: 23958619.8 i/s - 1.00x slower

benchmark-memory

This gem is helpful to check how much memory was used and how many objects were created during the time the code was being run. This is especially helpful when measuring the impact the code will have on the overall memory consumption (which is most important in HTTP-related code and less so in the rake tasks, unless they’re starting to outgrow the Heroku dyno or machine on which they’re running).

Example:

require "benchmark/memory"
# Warmup step, this is necessary so autoloading code does not affect the results
Rake::Task["expensive_operation_a"].execute
Rake::Task["expensive_operation_b"].execute
Rake::Task["expensive_operation_a"].reenable
Rake::Task["expensive_operation_b"].reenable

Benchmark.memory do |x|
  x.report("original") do
    Rake::Task["expensive_operation_a"].execute
    Rake::Task["expensive_operation_a"].reenable
  end
  x.report("improved") do
    Rake::Task["expensive_operation_b"].execute
    Rake::Task["expensive_operation_b"].reenable
  end
  x.compare!
end

And example report:

Calculating -------------------------------------
            original   219.152M memsize (   812.306k retained)
                         2.396M objects (     2.040k retained)
                        50.000  strings (    50.000  retained)
            improved     8.336M memsize (     3.461k retained)
                       149.698k objects (    46.000  retained)
                        50.000  strings (    11.000  retained)

Comparison:
            improved:    8335728 allocated
            original:  219151974 allocated - 26.29x more

stackprof

Stackprof is an advanced tool that allows us to profile specific calls in Ruby call and generate flamegraphs. We can either use it when running a specific piece of code or enable it as Rack middleware.

There are already good articles about how to use this tool efficiently so I’m not going to go into too many details:

rack-mini-profiler

Rack-mini-profiler is an extremely valuable tool to figure out why your Rails controllers are running slow. It allows you to check how long each controller action took, where it spent the most time and what SQL queries were run, and how long they took, so you can find both long queries and N+1s. It also includes integrations with the aforementioned stackprof and memory_profiler.

Writing faster code

Use simpler objects when possible

Most of the problems in terms of performance in Rails applications stem from a library that starts with an A and ends with ctiveRecord. You can massively speed up your code by avoiding instantiating new AR objects whenever possible.

The simplest examples are usually replacing things like association.map(&:id) with association.pluck(:id) but plucking can be a much more powerful concept than that.

Let’s say we need to move the data from one table to another based on a common user ID. A naive implementation would look like this:

CustomerProfile.find_each do |profile|
  relationship = Relationship.find_by!(customer_user_id: profile.customer_user_id)

  relationship.update!(
    cleaning_details: profile.cleaning_details
  )
end

This code will not only be inefficient in terms of SQL queries, but it will also create a lot of ActiveRecord objects, only to copy some data from one place to another.

We can avoid creating all those ActiveRecord objects (and SQL queries as a bonus) by pulling the data first, creating a map with user IDs as keys, and then go over all elements and fetch the data from said map:

profiles_data = CustomerProfile.pluck(:customer_user_id, :cleaning_details)
profiles_map = profiles_data.each_with_object({}) do |(customer_user_id, cleaning_details), acc|
  acc[customer_user_id] = cleaning_details
end

Relationship.where(customer_user_id: profiles_map.keys).find_each do |relationship|
  relationship.update!(cleanining_details: profiles_map[relationship.id])
end

# the above code could be optimized even further but let's keep it like this for the sake of an example

One thing to note here is that often mistakes are coming from not knowing all available ActiveRecord methods – it’s usually a good idea to try googling a better solution instead of just writing code that “works” and pushing it to the repository hoping nobody will notice during a review.

One such example would be code that works like this:

users.find_in_batches(batch_size: 1000) do |batch|
  ids = batch.map(&:id)
  # ...

even though ActiveRecord batching API includes methods such as in_batches that returns a relation object:

users.in_batches(of: 1000) do |relation|
  ids = relation.pluck(:id)
  # ...

though in this case, we could just as well do:

user_ids = users.pluck(:id)
user_ids.each_slice(1000) do |ids|

and save ourselves a lot of unnecessary extra SELECT queries.

Preload associations

This one is a staple of all performance-related blog posts and articles but it’s still surprisingly common to make this mistake and introduce N+1 queries, especially when you’re going through multiple layers of code that may not be obvious in retrospect.

The simplest way to introduce an N+1 problem is when you loop through a collection and every iteration does another SQL query, for example:

cancelled_relationships = Relationship.cancelled.order(cancelled_at: :desc).first(10)

cancelled_relationships.each do |relationship|
  RelationshipMailer.send_followup_email(relationship.customer_user).deliver_later
end

The code above will fetch the 10 latest canceled relationships and then for each of them fetch the customer_user association, causing in total 11 SQL queries. This can be costly from both server perspective (because it wastes time waiting for responses from doing round-trips back and forth) and from database perspective (because it’s under higher load from fetching all this data one-by-one).

There are multiple ways to do eager loading in Rails depending on what do you need it for and what you’re trying to accomplish but if you’re unsure, the easiest way is to trust Rails defaults and go with includes:

cancelled_relationships = Relationship.includes(:customer_user).cancelled.order(cancelled_at: :desc).first(10)

cancelled_relationships.each do |relationship|
  RelationshipMailer.send_followup_email(relationship.customer_user).deliver_later
end

This will load all of the associated customer users in two queries, once for the relationships, the second time for the users, making the code more performant.

⚠️ We used to have a gem called bullet to detect these automatically but it was slowing down the app in development considerably so after a quick discussion we decided to remove it.

Avoid N+1 queries by batching whenever possible

This is a mistake that is very simple to make – sometimes we need to run things that are more complex than the simple association that can be pushed to includes, eager_load, or preload. In these cases, developers tend to not think about performance implications on their code and introduce N+1 queries blindly.

An example of this behavior would be a loop like this:

customers = Customer.active

customers.each do |customer|
  customer.update future_events_count: customer.events.future.count
end

While this will work, the more users you have the slower this code will get so you may have just left a ticking bomb in the codebase that your work buddies will have to fix 2 years down the line – not cool!

All of this could’ve been avoided by once again mapping the numbers first:

customers = Customer.active
users_ids = users.pluck(:id)
events_counts = Event.future.where(customer_user_id: users_ids).group(:customer_user_id).count

customers.each do |customer|
  customer.update future_events_count: events_counts[customer.id]
end

Sometimes the queries we’re doing already have complex grouping inside and chaining extra group would break things. In this case, it’s usually OK to pluck the results and create a map in Ruby instead as long as we remember not to instantiate too many ActiveRecord models.

Keep in mind that complex queries may be hidden under seemingly innocent methods so always check what is happening under the hood before making a decision one way or another.

GraphQL Batch loader

In our codebase, we are using graphql-batch to avoid N+1 queries in deeply nested GraphQL queries.

To use it you need to pass the ActiveRecord class, the name of the association, and the object from which we want to pull that association, and the gem will handle batching SQLs for us:

# this class inherits from GraphQL::Batch::Loader, in graphql-batch docs they call it RecordLoader
Loaders::Association.for(Event, :customer_user).load(event)

Cache counters and calculate in Ruby

This one is a bit tricky but bear with me – whenever you’re running .count on the association, it’s doing an extra SELECT to grab the data from the database which usually is exactly what you need, except when running things in an array or when counting similar things that are only differing by another column value.

For example this code:

events = current_user.events

return {
  all_events: events.count,
  future_events: events.future.count,
  past_events: events.past.count
}

will do three SQL queries to fetch all the counters, while this code:

events_data = current_user.events.pluck(:end_time)

return {
  all_events: events_data.size,
  future_events: events_data.count { |end_time| end_time >= Time.zone.now },
  past_events: events_data.count { |end_time| end_time < Time.zone.now }
}

will only do one.

⚠️ Keep in mind that doing it this way is a trade-off – you are trading future maintainability (your code now is aware of how past and future have been implemented) for performance so don’t do it prematurely unless it becomes a real problem.

There are many ways to improve the performance of a Rails application and even more ways to accidentally make it worse. Keep experimenting and learning how your specific app behaves and where the performance bottlenecks are. Good luck!

Faster code – benchmarking and improving Rails applications

Benchmarking methods

Important note about warmup

Benchmark.ms

benchmark-ips

benchmark-memory

stackprof

rack-mini-profiler

Writing faster code

Use simpler objects when possible

Preload associations

Avoid N+1 queries by batching whenever possible

GraphQL Batch loader

Cache counters and calculate in Ruby

Michał Matyas

Improving developer experience with custom CLI tools

A quick story of slow code

Related Posts:

About Helpling

Helpling Tech

Helpling Tech

Faster code – benchmarking and improving Rails applications

Benchmarking methods

Important note about warmup

Benchmark.ms

benchmark-ips

benchmark-memory

stackprof

rack-mini-profiler

Writing faster code

Use simpler objects when possible

Preload associations

Avoid N+1 queries by batching whenever possible

GraphQL Batch loader

Cache counters and calculate in Ruby

Michał Matyas

Post Navigation

Improving developer experience with custom CLI tools

A quick story of slow code

Related Posts:

About Helpling

Sliding Sidebar