From time to time we run into issues with the database load spiking like crazy and we usually narrow it down to a rake task or two hammering the database with lots of queries or we notice that an operation that shouldn’t take much memory and CPU is suddenly slowing down entire Heroku dyno. What’s going on? How to improve this?
Many methods can help with improving the codebase, in this article I’m only going to focus on some of them.
Benchmarking methods
In our codebase, we have several gems that can help us with finding the problem or evaluate the potential solutions.
Important note about warmup
When benchmarking your code you may notice that whatever thing you’re benchmarking as the first one will usually take longer and use more of everything. Depending on the situation you may want to warm up your caches first to make sure you’re not skewing your results towards whatever is run later (and benefits from all sorts of caches). This can usually be done by running the same code for both cases before you start measuring.
Keep in mind though that in some cases it may also ruin the results because the code you want to test should depend on the caches, in which case it’s usually better to run the test in two separate processes or threads to prevent this from happening, sometimes it may even be a good idea to clear database caches as well. An example here would be a rake task where the first SELECT
will usually be slower but subsequent calls will hit cache 99% of the time. If you’re running this once a day, the chance of hitting the cache is low so your testing should accommodate for that.
Benchmark.ms
This is the simplest way of benchmarking things – it takes the block of code and checks how many milliseconds it took to run. Don’t let the simplicity steer you away – in many cases this will be enough to spot a glaring problem or compare the time it takes two algorithms to run.
⚠️ You may be inclined to just use start = Time.now
and then subtract that later but this is not a good idea. Time.now is not monotonic and may change during the time the test is running. Benchmark.ms
uses a monotonic clock under the hood.
Example:
result1 = Benchmark.ms do 10.times do Rake::Task['expensive_operation_a'].invoke Rake::Task['expensive_operation_a'].reenable end end result2 = Benchmark.ms do 10.times do Rake::Task['expensive_operation_b'].invoke Rake::Task['expensive_operation_b'].reenable end end puts "A took #{result1}ms" puts "B took #{result2}ms"
The problem with this approach is that you need to handle running the code in multiple iterations yourself to make sure you’re getting the results that are closest to the truth.
Now let’s talk about the gems that we have in our codebase and how they can help us deliver better code.
benchmark-ips
benchmark-ips
is a bit of a Benchmark.ms
on steroids. It allows us to specify a warmup step, set how many seconds we want to run it, and gives us a nice detailed report on the resulting iterations per second with a comparison between different implementations. This is especially useful when testing things that should run as fast as possible, for example in a loop, or when benchmarking hot paths of the application like code that is ran in an API endpoint.
Example:
require 'benchmark/ips' Benchmark.ips do |x| x.config(time: 5, warmup: 2) x.report("implementation_a") do SomeExpensiveCodeA.call end x.report("implementation_b") do SomeExpensiveCodeB.call end x.compare end
And example report:
Calculating -------------------------------------
implementation_a 71.254k i/100ms
implementation_b 68.658k i/100ms
-------------------------------------------------
implementation_a 4.955M (± 8.7%) i/s - 24.155M
implementation_b 24.011M (± 9.5%) i/s - 114.246M
Comparison:
implementation_a: 24011974.8 i/s
implementation_b: 23958619.8 i/s - 1.00x slower
benchmark-memory
This gem is helpful to check how much memory was used and how many objects were created during the time the code was being run. This is especially helpful when measuring the impact the code will have on the overall memory consumption (which is most important in HTTP-related code and less so in the rake tasks, unless they’re starting to outgrow the Heroku dyno or machine on which they’re running).
Example:
require "benchmark/memory" # Warmup step, this is necessary so autoloading code does not affect the results Rake::Task["expensive_operation_a"].execute Rake::Task["expensive_operation_b"].execute Rake::Task["expensive_operation_a"].reenable Rake::Task["expensive_operation_b"].reenable Benchmark.memory do |x| x.report("original") do Rake::Task["expensive_operation_a"].execute Rake::Task["expensive_operation_a"].reenable end x.report("improved") do Rake::Task["expensive_operation_b"].execute Rake::Task["expensive_operation_b"].reenable end x.compare! end
And example report:
Calculating -------------------------------------
original 219.152M memsize ( 812.306k retained)
2.396M objects ( 2.040k retained)
50.000 strings ( 50.000 retained)
improved 8.336M memsize ( 3.461k retained)
149.698k objects ( 46.000 retained)
50.000 strings ( 11.000 retained)
Comparison:
improved: 8335728 allocated
original: 219151974 allocated - 26.29x more
stackprof
Stackprof is an advanced tool that allows us to profile specific calls in Ruby call and generate flamegraphs. We can either use it when running a specific piece of code or enable it as Rack middleware.
There are already good articles about how to use this tool efficiently so I’m not going to go into too many details:
- StackProf: The Holy Grail of Rails Profiling
- A Little Ruby-Land Profiling with StackProf
- https://github.com/tmm1/stackprof#stackprof
rack-mini-profiler
Rack-mini-profiler is an extremely valuable tool to figure out why your Rails controllers are running slow. It allows you to check how long each controller action took, where it spent the most time and what SQL queries were run, and how long they took, so you can find both long queries and N+1s. It also includes integrations with the aforementioned stackprof and memory_profiler.
Writing faster code
Use simpler objects when possible
Most of the problems in terms of performance in Rails applications stem from a library that starts with an A
and ends with ctiveRecord
. You can massively speed up your code by avoiding instantiating new AR objects whenever possible.
The simplest examples are usually replacing things like association.map(&:id)
with association.pluck(:id)
but pluck
ing can be a much more powerful concept than that.
Let’s say we need to move the data from one table to another based on a common user ID. A naive implementation would look like this:
CustomerProfile.find_each do |profile| relationship = Relationship.find_by!(customer_user_id: profile.customer_user_id) relationship.update!( cleaning_details: profile.cleaning_details ) end
This code will not only be inefficient in terms of SQL queries, but it will also create a lot of ActiveRecord objects, only to copy some data from one place to another.
We can avoid creating all those ActiveRecord objects (and SQL queries as a bonus) by pulling the data first, creating a map with user IDs as keys, and then go over all elements and fetch the data from said map:
profiles_data = CustomerProfile.pluck(:customer_user_id, :cleaning_details) profiles_map = profiles_data.each_with_object({}) do |(customer_user_id, cleaning_details), acc| acc[customer_user_id] = cleaning_details end Relationship.where(customer_user_id: profiles_map.keys).find_each do |relationship| relationship.update!(cleanining_details: profiles_map[relationship.id]) end # the above code could be optimized even further but let's keep it like this for the sake of an example
One thing to note here is that often mistakes are coming from not knowing all available ActiveRecord methods – it’s usually a good idea to try googling a better solution instead of just writing code that “works” and pushing it to the repository hoping nobody will notice during a review.
One such example would be code that works like this:
users.find_in_batches(batch_size: 1000) do |batch| ids = batch.map(&:id) # ...
even though ActiveRecord batching API includes methods such as in_batches
that returns a relation object:
users.in_batches(of: 1000) do |relation| ids = relation.pluck(:id) # ...
though in this case, we could just as well do:
user_ids = users.pluck(:id) user_ids.each_slice(1000) do |ids|
and save ourselves a lot of unnecessary extra SELECT
queries.
Preload associations
This one is a staple of all performance-related blog posts and articles but it’s still surprisingly common to make this mistake and introduce N+1 queries, especially when you’re going through multiple layers of code that may not be obvious in retrospect.
The simplest way to introduce an N+1 problem is when you loop through a collection and every iteration does another SQL query, for example:
cancelled_relationships = Relationship.cancelled.order(cancelled_at: :desc).first(10) cancelled_relationships.each do |relationship| RelationshipMailer.send_followup_email(relationship.customer_user).deliver_later end
The code above will fetch the 10 latest canceled relationships and then for each of them fetch the customer_user
association, causing in total 11 SQL queries. This can be costly from both server perspective (because it wastes time waiting for responses from doing round-trips back and forth) and from database perspective (because it’s under higher load from fetching all this data one-by-one).
There are multiple ways to do eager loading in Rails depending on what do you need it for and what you’re trying to accomplish but if you’re unsure, the easiest way is to trust Rails defaults and go with includes
:
cancelled_relationships = Relationship.includes(:customer_user).cancelled.order(cancelled_at: :desc).first(10) cancelled_relationships.each do |relationship| RelationshipMailer.send_followup_email(relationship.customer_user).deliver_later end
This will load all of the associated customer users in two queries, once for the relationships, the second time for the users, making the code more performant.
⚠️ We used to have a gem called bullet to detect these automatically but it was slowing down the app in development considerably so after a quick discussion we decided to remove it.
Avoid N+1 queries by batching whenever possible
This is a mistake that is very simple to make – sometimes we need to run things that are more complex than the simple association that can be pushed to includes
, eager_load
, or preload
. In these cases, developers tend to not think about performance implications on their code and introduce N+1 queries blindly.
An example of this behavior would be a loop like this:
customers = Customer.active customers.each do |customer| customer.update future_events_count: customer.events.future.count end
While this will work, the more users you have the slower this code will get so you may have just left a ticking bomb in the codebase that your work buddies will have to fix 2 years down the line – not cool!
All of this could’ve been avoided by once again mapping the numbers first:
customers = Customer.active users_ids = users.pluck(:id) events_counts = Event.future.where(customer_user_id: users_ids).group(:customer_user_id).count customers.each do |customer| customer.update future_events_count: events_counts[customer.id] end
Sometimes the queries we’re doing already have complex grouping inside and chaining extra group
would break things. In this case, it’s usually OK to pluck
the results and create a map in Ruby instead as long as we remember not to instantiate too many ActiveRecord models.
Keep in mind that complex queries may be hidden under seemingly innocent methods so always check what is happening under the hood before making a decision one way or another.
GraphQL Batch loader
In our codebase, we are using graphql-batch to avoid N+1 queries in deeply nested GraphQL queries.
To use it you need to pass the ActiveRecord class, the name of the association, and the object from which we want to pull that association, and the gem will handle batching SQLs for us:
# this class inherits from GraphQL::Batch::Loader, in graphql-batch docs they call it RecordLoader Loaders::Association.for(Event, :customer_user).load(event)
Cache counters and calculate in Ruby
This one is a bit tricky but bear with me – whenever you’re running .count
on the association, it’s doing an extra SELECT
to grab the data from the database which usually is exactly what you need, except when running things in an array or when counting similar things that are only differing by another column value.
For example this code:
events = current_user.events return { all_events: events.count, future_events: events.future.count, past_events: events.past.count }
will do three SQL queries to fetch all the counters, while this code:
events_data = current_user.events.pluck(:end_time) return { all_events: events_data.size, future_events: events_data.count { |end_time| end_time >= Time.zone.now }, past_events: events_data.count { |end_time| end_time < Time.zone.now } }
will only do one.
⚠️ Keep in mind that doing it this way is a trade-off – you are trading future maintainability (your code now is aware of how past
and future
have been implemented) for performance so don’t do it prematurely unless it becomes a real problem.
There are many ways to improve the performance of a Rails application and even more ways to accidentally make it worse. Keep experimenting and learning how your specific app behaves and where the performance bottlenecks are. Good luck!