Materialized Views to Improve Aggregations Performance

Recently I was working on a project where it looked something like this: One model (movie) had a one-to-many relation with another model (user_rating), but I was only interested about the average of the ratings of the users.

DB structure

My initial approach was to do something like:

# frozen_string_literal: true

class Movie < Sequel::Model
  include ActiveModel::Serialization

  one_to_many :movie_ratings

  plugin :association_dependencies, movie_ratings: :destroy

  nested_attributes :movie_ratings

  def avg_rating
    return nil if movie_ratings.empty?
    sum_of_ratings = movie_ratings.sum(:rating)
    sum_of_ratings.to_f / movie_ratings.length
  end
end

Getting the average of the ratings by loading the records and doing the math was behaving fine for a single Movie but doing the same on a collection was super super slow even with eager loading. I had more tha 3.000.000 records for user_ratings.

One idea was to use pagination, and query only the Movies that are display in the current page of the list I had in the UI. That would offload the problem of the slow query to the user, every time the user would be filtering/re-sorting the movies the application would have to send a new request to the server, fetch the records from the DB and display them to the user. This would have a noticeable delay for the user, so it was not ideal.

My second thought was to use a Materialied View. Materialized View is a pre-computed query that is stored for later use with the goal to not re-peat the query again, the drawback of-course is that the data that you are getting may not be fresh enough, but in my case, the exact user-rating was not so critical and didn’t had to be 100% accurate. Ofcourse you probably do not want to do the same when you show to the user something that has to be 100% accurate e.g. the balance in his bank account.

An example of the query can be found here

So first I created a migration to create the materialized view. It was looking something like this:

# frozen_string_literal: true
Sequel.migration do
  up do
    execute <<-SQL
      CREATE MATERIALIZED VIEW movie_ratings_aggregations AS
        SELECT movies.id as movie_id, 
          avg(user_ratings.rating)::float as avg_rating, 
          count(user_ratings.rating) as ratings_count,
          now() AS last_refresh
        FROM user_ratings
        JOIN movies ON user_ratings.movie_id = movies.id
        GROUP BY movies.id
        ORDER BY movies.id;
    SQL
  end

  down do
    execute <<-SQL
      DROP MATERIALIZED VIEW movie_ratings_aggregations;
    SQL
  end
end

Next thing was to create a model to make the interaction with the view easier.

# frozen_string_literal: true
class MovieRatingsAggregation < Sequel::Model(:movie_ratings_aggregations)
  def readonly?
    true
  end

  def self.refresh_view
    Sequel::Model.db.refresh_view(:movie_ratings_aggregations)
  end
end

Important is to create a refresh method so we have the ability to refresh the view whenever we want, either by a cronjob or with callbacks after something changes in our main table.

Then we can create a one-to-one relationship with our main model

# frozen_string_literal: true

class Movie < Sequel::Model
  include ActiveModel::Serialization

  one_to_one  :movie_ratings_aggregation, key: :movie_id

  ...
  ...

We can refresh the view when new ratings are assigned, although in that case we have to be careful to not over-do it.

# frozen_string_literal: true
class MovieRating < Sequel::Model
  many_to_one :movies

  def after_save
    MovieRatingsAggregation.refresh_view
  end

  def after_destroy
    MovieRatingsAggregation.refresh_view
  end

  ...
  ...

end

That’s it, combined with eager loading as well can boost significanlty the performance of the application.

Published 7 Feb 2020

Software Engineering Lead, Certified AWS Solutions Architect. Opinions are my own and not necessarily the views of my employer.
Avraam Mavridis on Twitter