09 Jul 2025
While we can piece together Reports and Dashboards using ActiveRecord, it can be more efficient to use SQL to transform data and dry-struct
to turn query results into Ruby objects with strong typing.
ActiveRecord is a great ORM for typical CRUD operations. While it offers a handful of methods to aggregate data such as .group
, .sum
, and .joins
, they are not sufficient for complex reporting queries.
That’s where raw SQL with Common Table Expressions (CTEs) shines. If you’re not familiar with CTEs (Common Table Expressions), they’re like temporary views that let you transform data from one shape into another. Here we transform operational data into a format suitable for reporting.
Zipline users can send messages with associated tasks to multiple teams and track readership and task completion. In this first report, we want to display how many people read messages, grouped by team.
Let’s use CTEs to compute:
message_readership
: percent of recipients who read each messagemessage_team_metrics
: readership per communication per teamteam_readership
: average readership per teamThe transformation flow looks like this:
Here is the SQL query with CTEs:
WITH message_readership AS (
SELECT
m.id AS message_id,
COUNT(r.*) FILTER (WHERE r.read_at IS NOT NULL) * 1.0 / COUNT(r.*) AS readership
FROM messages m
JOIN recipients r ON r.message_id = m.id
GROUP BY m.id
),
message_team_metrics AS (
SELECT
mt.message_id,
mt.team_id,
mr.readership
FROM message_teams mt
JOIN message_readership mr ON mr.message_id = mt.message_id
),
team_readership AS (
SELECT
t.id AS team_id,
t.name as team_name,
AVG(mt.readership) AS avg_readership
FROM teams t
JOIN message_team_metrics mt ON mt.team_id = t.id
GROUP BY t.id, t.name
)
SELECT * FROM team_readership
The resulting rows will hydrate a collection of DryStructs
:
class TeamReadership < Dry::Struct
attribute :team_id, Types::Integer
attribute :team_name, Types::String
attribute :avg_readership, Types::Float
end
rows = ApplicationRecord.connection.exec_query(team_readership_sql)
structs = rows.map { |row| TeamReadership.new(row.symbolize_keys) }
With this approach, we write a readable SQL query that hydrates a collection of strongly typed ruby objects. We can then build the view layer using DryStructs
which is way easier than manipulating a collection of obscure hashes returned by a long SQL query.
Here is a more complex example where we want a dashboard showing metrics at three levels:
We’re going to implement this as one SQL query with CTEs to perform transformations in two stages:
We then use Postgres json functions to return these as a json document.
WITH message_readership AS (
SELECT
m.id AS message_id,
COUNT(r.*) FILTER (WHERE r.read_at IS NOT NULL) * 1.0 / COUNT(r.*) AS readership
FROM messages m
JOIN recipients r ON r.message_id = m.id
GROUP BY m.id
),
message_tasks AS (
SELECT
t.message_id,
COUNT(*) FILTER (WHERE t.completed) * 1.0 / COUNT(*) AS execution
FROM tasks t
GROUP BY t.message_id
),
message_team_metrics AS (
SELECT
mt.message_id,
mt.team_id,
mr.readership,
mtasks.execution
FROM message_teams mt
JOIN message_readership mr ON mr.message_id = mt.message_id
LEFT JOIN message_tasks mtasks ON mtasks.message_id = mt.message_id
),
overall_metrics AS (
SELECT
COUNT(DISTINCT message_id) AS total_messages,
AVG(readership) AS avg_readership,
AVG(execution) AS avg_execution
FROM message_team_metrics
),
metrics_per_message AS (
SELECT
m.id AS message_id,
m.subject,
AVG(mt.readership) AS readership,
AVG(mt.execution) AS execution
FROM messages m
JOIN message_team_metrics mt ON mt.message_id = m.id
GROUP BY m.id, m.subject
),
metrics_per_team AS (
SELECT
t.id AS team_id,
t.name,
AVG(mt.readership) AS readership,
AVG(mt.execution) AS execution
FROM teams t
JOIN message_team_metrics mt ON mt.team_id = t.id
GROUP BY t.id, t.name
),
dashboard AS (
SELECT
row_to_json(om.*) AS overall_metrics,
json_agg(DISTINCT row_to_json(mm.*)) AS metrics_per_message,
json_agg(DISTINCT row_to_json(tm.*)) AS metrics_per_team
FROM overall_metrics om, metrics_per_message mm, metrics_per_team tm
)
SELECT * FROM dashboard;
Here are the corresponding DryStructs:
class MessageMetric < Dry::Struct
attribute :message_id, Types::Integer
attribute :subject, Types::String
attribute :readership, Types::Float
attribute :execution, Types::Float
end
class TeamMetric < Dry::Struct
attribute :team_id, Types::Integer
attribute :name, Types::String
attribute :readership, Types::Float
attribute :execution, Types::Float
end
class OverallMetric < Dry::Struct
attribute :total_messages, Types::Integer
attribute :avg_readership, Types::Float
attribute :avg_execution, Types::Float
end
class Dashboard < Dry::Struct
attribute :overall_metrics, OverallMetric
attribute :metrics_per_message, Types::Array.of(MessageMetric)
attribute :metrics_per_team, Types::Array.of(TeamMetric)
end
Let’s hydrate these Structs with the JSON document returned by Postgres:
row = ApplicationRecord.connection.exec_query(dashboard_sql).first
parsed = row.transform_values { JSON.parse(_1) }
dashboard = Dashboard.new(parsed.transform_keys(&:to_sym))
We now have a fully structured dashboard in Ruby objects with validated types that we can use in the view layer.
p
| Readership:
= number_to_percentage(@dashboard.overall_metrics.avg_readership)
= render "metrics_per_message", metrics: @dashboard.metrics_per_message
= render "metrics_per_team", metrics: @dashboard.metrics_per_team
We can easily inject ActiveRecord scopes in our SQL queries so that we don’t duplicate business logic:
sql_scope = Message.last_30_days.for_teams(current_team).to_sql
sql = <<~SQL
WITH messages AS (
#{sql_scope}
),
...
SQL
Such complex SQL queries can be slow, so we often use EXPLAIN to understand what’s causing poor performance. Here are the strategies I use most of the time:
CREATE TEMP TABLE team_messages AS (
SELECT ... FROM ...
);
CREATE INDEX ON team_messages (team_id);
WITH message_metrics as (
SELECT ... FROM team_messages ...
)
...
SELECT * FROM ...
Using raw SQL unlocks the possibility of leveraging a data warehouse (Redshift, BigQuery, SnowFlake) to generate reports and dashboards from petabytes of data in seconds.
One of the best parts of using DryStruct
for dashboards is that it decouples the data shape from the data source.
We can hydrate a DryStruct
from:
This is incredibly powerful.
It means we can design and test dashboards without needing access to real data. We can render meaningful samples in staging, ship working demos to sales, and then swap in a real query when we go to production.
It’s also great for writing unit tests — we don’t need to mock a database, we just instantiate structs with test values.
Whether the data comes from petabytes in BigQuery or ten lines of Faker::Lorem
, the rest of the app stays the same.
You can build high-performance, structured dashboards in Rails using:
… and nothing prevents you from fetching data from a Data Warehouse or to inject fake data into your dashboards and reports!
Questions? Comments? 👉 pcreux@ruby.social.
You might also be interested in: