Or - how I stopped worrying and learned to aggregate SQL queries. This is a pattern that can be used when you have multiple SQL SELECT
queries Q_(1)...Q_(n), each returning zero or one rows, except for the first which must return a single row, where Q_(n) can depend on the result of Q_(1)...Q_(n-1), and you want to reduce the number of database roundtrips.
1. The Pattern
For each query Q in Q1, Q2, ..., Qn:
-
Rewrite the projection of Q to have names that are unique for the entire set of queries. For example, if you have two queries that select a user record:
SELECT id, name, email FROM Users WHERE ...
turn them into:
SELECT id AS id1, name AS name1, email AS email1 FROM Users WHERE ... SELECT id AS id2, name AS name2, email AS email2 FROM Users WHERE ...
-
If Q depends on the results in previous queries, write:
LEFT OUTER JOIN LATERAL ( ...insert Q here... ) ON true
If Q does not depend on any previous query results:
LEFT OUTER JOIN ( ...insert Q here... ) ON true
2. What it Does
Let's start with the first subquery. It results in a single row. You now have a single record:
Q_(1,a), ..., Q_(1,z)
Let's call this the "current aggregate result". As we go to the second to nth subquery, let's call it the m-th subquery, the LEFT OUTER JOIN LATERAL
now does two things for you. Starting with LATERAL
, it lets you use the values in the current aggregate result in the subquery, so in your query you can use any of the fields Q_(1,a)...Q_((m-1),z).
After that subquery has run you have the current aggregate result on your left and the new result set on the right, where the right hand side can be an empty result set with zero rows. That is, one of:
Q_(1,a), ..., Q_((m-1),z), Q_(m,a), ..., Q_(m,z)
Q_(1,a), ..., Q_((m-1),z), nothing
This is where the LEFT OUTER JOIN ... ON
true comes in to glue these two pieces together. Since it's not an INNER JOIN
, it can handle the case where the new result is empty - the fields are simply set to null. At the end of this step the current aggregate result will be:
Q_(1,a), ..., Q_(m,z)
...and we can repeat the process for the next subquery.
2.1. What About "ON true"?
The function of the ON true
is purely to satisfy the SQL-grammar. In order to extend the current single-row result set with the result set being joined without imposing any join conditions we have to supply an always-true join condition and a simple true
is what we're after.
3. Multiple Row Subqueries
The total number of rows that are returned will be the product of the number of rows in each subquery, with empty sub-result sets counting as single-row result sets. From this follows that you can end up with a combinatorial explosion.
A general way to collapse a multi-row result set into a single row is to use the array_agg
function. Multiple arrray_agg:s are guaranteed to see the input in the same order[a], so in principle you could array_agg
all the fields and then reconstruct the result set client-side by treating it as a struct-of-arrays to array-of-structs transformation.
To unpack an array_agg
'd value in a subquery, use . For example, if you have a
session_permissions
table session_permissions(session_id, permission_id)
with permission ids held by the user, and you want to output the permission names from a permissions(id,name)
table, given a session id:
SELECT array_agg(permission_id) AS permission_ids
FROM session_permissions
WHERE session_id = '<session id>'
LEFT OUTER JOIN LATERAL
SELECT array_agg(name) AS permission_names
FROM permissions
JOIN unnest(permission_ids) AS permisssion_id
ON permission_id = permissions.id
ON true;