New ask Hacker News story: Better SQL JOINs

Better SQL JOINs
31 by JoelJacobson | 17 comments on Hacker News.
I think foreign keys could be valuable to improve how we write SQL joins, in the special but common case when joining on columns that exactly match a foreign key. The idea is to add a new ternary operator, which would be allowed only in the FROM clause. It would take three operands: 1) referencing_table_alias 2) foreign_key_constraint_name 3) referenced_table_alias POSSIBLE BENEFITS * Eliminate risk of joining on the wrong columns Although probably an uncommon class of bugs, a join can be made on the wrong columns, which could go undetected if the desired row is included by coincidence, such as if the test environment might only contain a single row in some table, and the join condition happened to be always true. * Conciser syntax In a traditional join, you have to explicitly state all columns for the referencing and referenced table. This is somewhat addressed by the USING join form, but USING has other drawbacks, why I tend to avoid it except for one-off queries. When having to use fully-qualified table aliases, that adds even further to the verboseness. * Makes abnormal joins stand out If joining on something else than foreign key columns, or some inequality expression, such joins will continue to be written in the traditional way, and will therefore stand out and be more visible, if all other foreign key-based joins are written using the new syntax. When reading SQL queries, I think this would be a great improvement, since the boring normal joins on foreign keys could be given less attention, and focus could instead be made on making sure you understand the more complex joins. SYNTAX Syntax is hard, but here is a proposal to start the discussion: from_item join_type from_item WITH [referencing_table_alias]->[foreign_key_constraint_name] = [referenced_table_alias] [ AS join_using_alias ] EXAMPLE To experiment with the idea, I wanted to find some real-world queries written by others, to see how such SQL queries would look like, using traditional joins vs foreign key joins. I came up with the idea of searching Github for "LEFT JOIN", since just searching for "JOIN" would match a lot of non-SQL code as well. Here is one of the first examples I found, a query below from the Grafana project [1] [1] https://ift.tt/3mTwLQX SELECT p.*, ? AS resource_id, ur.user_id AS user_id, u.login AS user_login, u.email AS user_email, tr.team_id AS team_id, t.name AS team, t.email AS team_email, r.name as role_name FROM permission p LEFT JOIN role r ON p.role_id = r.id LEFT JOIN team_role tr ON r.id = tr.role_id LEFT JOIN team t ON tr.team_id = t.id LEFT JOIN user_role ur ON r.id = ur.role_id LEFT JOIN user u ON ur.user_id = u.id WHERE p.id = ? Here is how the FROM clause could be rewritten: FROM permission p LEFT JOIN role r WITH p->permission_role_id_fkey = r LEFT JOIN team_role tr WITH tr->team_role_role_id_fkey = r LEFT JOIN team t WITH tr->team_role_team_id_fkey = t LEFT JOIN user_role ur WITH ur->user_role_role_id_fkey = r LEFT JOIN "user" u WITH ur->user_role_user_id_fkey = u WHERE p.id = 1; In PostgreSQL, the foreign keys could also be given shorter names, since they only need to be unique per table and not per namespace. I think a nice convention is to give the foreign keys the same name as the referenced table, except if the same table is referenced multiple times or is self-referenced. Rewriting our example, using such naming convention for the foreign keys: FROM permission p LEFT JOIN role r WITH p->role = r LEFT JOIN team_role tr WITH tr->role = r LEFT JOIN team t WITH tr->team = t LEFT JOIN user_role ur WITH ur->role = r LEFT JOIN "user" u WITH ur->user = u WHERE p.id = 1;

Comments