Double-check your Rails migrations’ auto-generated SQL

Suppose we want to create a join table in a Ruby on Rails application between two models, Foo and Bar. There’s plenty of material available online describing join tables in detail (the Wikipedia page on associative entities is pretty good), but to begin, let’s go over a brief recap and an example:

Join tables implement many-to-many relationships between entities in a relational database. An entry in a join table will hold the primary keys of the entities it associates. For example, the linked Wikipedia article above (at time of writing) shows that we can use a Registration entity to implement a join table, and this join table holds a relationship between a Student and a Course. This Registration entity contains the primary keys of a Student and a Course.

Having seen an example, let’s a join table between a Foo and a Bar. A join table between Foo and Bar will hold the primary key of a Foo and the primary key of a Bar (ie. the IDs of these two models). Searching through the Ruby on Rails documentation, we may find that create_join_table is a method that is available inside of a database migration. To use this method, we’d write a migration like so:

If we run the migration, we do get a working join table in our database. However, if we look at the underlying SQL this migration generates, it might not quite match what we expect to get. In this blog post, I’m using SQLite (the default option provided by Rails), but we would see similar results when using a different database like PostgreSQL. The generated SQL is as follows:

There are two interesting things to note about the above SQL.

Firstly, our join table doesn’t actually use a foreign key constraint. This means that our SQL database will let us insert any integer into the values of foo_id and bar_id, even if such an ID doesn’t exist in our tables for Foo and Bar.

Secondly, the generated SQL has NOT NULL constraints, meaning our join table entries must always have an integer value set for foo_id and bar_id. This makes sense in some, but not all, use cases.

In the Student and Course example from the beginning of the article, NOT NULL fits the use case – it doesn’t make sense to have a Registration with a Course but no Student, or a Registration with a Student but no Course.

However, let’s look at a slightly different use case – suppose we’re trying to keep track of floating software licensing, where we limit the number of users who have permission to use some software at a given time. In this case, we may have a join table named something like LicenseAssignment between a User and a SoftwareApplication. In this case, however, it makes sense that User could be NULL – it would just mean that not all of our licenses are currently being used. In this case, we wouldn’t want a NOT NULL constraint.

As of writing, most hits on the first page of Google for “rails join table tutorial” use database migrations that implement pretty different results from what we see above. I’ve excluded database indexes for brevity, but for the most part, they look like so:

This migration generates the following SQL:

If we use the above SQL, we can rely on our database to enforce some level of data integrity via its foreign key constraints, and we support the use case where a Foo ID or a Bar ID can be NULL.

So, what’s the lesson behind all of this? Double-check your Rails migrations’ auto-generated SQL, and make sure it fits your assumptions and use case. Superficially, the two Rails migrations outlined in the article look similar, but their underlying data models are actually very different.

Start your journey towards writing better software, and watch this space for new content.