Coding ● May 2015
Building the optimal users database model
In the past, I’ve often wrestled with designing an optimal relational database model to represent users in a multi-tenant system. I’ve wanted to get the model just right—enough structure to accomodate future enhancements without bloating the schema with excessive JOINs, overly-wide tables, or any other number of typical relational-database “sins”.
Over four years into our current iteration of DoneDone, I can happily say that the database model we’ve built has been resilient to all sorts of feature updates we’ve made over the years.* I wanted to share our thought process behind how it’s designed, starting with a simple database model and evolving it into the model we currently use.
A simple model
Let’s start with the most bare-bones relational model to represent users in a system like ours. Naturally, we’ll start with a table called Users
. The attributes within Users
fall within one of three general groups of data.
- Personal information: Name, email address, phone, avatar, Twitter handle, etc.
- Login credentials: Username, password salt and hash, and any 3rd-party auth tokens
- Foreign keys: Most likely, one to an
Accounts
table to link a user to an account.
So, our starting model looks something like this, with unique constraints on items like Username
and EmailAddress
:
I’ve used many systems whose model follows closely to this two-table architecture. But, there’s one very large drawback to this system: The foreign key from Users
to Accounts
restricts a user’s access to only one account in the system. If someone wants to have access to multiple accounts, they are forced to create a new user record with another email address. That quickly becomes a nuisance to manage for the user.
Associating users to multiple accounts
So, unless I am 100% sure that the system will never allow a user access to multiple accounts (and, sometimes, even if I am 100% sure), I prefer breaking this relationship out as a many-to-many by introducing an association table in between. In DoneDone, we call this a membership.
The Memberships
table holds foreign keys to both Users
and Accounts
, so that we can both associate any number of users to an account (e.g. employees in a company) as well as any number of accounts to a user (e.g. a user’s freelance account along with their company account).
We can also add membership-specific columns to Memberships
. In DoneDone, for instance, the Memberships
table can hold columns like an office phone number or fax, as well as an account-specific email address. This way, we can send profile information and password reset links to the user’s global email address, stored in Users
, while sending email updates on specific DoneDone issues to the account-level email address, stored in Memberships
.
Additionally, the Memberships
table can hold information like roles—a user might be an administrator in one account, a regular user in another account, and an account owner in a third account.
Finally, the Memberships
table helps clarify the intent of certain database relationships involving a user. For instance, in DoneDone, we tie system alerts to a membership record. This lets us create a specific alert to a person’s specific account (say, a billing notice to an account owner). However, we tie issue relationships to a user record. If a person creates an issue, and then leaves a company, we can delete their Memberships
record while still preserving their relationship to that issue.
As I mentioned earlier, even if I’m fairly certain that users will not belong to multiple accounts, I still prefer the additional table in the design. If you avoid it now, should the business one day decide to support multiple accounts for a user, you not only need to massage your data into a new data model, you’ll need to provide some mechanism for a user to link once-disparate accounts together.
Improving performance and database integrity
This three-table architecture provides a lot of flexibility, but there are still other ways we can improve. In DoneDone, we segment off the authentication information away from the Users
table into a separate Logins
table. We then add a required foreign key from Logins
to Users
(a login must be associated to an existing user). Since a user can only have one set of credentials, we could further tighten the data integrity by placing a unique constraint against the foreign key.
There are a few different reasons why I prefer this separation.
First, it provides a cleaner way of representing users in a system who do not have credentials to login. If we don’t extract credentials to a separate table, we’re left with only the unsavory option of allowing NULL
values for usernames and password salts/hashes. Even further, many database systems (Sql Server 2008 and prior) do not support an out-of-the-box “unique if not null” constraint. This means that you can’t place a unique constraint on, say, username, if you had more than one unregistered user (i.e. two or more records with username equal to NULL
). As a workaround, you might be left with managing uniqueness on usernames solely outside of the database—a game I’d rather not play.
You may be asking why you’d ever want users in your system who can’t login. Where would this come into play? Here are two examples:
- A closed system that pre-populates the database with all allowable users. This would be common for a company intranet, or any kind of in-house system. When a user registers, they must match on personal information tied to an unregistered user already in the database.
- A system that allows partial access to features without requiring an explicit login. This, in fact, is how we manage Public Issues in DoneDone. Anyone who sends a public issue email into DoneDone will have a user record logged in our database, but they won’t necessarily have credentials tied to them.
Secondly, creating a separate Logins
table lets you keep credential information in as thin a table as possible. This will only help with indexing and any lazy selects (select * from Logins
). Since the Logins
table will usually have significantly more reads than writes, the more efficient those reads are, the better.
Finally, while you could argue placing the foreign key on Users
(since this is a one-to-one relationship), I find more benefits placing it on Logins
. This way, we can make the relationship required, which again avoids adding any nullable columns or “unique if not null” issues in your data model.
Adding access to groups of users
The last step we’ve made in DoneDone is to add a Companies
table between Memberships
and Accounts
. This lets us further differentiate access levels between groups of users in an account. For example, in DoneDone, we can mark a company as an admin company, and only allow users associated to the admin company to be administrators or account owners. A Companies
table also would allow you to easily make certain messages private amongst a user’s respective company.
With this in place, here is the current user data model we’ve used in DoneDone.
We’ve been really happy with this level of architecture in DoneDone thus far, and I hope this helps your product as well.
*–Much credit goes to former DoneDone colleague Mustafa Shabib for his influence on the data model design that’s largely withstood significant changes since it was released.
Originally published May 27, 2015 at DoneDone. Go to the next essay in Coding, “Programming's three life lessons”.