When Creating a Database, keep things Simple

Posted on January 27, 2012 by

Web applications can be complicated, so can the databases that drive them. Do they have to be? No, not if you incorporate simplicity into your design.

This post will focus on keeping your database schema simple.

Know the Application you’re Building

Sometimes a database is complicated because the system is not fully understood – plain and simple. Sorry to break this to you but you really, really need to understand every aspect of an application: the purpose of the application (why it’s needed); all the data the application wants to output (what and how it wants users to see the data); all the data the application wants from users (and what the plan is for that data once it’s received); as well as all all the little things in between.

If you don’t understand the application inside out you can’t build a data model that will accurately reflect it. In situations like these you will end up having to hack your database, adding in extra fields and new tables as you start to see that things aren’t working. This can lead to a messy, messy database.

Plan your database by understanding what it is you’re building.

Look for Similarities in Tables – Make Connections

After you understand what it is you’re building and you start the process of designing your tables, look at the fields within each table and make note of similarities. Sometimes you’ll find that you have tables that pretty much do the same thing, but for different parts of your application.

In situations like these you can merge these tables together into one and use a “type” identifier to indicate which part of the application a particular record belongs to.

For example, you have the following four tables: books, movies, book_sales, movie_sales. Look at the similarities in the book_sales and movie_sales table. If all they do is capture the purchase and the ID of the corresponding book or movie table, it would make more sense to have just a sales table along with a product_type field to indicate if the sale was a book or a movie. A record within the sales table is now dependent on the foreign key and the product_type value.

Reducing the number of tables your system needs by introducing special fields like this helps you in a number of ways. For one, it reduces the amount of code you have to write; for another, it keeps things simple for future expansion (always think scalability).

Think of the overhead involved in the addition of 10 other widgets that would be sold (such as games, shoes, coffee mugs, etc.). Can you imaging having to create all those extra tables – and then have to incorporate these new tables into your code? Ouch.

But hey! That product_type field would start to now contain a whole bunch of different values – one for each widget. Would it make sense to abstract this out and use a separate table for product_types?

Good thinking! Read on ….

Cut Down the Overhead – Use Tables that describe the Relationship

So now you have a single sales table, but wow, all those different product_type values are going to look pretty ugly – and what about referential integrity? If I have sales records with no product types I’m in trouble! This is where a table that describes a relationship comes in.

There are two types of tables for modeling data: an entity table that, “models the object” and a relation table that – you guessed it – “models the relationship” But, what does this mean?

Basically, you have tables that are used for storing data (like your widgets and the sales of those widgets), and you have tables that form the connective glue between these objects. In our example, a product_types table would help identify all our different widget tables to our sales table.

By storing the widget_id and the product_type_id into the sales table, you are able to have every record in your sales table connect to the correct widget.

Side Note: Your mind might be racing about how to combine many sales for many products into one order shipment. In this case you would want to create an orders table and store the order_id in each sale record to be shipped.

This is where, “know what you’re building” comes into play.

Look at Your Table Fields

Ever see a table called organizations and see the following fields:

While there’s nothing wrong with this, you should consider the amount of typing you have to do in your SQL statements. That’s a lot of typing to pull in specific data, not to mention the possibility of spelling orgnaization wrong.

Consider using shorter names within your tables that still convey the correct meaning.

For example, if your table is called organizations, your fields could be id, name, primary_address, secondary_address, etc.. The fact that these fields belong to the organization table is kind of implied by the fact that they’re in the organization table.

When using these fields in other tables (such as the id as a foreign key), then it makes sense to call it organization_id. In your SQL queries, prefixing the field name with the table that it belongs to will help keep things clear. For example:

select o.id as organizational_id, o.name as organization_name, e.id as employee_id, e.first_name as employee_first_name, e.last_name as employee_last_name from organizations o left join employees e on o.id=e.organization_id where e.state=1;

Other Things To Consider

While what I’m going to talk about below might not reduce the size or complexity of your database, the following fields are very valuable to have for each record in your table:

Knowing When a Record Was Created and the last time it was Modified

Having a date to indicate when a record was created is important for book keeping. Obviously you’d want to know when your sales were made, but what about when a widget was first introduced in the system? This kind of knowledge would help for report gathering. Also, knowing when a record was last modified can help when something was entered in wrong – people would want to know when it happened as much as why it happened.

Controlling the State of your Record

State fields are a good idea too as they allow you to easily indicate if a record is active, inactive or deleted (a soft delete). By building this into your data model you are ensuring that you can control the state of records within it.

This is important when your asked to urgently suspend a widget from the online sales system because of a recall or because it’s no longer supported; yet the client would still like it listed within their sales reports. Setting up your application in a way that allows you to control the state of records through the use of a state field will make this task much easier.