Database Design for Web Applications
RDMS Design for Websites
Published 30 Sep 2008
Designing the database for a web application is critical. Get it right at the beginning, and everything will run smoothly/smoother. Discover you need to change the structure half-way through a project, and it's a nightmare.
The theory on relational databases can be somewhat dry, but's important to remember that it's theory with a very practical application.
Modelling
Designing the database should be the first thing you do, before writing any code. Step away from the keyboard, get a pen and some paper, and begin analysing the system and determine the major objects. There's a tutorial on data modelling from the University of Texas, and www.cs.sfu.ca/CC/354/zaiane/ material/notes/Chapter2/node1.html.
Draw the ERM diagram. You don't need fancy software for this, pen & paper will do or the Draw application in the OpenOffice suite has some nice diagramming tools. I'm often sceptical about diagrams in software engineering, as it often seems more about selling expensive software and a folder of paper that no-one will ever look at again than being useful, but entity relationship diagrams are an exception.
You'll end up with major entities, such as Users, Widgets, Orders, and some minor entities such as Categories, Countries, etc. Note that the entities will normally map nicely to classes when writing the PHP code later (assuming you're using OOP methodology), thus killing two documentation needs with one diagram.
Internationalisation
If the application/website is going to be multilingual (or maybe will need to be in the future), you need to think about this now. For example, translating category names. You can simply have multiple fields, such as
Table: categories
| id | name_en | name_de | name_sv |
-----------------------------------
or you can have a Language entitiy (with a many-to-many relationship with the categories). You'll also have to consider collation later on (what character set to be used for text fields).
Many-to-Many
You need to break up any many-to-many relationships. For example, an article has one or more tags, a tag has one or more articles. You'll need an artificial "link" entity inbetween the two. Article on many-to-many in mySQL.
Normalise
It's important to normalise the tables to avoid data duplication and code complexity later. I've come accross databases that haven't normalised, and it's a right mess. I usually normalise to the 3rd Form for web applications, though with some exceptions.For example, an Invoice may have a status of New, Paid, or Canceled. When it's a small duplication like this, where the values will seldom (or never) change, I will leave this in rather than create another table. I will then have the field as a flag (TINYINT (1) UNSIGNED), and define an array in the application settings file later. This is a design decision, and when to do it comes a bit with experience.
Data Dictionary
Now it's time to document the fields. Hopefully you've received some sample data from your client, or you're able to produce some yourself.
Naming Conventions
Names should be short and descriptive. If it's not clear from the name, write some comments (it'll help when you come to look at it again in 6 months time). Use lowercase. Table names should be plural, field names singular. Don't use spaces in names, use an underscore if it's really necessary.
Don't duplicate the table name in the field name. I'm not sure why some people do this (though I suspect it's a MS Access induced disease) - maybe it's when people have problems with JOINs.
`items`.`itemname`
is superfluous,
`items`.`name`
is sufficiently clear, and saves plenty of typing later on.
Data Format
Text Fields
Unique to databases used for websites/applications, is that text data will usually be displayed as part of an (X)HTML document. We usually make the assumption that data will be viewed at lot more frequently than edited (indeed, one of the reasons for mySQL's popularity is it's performance under this assumption with the default storage engine). Therefore, it's more efficient that text data is stored as HTML.
In practive, this means running htmlspecialchars() on singleline data (VARCHAR) fields, maybe nl2br() for multiline data (TEXT fields), before insertion.
This will need to be converted back to plain text when editing, before the data is displayed inside a form.
Field Sizes & Limits
You'll need to give yourself some slack when defining maximum sizes, to allow for expansion or changes in specification (e.g. the phone company starts issuing 14 digit numbers next year, but you've only space for 12 in the database...).
Bear in mind that mySQL will silently chomp text data that's too long, so if your email field is only VARCHAR(16) big there'll be problems when mistermongomeryburns@veryverylongdomain.net tries to register on the site.
- Integer fields should normally be UNSIGNED, unless you actually need negative numbers.
- For a small range of numbers, e.g. 1-100, TINYINT(1) UNSIGNED will suffice.
- Use a TINYINT with possible values of 1 or 0 instead of a boolean - it'll save problems in PHP later.
- Don't go to the other extreme of caution and be wasteful, e.g. VARCHAR(255) for all text fields.
- DECIMAL(10,2) will be ok for currency, unless you're dealing with Bill Gate's bank account.
Primary Keys
An unsigned integer field called id set to auto-increment works well in most cases. E.g. `users`.`id`, `products`.`id`, `invoice`.`id`. Avoid using VARCHAR as the primary key.
Don't use anything that may possibley change - e.g. an email field in `users` maybe unique, and thefore suitable as a primary key, but when if the user changes email address you'll need to update all other tables that reference this. Remember that mySQL doesn't enforce foreign key constraints for you normallly.
"Link" tables, that is tables used to break a many-to-many relationship, will have a composite primary key consisting of the 2 foreign keys of the respective tables.
Indexes
You won't notice them at the beginning, but they'll come into play when you're in production with thousands of rows and hundreds of concurrent users. Carefully index fields that will be searched on frequently.
A simplified example: an articles table has an approved flag field. Only articles that are approved are shown on the website:
SELECT * FROM `articles` WHERE `approved`=1 ORDER BY `title` LIMIT 10
The approved field is a good candidate to be indexed.
Every silver lining has it's cloud - indexing requires more storage space, and will slow UPDATE & DELETE statements, so don't go overboard and index everything. Indexing foreign keys if often a good idea, especially when using JOINs.
Creating unique indexes is good for enforcing data integrity (such as `users`.`email`), again with the slight performance cost (not when SELECTing.
Implement the Database
A couple of things to remember. Set suitable DEFAULT values when creating the tables. Just as it's important to document source code, remember to add comments to the table and the fields - foreign key, data range, etc.
Add a resonable amount of test data. Run through the use cases. If everythings ok, then it's time to start coding the application.
Tags: