Universally Unique Identifiers and You! (Part 2)

By March 2, 2010Insights

In part 1 we went into the why of UUIDs, now lets get into the how.
Representation of UUIDs in MySQL
UUIDs are intrinsically fairly large, so storage space is something to consider. Large database rows result in more cache misses, more disk seeks, slower backups, etc. There are some techniques you can use to optimize storage if this is a concern for you.
It’s tempting to store UUIDs are some sort of text string since we see them in that format (e.g. 550e8400-e29b-41d4-a716-446655440000) so often, but it’s also a waste of space: it takes 36 bytes to store the string in its default format and a UUID just represents a 16 byte number! It’s also a waste of processing time to store UUIDs as strings since mysql then has to worry about string lengths, character sets and collations. Instead, you can use a BINARY(16) column to be efficient. You might think 16 bytes is a lot of space to use up for just storing an identifier, but it’s only 4x larger than the default INT or 2x larger than a BIGINT.
An annoyance when you’re developing an app or dealing with data full of binary UUIDs is that the bytes are rendered raw when using MySQL on the command line so you end up with control codes in the rendered string and get fun things like the bell characters and misaligned columns. Use MySQL’s HEX and UNHEX functions to create some nice human readable representations and you’re off to the races. If you’re dealing with tables frequently you could create a database view for easy querying.
UUIDs and table layout
Also, consider the issue of your on-disk table layout. At small scales, you do not need to worry about that, but as you scale up, you should really understand the details of how your database manages disk space.
UUIDs are not always the best choice for a primary key. Since MySQL loads data from disk in batches based on primary key, you could end up with wasted memory and unnecessary disk reads. To alleviate this, you can add another column for the primary key (a standard auto-increment column is a good candidate here as it clusters data well), and put an index on your UUID column, but refer to data by only its UUID. Of course the index takes up disk space and slows down reads, but it is often more important to keep chronologically related data together for reads.
UUIDs in Ruby
We use the uuidtools gem.
UUIDTools can sometime have trouble finding your machine’s mac address (needed to generate some UUIDs), so we use the macaddr gem.

require ‘macaddr’
require ‘uuidtools’
UUIDTools::UUID.mac_address = Mac.addr

UUIDTools::UUID the key methods you’re going to need are:

  • to_s – the formatted one with dashes
  • hexdigest – like to_s but without dashes. We use this for memcache keys since you can hit the key length limit surprisingly easily.
  • raw – the raw byte string. Store this in MySQL.

If you are having trouble getting the byte strings into your database, you can use the MySQL literal “x’#{uuid.hexdigest()}'” in your queries to have the database server parse the bytes for you.
UUIDs are a useful tools for distributed applications. They take some thought to implement well, but hopefully we’ve helped you get a jumpstart on that.