Universally Unique Identifiers and You! (Part 1)

03/02/2010
Courtesy of Flickr's Smithsonian Institution

Courtesy of Flickr’s Smithsonian Institution

The computer scientist Phil Karlton is reputed to have said, “there are only two hard things in Computer Science: cache invalidation and naming things.” Universally Unique Identifiers (UUIDs) are a way of naming things such that the name is essentially guaranteed to be unique, literally, universally. A UUID is a string that looks like this: “F7C27314-5CFC-4E36-8F87-C8CC8DDAC0B1”. I generated that string with the uuidgen application and it is universally unique. I just Googled that string, and it does not appear anywhere else on the Internet. If it does when you Google it now, then it was copied from this document. That’s the power of UUIDs: when you see the same UUID in two different places you know that you are referring to the same object. If I ever wish to find copies of this blog post, Googling that UUID will turn them up.

What is a UUID?

UUIDs are basically very large random numbers. They are virtually guaranteed to be unique for basically the same reason that it is very unlikely that hundreds of people will win the lottery: simple statistics. Here’s what Wikipedia has to say about it: “In other words, only after generating 1 billion UUIDs every second for the next 100 years, would the probability of creating just one duplicate be about 50%. The probability of one duplicate would be about 50% if every person on earth owned 600 million UUIDs.”

Why do we care about identifiers that are universally unique?

We want identifiers to be universally unique when they are going to be generated by multiple distributed systems that are not necessarily in constant communication with each other. UUIDs are used in Amazon’s SimpleDB, for example, because internally SimpleDB nodes are not stored in just one location — it is a distributed database. This is in contrast to using an auto-increment column in a single-server relational database implementation, where every thread depends on a single memory location to insure that identifiers are unique.

Another benefit of identifiers being unique within a system is that datasets can be partitioned and merged between systems without the headache of trying to synchronize the migration of identifiers that overlap. As an example, you could take “vehicle” objects from a car racing game and migrate them into a database with “sword” objects from a fantasy game. This gives us great flexibility with our underlying resource allocation.

When to use UUIDs

There are basically only two ways to name things and have the names be universally unique. One is to use a hierarchical system like URLs, where there is a process of “nested registries”. For example: a URL like http://www.facebook.com/ayogo involves a DNS registry for “.com”, which gives Facebook license to use “facebook”, a registry for Facebook.com sub-domains which binds “www.facebook.com” and a registry for Facebook “usernames” which binds “ayogo”.

This is obviously a very useful and popular system of universal naming, and it has the advantage of having names that are more human-readable than UUIDs. But it depends on the management of the hierarchical registry system. When it comes to domain names and Facebook usernames, the management of the database costs millions of dollars per year. Sometimes you’d like to achieve the universal uniqueness without the overhead of a registry. UUIDs are the standard technology for accomplishing that.

Summary

Naming is hard. But if you can live with UUID’s verbosity and poor readability, they are actually quite a bit simpler than hierarchical universal naming schemes.

In Part 2, we will go into some detail as to how to implement UUIDs in your application.