This weekend Dr Nic shared with us a well written article discussing the shortcomings of auto-incrementing (serial) keys and an alternative approach. While discussing the article the question came up about how to go about using UUID keys in our applications and what we have to do within PostgreSQL in order to use them. PostgreSQL out of the box defines a UUID Data Type which is a great start. We then have three options for generating UUID’s,
- Within your application code
- Within the database using the extension uuid-ossp
- Within the database using the extension pgcrypto
Generating them within our application will work quite nicely and is a good way to go, unless you want the database to handle creating these automatically for you which is usually where our point of view.
Only use uuid-ossp if you have a real and specific need for the functions it provides If your only need is to generate and index them then you do not need uid-ossp. (Update: See below.) For generating UUID’s within the database an easy place to start is to use the gen_random_uuid ()
function from the pgcrypto extension.
So how do we use this function? First we need to create the extension in the database we wish to use it with,
CREATE EXTENSION pgcrypto;
This is how you load pre-compiled shared library code which adds functionality into your PostgreSQL database.
Note specifically that the extension must be created (loaded) once for each database in which you wish to use it. Once it has been loaded into a running instance of the database server it will be there for use from then on spanning restarts.
Also be aware that if you have dumped and restored the database from one server to another running instance then depending on the method of the dump/restore you may need to load it into this new instance after restoring.
Once you have done this you can generate a random id,
SELECT gen_random_uuid();
Which returns a UUID Data Type.
Let’s create a table with a UUID primary key and see how we use the gen_random_uuid()
function to populate our IDs for us,
CREATE SCHEMA IF NOT EXISTS snw;
CREATE TABLE snw.contacts(
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT,
email TEXT
);
We can now add entries into our newly created Stark & Wayne Contacts table,
INSERT INTO snw.contacts (name,email) VALUES
('Dr Nic Williams','drnic'),
('Brian Mattal','brian'),
('Wayne E. Seguin','wayneeseguin'),
('Long Nguyen','long'),
('Bill Chapman','bill'),
('Chris Weibel','chris'),
('Jeremey Budnack','jrbudnack'),
('Ruben Koster','rkoster'),
('Jamie Van Dyke','jamie'),
('Quintessence Anx','qanx'),
('McGowan','mcg'),
('高,秀娇 (XJ)','xj'),
('Geoff Franks','geoff'),
('Van Nguyen','vnguyen'),
('John Longanecker','jlonganecker')
;
INSERT 0 15
Let’s examine our table now,
SELECT * FROM snw.contacts;
id | name | email
--------------------------------------+------------------+---------------
0d60a85e-0b90-4482-a14c-108aea2557aa | Dr Nic Williams | drnic
39240e9f-ae09-4e95-9fd0-a712035c8ad7 | Brian Mattal | brian
9e4de779-d6a0-44bc-a531-20cdb97178d2 | Wayne E. Seguin | wayneeseguin
66a45c1b-19af-4ab5-8747-1b0e2d79339d | Long Nguyen | long
bc8250bb-f7eb-4adc-925c-2af315cc4a55 | Bill Chapman | bill
200393bc-8aaa-45a8-9093-80c4792348cd | Chris Weibel | chris
cd881764-bea1-4249-b86d-f8fb8182eec1 | Jeremey Budnack | jrbudnack
970972dd-dce8-4c65-a85b-63735ada0fc9 | Ruben Koster | rkoster
1c225a3a-2c70-4d95-b87f-f086cbd20366 | Jamie Van Dyke | jamie
9f0bb16e-fc25-47f3-b60a-635b6224225a | Quintessence Anx | qanx
9788c636-936e-4dd6-b9d5-f340329142bd | McGowan | mcg
3d8b664f-ef5f-4587-a45c-f2991a1fc029 | 高,秀娇 (XJ) | xj
f618f5a8-380e-44fb-9b4e-b3286f29dcc8 | Geoff Franks | geoff
35cb41dd-3edb-4483-a1fe-fc315243d2f8 | Van Nguyen | vnguyen
74b37cd4-75aa-4871-b17b-a5160428e589 | John Longanecker | jlonganecker
(15 rows)
We see that each row has a UUID id
field which is the primary key and was automatically generated for us.
Discussion
It was pointed out by @drewblas (thank you Drew!) that using gen_random_uuid()
from pgcrypto
has a negative side effect with respect to keyspace fragmentation on disk for the tables. Drew told us that:
Random produces very fragmented inserts that destroy tables. Use
uuid_generate_v1mc()
[instead] … the keys are seq because they’re time based. So all inserts go to the same data page without random io.
This makes sense due to the random probability distribution of the keys, it should be fragemented. This fragmentation however is not so good for efficiency of the database system itself. In order to gain the benefits of using UUID primary keys with lower keyspace fragmentation perhaps Drew points out that it is better to use uuid_generate_v1mc()
from the uuid-ossp
extension because underneath it is using a time based seq algorithm which you can read about in the postgresql documentation.