Primary key in uniqueidentifier
Published on: 20th Jun 2012
Updated on: 15th May 2021
Overview
We used to define the primary key in integer type. Usually, an auto-increment primary key field. This was convenient until one day that we were working on a web project. We quickly realized that using integer type for primary key is going to cause some issues.
Some issues causes by integer as primary key
-
Issue #1 - passing the primary key in the URL when requesting the record.
For example, the primary key value for customer A is "1001". If the user wants to edit the customer A's profile, the URL will look like this,
http://localhost/customer.aspx?id=1001
So, the page will look for the customer whose primary key value is 1001.
Some DBA-s argued that the primary key should not be an auto-increment integer value. Instead, they recommended "customer account number" or "customer account code" which is in alphanumeric. This is not helpful because anyone who sees the URL will be able to make a guess and retrieve what they want. For example
http://localhost/customer.aspx?id=AB001 http://localhost/customer.aspx?id=AB002
Unless the data is allowed public access. Otherwise, using these primary keys in the URL will be a big issue from the security point of view.
-
Issue #2 - bottleneck of inserting new records.
For example, you are running a system that requires handling many transactions per seconds. Whether you are using "identity" or generating the next ID value that is stored in the last ID table, it creates a bottleneck while inserting new records. This is because the insert process will not be able to determine the new value until it gains the exclusive lock on the resources. The bottleneck will become worse if you have a long process that involves many tables.
-
Issue #3 - multiple databases.
In some cases, you might have to run the same system in multiple locations and each location has its own database. These databases require to be merged into a centralized database at a later time. Regardless of the merging interval, by using integer data type in the primary key, you are facing a problem with a question: what is the appropriate range of values to be assigned to each location?
How to solve this puzzle
After trial and error with integer primary key and nvarchar (i.e., add an "outlet code" field to make the composite key), we found an easier way to solve the above problem. We decided to use UNIQUEIDENTIFIER data type (or GUID in .Net framework).
-
It solves issue #1.
When showing the UNIQUEIDENTIFIER value in the URL query string, it's hard for the user to guess the primary key value for other customers.
-
It solves issue #2.
The bottleneck problem also disappears because UNIQUEIDENTIFIER value can be generated at any time by calling
NEWID
(MSSQL) function and without locking any resources. -
It solves issue #3.
Finally, merging the records from various databases will not be a problem and you don't have to scratch your head to look for the appropriate range of values to be allocated or design a composite key that can survive in multiple databases hosted in different servers.
It's time to make a judgement call
In fact, using UNIQUEIDENTIFIER data type comes with some costs.
-
It requires more storage space because UNIQUEIDENTIFIER requires 16 bytes storage. In fact, the storage price is getting cheaper and cheaper.
-
The searching takes more time. In fact, with proper indexing or increase the hardware specification will improve the query performance.
-
The value is fragmented which slows down the query. This happens when we are using the
NEWID()
function to generate a new random value causing the record to be inserted into any page in the table. In fact, you may rebuild or reorganise the index to improve the performance. -
In MSSQL, the primary key is a clustered index. This means, inserting new records will cause the MSSQL engine to look for the appropriate page position and insert the new record. With the improvement in storage read and write speed, this issue will become less important.
Reviews on 15th May 2021
As of 15th May 2021, we never face any issue with this implementation.
-
First of all, the projects that we have done do not require high performance inserting new records. So, it is nearly impossible to have the
NEWID()
generate the same value. -
The query performance has been degraded when the table has more records. In fact, we review the index periodically and come out with a better index. So, this is not an issue.
-
Merging the records from various databases into a central database has never been an issue.
Jump to #MSSQL blog
Author
Lau Hon Wan, software developer.