Stop Using ID Columns In Database Schemas

Article Contents

Using ID columns in database schemas has been a common practice as a means of uniquely identifying records. This approach is now being discouraged because it can lead to inefficient and potentially insecure database design issues.

ID columns can create unnecessary complexity and redundancy, particularly in large databases where there are multiple tables linked through IDs. Instead of using ID columns, experts recommend using UUID or ULID columns, which are unique non-sequential string values.

This approach can enhance database performance, reduce redundancy, and improve the overall quality of data.

Common Uses For Integer ID Columns

Integer IDs, also known as integer identifiers or keys, are commonly used in databases for several reasons.

Unique Identification: The primary use of integer IDs in databases is to uniquely identify each record or row. In a database table, each row represents a unique piece of data or information. To easily identify and differentiate these records, each of them is assigned a unique integer ID.

Relational Mapping: Integer IDs are essential in relational databases for creating relationships between different tables. For instance, if there are two tables – one for ‘Customers’ and another for ‘Orders’, the ‘Orders’ table may include a ‘Customer_ID’ column to show which customer placed which order.

Efficiency: Integer IDs are more efficient than textual identifiers. Computers process integers faster than text. Therefore, using integer IDs can make database operations such as searching, sorting, and indexing much quicker.

Auto Numbering: Most databases support autonumber or auto-incrementing features for integer IDs. This means the database automatically assigns a unique ID to each new record, reducing the risk of human error and duplication.

Space Saving: Integers take up less storage space compared to strings. This can be a significant advantage in large databases where efficiency and storage optimization are important.

However, there are some potential drawbacks or challenges as well. One is the possibility of running out of numbers if the database grows very large. Another is that integer IDs reveal the order in which records were created, which might be a security concern in some cases.

History of Integer ID Columns

The history of Integer ID Columns dates back to the early days of database systems. These columns have traditionally served as primary keys in a database table, providing a unique identifier for each record.

Table ID’s are assigned automatically by the database engine (MySQL, NoSQL, Postgres etc) in sequential order as new records are added. The use of integer ID columns is largely rooted in their efficiency, as integer numbers (0 or higher) are easier and faster for a system to process and index over other column data types.

Over the years, the use of Integer ID Columns has evolved with advancements in technology and shifts in database design philosophies, including the adoption of UUIDs (Universally Unique Identifiers) and other non-integer identifiers. Using integer ID columns remain a common and critical component in many relational database systems.

Sometimes it makes sense to use Integer ID’s in your table schema but then add a UUID column. This gives you the efficiency of indexing number columns but you get the added security of using UUID’s for getting and setting data in your application.

This auto numbering or auto-incrementing feature reduced the risk of errors and duplication. Integer database columns take up less storage space compared to UUID strings, making them ideal for optimising storage in large databases.

Drawbacks of Using Integer IDs

Enumeration and security risks

Using Integer ID columns in database systems represents a common practice for identifying unique records. It presents a potential security risk known as enumeration because if these ID values are sequentially generated, they become predictable.

A hacker could easily exploit this ID predictability to access unauthorised information by enumerating over the ID values. This form of attack, known as “Insecure Direct Object References” (IDOR), can lead to unauthorised data access, information leakage and potential data manipulation. It’s crucial to properly secure and obfuscate these ID columns to mitigate the risk of enumeration attacks.

Scalability Issues When Using Integer IDs

Using integer IDs can lead to scalability issues when the system needs to expand or handle a larger amount of data. The use of integer IDs is often limited by the maximum value that an integer type can hold.

Once this limit is reached, no new IDs can be generated, thus hindering the system’s ability to scale. For instance, in a database table with an integer ID as the primary key, the number of rows can never exceed the maximum integer value. This creates a potential bottleneck in systems that require constant growth or expansion.

The use of integer IDs could also lead to performance issues as the size of the data grows, since searching, sorting, and indexing operations may take longer to complete.

Limitations of Integer ID’s In Distributed Systems

Integer ID’s are problematic when used in distributed systems. Integer columns are finite and can be exhausted, especially in large-scale systems where billions of unique IDs might be needed. Integers are often subject to collisions where two different entities are assigned the same ID, leading to data inconsistency.

Where integer IDs are sequential and predictable, it makes the system vulnerable to security breaches. When a web application is distributed across multiple servers or locations, managing and coordinating unique integer IDs can lead to data synchronisation issues.

Reduce ID Collisions

The potential for integer ID collisions in large datasets is a genuine concern in data management. ID collisions occur when two or more distinct data entries share the same integer ID, causing confusion and potential data corruption.

As the scale of the dataset increases, the probability of such collisions also rises, particularly if the range of possible integer IDs is limited. This can lead to significant problems in data integrity and reliability, making it challenging to accurately track or retrieve individual data entries.

Strategies must be implemented and tested to prevent integer ID collisions, such as using larger integer types (bigint), unique identifiers, or you could use a hash function to generate a unique string identifier.

Benefits of UUID and ULID Columns

What are the advantages of UUIDs and ULIDs?

UUIDs (Universally Unique Identifiers) and ULIDs (Universally Unique Lexicographically Sortable Identifiers) come with a set of advantages.

UUIDs provide a high level of uniqueness, reducing the probability of collision, making them ideal for distributed systems where uniqueness across multiple systems is needed. They can be generated offline and in a decentralised way, removing the need for a central authority or coordination between different microservices.

ULIDs not only offer uniqueness but also give the advantage of being time-stamped and lexicographically sortable. This means you can sort them chronologically, making them useful for systems where ordering of data is important. The timestamp feature also allows for easy tracing and debugging.

Using composite keys for complex data models

Using composite keys for complex data models is an effective method for managing and accessing intricate databases. A composite key, which is a combination of two or more columns, is used to create a unique identifier for each row within a database.

This allows for more specific, granular access to data, as well as more efficient indexing and data retrieval. In complex data models where there may be multiple layers of relationships and dependencies, composite keys can significantly enhance the precision and flexibility of data operations.

While they can provide powerful benefits, composite keys also require careful management to ensure data integrity and avoid redundancy or other complications.

Frequently Asked Questions

Is it bad practice to expose database auto-incrementing IDs?

Yes, it is generally considered bad practice to expose auto-incrementing database IDs. Exposing these IDs can potentially expose sensitive information and pose security risks. It is recommended to use a unique identifier separate from the auto-incrementing ID for any public-facing URLs or references.

Why are integer IDs often used as database primary keys?

Integer IDs are often used as database primary keys because they offer several advantages:

Efficiency: Integer IDs use less storage space compared to other data types like strings. They take up less memory and require fewer CPU cycles to process, resulting in faster database operations.

Indexing: Integer IDs can be easily indexed, allowing for efficient searching and sorting of data. Indexing improves query performance by reducing the time it takes to locate and retrieve specific records.

Performance: Integer comparisons and calculations are computationally faster than string manipulations. Using integer IDs as primary keys can significantly improve performance when performing join operations or complex queries involving multiple tables.

Overall, using integer IDs as primary keys provides a balance of efficiency, performance, and simplicity in database design and operations.

When would you use a UUID or ULID instead of an integer?

You would use a UUID or ULID instead of an integer when you need a globally unique identifier for your entities or records in a distributed web application.

What are the security implications of using integer IDs in APIs?

Using integer IDs in APIs can have security implications, as they can potentially expose sensitive information or lead to security vulnerabilities. Here are some possible security risks associated with using integer IDs in APIs:

Information Disclosure: Integer IDs can inadvertently reveal sensitive information, such as user data, by allowing attackers to guess or enumerate valid IDs. This can result in unauthorized access to private resources.

ID Manipulation or Enumeration: Attackers may attempt to manipulate integer IDs to gain unauthorised access or perform malicious actions. For example, they can increment or decrement IDs to access other users’ data or perform actions reserved for privileged accounts.

Brute-force Attacks: Integer IDs that follow a predictable pattern (e.g., sequential numbers) can be subjected to brute-force attacks. Attackers can systematically iterate through possible IDs to discover valid ones and exploit the associated resources.

Insecure Direct Object References (IDOR): Integer IDs can lead to IDOR vulnerabilities, where an attacker manipulates the ID parameter to access unauthorized resources. This can occur if the API does not properly validate or authorise requests based on the provided ID.

Denial of Service (DoS) Attacks: Integer IDs that are used as input for resource-intensive operations can be abused by attackers to trigger DoS attacks. By sending large or maliciously crafted IDs, attackers can overwhelm system resources, causing performance issues or service disruption.

To mitigate these security risks, it is recommended to implement proper access controls, input validation, and authorisation mechanisms in APIs. Using more complex and unpredictable identifiers (e.g., UUIDs) instead of integer IDs can help improve security.

Are there any GDPR or PCI compliance issues when using integer IDs?

Using integer IDs does not inherently pose GDPR or PCI compliance issues. It is important to note that compliance with GDPR and PCI DSS involves a holistic approach and various contributing factors beyond just the use of integer IDs in your database schema.

It is essential to ensure that proper security measures and protocols are in place to protect personal data and sensitive information. It is recommended to consult with legal and compliance experts to assess your specific situation and ensure compliance with applicable regulations.

How do UUIDs enhance security compared to integer IDs?

UUIDs (Universally Unique Identifiers) enhance security compared to integer IDs in several ways. UUIDs are generated using a combination of timestamp, random numbers, and the MAC address of the device.

This randomness makes it extremely difficult for attackers to predict or guess the next value, making UUIDs more resistant to enumeration attacks. UUIDs are designed to be globally unique, meaning that the chance of two UUIDs colliding is exceedingly low.

Integer IDs are sequential and predictable, making them susceptible to ID tampering attacks. UUIDs are generated using complex algorithms that make it difficult for attackers to tamper with or manipulate the IDs.

Integer IDs can sometimes reveal information about the structure or order of entities in a system. UUIDs, being random and globally unique, do not reveal any such information, making it harder for attackers to exploit.

Conclusion

Recap of the reasons to move away from integer IDs. Encouragement to evaluate and choose the best ID system for specific use cases.