ROWID Reuse in SQLite Databases

Recently on one of the list-serves I'm on; a question was raised about whether ID's can be reused in an SQLite database. Whilst the database in question does not reuse ID's, it does bring up the question of does SQLite ever reuse ID's? The short answer is under certain circumstances YES they can! and the long answer requires an understanding of how rowid's work in SQLite, so let's get into that.

ROWID's Explained

For faster sorting and searching, SQLite implemented a special field called rowid that uniquely identifies a record in a table. This unique identifier is considered the true primary key and is what is actually used by the underlying B-tree storage mechanism to look up records in a table. The only exception to this is what they call a WITHOUT ROWID table, which uses the declared primary key as the identifier.

When a table has an integer primary key declared, then this field becomes an alias of the rowid, so both fields will contain the same value. In fact, if you look at a record at the physical level for a table that has an integer primary key, the serial type for the primary key column will be 0 which means NULL.

Note: The declared field type for an integer primary key must be the word INTEGER exactly. If the field is just INT, BIGINT, INT64 or something similar then it will be treated as its own field rather than becoming an alias for the rowid.

If the Primary Key is a non-integer field such as text or if a composite key is defined (multiple columns combined as a primary key) then the Primary Key and rowid will be different, but still the rowid is considered the true primary key for searching and sorting purposes.

SQLite does allow tables to be created without a primary key. So even if that is the case, rowid is used to uniquely identify records.

How are ROWID's Assigned?

The configuration of a primary key determines the behavior of how rowid's are assigned. The default behavior is to assign a new record with a rowid that is the largest EXISTING rowid value + 1. This is also the way rowid's are assigned if the primary key is a non-integer or if no primary is declared.

If the primary key has been set to AUTOINCREMENT then new records will be assigned a rowid that is the largest EVER rowid used + 1. In this scenario an internal SQLite table called sqlite_sequence will be present in the database and used to track the highest rowid.

Default ROWID Assignment Example

I created a real simple database with 1 table that has Contact_ID as an Integer Primary Key, so this field becomes an alias of the rowid.

CREATE TABLE [Contacts]([Contact_ID] INTEGER PRIMARY KEY, [Contact_FName] TEXT NOT NULL, [Contact_LName] TEXT NOT NULL, [Contact_Number] TEXT);









I added 4 records to the table with the Contact_ID as 1 through 4.













I deleted the contact "Mark Lorenzo" and as expected, I now have a gap in my ID values. As most forensic examiners know, a gap in id's indicates a deletion by either the user or by some automated application process like data expiration.


I insert a new record, the id value that was assigned will be 5 as the largest Existing ROWID is 4.








AUTOINCREMENT Example

I created the same Contacts table again, but this time I configured the Contact_ID as autoincrementing primary key.

CREATE TABLE [Contacts]([Contact_ID] INTEGER PRIMARY KEY AUTOINCREMENT, [Contact_FName] TEXT NOT NULL, [Contact_LName] TEXT);

As I used AUTOINCREMENT, the sqlite_sequence table was created. This table is an internal table that tracks the largest rowid ever used in tables that have auto-incrementing primary keys.

The sqlite_sequence table contains 2 fields:

  • Name: Name of the table with an autoincrementing integer primary key
  • seq: The largest rowid EVER used in the table. Incremented by 1 for each record that is inserted into the table.


I inserted the same initial 4 records to this database, deleted the record for Mark Lorenzo, and then inserted a new record. The rowid's came out the same as the previous example where we have a gap in the values.









The entry for the Contacts table in the sqlite_sequence table is updated to 5.







How can ROWID's be reused?

We have established how SQLite assigns rowid's and because rowid is considered the true primary key for a table you will never find 2 allocated records with the same rowid. Let's now look at how rowid's can be reused.

The scenario where a rowid can be reused is when the table primary key is not set to autoincrement, and the last inserted record(s) are deleted. With this scenario, As the sqlite_sequence table is not tracking the highest rowid ever used, it takes the highest EXISTING rowid. 

I will walkthrough an example in my contacts database where I didn't set the Contact_ID to autoincrement. I deleted the last 2 contacts I inserted with Holly Berry being id 5 and Brian Nice being 4.






I then inserted 2 new records into the table. From the database perspective it doesn't know that ID's 3 - 5 have been previously used, so the new records are assigned ID 3 and 4.







Identifying Reused ID's

The identification of reused ID's is dependent on if the deIeted record can be recovered from the main database file or the journal files.

First step is to determine if the rowid's can be reused in the table of interest (no sense in going down the rabbit hole if you don't need to). Easiest way is to look for the presence of the sqlite_sequence table and if it is present, query it to see what tables are being tracked.

SELECT * FROM sqlite_sequence

Once you have established that there is the possibility that the rowid can be reused, you will need to compare the active records with the deleted records that your tool was able to recover. Be mindful that your tool must be able to recover the row_id for a record in order for this to work.

I used my SQLite tool to extract records from the main database file and the associated WAL file. 

Note: The Analysis_ID, Source, Frame, Page and File_Offset columns are auto generated by my tool and may not be provided by the tool that you use.


We can clean up this output multiple ways. If you don't have the additional columns like I have, a SELECT DISTINCT will work, but if your tool does provide additional columns, you can use the GROUP BY function to only return 1 version of each record based on the actual table fields.

SELECT
    Source,
    File_Offset,
    Contact_ID, 
    Contact_FName, 
    Contact_LName, 
    Contact_Number 
FROM Contacts
GROUP BY Contact_ID, Contact_FName, Contact_LName, Contact_Number
ORDER BY Contact_ID;













As you can see above, by doing a simple data recovery I have multiple records that have the same ID.

Contact ID 3 - Mark Lorenzo 
Contact ID 3 - Tammy Randall
Contact ID 4 - Brian Nice
Contact ID 4 - Gail Shaffer

This of course was a small database, so it was easy to find the reused id's in the output, but what if there are thousands of records? 

This is where you need to get creative with SQLite querying by using a Common Table Expression (CTE) and a Sub Query in a JOIN. This took me a few days to write and looks overly complex, but basically it compares the records to identify the unique records and finds the most recent instance of that exact record based on the analysis id. It then only returns records where the same ID is used multiple times.

-- Identifies the current version of a record
-- Add Table Specific Columns to suit your purpose
-- If your tool doesn't assign a unique id for each record use the ID field in the ORDER BY clause
WITH MostRecentRecord AS (
    SELECT *, 
           ROW_NUMBER() OVER (
               PARTITION BY Contact_ID, Contact_FName, Contact_LName, Contact_Number 
               ORDER BY Analysis_ID DESC 
           ) AS row_num
    FROM Contacts
),

-- Filters the results so only 1 unique instance of each record is returned
-- Add Table Specific Columns to suit your purpose
FilteredResults AS (
    SELECT MostRecentRecord.Source, MostRecentRecord.File_Offset, MostRecentRecord.Analysis_ID, 
           MostRecentRecord.Contact_ID, MostRecentRecord.Contact_FName, MostRecentRecord.Contact_LName, MostRecentRecord.Contact_Number
    FROM MostRecentRecord
WHERE MostRecentRecord.row_num = 1
)

-- Output only records that appear more than once in the filtered results
-- Add Table Specific Columns to suit your purpose
SELECT DISTINCT
    FilteredResults.Analysis_ID,
    FilteredResults.Source, 
    FilteredResults.File_Offset,
FilteredResults.Contact_ID,
    FilteredResults.Contact_FName, 
    FilteredResults.Contact_LName, 
    FilteredResults.Contact_Number
FROM FilteredResults
JOIN (
    SELECT Contact_ID
    FROM FilteredResults
    GROUP BY Contact_ID
    HAVING COUNT(*) > 1
) reusedID ON FilteredResults.Contact_ID = reusedID.Contact_ID
ORDER BY FilteredResults.Contact_ID;

Running this query will output records where the ID is the same but there are differences in the other columns.











I Hope this all makes sense, and if the concept of ID reuse is new to you, you now have a workflow to potentially isolate reused ID's in a table. The queries I provided can be used as a template, just change the column and table names based on your needs. 

If you have any questions, just reach out to me on social media!  


Comments

Popular posts from this blog

The Duck Hunters Guide - DuckDuckGo Android Cheat Sheet

The Duck Hunters Guide - Blog #5 - Bookmarks & Favorites (Android)

SQBite Beta Release