Introducing SQBite (Alpha) - Python Tool for Extracting Records from SQLite Databases

A few days ago, I released the Alpha code for a new python tool I have been working on that extracts records from an SQLite database. Whilst I understand how SQLite Databases and their corresponding journal files work and have a solid understanding of the physical structured of the files, I am fairly new to Python (less than a year) and some of the functions I wrote provided me some challenges but I eventually figured out.

Over my digital forensics career, I have used many tools (both paid and open-source) to look at SQLite Databases at a forensic level and whilst many do an "OKAY" job with parsing records I have found that many miss a lot of data (or just not providing it the examiner) that could be critical evidence during an examination.

The purpose of this tool is not to reinvent the wheel for forensics on SQLite databases, but to be used for validation, and as an educational tool for my Advanced Applied Database Forensics class that I teach at Spyder Forensics (My employer).

For those reading this that understand how SQLite works, if you access the database using the SQLite libraries you only get the allocated records but can't pull out deleted records or all versions of records. With this in mind I built this tool to physically walk through the database files and extract the data at a physical level.

For those who haven't seen my social media posts, the tool can be downloaded from the Spyder Forensics Github: SQBite Code

Alpha Version Functionality

This first code release is an alpha version, so my focus was on parsing out allocated records in the Main Database file and Write-Ahead log along with building out a framework for future functionality such as parsing of freelist pages, freeblocks, unallocated space and some analysis functionality.

The tool is super simple to use and in its current state doesn't require any additional python modules.

python SQBite.py -i [path to maindb] -w [path to wal (optional) -o [output path]

Example Usage: python SQBite.py -i Evidence\Photos.sqlite -w Evidence\Photos.sqlite-wal -o PhotosDatabaseExtraction.csv

Main Database File Parsing

1. Parses the sqlite_master table to identify table and determine the root pages for tables in the database.

Note: WITHOUT ROW ID tables utilize index b-trees to store their records which is currently not supported. Virtual Tables are also skipped as their contents are not stored physically.

2. Traverses each table B-tree and extracts allocated records from each leaf page. Overflow Pages are also traversed if the record payload is too large to fit on a single leaf page.

Note: I used the code for varints and determining serial types from this WAL crawler script: https://github.com/PythonForensics/Learning-Python-for-Forensics-Second-Edition/tree/01b348bc831055c3571f8ef203e75d733beac907/Chapter12

Write-Ahead Log Parsing (Optional)

1. Parses WAL Frames that contain Table Leaf Pages and extracts allocated records (does not parse overflow pages currently).

2. All Frames are checked to determine page type:

Index B-trees (Interior & Leaf Pages) are skipped
Pointer Map Pages are identified if Auto_Vacuum is enabled but they are skipped
Table Interior Pages are identified and stored to determine table assignments

3. Determines the Table for each page.

Parses the sqlite_master table in the Main Database file to identify tables and determine the root pages.
Walks backwards through the WAL frames to identify a table interior page that references the leaf page in that frame.
Repeats this process until a root page is identified (Basically traverses the b-tree from bottom to top)
If a table cannot be identified from the WAL, the b-trees are traversed in the main database file to identify the table (From top to bottom)

Output

The current output is a csv file that combines all records that were parsed. There are columns for where the record was parsed from, the SQLite Page Number, Table Name, File Offset and the columns that were parsed.

By combining the records, you can potentially find deleted records that still exist in either the main database file or and older WAL frame.

Using the example below in the places.sqlite for Tor Browser, I have 3 versions of Page 8. The oldest version is in the main database file, and then there are 2 frames that contain new versions of the page in the WAL file. This means that there were 2 transactions that occurred which changed the content of Page 8.

As we go through, Row ID 27 for GammaGoblin exists in the Main Database File and 1 frame in the WAL File. This would indicate that this record was deleted.

Moving down to Row ID 32 for Download music, movies.. this record only exists in the 2 WAL frames that contain page 8. This would indicate that this record was added to the table but as a WAL checkpoint hasn't occurred it hasn't made it over to the Main Database file yet.

We can go further and determine order of transactions. As Row ID 32 exists in both WAL frames and not in the Main Database File, the user added this Bookmark entry first. This leaves the Bookmark Entry for Row ID 27, which exists in the first version of page 8 in the WAL but not the 2nd so it was deleted after Row ID 32 was added.

Ultimately, I would like to add this type of analysis into the tool as all the information needed to determine transactions is stored in the WAL file, we just needed to interpret the changes made to a page with each commit.

Final Thoughts

I look forward to hearing everyone's feedback on the project and I'm truly excited about the potentials for this tool as my knowledge of Python increases.

The Beta Version which I plan to release around March 2025 will have basic record recovery functionality for Freeblocks, Freelist Pages, and Unallocated Space. The code is built, I just need to incorporate into the code and rigorously test it along with other bug fixes and code refactoring.

Search This Blog

Digital Forensics with Damien