The Realm Files - Vol 2 - Physical Structure Overview
The challenge, forensically speaking, is linking the Clusters back to their corresponding Tables (Classes) and Columns (Properties) within those tables. To accomplish this, we must traverse the hierarchy beginning at the top-level node, then follow each reference through the series of nested arrays of Tables and Cluster Trees until we reach the Clusters that store the individual object records.
While I won’t be covering how to navigate and parse arrays at the physical level in this post, understanding the conceptual layout will provide the foundation for the more technical topics I’ll explore in future installments of this series.
The diagram below provides a high-level view of a node within a Realm database. This is a simplified representation, and real-world nodes become increasingly complex as more tables and objects are added to the database.
Copy-on-Write Architecture
What further complicates matters, is Realm maintains two distinct nodes to support its copy-on-write architecture to ensure that each database commit is both atomic and crash-safe. When a write transaction occurs, Realm does not modify the existing node in place. Instead, it allocates new space in the file and writes updated arrays, tables, and clusters into a new node structure. Once all changes are written, the database updates the inactive Top Reference in the file header to point to the new node, then flips the active flag to mark it as the current version. The previously active node remains intact and becomes the inactive snapshot. This alternating process guarantees that one complete, uncorrupted Group is always available, even if a crash or power loss occurs during a write operation.
During an examination, we would want to examine both nodes, as each represents a complete and internally consistent snapshot of the database at a specific point in time. The active node reflects the most recent committed state, while the inactive node preserves the previous version of the database before the last write transaction. By parsing both, an examiner can recover deleted or modified records and reconstruct changes between commits.
Using the iOS Replika app as an example, if we walk the arrays that make up the two nodes, we can identify arrays that are no longer part of the active node. There was a total of 84 arrays in the inactive node that were not in the active node. These arrays represent structures that changed between commits.
If we go to offset 99968, which is considered a data array (Cluster) that is 5 levels deep from the root node. Here we have an example of a 976-byte blob that consists of a concatenated string table containing 88 strings.
If we walk the active node and identify the new version of the data array (Cluster) now at offset 125832 there is a 1952-byte blob which contains that updated version of the string table that has 176 strings. This mean that part of the last transaction was adding strings to the string table.
This is just a basic example of what’s possible when parsing a Realm database at the physical level. By understanding how Realm uses its copy-on-write architecture, and combining that knowledge with advanced analysis, we can identify the specific changes made to the database during the last transaction.
Unallocated Regions
Now we move on to the unallocated regions of the file, which are areas that exist outside of the active and inactive nodes. After a new commit is finalized and the active Top Reference is switched, Realm performs a cleanup process to manage the unused space left behind by older nodes. Because each commit writes modified arrays to new locations rather than overwriting existing ones, portions of the file that belong to the inactive node may eventually become obsolete. Realm marks these regions as free space and reuses them for future allocations during subsequent write transactions. This reclamation occurs gradually, allowing older snapshots to remain intact until they are no longer referenced by active transactions.
From a forensic perspective, this behavior is significant because remnants of outdated or deleted objects can persist in unallocated regions of the file long after they have been removed from the active node. This means that old arrays containing data relevant to an investigation may still be recoverable.
Using the iOS Replika app as an example, if we walk all the arrays that make up the two nodes and identify where each array starts and ends physically, any bytes that are not occupied would be considered unallocated regions of the file. At offset 52616 is a 480 byte unoccupied region that contains 2 arrays (signified by AAAA).
- 5fd0e3b1e5e78b00079b7b5e
- 5fd0e3b1e5e78b00079b7b8b
- 5fe363a1c32a7b000701fd84
- 60128fb8e0704a00068aa367
- 612ccb95e0704a00072b79f7
- 61851af90c81a60007fa391a
- 61bc94e00c81a6000779b2bb
- 61bc94e00c81a6000779b3cf
- 61bc94e50c81a6000779bfb0
- 5fd0e3b1e5e78b00079b7cb1
- 5fd0e3b1e5e78b00079b7cb4
- 5fe363a1c32a7b000701fe11
- 60128fc7e0704a00068aa416
- 612ccb95e0704a00072b79f8
- 6155ad710c81a60007334e15
- 61e6bb7a7045d800066fbbc1
- 61bc94e00c81a6000779b3e4
- 61e6bb7a7045d800066fbbad
Comments
Post a Comment