Transaction Internals: Version Storage

Versioning can be used to support the isolation among transactions via snapshotting (i.e., snapshot isolation level). While one transaction is editing the data, the other transactions should only be able to see the original data before the edit. In order to achieve that, both of the original and new data are kept. That is, we maintain versions of the data.

To simplify the naming and for better distinction, we use “data” when referring to the latest version of data, and use “version” for older versions maintained due to versioning.

Volatile version store layout

Versioning are applied at the row level. Versions are temporary and can be reclaimed when they are not needed. Under the context of snapshot isolation, it’s enough to have volatile versions since all open transactions will be aborted under a system restart. We may still write versions to the disk if their volume is too large to fit the memory, but we don’t need to recover them after system restarts.

Versions are often stored separately from data. Otherwise, it would be hard to manage rows that are mixed with both persistent and volatile information. For example, while data operations are logged, the generation of versions does not need to be logged since no preservation is required across restarts. This allows more efficient version generation.

Versions are often organized in a different way from data. For example:

  • Versions are written based on their creation time chronologically. This is likely to match the order of the version reclamation–a version saved earlier is likely to be reclaimed first. A version store is thus designed to be append-only.

  • A version store is partitioned into chunks (e.g., based on pages), where a chunk holds multiple versions. The reclamation of versions is done lazily. For example, versions are reclaimed in chunks, and a chunk can be reclaimed if and only if all versions included are expired. Generally, managing coarse-grained units can be more efficient.

Version chain

There could be the case where the data is linked with its previous version, while that version is further linked with another even older version. That is, multiple versions of the same data are organized as a chain. If the data is invisible to a transaction, we will walk along the version chain to find the proper version that is visible to the transaction. We will discuss the version visibility in the next article.

Costs

Versioning introduces some additional costs:

  1. Apparently versions occupy extra storage spaces.
  2. Data modifications always bear the overhead due to the version generation and storage, even if there are no other transactions reading the data.
  3. A data reading operation can be slow when it needs to traverse the version chain.

Contents