Transaction Internals: Snapshot Isolation With Versioning

The unique benefit of snapshot isolation is that we can introduce great concurrency to the execution of multiple transactions. This is because readers do not block writers and writers do not block readers either. The rationale behind that is the well-designed mechanism of version processing, including the version generation, visibility, and cleanup.

Version generation

We only maintain versions for updates and deletion because they overwrite the original data. For most storage systems (with or without versioning), the deletion does not really overwrite each single bit with null values, because that’s inefficient and unnecessary (unless the data need to be purged for security). Instead, deleting the data simply marks them as deleted (e.g., by updating the metadata). In this sense, the deletion always keeps the original data regardless whether versioning is involved.

In contrast, inserting new data does not introduce versions since there are no old data to begin with. One exception is to insert data over an entry that was just deleted, because as mentioned before, the data are just marked as deleted while the content are still there.

Version visibility

Snapshotting-based isolation lets each transaction see the proper version of data. The visibility of each version is determined through the sequence of transactions and their operations within a system. For example, we want to know when a transaction starts, reads, writes, and commits. We therefore maintain a global counter that is monotonically increasing. The counter does not need to be persisted (e.g., will be reset over a system restart). Upon the counter, we record timestamps. For version visibility, we maintain two timestamps for each transaction:

  • Start timestamp–When a transaction starts, it’s assigned a timestamp from the current counter value. The counter is then increased by one.

  • Minimum start timestamp–When a transaction starts, a snapshot of open transactions in the system is taken. It then selects the open transaction in the snapshot that has the lowest start timestamp, and records that as the minimum start timestamp. Such minimum start timestamp indicates the oldest open transaction in the system at that moment.

We also maintain a timestamp for each data unit inherited from transactions:

  • Update timestamp–the start timestamp of the last transaction that updates this data unit.

Under the snapshot isolation (SI) level, a transaction will access the most recent committed data as of the start of that transaction. The visibility of data to a transaction \(T\) can be determined as below:

  1. If the data unit has an update timestamp less than the minimum timestamp of \(T\), it’s visible to \(T\) because the last transaction that updates that data unit commits before the start of \(T\) (thus it’s included by the snapshot taken as \(T\) starts).

  2. If the data unit has an update timestamp larger than the start timestamp of \(T\), it’s invisible to \(T\) because the last transaction that updates that data unit starts after \(T\) (thus it won’t be included by the snapshot taken as \(T\) starts).

  3. If the data unit has an update timestamp in between the minimum timestamp and start timestamp (both inclusive) of \(T\), we need to further check the status of the last transaction that updates that data unit. The data unit is visible to \(T\) if and only if that transaction has committed.

Version reclamation

For a data unit, old versions can be reclaimed if they are not used by any transactions any more. From another perspective, a data unit can discard an old version if the next newer value is visible to every transaction. In that sense, the problem of version reclamation is equivalent to that of version visibility. Compared against what we discuss in the previous session, two aspects of extension are applied.

  1. The previous session focuses on the version visibility for one specific transaction. Here to determine if a version is visible to all transactions, we further calculate:

    • Minimum useful timestamp–the lowest minimum start timestamp of all current open transactions.
  2. The eligibility of reclaiming a version depends on the visibility of the next newer version. For example, we update a data unit from value \(V_{old}\) to \(V_{new}\). We can remove \(V_{old}\) if \(V_{new}\) becomes visible to all transactions. Recall that we rely on the update timestamp of a data unit to determine its visibility. However, we don’t know about the update timestamp of \(V_{new}\) from \(V_{old}\). We therefore attach the update timestamp of \(V_{new}\) to \(V_{old}\) and name it:

    • Version generation timestamp–the start timestamp of the transaction that updates the data unit to \(V_{new}\) and save \(V_{old}\) to the version store.

The minimum useful timestamp represents the upper bound of data unit versions that can be safely removed. That is, any data unit with a version generation timestamp less than the minimum useful timestamp is no longer needed. For such a data unit version, the transaction that moves it to the version store must have been committed. Otherwise, that transaction must have been included by the minimum useful timestamp (i.e., its start timestamp must be larger or equal to the minimum useful timestamp). Since that transaction is committed, all its updates are visible to all current open transactions (which start from or after the minimum useful timestamp), and thus the old versions are not needed any more.