SQL Server isolation levels are used to define the degree to which one transaction must be isolated from resource or data modifications made by other concurrent transactions. The different Isolation Levels are:
- Read Uncommitted (or nolock)
- Read Committed
- Repeatable Read
Read Committed is the default isolation level.
The write behaviour of all transaction isolation levels is the same. However, they are very different when it comes to reads, particularly repeated reads of the same data within a transaction. Isolation levels define the degree to which a transaction must be isolated from the data modifications made by any other transaction in the database system. A transaction isolation level is defined by the following phenomena –
- Dirty Read – A Dirty read is the situation when a transaction reads a data that has not yet been committed. For example, Let’s say transaction 1 updates a row and leaves it uncommitted, meanwhile, Transaction 2 reads the updated row. If transaction 1 rolls back the change, transaction 2 will have read data that is considered never to have existed.
- Non Repeatable read – Non Repeatable read occurs when a transaction reads same row twice, and get a different value each time. For example, suppose transaction T1 reads data. Due to concurrency, another transaction T2 updates the same data and commit, Now if transaction T1 rereads the same data, it will retrieve a different value.
- Phantom Read – Phantom Read occurs when two same queries are executed, but the rows retrieved by the two, are different. For example, suppose transaction T1 retrieves a set of rows that satisfy some search criteria. Now, Transaction T2 generates some new rows that match the search criteria for transaction T1. If transaction T1 re-executes the statement that reads the rows, it gets a different set of rows this time.
We need to figure out which of these behaviours our application can handle. If it is an absolute requirement that none of these can occur; than we have to use the Serializable isolation level. However, this is also the level with the least amount of concurrency, as preventing these phenomena requires a great deal of serialization of write processes.
In many situations we can get away with less restrictive isolation levels provided we use good coding practices. For example, in general it should be avoided to read the same record twice within the same transaction. That way we don’t need to worry about non-repeatable reads.
However, sometimes we need to do things like check if a record exist and create it if necessary. This two step process is only safe when using the serializable isolation level as it is the only one that allows range locks and with that gives a way to prevent inserts into a table area that another transaction has “looked at”.
A financial application probably has many situations where one record is read and then based on that one or more records including the original one are altered. Every time we encounter a situation like that, it is more important that we indicate the write intend on the initial read by requesting a write or update lock. If we do that within the same transaction that executes the writes later on; we are not dependent on the SQL Server isolation level at all. We also will reduce the likelihood of deadlocks.
SQL Server has a pessimistic locking model because it was built for an OLTP environment and assumes that updates will occur frequently. For a developer, this is important to understand in terms of concurrency: how many users can access the same data simultaneously. In some cases, we may want concurrency to be high (many people can access the same data at the same time); in other cases we may want concurrency to be low (only one person can access the data at any given time). A developer must know what level of concurrency is needed, the effects that concurrency has in the database (locks) and how concurrency affects user activity and processing. The isolation level for a database connection directly affects the type of lock that is taken for a read operation, and therefore affects concurrency.
The SQL Server maintains consistency by utilizing two different techniques: Locking and Row versioning.
Locking isolation levels
Locking works by the SQL Server issues a shared lock on the tables/rows that it reads, blocking other transactions from updating the data. The same goes for updates, which are issued an exclusive lock, blocking other transactions from reading the data. Locking occurrs in different parts of the database, which I will not go into, but for instance tables, rows, indexes.
READ COMMITTED is using locking to ensure that it only reads committed data. And that no other transactions are updating the data while it’s being read. This means that an update statement on rows that another transaction is currently selecting is being blocked. And, a select statement on rows that another transaction is updating is being blocked until that data is committed.
READ UNCOMMITTED ignores everything, and doesn’t take any locks at all. This means that another transaction can perform an update on rows that the transaction is currently reading without being blocked, but it also have the effect that the transaction will receive data that the other transaction might not yet have committed.
SERIALIZABLE do the opposite, and locks everything. While READ COMMITTED releases the locks when it read the row, or when the statement is completed depending on the lock, SERIALIZABLE releases the locks once the transaction is committed. This means that another transaction that wants to update data that the transaction have read at least once, or another transaction wants to read data that the transaction has updated will be blocked until the transaction is committed.
Row versioned isolation levels
READ COMMITTED SNAPSHOT and SNAPSHOT ISOLATION uses row versioning instead. Row versioning means that every time a row is modified, the SQL Server stores a version of the row, ensuring that it will stay the same when read by another transaction.
SQL Server isolation level SNAPSHOT works in the way that when a read is done on a table, it retrieves the last version of the rows that were committed at the time the transaction started. This provides a consistent snapshot of the data within the transaction. Data modified after the transaction started will not be visible, but at the same time the transaction will not be blocked. To protect against lost updates, if the transaction wants to update some rows that have been changed by another transaction after the transaction begun, it will terminate the transactions due to conflicting data.
READ COMMITTED SNAPSHOT works in the same way as snapshot isolation, but instead of keeping the snapshot the entire transaction, it only keeps it for the duration of the statement. This means that two read statements within one transaction could receive different results. However, when the transaction is doing an update, it is using the actual row instead of a previous row version and does not track if the row has been changed.
Since the SQL Server needs to keep each modified row that could be used by active transactions available, it stores them in tempdb. Due to this, tempdb needs to be large enough to incorporate all the changes. There is a background thread that checks which rows are still needed, and removes the rest, but if there is a long running transaction, it will prevent those rows to be removed. If tempdb will run out of space, no new row versions will be created, and any transactions attempting to access those (non-existing) rows will terminate.
This the most optimistic isolation level in SQL Server, and in order to utilize it, the SET TRANSACTION ISOLATION LEVEL statement must be executed when a connection is made.
In a READ UNCOMMITTED isolation level:
-> If data is being modified by a transaction, any subsequent select statement that occurs before the modification is complete will read the modified copy of the data. This is known as a dirty read.
-> If data is being modified by a transaction, any subsequent select statement that occurs before the modification is complete will wait until the transaction commits or rolls back to read the data.
-> Select statements create shared locks which are released as soon as the data is read. If a shared lock exists, data cannot be modified until the lock is released. Under the right circumstances, this can lead to blocking.
When SQL Server executes a statement at the read committed isolation level, it acquires short lived share locks on a row by row basis. The duration of these share locks is just long enough to read and process each row; the server generally releases each lock before proceeding to the next row. Thus, if we run a simple select statement under read committed and check for locks (e.g., with sys.dm_tran_locks), we will typically see at most a single row lock at a time. The sole purpose of these locks is to ensure that the statement only reads and returns committed data. The locks work because updates always acquire an exclusive lock which blocks any readers trying to acquire a share lock.
In Repeatable Read, statements cannot read data that has been modified but not yet committed by other transactions. No other transaction can modify data that has been read by the current transaction until the current transaction completes.
Shared locks are placed on all data read by each statement in the transaction and are held until the transaction completes. This prevents other transactions from modifying any rows that have been read by the current transaction. This isolation level prevents the Non-Repeatable Read issue.
Other transactions can insert new rows that match the search conditions of statements issued by the current transaction. If the current transaction then retries the statement it will retrieve the new rows, which results in phantom reads.
In the serializable isolation level, statements cannot read data that has been modified but not yet committed by other transactions. No other transactions can modify data that has been read by the current transaction until the current transaction completes. Other transactions cannot insert new rows with key values that would fall in the range of keys read by any statements in the current transaction until the current transaction completes.
This isolation level was added in SQL Server 2005, and is considered a good hybrid of READ COMMITTED and READ UNCOMMITTED as it requires that data that is read during a select statement will be the same value that existed when the transaction started. In a SNAPSHOT isolation level:
-> If an insert, update or delete statement does not commit before the select statement begins, then the output from the select will show the original values.
-> Locks are not taken when reading data.
Some Important Lock Modes
Shared (S) locks allow concurrent transactions to read (SELECT) a resource under pessimistic concurrency control. No other transactions can modify the data while shared (S) locks exist on the resource. Shared (S) locks on a resource are released as soon as the read operation completes, unless the transaction isolation level is set to repeatable read or higher, or a locking hint is used to retain the shared (S) locks for the duration of the transaction.
Update (U) locks prevent a common form of deadlock. In a repeatable read or serializable transaction, the transaction reads data, acquiring a shared (S) lock on the resource (page or row), and then modifies the data, which requires lock conversion to an exclusive (X) lock. If two transactions acquire shared-mode locks on a resource and then attempt to update data concurrently, one transaction attempts the lock conversion to an exclusive (X) lock. The shared-mode-to-exclusive lock conversion must wait because the exclusive lock for one transaction is not compatible with the shared-mode lock of the other transaction; a lock wait occurs. The second transaction attempts to acquire an exclusive (X) lock for its update. Because both transactions are converting to exclusive (X) locks, and they are each waiting for the other transaction to release its shared-mode lock, a deadlock occurs.
To avoid this potential deadlock problem, update (U) locks are used. Only one transaction can obtain an update (U) lock to a resource at a time. If a transaction modifies a resource, the update (U) lock is converted to an exclusive (X) lock.
Exclusive (X) locks prevent access to a resource by concurrent transactions. With an exclusive (X) lock, no other transactions can modify data; read operations can take place only with the use of the NOLOCK hint or read uncommitted isolation level.
Data modification statements, such as INSERT, UPDATE, and DELETE combine both modification and read operations. The statement first performs read operations to acquire data before performing the required modification operations. Data modification statements, therefore, typically request both shared locks and exclusive locks. For example, an UPDATE statement might modify rows in one table based on a join with another table. In this case, the UPDATE statement requests shared locks on the rows read in the join table in addition to requesting exclusive locks on the updated rows.
Repeatable Read vs Serializable Isolation Level
Serializable means that there is some order the transactions can be run in without overlapping and we’ll end up with the same answers and the same state of the database as we get by running the transactions in parallel with serializable isolation level.
Given two transactions, A and B, the only valid states of the system are –
All of A followed by all of B, OR
All of B followed by all of A.
That’s it. If the system can end up in any other state the transactions are not serializable. If we can show that two transactions running in parallel do not correspond to one or other of these states then those transactions are not serializable.
Think about a table with 4 rows, ID values 2, 4, 6 and 8. There are two transactions A and B. A counts the number of rows. B inserts two rows, ID values 3 and 7.
If we run A->B then A returns 4. If we run B->A then A returns 6. Those are the only possible answers if we are to guarantee serialization.
So A starts under Repeatable Read isolation. It will perform a table scan. A reads row 2 and takes a lock, then reads row 4 and takes a lock.
Now B starts in parallel with A. B tries to insert row 3. Nothing prevents it; A has never read a row with ID 3 to take a lock on it. Then B inserts row 7 and commits, releasing its locks.
Transaction A continues reading. It has just finished with 4 so the next row is 6, then comes 7 (B has committed so its lock on 7 has been released already) and finally 8. So A has seen rows 2, 4, 6, 7 & 8 – five rows!
This workload is not serializable.
The “problem” is the phantom rows produced by B. They overlap the range of data to be read by A. But Repeatable Read does not issue range locks; so B is free to do this. Under Serializable isolation B’s insertion locks would have prevented A taking range locks or A’s range locks would have blocked B’s insertions.
SQL Server Isolation Level Snapshot vs Serializable
Both the serializable and snapshot isolation levels provide a read consistent view of the database to all transactions. In either of these isolation levels, a transaction can only read data that has been committed. Moreover, a transaction can read the same data multiple times without ever observing any concurrent transactions making changes to this data. The unexpected read committed and repeatable read results are not possible in serializable or snapshot isolation level.
Notice the phrase “without ever observing any … changes.” This choice of words is deliberate. In serializable isolation level, SQL Server acquires key range locks and holds them until the end of the transaction. A key range lock ensures that, once a transaction reads data, no other transaction can alter that data – not even to insert phantom rows – until the transaction holding the lock completes. In snapshot isolation level, SQL Server does not acquire any locks. Thus, it is possible for a concurrent transaction to modify data that a second transaction has already read. The second transaction simply does not observe the changes and continues to read an old copy of the data.
Serializable isolation level relies on pessimistic concurrency control. It guarantees consistency by assuming that two transactions might try to update the same data and uses locks to ensure that they do not but at a cost of reduced concurrency – one transaction must wait for the other to complete and two transactions can deadlock. Snapshot isolation level relies on optimistic concurrency control. It allows transactions to proceed without locks and with maximum concurrency, but may need to fail and rollback a transaction if two transactions attempt to modify the same data at the same time.
How do serializable and snapshot differ in terms of the transaction isolation that they confer? It is simple to understand serializable. For the outcome of two transactions to be considered serializable, it must be possible to achieve this outcome by running one transaction at a time in some order.
Snapshot does not guarantee this level of isolation.
Imagine that we have a bag containing a mixture of white and black marbles. Suppose that we want to run two transactions. One transaction turns each of the white marbles into black marbles. The second transaction turns each of the black marbles into white marbles. If we run these transactions under serializable isolation, we must run them one at a time. The first transaction will leave a bag with marbles of only one color. After that, the second transaction will change all of these marbles to the other color. There are only two possible outcomes: a bag with only white marbles or a bag with only black marbles.
If we run these transactions under snapshot isolation, there is a third outcome that is not possible under serializable isolation. Each transaction can simultaneously take a snapshot of the bag of marbles as it exists before we make any changes. Now one transaction finds the white marbles and turns them into black marbles. At the same time, the other transactions finds the black marbles – but only those marbles that where black when we took the snapshot – not those marbles that the first transaction changed to black – and turns them into white marbles. In the end, we still have a mixed bag of marbles with some white and some black. In fact, we have precisely switched each marble.
Summary of SQL Server Isolation Level (Sort Of…)
- READ COMMITTED SNAPSHOT is apart from READ UNCOMMITTED the most permissive when it comes to concurrency. It doesn’t block any other DML statements, and keep a consistent view of the data within each statement.
- SNAPSHOT ISOLATION is permissive and maintains better consistency than READ COMMITTED SNAPSHOT, with the drawback that due to its conflict resolution may fail when doing updates. Multiple statements within the same transaction are guaranteed to be consistent with each other.
- READ COMMITTED is permissive when it comes to concurrent reads, not so much for updates. Due to locking, it is also slower than the row versioning levels.
- READ UNCOMMITTED is not recommended to use due to it will read dirty data, if that is no concern then it is best due to no locking and no need to keep old row versions.
- SERIALIZABLE is not recommended since it keeps locks on everything and therefore is slow and not good with concurrency.
Only one of the isolation level options can be set at a time, and it remains set for that connection until it is explicitly changed. A lower isolation level increases the ability of many users to access data at the same time, but increases the number of concurrency effects, such as dirty reads or lost updates etc. Conversely, a higher isolation level reduces the types of concurrency effects that users might encounter, but requires more system resources and increases the chances that one transaction will block another.
The lowest isolation level, read uncommitted, can retrieve data that has been modified but not committed by other transactions. All concurrency side effects can happen in read uncommitted, but there’s no read locking or versioning, so overhead is minimized.
READ COMMITTED is the default isolation level for SQL Server. It prevents dirty reads by specifying that statements cannot read data values that have been modified but not yet committed by other transactions. If the READ_COMMITTED_SNAPSHOT option is set as ON, the Read transactions need not wait and can access the last committed records. Other transactions can modify, insert, or delete data between executions of individual SELECT statements within the current transaction, resulting in non-repeatable reads or phantom rows.
REPEATABLE READ is a more restrictive isolation level than READ COMMITTED. It encompasses READ COMMITTED and additionally specifies that no other transactions can modify or delete data that has been read by the current transaction until the current transaction commits. Concurrency is lower than for READ COMMITTED because shared locks on read data are held for the duration of the transaction instead of being released at the end of each statement. But Other transactions can insert data between executions of individual SELECT statements within the current transaction, resulting in phantom rows.
The highest isolation level, serializable, guarantees that a transaction will retrieve exactly the same data every time it repeats a read operation, but it does this by performing a level of locking that is likely to impact other users in multi-user systems.
SNAPSHOT isolation specifies that data read within a transaction will never reflect changes made by other simultaneous transactions. The transaction uses the data row versions that exist when the transaction begins. No locks are placed on the data when it is read, so SNAPSHOT transactions do not block other transactions from writing data. Transactions that write data do not block snapshot transactions from reading data. If no waiting is acceptable for the SELECT operation but the last committed data is enough to be displayed, this isolation level may be appropriate.