Isolation & Concurrency
A few meetups ago, we had Jim Starkey come and speak. During that evening, he asked "Why is MVCC not serializable?" It was a "go and do your homework" kind of question.
So, in order to answer that question, let's explore the concepts of isolation and concurrency, the "i" and "c" in ACID compliance. For this post, I will refer to "transactions", by which I mean actual transactions, as well as statements where autocommit=0. In other words, every statement is its own transaction, unless it is within an explicit transaction, and for the purpose of this article, those statements are "transactions". Of course, for many of the explanations below, if there's only one statement in the transaction, there's no problem. (As usual, if I make a mistake, please correct me. I will make sure all comments from all sources get to the article at http://www.sheeri.net/123 .)
Isolation levels define how separate the transaction data is from the actual data in the table. So if I begin a transaction, read a row from the table, change the row, read the row from the table, and commit, what do I see at each step? What does someone else who is also reading the same row see?
| Isolation Level: Read Uncommitted | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| # | Step | Thread 1: changing data | Thread 2: reading data | ||||||
| 0 | Thread 1: Read row Thread 2: Read row |
sees data from original row | sees data from original row | ||||||
| 1 | Thread 1: Change Row Thread 2: N/A |
changes data | N/A | ||||||
| 2 | Thread 1: Read row Thread 2: Read row |
sees new data | sees new data | ||||||
| 3 | Thread 1: Commit Thread 2: N/A |
Commits data | N/A | ||||||
| 4 | Thread 1: Read row Thread 2: Read row |
sees new data | sees new data | ||||||
So a read uncommitted is the lowest level of isolation in that there is none. This is how statements that are implicit transactions act -- you see the data right away, even if what you are doing is logically a transaction. This is also called a "dirty read" because information is seen right away. If the transaction rolled back, and 'undid' the change, then the original data would be back for all to see.
More isolated than read uncommitted read is "read committed", which, as you can guess, means that thread 2 can only read committed changes.
The description in bold is the difference between read uncommitted and read commited:
| Isolation Level: Read Committed | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| # | Step | Thread 1: changing data | Thread 2: reading data | ||||||
| 0 | Thread 1: Read row Thread 2: Read row |
sees data from original row | sees data from original row | ||||||
| 1 | Thread 1: Change Row Thread 2: N/A |
changes data | N/A | ||||||
| 2 | Thread 1: Read row Thread 2: Read row |
sees new data | sees data from the original row | ||||||
| 3 | Thread 1: Commit Thread 2: N/A |
Commits data | N/A | ||||||
| 4 | Thread 1: Read row Thread 2: Read row |
sees new data | sees new data | ||||||
However, this means that for the duration of thread 2's transaction, the data may change. This is not necessarily desirable, and is seen as a 'dirty' read because for the duration of thread 2's transaction, the state of the data changes.
So if thread 1 commits a change, thread 2 does not see it until it reads the changed row in a transaction that started after the commit.
The part in bold is the difference between read uncommitted and repeatable read:
| Isolation Level: Repeatable Read | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| # | Step | Thread 1: changing data | Thread 2: reading data | ||||||
| 0 | Thread 1: Read row Thread 2: Read row |
sees data from original row | sees data from original row | ||||||
| 1 | Thread 1: Change Row Thread 2: N/A |
changes data | N/A | ||||||
| 2 | Thread 1: Read row Thread 2: Read row |
sees new data | sees data from the original row | ||||||
| 3 | Thread 1: Commit Thread 2: N/A |
Commits data | N/A | ||||||
| 4 | Thread 1: Read row Thread 2: Read row |
sees new data | sees data from original table | ||||||
| 5 | Thread 1: N/A Thread 2: New transaction |
Thread 2 starts new transaction | N/A | ||||||
| 6 | Thread 1: Read row Thread 2: Read row |
sees new data | sees new data | ||||||
This sounds like a good level of isolation. However, there are many cases where a transaction will want the most recent available data. For instance, if two people who have a joint bank account take out money at the same time, having two systems start with the same balance, update the new balance and then write the new balance to the database will cause the bank to lose a lot of money, as only one of the two withdrawals will be reflected.
The serializable isolation level helps prevent this -- it does not allow a read of data that is being written to. In other words, there is a serial order to transactions, even reads.
The part in bold is the difference between repeatable read and serializable:
| Isolation Level: Serializable | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| # | Step | Thread 1: changing data | Thread 2: reading data | ||||||
| 0 | Thread 1: Read row Thread 2: Read row |
sees data from original row | sees data from original row | ||||||
| 1 | Thread 1: Change Row Thread 2: N/A |
changes data | N/A | ||||||
| 2 | Thread 1: Read row Thread 2: Read row |
sees new data | is not allowed to start the transaction of reading | ||||||
| 3 | Thread 1: Commit Thread 2: N/A |
Commits data | N/A | ||||||
| 4 | Thread 1: Read row Thread 2: Read row |
sees new data | is allowed to start the read transaction, sees new data |
||||||
| 5 | Thread 1: N/A Thread 2: New transaction |
Thread 2 starts new transaction | N/A | ||||||
| 6 | Thread 1: Read row Thread 2: Read row |
sees new data | sees new data | ||||||
I have not yet addressed how these restrictions are enforced, that's what concurrency is.
Concurrency is how the different working environments occur at the same time. Folks who know about version control understand concurrency issues. One way to control concurrency is to use READ and WRITE locks. To achieve a read uncommitted isolation level, there are no locks. For read committed, a read lock on the data allows multiple threads to read the data, but no thread may write to the data; a write lock does not allow reading until the lock is released. For repeatable read, a read lock means that the thread always sees the same data even if it is changed on disk. For serializable, only one lock can be on the data at a time.
Those who deal with version control know that there is a better system than those locks. Imagine not being able to look at a code file while someone was changing it! In many version control systems, each user has a working copy where changes can be made, which is hidden from everyone else until commit. That is multi-version concurrency control (MVCC) -- there are multiple versions, all out there at once concurrently. MySQL employes MVCC.
The serializable isolation level makes more sense using the MVCC model -- before a software developer commits, they ususally update their files to see if other changes have been made. This does not quite carry over to databases.
Within a multi-statement transaction (for storage engines with support), the database gives a snapshot of the data to the transaction to work with, much like checking out a version. The physical database may be changing with other commits, but much like version control in software, you only see the data you 'checked out' for the duration of the transaction.
Therefore, all isolation levels up to repeatable read are easy to enforce, but serializable is more difficult. There is no way to "update" the snapshot of data within a transaction, to use what's currently in the database; you can only look at the data as it was at the beginning of the transaction (repeatable read isolation level).
So that's why MVCC in a database is not serializable.
To be clear: You can set the isolation level to be SERIALIZABLE in MySQL.
See:
http://dev.mysql.com/doc/refman/5.0/en/innodb-transaction-isolation.html
and
http://dev.mysql.com/doc/refman/5.0/en/innodb-transaction-isolation.html

The She-BA
1. I believe kimseong is
1. I believe kimseong is correct: in an ANSI SQL SERIALIZABLE isolation, the read lock taken by thread 2 in step 0 will block thread 1 from updating (and therefore taking an update lock) in step 1. In SERIALIZABLE mode, writers block readers, and readers block writers. I think this is generic and not specific to any implementation.
2. SERIALIZABLE readers do _not_ block other readers; so I would correct the statement
"For serializable, only one lock can be on the data at a time."
to something like
"For serializable, only one update lock, or any number of read locks, can be on the data at a time."
3. In the paragraphs under "Read Committed", where it says,
"This is not necessarily desirable, and is seen as a ‘dirty’ read because for the duration of thread 2’s transaction, the state of the data changes."
It might be confusing to call "Read Committed" a "dirty" read, because "dirty read" is the informal name for "Read Uncommitted", as you correctly point out in that section.
3. The sentence:
"So if thread 1 commits a change, thread 2 does not see it until it reads the changed row in a transaction that started after the commit."
appears to be the start of a new section on "Repeatable Read"; it appears that an introductory sentence for Repeatable Read is missing.
4. In the discussion of Repeatable Read, it says that two transaction could update one bank account balance, and the bank could thus lose money. I believe that as soon as a transaction does any kind of update (insert, update, or delete) the isolation level automatically changes from RU, RC or RR to Serializable, to avoid just this sort of problem. At least, that is my experience in a couple of implementations (Informix and DB2); I imagine that the SQL standard would say the same, because this is a common concern.
Wikipedia has an article on isolation levels; it points out some of the same issue as you do.
Hope this helps,
- Don
Could you spend sometime and
Could you spend sometime and explain isolation level in scalar aggregation and vector aggregation with or without having? Is there something written in this area that I can read about?
[...] Concurrency and
[...] Concurrency and Isolation Levels http://sheeri.net/index.php?p=123 [...]
The chart is the general
The chart is the general rule for what makes a transaction serializable, not the MySQL-specific implementation. The chart is correct.
You are correct in how MySQL implements SERIALIZABLE.
Your comment is an excellent PS to my post, with the first sentence taken out. I did not address how MySQL implements SERIALIZABLE.
I believe the Serializable
I believe the Serializable chart could be wrong.
In InnoDB, when the isolation level is serializable, every select make an implicit read locks on the selected rows, making it impossible for other transactions to write to those rows. This is similar to having SELECT ... LOCK IN SHARE MODE
If both thread had read the data in Step 0, then the change data in Step 1 of thread 1 will wait until thread 2 release the lock, thread 2 release the lock when it commit or rollback. After that, if thread 2 also try to change the data while thread 1 is waiting, a deadlock occurs.
On the other hand, if thread 1 had read the data and changed the data in step 0 and 1, then thread 2 try to read the data in step 2, then the read in step 2 of thread 2 will wait, as in your chart, this happens only if thread 2 did not make a read in step 0. Only if thread 2 tries to read after thread 1 changed the data, the read in thread 2 will wait as it cannot get a read lock when thread 1 had escalate the read lock into a write lock with the change data in step 1.