In August 2010, I sent an email to pghackers reporting a strange deadlock problem.
I couldn’t understand how it was possible you could get a deadlock by updating
the very same row again later in the same transaction.
http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg157869.html
I also made a screencast demonstrating the problem:
http://www.screencast.com/users/joeljacobson/folders/Jing/media/42c31028-80fa-45fe-b21f-9039110c3555
Simon Riggs came up with a very clever solution to this problem.
By introducing a new type of lock, it would be possible to avoid a lot of deadlocks.
Álvaro Herrera has since then been working on the implementation.
It turned out to be a lot of work, unfortunately the feature didn’t make it into 9.2.
I bet this will be one of the most important performance features in 9.3.
It will greatly improve the performance by increasing the concurrency possible if you have a high transaction load.
I thought it would be fun to test my old example from 2010 using the latest version of the patch.
Let’s checkout Álvaro’s latest version of the fklocks patch:
git remote add fklocks git://github.com/alvherre/postgres.git
git fetch fklocks
git checkout fklocks
CREATE TABLE A (
AID integer not null,
Col1 integer,
PRIMARY KEY (AID)
);
CREATE TABLE B (
BID integer not null,
AID integer not null,
Col2 integer,
PRIMARY KEY (BID),
FOREIGN KEY (AID) REFERENCES A(AID)
);
INSERT INTO A (AID) VALUES (1);
INSERT INTO B (BID,AID) VALUES (2,1);
-- Process 1:
BEGIN;
-- Process 2:
BEGIN;
UPDATE A SET Col1 = 1 WHERE AID = 1;
UPDATE B SET Col2 = 2 WHERE BID = 2;
UPDATE B SET Col2 = 3 WHERE BID = 2;
-- Process 1 is waiting
UPDATE B SET Col2 = 4 WHERE BID = 2;
-- Process 2 is allowed to update the same row
COMMIT;
COMMIT;
SELECT * FROM A;
aid | col1
-----+------
1 | 1
(1 row)
SELECT * FROM B;
bid | aid | col2
-----+-----+------
2 | 1 | 3
(1 row)
Both transactions were able to COMMIT successfully, hurray!
Without this patch, you would get a “deadlock detected” on row 30,
when Process 2 tries to update the same row again, BID = 2.
However, I was a bit surprised the final value of Col2 is 3 and not 4,
as Process 2 updated the row BID = 2 before Process 1,
and Process 2 updated the row again after Process 1.
This probably has a logical explanation.
I hope to have time to do some concurrency stress testing using
example code and data extracted from our production database.
Would be interesting to see if my old read-life deadlock scenarios
are fixed by this patch and how it affects concurrency and performance.
Optimistic updates result in the second commit overwriting the first. While the update statement for 4 came later the commit for 4 came before the commit for 3.
Or, to put it a different way: Process 1 blocks when it attempts to update Col2, and cannot progress until process 2 commits. Once process 2 commits, process 1 can continue to run where it left off, updating Col2. It does so, then commits. The update for process 1 actually occurs after the update for process 2, because the update doesn’t occur until process 1 unblocks.
The final value of Col2 being 3 is actually correct. Process 1 blocks until the commit of process 2 when the lock on that row is released.
As J Prevost mentioned, the update in process 1 doesn’t happen till *after* the commit in Process 2.
Anonymous is incorrect: it has nothing to do with the order of the commits. And the “update for 4″ *didn’t* come later — but it was *submitted* later.
To get around this “problem” of “lost updates” you’ll need to do either of two things:
1. use a serializable transaction isolation level for both processes,
see http://www.postgresql.org/docs/current/static/transaction-iso.html
2. rewrite your update statements to include the expected “old” value. Or you can rework your data model and add either of these two columns:
A. changed_on timestamp default now()
B. version integer column default 0
and, again, rewrite your update statements to include the expected “old” changed_on or version value. And update the changed_on = now() or version = version + 1.
BUT
Great find, this inability to “reupdate” is something that should be fixed.