-
Bug Report
-
Resolution: Fixed
-
L3 - Default
-
None
-
None
Environment
- Camunda BPM 7.12.5
- Database: DB2 v11.1.4.4
- Isolation Level: READ_COMMITTED
Description
- Setting the removal time for decisions via the corresponding batch operation results in a deadlock exception
- Deadlock victim query, as well as the other involved query, is:
-- VariableInstanceEntity#updateByteArraysByDecisionInstanceId UPDATE act_ge_bytearray SET removal_time_ = ? WHERE id_ IN (SELECT bytearray_id_ FROM act_hi_dec_in I INNER JOIN act_hi_decinst D ON I.dec_inst_id_ = D.id_ WHERE D.id_ = ?) OR id_ IN (SELECT bytearray_id_ FROM act_hi_dec_out O INNER JOIN act_hi_decinst D ON O.dec_inst_id_ = D.id_ WHERE D.id_ = ?)
Steps to reproduce
- Setup an application server with DB2 v11.1 and Camunda BPM 7.12
- Go to Cockpit and open the batch operation page
- Choose the "Set removal time to decision instances" batch operation, select some decision instances (at least two), and set an absolute removal time
- Execute the batch operation
- Go to the batches page
Observed behavior
The batch operation cannot be successfully completed and has failed jobs due to a deadlock exception: exception_stacktrace.txt
Expected behavior
The batch operation can be completed successfully without failed jobs.
Root cause
- The query optimizer rewrites the SQL query as follows:
(SELECT Q6.BYTEARRAY_ID_ FROM CBPM_INPUT.ACT_HI_DECINST AS Q5, CBPM_INPUT.ACT_HI_DEC_IN AS Q6 WHERE (Q6.DEC_INST_ID_ = 'bc6132f1-79dd-11e8-a021-005056bd9c8c') AND (Q5.ID_ = 'bc6132f1-79dd-11e8-a021-005056bd9c8c') ) AS Q7 RIGHT OUTER JOIN CBPM_INPUT.ACT_GE_BYTEARRAY AS Q8 ON (Q8.ID_ = Q7.BYTEARRAY_ID_)
- The rows of the ACT_GE_BYTEARRAY table are exclusively locked as part of the RIGHT OUTER JOIN table scan:
.... 6) TBSCAN: (Table Scan) .... ROWLOCK: (Row Lock intent) EXCLUSIVE
- When the above-mentioned query is executed in parallel (i.e., more than one removal time batch jobs run in parallel), a deadlock situation might occur as follows:
- Participant 1
- Holds exclusive lock on a row of ACT_GE_BYTEARRAY
- Waits to set an exclusive lock on a row of ACT_GE_BYTEARRAY, however, the lock is held by participant 2
- Participant 2
- Holds exclusive lock on a row of ACT_GE_BYTEARRAY
- Waits to set an exclusive lock on a row of ACT_GE_BYTEARRAY, however, the lock is held by participant 3
- Participant 3
- Holds exclusive lock on a row of ACT_GE_BYTEARRAY
- Waits to set an exclusive lock on a row of ACT_GE_BYTEARRAY, however, the lock is held by participant 1
- Participant 1
Solution idea
- Split up the VariableInstanceEntity#updateByteArraysByDecisionInstanceId query into two queries
- Since the same error pattern is likely to occur for the query VariableInstanceEntity#updateByteArraysByRootDecisionInstanceId (hierarchical flag true), we should split up this query as well
- The solution has already been validated with a customer
- There is already a pull request that fixes the problem: https://github.com/camunda/camunda-bpm-platform/pull/1201
This is the controller panel for Smart Panels app
- is related to
-
CAM-10360 Deadlock exception when set removal time batch is executed
- Closed