-
Bug Report
-
Resolution: Fixed
-
L3 - Default
-
None
What are the steps to reproduce your problem?
- Setup the engine to use an empty, supported SqlServer instance
- Start attached Spring Boot application
- Change History Time To Live to 0 in cockpit
- Create/Start 5000 Instances for cleanup
- Change History Time To Live to 2 in cockpit
- Start constantly 2 running processes (with a dalay of 50ms) + Start Cleanup in cockpit
What is the problem?
- Some running instances will end up failing in the end event due to deadlock exceptions when deleting variables (due to a deadlock with the deletion from byte array table by removal time triggered by the history cleanup)
What would be the expected behavior:
- All process instances finish successfully
Hints (optional):
- When deleting data from ACT_GE_BYTEARRAY during history cleanup (e.g., VariableInstance.xml#deleteByteArraysByRemovalTime), then SQL server must ensure that any foreign key constraints are still satisfied after the deletion, i.e., no other tables reference the byte array rows.
- Currently, the tables ACT_RU_VARIABLE, ACT_RU_JOB and ACT_RU_EXT_TASK have such a constraint.
- In order to ensure that, SQL server joins all the tables together during the delete (left join) and verifies that all rows from ACT_GE_BYTEARRAY have no join partners.
- When there is no or only a small amount of data in the tables, SQL server uses merge joins with index scans because it considers them to be faster than using a loop join with index seek (that is usually used when more data is contained in the tables as this is usually faster then).
- The index scans are locking substantial amounts of data in the tables as opposed to the index seek.
- Different transactions block different tables first and start creating deadlocks for the remaining tables to update.
- Using loop joins with index seeks can be forced with SqlServer by using the query hint OPTION (LOOP JOIN).
- Forcing the use of this hint would mean to create SqlServer-specific mappings in the following mapping files:
- VariableInstance.xml:
<delete id="deleteByteArraysByRemovalTime_mssql" parameterType="org.camunda.bpm.engine.impl.db.ListQueryParameterObject"> <bind name="date" value="'REMOVAL_TIME_'"/> <bind name="reportPeriodUnitName" value="'MINUTE'"/> delete ${limitBeforeWithoutOffset} from ${prefix}ACT_GE_BYTEARRAY where REMOVAL_TIME_ <= #{parameter.removalTime} <include refid="andWhereMinuteInDateBetweenSql"/> ${limitAfterWithoutOffset} OPTION (LOOP JOIN) </delete>
- VariableInstance.xml: