Details
Description
Environment (Required on creation):
All Camunda distributions, Camunda version 7.15.x
Description (Required on creation; please attach any relevant screenshots, stacktraces, log files, etc. to the ticket):
The following update query performs poorly on MySQL with high cardinalities in the ACT_GE_BYTEARRAY table:
<update id="updateByteArraysByBatchId_mysql" parameterType="java.util.Map"> update ${prefix}ACT_GE_BYTEARRAY set REMOVAL_TIME_ = #{removalTime, jdbcType=TIMESTAMP} where ID_ IN ( SELECT JOB_EXCEPTION_STACK_ID_ FROM ${prefix}ACT_HI_JOB_LOG WHERE JOB_DEF_CONFIGURATION_ = #{batchId, jdbcType=VARCHAR} AND JOB_EXCEPTION_STACK_ID_ IS NOT NULL ) </update>
Steps to reproduce (Required on creation):
- Add 21 million records to the table ACT_GE_BYTEARRAY.
- Create a batch job that modifies some process instances.
- Wait until the batch job shows 100% progress in the cockpit.
- The batch job won't be finished for a long time (switch from runtime to history).
- Check the running statements in the database.
Observed Behavior (Required on creation):
The statement runs for several minutes (~30 minutes for customers) up to more than one hour (see attached Support cases).
Expected behavior (Required on creation):
The statement finishes in a reasonable amount of time, ideally sub-second.
Root Cause (Required on prioritization):
The EXPLAIN shows the whole table ACT_GE_BYTEARRAY is joined with the ACT_HI_JOB_LOG table for every row in the byte array table. The higher the cardinalities in the byte array table, the longer the statement will take to complete.
Solution Ideas (Optional):
Use an INNER JOIN for the update statement instead of a subselect, which is known to perform poorly on MySQL. That way, the job log table will be queried first and the resulting rows will be used to join with the byte array table which is more efficient.
The original query shows a runtime of 1,780,000 milliseconds (~30 minutes) in no_join_query.json. The INNER JOIN equivalent shows a runtime of ~11 milliseconds in join_query.json. This is based on roughly ~23 million records in ACT_GE_BYTEARRAY and ~40 million records in ACT_HI_JOB_LOG.
<update id="updateByteArraysByBatchId_mysql" parameterType="java.util.Map"> update ${prefix}ACT_GE_BYTEARRAY BA INNER JOIN ${prefix}ACT_HI_JOB_LOG JL ON BA.ID_ = JL.JOB_EXCEPTION_STACK_ID_ AND JL.JOB_DEF_CONFIGURATION_ = #{batchId, jdbcType=VARCHAR} SET BA.REMOVAL_TIME_ = #{removalTime, jdbcType=TIMESTAMP} </update>
Hints (optional):
- We use the INNER JOIN for MySQL already for updating the removal time of historic incidents in the following way:
<update id="updateHistoricIncidentsByBatchId_mysql" parameterType="java.util.Map"> update ${prefix}ACT_HI_INCIDENT as I inner join ${prefix}ACT_HI_INCIDENT as HI on HI.ID_ = I.ID_ inner join ${prefix}ACT_HI_BATCH as HB on HI.JOB_DEF_ID_ in (HB.SEED_JOB_DEF_ID_, HB.MONITOR_JOB_DEF_ID_, HB.BATCH_JOB_DEF_ID_) set I.REMOVAL_TIME_ = #{removalTime, jdbcType=TIMESTAMP}, I.CREATE_TIME_ = I.CREATE_TIME_ where HB.ID_ = #{batchId} </update>