[CAM-13992] Query for updating the removal time for batches performs poorly on MySQL

Type: Bug Report
Resolution: Fixed
Priority: L3 - Default
Fix Version/s: 7.17.0, 7.15.7, 7.16.1, 7.17.0-alpha1
Affects Version/s: None
Component/s: engine
Labels:
- SUPPORT

Environment (Required on creation):

All Camunda distributions, Camunda version 7.15.x

Description (Required on creation; please attach any relevant screenshots, stacktraces, log files, etc. to the ticket):

The following update query performs poorly on MySQL with high cardinalities in the ACT_GE_BYTEARRAY table:

  <update id="updateByteArraysByBatchId_mysql"
          parameterType="java.util.Map">
    update ${prefix}ACT_GE_BYTEARRAY
      set REMOVAL_TIME_ = #{removalTime, jdbcType=TIMESTAMP}
      where ID_ IN (
        SELECT JOB_EXCEPTION_STACK_ID_
        FROM ${prefix}ACT_HI_JOB_LOG
        WHERE JOB_DEF_CONFIGURATION_ = #{batchId, jdbcType=VARCHAR}
          AND JOB_EXCEPTION_STACK_ID_ IS NOT NULL
      )
  </update>

Steps to reproduce (Required on creation):

Add 21 million records to the table ACT_GE_BYTEARRAY.
Create a batch job that modifies some process instances.
Wait until the batch job shows 100% progress in the cockpit.
The batch job won't be finished for a long time (switch from runtime to history).
Check the running statements in the database.

Observed Behavior (Required on creation):

The statement runs for several minutes (~30 minutes for customers) up to more than one hour (see attached Support cases).

Expected behavior (Required on creation):

The statement finishes in a reasonable amount of time, ideally sub-second.

Root Cause (Required on prioritization):

The EXPLAIN shows the whole table ACT_GE_BYTEARRAY is joined with the ACT_HI_JOB_LOG table for every row in the byte array table. The higher the cardinalities in the byte array table, the longer the statement will take to complete.

Solution Ideas (Optional):

Use an INNER JOIN for the update statement instead of a subselect, which is known to perform poorly on MySQL. That way, the job log table will be queried first and the resulting rows will be used to join with the byte array table which is more efficient.

The original query shows a runtime of 1,780,000 milliseconds (~30 minutes) in no_join_query.json. The INNER JOIN equivalent shows a runtime of ~11 milliseconds in join_query.json. This is based on roughly ~23 million records in ACT_GE_BYTEARRAY and ~40 million records in ACT_HI_JOB_LOG.

  <update id="updateByteArraysByBatchId_mysql"
          parameterType="java.util.Map">
    update ${prefix}ACT_GE_BYTEARRAY BA
      INNER JOIN ${prefix}ACT_HI_JOB_LOG JL
        ON BA.ID_ = JL.JOB_EXCEPTION_STACK_ID_
        AND JL.JOB_DEF_CONFIGURATION_ = #{batchId, jdbcType=VARCHAR}
      SET BA.REMOVAL_TIME_ = #{removalTime, jdbcType=TIMESTAMP}
  </update>

Hints (optional):

We use the INNER JOIN for MySQL already for updating the removal time of historic incidents in the following way:

  <update id="updateHistoricIncidentsByBatchId_mysql"
          parameterType="java.util.Map">
    update ${prefix}ACT_HI_INCIDENT as I
      inner join ${prefix}ACT_HI_INCIDENT as HI
        on HI.ID_ = I.ID_
      inner join ${prefix}ACT_HI_BATCH as HB
        on HI.JOB_DEF_ID_ in
          (HB.SEED_JOB_DEF_ID_,
          HB.MONITOR_JOB_DEF_ID_,
          HB.BATCH_JOB_DEF_ID_)
      set I.REMOVAL_TIME_ = #{removalTime, jdbcType=TIMESTAMP},
          I.CREATE_TIME_ = I.CREATE_TIME_
      where HB.ID_ = #{batchId}
  </update>

This is the controller panel for Smart Panels app

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

join_query.json
1 kB
08/Oct/21 12:42 PM
no_join_query.json
1 kB
08/Oct/21 12:42 PM

There are no comments yet on this issue.

Assignee:: Anton von Weltzien

Reporter:: Tobias Metzke-Bernstein

DRI:: Anton von Weltzien

Reviewer:: Tobias Metzke-Bernstein

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 08/Oct/21 12:45 PM

Updated:: 05/Apr/22 1:38 PM

Resolved:: 19/Oct/21 9:59 AM

camunda BPM

Details

Description

Environment (Required on creation):

Description (Required on creation; please attach any relevant screenshots, stacktraces, log files, etc. to the ticket):

Steps to reproduce (Required on creation):

Observed Behavior (Required on creation):

Expected behavior (Required on creation):

Root Cause (Required on prioritization):

Solution Ideas (Optional):

Hints (optional):

mgm-controller-panel

This is the controller panel for Smart Panels app

Attachments

Attachments

Activity

People

Dates

Salesforce