Uploaded image for project: 'camunda BPM'
  1. camunda BPM
  2. CAM-13992

Query for updating the removal time for batches performs poorly on MySQL

XMLWordPrintable

      Environment (Required on creation):

      All Camunda distributions, Camunda version 7.15.x

      Description (Required on creation; please attach any relevant screenshots, stacktraces, log files, etc. to the ticket):

      The following update query performs poorly on MySQL with high cardinalities in the ACT_GE_BYTEARRAY table:

        <update id="updateByteArraysByBatchId_mysql"
                parameterType="java.util.Map">
          update ${prefix}ACT_GE_BYTEARRAY
            set REMOVAL_TIME_ = #{removalTime, jdbcType=TIMESTAMP}
            where ID_ IN (
              SELECT JOB_EXCEPTION_STACK_ID_
              FROM ${prefix}ACT_HI_JOB_LOG
              WHERE JOB_DEF_CONFIGURATION_ = #{batchId, jdbcType=VARCHAR}
                AND JOB_EXCEPTION_STACK_ID_ IS NOT NULL
            )
        </update>
      

      Steps to reproduce (Required on creation):

      1. Add 21 million records to the table ACT_GE_BYTEARRAY.
      2. Create a batch job that modifies some process instances.
      3. Wait until the batch job shows 100% progress in the cockpit.
      4. The batch job won't be finished for a long time (switch from runtime to history).
      5. Check the running statements in the database.

      Observed Behavior (Required on creation):

      The statement runs for several minutes (~30 minutes for customers) up to more than one hour (see attached Support cases).

      Expected behavior (Required on creation):

      The statement finishes in a reasonable amount of time, ideally sub-second.

      Root Cause (Required on prioritization):

      The EXPLAIN shows the whole table ACT_GE_BYTEARRAY is joined with the ACT_HI_JOB_LOG table for every row in the byte array table. The higher the cardinalities in the byte array table, the longer the statement will take to complete.

      Solution Ideas (Optional):

      Use an INNER JOIN for the update statement instead of a subselect, which is known to perform poorly on MySQL. That way, the job log table will be queried first and the resulting rows will be used to join with the byte array table which is more efficient.

      The original query shows a runtime of 1,780,000 milliseconds (~30 minutes) in no_join_query.json. The INNER JOIN equivalent shows a runtime of ~11 milliseconds in join_query.json. This is based on roughly ~23 million records in ACT_GE_BYTEARRAY and ~40 million records in ACT_HI_JOB_LOG.

        <update id="updateByteArraysByBatchId_mysql"
                parameterType="java.util.Map">
          update ${prefix}ACT_GE_BYTEARRAY BA
            INNER JOIN ${prefix}ACT_HI_JOB_LOG JL
              ON BA.ID_ = JL.JOB_EXCEPTION_STACK_ID_
              AND JL.JOB_DEF_CONFIGURATION_ = #{batchId, jdbcType=VARCHAR}
            SET BA.REMOVAL_TIME_ = #{removalTime, jdbcType=TIMESTAMP}
        </update>
      

      Hints (optional):

      • We use the INNER JOIN for MySQL already for updating the removal time of historic incidents in the following way:
          <update id="updateHistoricIncidentsByBatchId_mysql"
                  parameterType="java.util.Map">
            update ${prefix}ACT_HI_INCIDENT as I
              inner join ${prefix}ACT_HI_INCIDENT as HI
                on HI.ID_ = I.ID_
              inner join ${prefix}ACT_HI_BATCH as HB
                on HI.JOB_DEF_ID_ in
                  (HB.SEED_JOB_DEF_ID_,
                  HB.MONITOR_JOB_DEF_ID_,
                  HB.BATCH_JOB_DEF_ID_)
              set I.REMOVAL_TIME_ = #{removalTime, jdbcType=TIMESTAMP},
                  I.CREATE_TIME_ = I.CREATE_TIME_
              where HB.ID_ = #{batchId}
          </update>
        

        This is the controller panel for Smart Panels app

              anton.weltzien Anton von Weltzien
              tobias.metzke Tobias Metzke-Bernstein
              Anton von Weltzien Anton von Weltzien
              Tobias Metzke-Bernstein Tobias Metzke-Bernstein
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: