Uploaded image for project: 'camunda BPM'
  1. camunda BPM
  2. CAM-13683

Prevent Deadlocks on SqlServer for concurrent history cleanup and running instances

    XMLWordPrintable

Details

    Description

      What are the steps to reproduce your problem?

      1. Setup the engine to use an empty, supported SqlServer instance
      2. Start attached Spring Boot application
      3. Change History Time To Live to 0 in cockpit
      4. Create/Start 5000 Instances for cleanup
      5. Change History Time To Live to 2 in cockpit
      6. Start constantly 2 running processes (with a dalay of 50ms) + Start Cleanup in cockpit

      What is the problem?

      • Some running instances will end up failing in the end event due to deadlock exceptions when deleting variables (due to a deadlock with the deletion from byte array table by removal time triggered by the history cleanup)

      What would be the expected behavior:

      • All process instances finish successfully

      Hints (optional):

      • When deleting data from ACT_GE_BYTEARRAY during history cleanup (e.g., VariableInstance.xml#deleteByteArraysByRemovalTime), then SQL server must ensure that any foreign key constraints are still satisfied after the deletion, i.e., no other tables reference the byte array rows.
      • Currently, the tables ACT_RU_VARIABLE, ACT_RU_JOB and ACT_RU_EXT_TASK have such a constraint.
      • In order to ensure that, SQL server joins all the tables together during the delete (left join) and verifies that all rows from ACT_GE_BYTEARRAY have no join partners.
      • When there is no or only a small amount of data in the tables, SQL server uses merge joins with index scans because it considers them to be faster than using a loop join with index seek (that is usually used when more data is contained in the tables as this is usually faster then).
      • The index scans are locking substantial amounts of data in the tables as opposed to the index seek.
      • Different transactions block different tables first and start creating deadlocks for the remaining tables to update.
      • Using loop joins with index seeks can be forced with SqlServer by using the query hint OPTION (LOOP JOIN).
      • Forcing the use of this hint would mean to create SqlServer-specific mappings in the following mapping files:
        • VariableInstance.xml:
            
          <delete id="deleteByteArraysByRemovalTime_mssql"
                    parameterType="org.camunda.bpm.engine.impl.db.ListQueryParameterObject">
              <bind name="date" value="'REMOVAL_TIME_'"/>
              <bind name="reportPeriodUnitName" value="'MINUTE'"/>
              delete ${limitBeforeWithoutOffset} from ${prefix}ACT_GE_BYTEARRAY
              where REMOVAL_TIME_ &lt;= #{parameter.removalTime}
              <include refid="andWhereMinuteInDateBetweenSql"/>
              ${limitAfterWithoutOffset}
              OPTION (LOOP JOIN)
          </delete>
          

      mgm-controller-panel

        This is the controller panel for Smart Panels app

        Attachments

          Activity

            People

              Unassigned Unassigned
              tobias.metzke Tobias Metzke-Bernstein
              Tobias Metzke-Bernstein Tobias Metzke-Bernstein
              Miklas Boskamp Miklas Boskamp
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Salesforce