Uploaded image for project: 'camunda BPM'
  1. camunda BPM
  2. CAM-14619

Timed-out jobs can cause an infinite OptimisticLockingException loop

XMLWordPrintable

    • Icon: Bug Report Bug Report
    • Resolution: Fixed
    • Icon: L3 - Default L3 - Default
    • 7.18.0, 7.17.6
    • None
    • engine

      Environment (Required on creation):

      Any distribution. Tested in spring boot 7.17.0-ee.

      Description (Required on creation; please attach any relevant screenshots, stacktraces, log files, etc. to the ticket):

      If a job happens to take longer than the defined job lock expiration time to complete, timing issues with follow-up jobs can lead to the case where each successive job execution thread generates an OLE due to the actions of the previous thread.
      The test case in the example project example-project.7z is designed to reproduce the issue.

      Steps to reproduce (Required on creation):

      Run example-project.7z. It demonstrates the following behavior:

      1. Thread 1 executes job A. This takes longer than the defined job lock expiration time.
      2. Thread 2 picks up job A after the lock has expired. The job acquisition alters job A.
      3. Thread 1 finishes and fails with an OLE since the job acquisition already altered Job A.
      4. Thread 1 triggers the failure job listener that alters job A. The retries are not decremented due to the OLE.
      5. Thread 1 schedules job A again.
      6. Thread 1 picks up job A after the failure listeners cleared the lock attributes.
      7. Thread 2 finishes and fails with an OLE since the failure listener of Thread 1 already altered Job A.
      8. Steps 4 - 7 repeat indefinitely with alternating threads

      Observed Behavior (Required on creation):

      • Infinite loop of jobs with OLEs.

      Expected behavior (Required on creation):

      • One OLE for a job execution that exceeds the lock expiration time.
      • A successful job execution for the next job execution thread.

      Root Cause (Required on prioritization):

      • The expired job that fails with the OLE runs the job failure listener and clears up all lock attributes of the job.
      • The job can be picked up again and runs in parallel with concurrent threads.

      Solution Ideas (Optional):

      • The expired job could detect that it is executed again already because it was overdue and therefore failed with an OLE.
      • The expired job does not blindly adjust the job if the lock attributes have been altered already, for example.

      Hints (optional):

        This is the controller panel for Smart Panels app

              michael.schoettes Michael Schoettes
              daniel.ewing Daniel Ewing
              Daniel Kelemen Daniel Kelemen
              Tassilo Weidner Tassilo Weidner
              Michael Schoettes Michael Schoettes
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: