Uploaded image for project: 'Camunda Optimize'
  1. Camunda Optimize
  2. OPT-7047

Optimize importer can get stuck permanently after temporary import blockage

    XMLWordPrintable

Details

    • Bug Report
    • Resolution: Fixed
    • L3 - Default
    • 3.10.4, 3.11.0-alpha5, 3.11.0
    • None
    • backend
    • None
    • Not defined

    Description

      Brief summary of the bug. What is it ? Where is it ?

      Since Optimize 3.10.0, it has been possible for Optimize to import based on the sequence of the Zeebe record. With this, we build a range query based on the sequence of the previously seen record and the size of the next batch.

      However, we had a scenario where the Optimize importer was stuck on a batch of records and repeatedly attempting the same batch for five days. After this time, the zeebe records that were unprocessable were deleted, but Optimize was still not able to move to the next importable batch of records because the range query uses the last imported record sequence as the lower boundary and this sequence plus the batch size as the upper boundary. Because the fixed recent records have a sequence higher than this upper number, the Optimize importer could never catch up without manual intervention.

      In short, Optimize needs to handle both the case where there are no records to import in the given range, but also the case that there still could be records to process beyond the given empty result set (and thus the import indexes should still be updated)

      Steps to reproduce:

      1. Set the batch size as 1
      2. Block the import with a record with an unimportable value. In the real scenario, this was having a null value for the bpmnElementType field.
      3. Observe the repeated blocked record not being imported
      4. Delete the broken record
      5. Observe the importer not catching up

      Actual result:

      Optimize remains stuck and doesn't import newer records

      Expected result:

      Optimize can always get the next records, no matter how high much higher their sequence is than the previously imported record

      Notes:

      • This should be fixed on master and also backported to the maintenance/3.10 branch
      • It might be that we can revert to the 'position' query if we can identify being in such a position where the Optimize importer is stuck

      Testing Notes:

      Behaviour to verify before:

      Do all of this in the context of a single partition

      1. Deploy Zeebe process instance data and complete an instance
      2. Modify the sequence of the last data point(s) to be more than the page size greater than the previous records. Basically make it so that only part of the process can be imported, and that the sequence query misses the last data points. This can also be done via data deletion
      3. Observe in the logs that the importer is working but not importing any data
      4. Check in ES that the instance state is still running

      Behaviour to verify with fix:

      Do all of this in the context of a single partition

      1. Deploy Zeebe process instance data and complete an instance
      2. Modify the sequence of the last data point(s) to be more than the page size greater than the previous records. Basically make it so that only part of the process can be imported, and that the sequence query misses the last data points. This can also be done via data deletion
      3. Observe in the logs that the importer uses the position based query to "catch up"
      4. Check in ES that the instance state is completed

      mgm-controller-panel

        This is the controller panel for Smart Panels app

        Attachments

          Activity

            People

              Unassigned Unassigned
              joshua.windels Joshua Windels
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Salesforce