-
Bug Report
-
Resolution: Fixed
-
L3 - Default
-
None
-
None
-
Not defined
Brief summary of the bug. What is it ? Where is it ?
Since Optimize 3.10.0, it has been possible for Optimize to import based on the sequence of the Zeebe record. With this, we build a range query based on the sequence of the previously seen record and the size of the next batch.
However, we had a scenario where the Optimize importer was stuck on a batch of records and repeatedly attempting the same batch for five days. After this time, the zeebe records that were unprocessable were deleted, but Optimize was still not able to move to the next importable batch of records because the range query uses the last imported record sequence as the lower boundary and this sequence plus the batch size as the upper boundary. Because the fixed recent records have a sequence higher than this upper number, the Optimize importer could never catch up without manual intervention.
In short, Optimize needs to handle both the case where there are no records to import in the given range, but also the case that there still could be records to process beyond the given empty result set (and thus the import indexes should still be updated)
Steps to reproduce:
- Set the batch size as 1
- Block the import with a record with an unimportable value. In the real scenario, this was having a null value for the bpmnElementType field.
- Observe the repeated blocked record not being imported
- Delete the broken record
- Observe the importer not catching up
Actual result:
Optimize remains stuck and doesn't import newer records
Expected result:
Optimize can always get the next records, no matter how high much higher their sequence is than the previously imported record
Notes:
- This should be fixed on master and also backported to the maintenance/3.10 branch
- It might be that we can revert to the 'position' query if we can identify being in such a position where the Optimize importer is stuck
Testing Notes:
Behaviour to verify before:
Do all of this in the context of a single partition
- Deploy Zeebe process instance data and complete an instance
- Modify the sequence of the last data point(s) to be more than the page size greater than the previous records. Basically make it so that only part of the process can be imported, and that the sequence query misses the last data points. This can also be done via data deletion
- Observe in the logs that the importer is working but not importing any data
- Check in ES that the instance state is still running
Behaviour to verify with fix:
Do all of this in the context of a single partition
- Deploy Zeebe process instance data and complete an instance
- Modify the sequence of the last data point(s) to be more than the page size greater than the previous records. Basically make it so that only part of the process can be imported, and that the sequence query misses the last data points. This can also be done via data deletion
- Observe in the logs that the importer uses the position based query to "catch up"
- Check in ES that the instance state is completed