[CAM-9934] Process status after interruptive event subprocess

Type: Bug Report
Resolution: Fixed
Priority: L3 - Default
Fix Version/s: 7.12.0, 7.11.5, 7.10.11, 7.12.0-alpha5
Affects Version/s: 7.10.0
Component/s: engine
Labels:
- SUPPORT

Steps to reproduce
See failing test case.

Expected behavior
If the activity instances are canceled by the interrupting event subprocess, ...

the status of the historical process instance should remain ACTIVE
no endDate should be set to the historical process instance

Observed behavior
If the activity instances are canceled by the interrupting event subprocess, ...

the status of the historical process instance should remain INTERNALLY_TERMINATED
an endDate is set to the historical process instance

Problem 1
Wrong state of the historic process instance is written to the database.

Problem 2
Precondition: The removal time strategy is set to start and a time to live is specified

The process instance end event in the historic event producer is triggered
- This writes the data to be updated related to the historic process instance (e. g. end time, duration, etc.) to the database entity cache
- The removal time is in this case null because it hasn't been changed
All subsequent historic events (e. g. creating a historic task instance) within the same transaction retrieve the removal time of null from the database entity cache and these events are inserted with a removal time of null to the database

Please see the following test case for problem 2:
https://github.com/camunda/camunda-bpm-platform/blob/master/engine/src/test/java/org/camunda/bpm/engine/test/history/HistoricProcessInstanceStateTest.java#L290-L322

This is the controller panel for Smart Panels app

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

example process.png
27 kB
14/Mar/19 10:20 AM

Tassilo Weidner added a comment - 15/Mar/19 5:03 PM

Hi Sébastien,

thanks for reaching out to us with your question.

I will investigate if this is a bug or expected behavior and come back to you as soon as possible.

Cheers,
Tassilo

Tassilo Weidner added a comment - 15/Mar/19 5:03 PM Hi Sébastien, thanks for reaching out to us with your question. I will investigate if this is a bug or expected behavior and come back to you as soon as possible. Cheers, Tassilo

Sébastien de la Fosse added a comment - 18/Mar/19 12:01 PM

Hello Tassilo,

Thank you for your time and effort.

I noticed that the status has changed to "ready". Is there any information on the issue?

Kind regards,

Sébastien

Sébastien de la Fosse added a comment - 18/Mar/19 12:01 PM Hello Tassilo, Thank you for your time and effort. I noticed that the status has changed to "ready". Is there any information on the issue? Kind regards, Sébastien

Tassilo Weidner added a comment - 18/Mar/19 1:10 PM

Hey Sébastien,

the behavior is indeed inconsistent. We are currently investigating what would be the expected behavior in this situation and come back to you as soon as we have more to share. We will come back to you tomorrow at the latest.

Cheers,
Tassilo

Tassilo Weidner added a comment - 18/Mar/19 1:10 PM Hey Sébastien, the behavior is indeed inconsistent. We are currently investigating what would be the expected behavior in this situation and come back to you as soon as we have more to share. We will come back to you tomorrow at the latest. Cheers, Tassilo

Tassilo Weidner added a comment - 19/Mar/19 5:02 PM

Hey Sébastien,

thank you for your patience.

We came to the conclusion that this behavior is indeed unexpected.

What would be the expected behavior?
If the activity instances are canceled by the interrupting event subprocess, ...

the status of the historical process instance should remain ACTIVE
no endDate should be set to the historical process instance

Right now, there are no plans to fix this bug in the future. However, we would appreciate it if you could contribute a fix to our code base by raising a Pull Request on GitHub [1].

Cheers,
Tassilo

[1] https://github.com/camunda/camunda-bpm-platform/pulls

Tassilo Weidner added a comment - 19/Mar/19 5:02 PM Hey Sébastien, thank you for your patience. We came to the conclusion that this behavior is indeed unexpected. What would be the expected behavior? If the activity instances are canceled by the interrupting event subprocess, ... the status of the historical process instance should remain ACTIVE no endDate should be set to the historical process instance Right now, there are no plans to fix this bug in the future. However, we would appreciate it if you could contribute a fix to our code base by raising a Pull Request on GitHub [1] . Cheers, Tassilo [1] https://github.com/camunda/camunda-bpm-platform/pulls

Thorben Lindhauer added a comment - 07/Oct/19 6:15 PM

There are a couple of problems involved here:

Root cause:

The root cause is that the execution listener end event of the process instance is triggered as soon as the interrupting event triggers
When interruption is executed, all child executions of the interrupted scope (here process instance) are deleted via ExecutionEntity#deleteCascade
In the example process, the process instance has concurrent child executions. #deleteCascade repeatedly navigates to a process instance's leaves and then removes the leaf and a possible concurrent parent. Then the next leaf is removed, etc. If now the delete root (i.e. execution that #deleteCascade is invoked on) itself is concurrent, then we call #deleteCascade again on it, although it has already been removed. This triggers the end execution listeners of the process instance, which in turn generate a history event for the process instance.

Side problem 1 - incomplete history events:

When the history event is generated, it is only a partial event, i.e. only those properties are set that will be updated in the database. This does not include the removal time of the process instance. So once the process end listener triggers, there is a history event in the entity cache that has a null removal time.
When the history event producer creates subsequent history events (e.g. for the user task in the event subprocess in the example BPMN), it reads the null removal time from the cache

Side problem 2 - reading the removal time from history:

When the removal time strategy is start, the history event producer reads the historic process instance from the database/cache in order to determine the removal time for new history events
This assumes that the instance exists in the history, which is generally not a given (and in combination with problem 1 creates the situation that multiple objects represent the same database entity)
We can argue that history cleanup makes only sense when there is history in the Camunda database tables, so if there is no history, then having no removal time is ok
With CAM-10825 (storing the start time in the runtime entities), we might be able to avoid selecting history data

Bug fix:

The bug fix fixes the root cause, i.e. the process instance end execution listener will no longer trigger too early
For subsequent history events, the history event producer will then select the actual historic process instance from the database (since it is not cached yet) that has the correct removal time set
This is ok as long as we always select the historic process instance from the database before we try to create a partial entity
In reversed order (1. put partial entity into cache; 2. select root instance from cache), the problem would remain. Avoiding this possibility would require a more complex fix, e.g. a mechanism in the entity cache that would allow to update a partial cached entity with its contents in the database. I tried to program this and it appears complex to build in the cache itself and also raises questions about concurrency (the entity in the database may have been updated in the meantime). I decided to accept this bug potential for now.

Thorben Lindhauer added a comment - 07/Oct/19 6:15 PM There are a couple of problems involved here: Root cause: The root cause is that the execution listener end event of the process instance is triggered as soon as the interrupting event triggers When interruption is executed, all child executions of the interrupted scope (here process instance) are deleted via ExecutionEntity#deleteCascade In the example process, the process instance has concurrent child executions. #deleteCascade repeatedly navigates to a process instance's leaves and then removes the leaf and a possible concurrent parent. Then the next leaf is removed, etc. If now the delete root (i.e. execution that #deleteCascade is invoked on) itself is concurrent, then we call #deleteCascade again on it, although it has already been removed. This triggers the end execution listeners of the process instance, which in turn generate a history event for the process instance. Side problem 1 - incomplete history events: When the history event is generated, it is only a partial event, i.e. only those properties are set that will be updated in the database. This does not include the removal time of the process instance. So once the process end listener triggers, there is a history event in the entity cache that has a null removal time. When the history event producer creates subsequent history events (e.g. for the user task in the event subprocess in the example BPMN), it reads the null removal time from the cache Side problem 2 - reading the removal time from history: When the removal time strategy is start , the history event producer reads the historic process instance from the database/cache in order to determine the removal time for new history events This assumes that the instance exists in the history, which is generally not a given (and in combination with problem 1 creates the situation that multiple objects represent the same database entity) We can argue that history cleanup makes only sense when there is history in the Camunda database tables, so if there is no history, then having no removal time is ok With CAM-10825 (storing the start time in the runtime entities), we might be able to avoid selecting history data Bug fix: The bug fix fixes the root cause, i.e. the process instance end execution listener will no longer trigger too early For subsequent history events, the history event producer will then select the actual historic process instance from the database (since it is not cached yet) that has the correct removal time set This is ok as long as we always select the historic process instance from the database before we try to create a partial entity In reversed order (1. put partial entity into cache; 2. select root instance from cache), the problem would remain. Avoiding this possibility would require a more complex fix, e.g. a mechanism in the entity cache that would allow to update a partial cached entity with its contents in the database. I tried to program this and it appears complex to build in the cache itself and also raises questions about concurrency (the entity in the database may have been updated in the meantime). I decided to accept this bug potential for now.

camunda BPM

Details

Description

mgm-controller-panel

This is the controller panel for Smart Panels app

Attachments

Attachments

Activity

Collapse comment: Tassilo Weidner added a comment - 15/Mar/19 5:03 PM

Expand comment: Tassilo Weidner added a comment - 15/Mar/19 5:03 PM

Collapse comment: Sébastien de la Fosse added a comment - 18/Mar/19 12:01 PM

Expand comment: Sébastien de la Fosse added a comment - 18/Mar/19 12:01 PM

Collapse comment: Tassilo Weidner added a comment - 18/Mar/19 1:10 PM

Expand comment: Tassilo Weidner added a comment - 18/Mar/19 1:10 PM

Collapse comment: Tassilo Weidner added a comment - 19/Mar/19 5:02 PM

Expand comment: Tassilo Weidner added a comment - 19/Mar/19 5:02 PM

Collapse comment: Thorben Lindhauer added a comment - 07/Oct/19 6:15 PM

Expand comment: Thorben Lindhauer added a comment - 07/Oct/19 6:15 PM

People

Dates

Salesforce