Uploaded image for project: 'camunda BPM'
  1. camunda BPM
  2. CAM-9934

Process status after interruptive event subprocess

      Steps to reproduce
      See failing test case.

      Expected behavior
      If the activity instances are canceled by the interrupting event subprocess, ...

      1. the status of the historical process instance should remain ACTIVE
      2. no endDate should be set to the historical process instance

      Observed behavior
      If the activity instances are canceled by the interrupting event subprocess, ...

      1. the status of the historical process instance should remain INTERNALLY_TERMINATED
      2. an endDate is set to the historical process instance

      Problem 1
      Wrong state of the historic process instance is written to the database.

      Problem 2
      Precondition: The removal time strategy is set to start and a time to live is specified

      1. The process instance end event in the historic event producer is triggered
        • This writes the data to be updated related to the historic process instance (e. g. end time, duration, etc.) to the database entity cache
        • The removal time is in this case null because it hasn't been changed
      2. All subsequent historic events (e. g. creating a historic task instance) within the same transaction retrieve the removal time of null from the database entity cache and these events are inserted with a removal time of null to the database

      Please see the following test case for problem 2:
      https://github.com/camunda/camunda-bpm-platform/blob/master/engine/src/test/java/org/camunda/bpm/engine/test/history/HistoricProcessInstanceStateTest.java#L290-L322

        This is the controller panel for Smart Panels app

            [CAM-9934] Process status after interruptive event subprocess

            Hi Sébastien,

            thanks for reaching out to us with your question.

            I will investigate if this is a bug or expected behavior and come back to you as soon as possible.

            Cheers,
            Tassilo

            Tassilo Weidner added a comment - Hi Sébastien, thanks for reaching out to us with your question. I will investigate if this is a bug or expected behavior and come back to you as soon as possible. Cheers, Tassilo

            Hello Tassilo,

            Thank you for your time and effort.

            I noticed that the status has changed to "ready". Is there any information on the issue?

            Kind regards,

            Sébastien

            Sébastien de la Fosse added a comment - Hello Tassilo, Thank you for your time and effort. I noticed that the status has changed to "ready". Is there any information on the issue? Kind regards, Sébastien

            Hey Sébastien,

            the behavior is indeed inconsistent. We are currently investigating what would be the expected behavior in this situation and come back to you as soon as we have more to share. We will come back to you tomorrow at the latest.

            Cheers,
            Tassilo

            Tassilo Weidner added a comment - Hey Sébastien, the behavior is indeed inconsistent. We are currently investigating what would be the expected behavior in this situation and come back to you as soon as we have more to share. We will come back to you tomorrow at the latest. Cheers, Tassilo

            Hey Sébastien,

            thank you for your patience.

            We came to the conclusion that this behavior is indeed unexpected.

            What would be the expected behavior?
            If the activity instances are canceled by the interrupting event subprocess, ...

            1. the status of the historical process instance should remain ACTIVE
            2. no endDate should be set to the historical process instance

            Right now, there are no plans to fix this bug in the future. However, we would appreciate it if you could contribute a fix to our code base by raising a Pull Request on GitHub [1].

            Cheers,
            Tassilo

            [1] https://github.com/camunda/camunda-bpm-platform/pulls

            Tassilo Weidner added a comment - Hey Sébastien, thank you for your patience. We came to the conclusion that this behavior is indeed unexpected. What would be the expected behavior? If the activity instances are canceled by the interrupting event subprocess, ... the status of the historical process instance should remain ACTIVE no endDate should be set to the historical process instance Right now, there are no plans to fix this bug in the future. However, we would appreciate it if you could contribute a fix to our code base by raising a Pull Request on GitHub [1] . Cheers, Tassilo [1] https://github.com/camunda/camunda-bpm-platform/pulls

            There are a couple of problems involved here:

            Root cause:

            • The root cause is that the execution listener end event of the process instance is triggered as soon as the interrupting event triggers
            • When interruption is executed, all child executions of the interrupted scope (here process instance) are deleted via ExecutionEntity#deleteCascade
            • In the example process, the process instance has concurrent child executions. #deleteCascade repeatedly navigates to a process instance's leaves and then removes the leaf and a possible concurrent parent. Then the next leaf is removed, etc. If now the delete root (i.e. execution that #deleteCascade is invoked on) itself is concurrent, then we call #deleteCascade again on it, although it has already been removed. This triggers the end execution listeners of the process instance, which in turn generate a history event for the process instance.

            Side problem 1 - incomplete history events:

            • When the history event is generated, it is only a partial event, i.e. only those properties are set that will be updated in the database. This does not include the removal time of the process instance. So once the process end listener triggers, there is a history event in the entity cache that has a null removal time.
            • When the history event producer creates subsequent history events (e.g. for the user task in the event subprocess in the example BPMN), it reads the null removal time from the cache

            Side problem 2 - reading the removal time from history:

            • When the removal time strategy is start, the history event producer reads the historic process instance from the database/cache in order to determine the removal time for new history events
            • This assumes that the instance exists in the history, which is generally not a given (and in combination with problem 1 creates the situation that multiple objects represent the same database entity)
            • We can argue that history cleanup makes only sense when there is history in the Camunda database tables, so if there is no history, then having no removal time is ok
            • With CAM-10825 (storing the start time in the runtime entities), we might be able to avoid selecting history data

            Bug fix:

            • The bug fix fixes the root cause, i.e. the process instance end execution listener will no longer trigger too early
            • For subsequent history events, the history event producer will then select the actual historic process instance from the database (since it is not cached yet) that has the correct removal time set
            • This is ok as long as we always select the historic process instance from the database before we try to create a partial entity
            • In reversed order (1. put partial entity into cache; 2. select root instance from cache), the problem would remain. Avoiding this possibility would require a more complex fix, e.g. a mechanism in the entity cache that would allow to update a partial cached entity with its contents in the database. I tried to program this and it appears complex to build in the cache itself and also raises questions about concurrency (the entity in the database may have been updated in the meantime). I decided to accept this bug potential for now.

            Thorben Lindhauer added a comment - There are a couple of problems involved here: Root cause: The root cause is that the execution listener end event of the process instance is triggered as soon as the interrupting event triggers When interruption is executed, all child executions of the interrupted scope (here process instance) are deleted via ExecutionEntity#deleteCascade In the example process, the process instance has concurrent child executions. #deleteCascade repeatedly navigates to a process instance's leaves and then removes the leaf and a possible concurrent parent. Then the next leaf is removed, etc. If now the delete root (i.e. execution that #deleteCascade is invoked on) itself is concurrent, then we call #deleteCascade again on it, although it has already been removed. This triggers the end execution listeners of the process instance, which in turn generate a history event for the process instance. Side problem 1 - incomplete history events: When the history event is generated, it is only a partial event, i.e. only those properties are set that will be updated in the database. This does not include the removal time of the process instance. So once the process end listener triggers, there is a history event in the entity cache that has a null removal time. When the history event producer creates subsequent history events (e.g. for the user task in the event subprocess in the example BPMN), it reads the null removal time from the cache Side problem 2 - reading the removal time from history: When the removal time strategy is start , the history event producer reads the historic process instance from the database/cache in order to determine the removal time for new history events This assumes that the instance exists in the history, which is generally not a given (and in combination with problem 1 creates the situation that multiple objects represent the same database entity) We can argue that history cleanup makes only sense when there is history in the Camunda database tables, so if there is no history, then having no removal time is ok With CAM-10825 (storing the start time in the runtime entities), we might be able to avoid selecting history data Bug fix: The bug fix fixes the root cause, i.e. the process instance end execution listener will no longer trigger too early For subsequent history events, the history event producer will then select the actual historic process instance from the database (since it is not cached yet) that has the correct removal time set This is ok as long as we always select the historic process instance from the database before we try to create a partial entity In reversed order (1. put partial entity into cache; 2. select root instance from cache), the problem would remain. Avoiding this possibility would require a more complex fix, e.g. a mechanism in the entity cache that would allow to update a partial cached entity with its contents in the database. I tried to program this and it appears complex to build in the cache itself and also raises questions about concurrency (the entity in the database may have been updated in the meantime). I decided to accept this bug potential for now.

              Unassigned Unassigned
              sdelafosse Sébastien de la Fosse
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: