Problem:
When a job is executed within a JTA transaction and an exception is thrown upon commit (i.e. invocation of JtaTransactionInterceptor#doCommit), the job's retries are decremented properly but the job's exception message (and incident message) is not populated.
Reason:
Job retries are implemented via a transaction listener. This listener is handed the exception when it is thrown. However, this is only done by the CommandContextInterceptor, meaning that only exceptions thrown from within a command or on its flush are treated that way. Exceptions thrown from a transacrtion synchronization (transaction listener) cannot be caught this way. A common example is JPA which often implements manged flush of the transaction scoped entity manager as TX-Sync on commit. Exceptions thrown by JPA at this point are not visible in the process engine database. Those kinds of exceptions are then only logged in ExecuteJobsRunnable#run.
Possible solution:
Implement job retry decrements in ExecuteJobsRunnable#run instead of the transaction listener. Then, the catched exception can not only be logged but also set on the job. However, we have to make sure that we do not change the existing behavior other than that. Especially an exception thrown within the job command should not be wrapped in any other exception by a command interceptor. Thus, there has to be some way of handing these "out", for example by setting them on the current command.
This is the controller panel for Smart Panels app
[CAM-3303] Job/incident exception message is not populated when exception is from a transaction synchronization
It seems that the integration tests fail on all jboss, wildfly, websphere and weblogic builds.
I revert all changes for this ticket:
https://github.com/camunda/camunda-bpm-platform/commit/da63f241c7b57fe980249ac5ae3be369648d3782
It seems that the ConcurrentJobExecutorTest does change something related to the job execution and does not clean up afterwards. So in odd hours the test is executed before the RuntimeServiceAsyncOperationsTest which than hangs.
"main" prio=10 tid=0x00007f1270006000 nid=0x1c4b in Object.wait() [0x00007f127670f000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000000eae3c030> (a org.camunda.bpm.engine.test.concurrency.ConcurrentJobExecutorTest$ControllableExecuteJobThread) at java.lang.Object.wait(Object.java:485) at org.camunda.bpm.engine.test.concurrency.ControllableThread.returnControlToTestThreadAndWait(ControllableThread.java:57) - locked <0x00000000eae3c030> (a org.camunda.bpm.engine.test.concurrency.ConcurrentJobExecutorTest$ControllableExecuteJobThread) at org.camunda.bpm.engine.test.concurrency.ConcurrentJobExecutorTest$ControllableExecuteJobThread$1$1.execute(ConcurrentJobExecutorTest.java:434) at org.camunda.bpm.engine.test.concurrency.ConcurrentJobExecutorTest$ControllableExecuteJobThread$1$1.execute(ConcurrentJobExecutorTest.java:430) at org.camunda.bpm.engine.test.concurrency.ConcurrentJobExecutorTest$ControllableExecuteJobThread$1$1.execute(ConcurrentJobExecutorTest.java:430) at org.camunda.bpm.engine.test.concurrency.ConcurrentJobExecutorTest$ControllableExecuteJobThread$1$1.execute(ConcurrentJobExecutorTest.java:430) at org.camunda.bpm.engine.test.concurrency.ConcurrentJobExecutorTest$ControllableExecuteJobThread$1$1.execute(ConcurrentJobExecutorTest.java:430) at org.camunda.bpm.engine.test.concurrency.ConcurrentJobExecutorTest$ControllableExecuteJobThread$1$1.execute(ConcurrentJobExecutorTest.java:430) at org.camunda.bpm.engine.test.concurrency.ConcurrentJobExecutorTest$ControllableExecuteJobThread$1$1.execute(ConcurrentJobExecutorTest.java:430) at org.camunda.bpm.engine.test.concurrency.ConcurrentJobExecutorTest$ControllableExecuteJobThread$1$1.execute(ConcurrentJobExecutorTest.java:430) at org.camunda.bpm.engine.impl.interceptor.CommandExecutorImpl.execute(CommandExecutorImpl.java:24) at org.camunda.bpm.engine.impl.interceptor.CommandContextInterceptor.execute(CommandContextInterceptor.java:104) at org.camunda.bpm.engine.impl.interceptor.ProcessApplicationContextInterceptor.execute(ProcessApplicationContextInterceptor.java:66) at org.camunda.bpm.engine.impl.interceptor.LogInterceptor.execute(LogInterceptor.java:30) at org.camunda.bpm.engine.test.concurrency.ConcurrentJobExecutorTest$ControllableExecuteJobThread$1.execute(ConcurrentJobExecutorTest.java:427) at org.camunda.bpm.engine.test.concurrency.ConcurrentJobExecutorTest$ControllableExecuteJobThread$1.execute(ConcurrentJobExecutorTest.java:427) at org.camunda.bpm.engine.test.concurrency.ConcurrentJobExecutorTest$ControllableExecuteJobThread$1.execute(ConcurrentJobExecutorTest.java:427) at org.camunda.bpm.engine.test.concurrency.ConcurrentJobExecutorTest$ControllableExecuteJobThread$1.execute(ConcurrentJobExecutorTest.java:427) at org.camunda.bpm.engine.test.concurrency.ConcurrentJobExecutorTest$ControllableExecuteJobThread$1.execute(ConcurrentJobExecutorTest.java:427) at org.camunda.bpm.engine.test.concurrency.ConcurrentJobExecutorTest$ControllableExecuteJobThread$1.execute(ConcurrentJobExecutorTest.java:427) at org.camunda.bpm.engine.test.concurrency.ConcurrentJobExecutorTest$ControllableExecuteJobThread$1.execute(ConcurrentJobExecutorTest.java:427) at org.camunda.bpm.engine.test.concurrency.ConcurrentJobExecutorTest$ControllableExecuteJobThread$1.execute(ConcurrentJobExecutorTest.java:427) at org.camunda.bpm.engine.impl.cmd.ExecuteJobsCmd.executeJob(ExecuteJobsCmd.java:120) at org.camunda.bpm.engine.impl.cmd.ExecuteJobsCmd.execute(ExecuteJobsCmd.java:93) at org.camunda.bpm.engine.impl.interceptor.CommandExecutorImpl.execute(CommandExecutorImpl.java:24) at org.camunda.bpm.engine.impl.interceptor.CommandContextInterceptor.execute(CommandContextInterceptor.java:104) at org.camunda.bpm.engine.impl.interceptor.ProcessApplicationContextInterceptor.execute(ProcessApplicationContextInterceptor.java:66) at org.camunda.bpm.engine.impl.interceptor.LogInterceptor.execute(LogInterceptor.java:30) at org.camunda.bpm.engine.impl.ManagementServiceImpl.executeJob(ManagementServiceImpl.java:109) at org.camunda.bpm.engine.test.api.runtime.RuntimeServiceAsyncOperationsTest.testDeleteProcessInstancesAsyncWithQuery(RuntimeServiceAsyncOperationsTest.java:193) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:168) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.camunda.bpm.engine.test.ProcessEngineRule$1.evaluate(ProcessEngineRule.java:165) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) Locked ownable synchronizers: - None
Hi philipp.ossler,
does this fix requires more connections to the database? As we see issues in the engine-it suite on oracle and sqlserver where the engine fails to acquire a new connection:
- https://hq2.camunda.com/jenkins/ci/job/7.6-engine-IT-XA-wildfly-100-sqlserver-2014/44/testReport/org.camunda.bpm.integrationtest.functional.classloading.variables/SerializableVariableTest/testResolveClass/
- https://hq2.camunda.com/jenkins/ci/job/7.6-EE-engine-IT-jbosseap-62-oracle-10/lastCompletedBuild/testReport/junit/org.camunda.bpm.integrationtest.functional.ejb.remote/RemoteSingletonBeanInvocationTest/testMultipleInvocations/
- https://hq2.camunda.com/jenkins/ci/job/7.6-EE-engine-IT-jbosseap-63-oracle-10/44/testReport/junit/org.camunda.bpm.integrationtest.functional.ejb.remote/RemoteSFSBInvocationTest/testInvokeBean/
- https://hq2.camunda.com/jenkins/ci/job/7.6-engine-IT-XA-jboss-72-oracle-10/43/testReport/junit/org.camunda.bpm.integrationtest.functional.ejb.remote/RemoteSFSBInvocationTest/testMultipleInvocations/
/cc lipphardt
Also we got an out of memory issue. Could this be related to this issue?
- https://hq2.camunda.com/jenkins/ci/view/Broken/job/7.6-platform-jdks/95/container=h2,jdk=ibm-jdk-6-
latest/testReport/junit/org.camunda.bpm.engine.spring.test.transaction/SpringTransactionIntegrationTest/testBasicActivitiSpringIntegration/
I created a job which will hopefully generate a heap dump on out of memory if I configured everything correctly. Maybe we can see an issue there:
https://hq2.camunda.com/jenkins/ci/job/TEST-ibm-jdk-heapdump/
Hi philipp.ossler, meyer, lipphardt,
I debugged at least the memory issue and found out that the engine-spring test suites creates a lot of engines without killing them. Philipps commit added more test methods to a test class which for whatever reason also leads to a lot more engines. I found 9 engines at the end of the test. I think this is a problem of the engine-spring test suite and the way the engines are created and not probably deleted.
I create CAM-6727 for that issue. Until this the jdk ibm test suite will fail on odd hours.
@meyer could you decided on the importance of CAM-6727?
Cheers,
Sebastian
PS: We think the connection issue is also somehow related to this but I didn't had time to debug it. Also we maybe wait for fixes for this issue before we backport these changes.
Just found a problem here: on the following servers, the MDB* is not invoked anymore: WLS, WAS, GF
- org.camunda.bpm.container.impl.ejb.JobExecutionHandlerMDB
Hi philipp.ossler,
the integrations fail, see for example:
https://hq2.camunda.com/jenkins/ci/view/Broken/job/7.6-engine-IT-wildfly-82-h2/218/testReport/
https://hq2.camunda.com/jenkins/ci/view/Broken/job/7.6-EE-engine-IT-jbosseap-70-h2/221/testReport/
https://hq2.camunda.com/jenkins/ci/view/Broken/job/7.6-EE-engine-IT-weblogic-12R1-h2/279/testReport/