Any distribution. Tested in spring boot 7.17.0-ee.
Description (Required on creation; please attach any relevant screenshots, stacktraces, log files, etc. to the ticket):
If a job happens to take longer than the defined job lock expiration time to complete, timing issues with follow-up jobs can lead to the case where each successive job execution thread generates an OLE due to the actions of the previous thread.
The test case in the example project example-project.7z is designed to reproduce the issue.
Run example-project.7z. It demonstrates the following behavior:
- Thread 1 executes job A. This takes longer than the defined job lock expiration time.
- Thread 2 picks up job A after the lock has expired. The job acquisition alters job A.
- Thread 1 finishes and fails with an OLE since the job acquisition already altered Job A.
- Thread 1 triggers the failure job listener that alters job A. The retries are not decremented due to the OLE.
- Thread 1 schedules job A again.
- Thread 1 picks up job A after the failure listeners cleared the lock attributes.
- Thread 2 finishes and fails with an OLE since the failure listener of Thread 1 already altered Job A.
- Steps 4 - 7 repeat indefinitely with alternating threads
- Infinite loop of jobs with OLEs.
- One OLE for a job execution that exceeds the lock expiration time.
- A successful job execution for the next job execution thread.
- The expired job that fails with the OLE runs the job failure listener and clears up all lock attributes of the job.
- The job can be picked up again and runs in parallel with concurrent threads.
- The expired job could detect that it is executed again already because it was overdue and therefore failed with an OLE.
- The expired job does not blindly adjust the job if the lock attributes have been altered already, for example.