-
Type:
Task
-
Resolution: Fixed
-
Priority:
L3 - Default
-
None
-
Affects Version/s: None
-
Component/s: continuous integration
Problem:
The Docker Swarm in our DC died several times in the last few weeks. Most of the time it happened during the night with the effect that the Jenkins had really big build queues which weren't processed anymore. The Swarm had to be restarted manually.
AT:
- Create a plan on how to investigate these outages in a structured way, e.g.
- harvesting logfiles of consul, swarm
- logging load of the machines and the Jenkins