Uploaded image for project: 'Camunda Optimize'
  1. Camunda Optimize
  2. OPT-7120

Initialization of Management Dashboards blocks Optimize startup

    XMLWordPrintable

Details

    • Task
    • Resolution: Fixed
    • L2 - Critical
    • 3.9.5, 3.10.3
    • None
    • None
    • None
    • S

    Description

      What/Where is the issue ?

      Issue was discovered during investigation of this incident: https://app.incident.io/camunda/incidents/307

      What happens is basically this: Optimize starts, PostConstruct from ManagementDashboardService is called, that sends lots of requests, ElasticSearch is overwhelmed, requests get queued, main thread gets blocked, Optimize liveness probe doesn't answer, kubernetes kills pod and restarts, everything starts over again, crash looping

      Upon analysis of the Optimize Importer thread heap, it was noticed that the main thread is blocked by the ManagementDashboardService:

      "main" #1 prio=5 os_prio=0 cpu=9652.67ms elapsed=167.25s tid=0x00007f41d1ee1800 nid=0x1c waiting on condition  [0x00007f41d207a000]
         java.lang.Thread.State: TIMED_WAITING (sleeping)
          at java.lang.Thread.sleep(java.base@11.0.18/Native Method)
          at org.camunda.optimize.service.es.writer.ElasticsearchWriterUtil.waitUntilTaskIsFinished(ElasticsearchWriterUtil.java:403)
          at org.camunda.optimize.service.es.writer.ElasticsearchWriterUtil.tryDeleteByQueryRequest(ElasticsearchWriterUtil.java:234)
          at org.camunda.optimize.service.es.writer.DashboardWriter.deleteManagementDashboard(DashboardWriter.java:202)
          at org.camunda.optimize.service.dashboard.ManagementDashboardService.init(ManagementDashboardService.java:70)
          at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.18/Native Method)
          at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@11.0.18/NativeMethodAccessorImpl.java:62)
          at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.18/DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(java.base@11.0.18/Method.java:566)
          at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleElement.invoke(InitDestroyAnnotationBeanPostProcessor.java:389)
          at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleMetadata.invokeInitMethods(InitDestroyAnnotationBeanPostProcessor.java:333)
          at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor.postProcessBeforeInitialization(InitDestroyAnnotationBeanPostProcessor.java:157)
          at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyBeanPostProcessorsBeforeInitialization(AbstractAutowireCapableBeanFactory.java:440)
          at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1796)
          at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:620)
          at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542)
          at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:335)
          at org.springframework.beans.factory.support.AbstractBeanFactory$$Lambda$311/0x00000001003ac440.getObject(Unknown Source)
          at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234)
          - locked <0x00000000eb972690> (a java.util.concurrent.ConcurrentHashMap)
          at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:333)
          at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:208)
          at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:955)
          at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:920)
          at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:583)
          - locked <0x00000000eb945570> (a java.lang.Object)
          at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:147)
          at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:731)
          at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:408)
          at org.springframework.boot.SpringApplication.run(SpringApplication.java:307)
          at org.camunda.optimize.Main.main(Main.java:30)

      By analyzing the code, the @PostConstruct tag is used on the initialization of the management dashboard, which fires several requests to elasticsearch. Since the main thread is blocked by that, the liveliness probe doesn't react and kubernetes kills the pod before it could start.

      Solution Proposal

      1) Make it configurable whether an instance should be the one creating the management dashboard. Like this we can stop the importer with being tasked to do that and leave it up to the webapp to do so. Default should be "true" i.e. by default the instances create the management dashboard
      2) Check whether it really is necessary to delete and re-create all management dashboards at every start-up. A mechanism akin to how the instant preview dashboards are created could be used instead  For now we will leave this as is, we assume that the reason for recreation is that this is easier than migrating in case of changes to management entities. We may evaluate this further in a follow up
      3) Make sure the management dashboard creation/deletion runs in a separate thread than main. Currently it is running with the @PostConstruct tag. Instead it should be executed similar to what is described in OPT-6771#

      Testing Notes

      Only solution part 1) can be tested easily:

      1) setup a clean ES
      2) set the config managementEntities.createOnStartup (env varCAMUNDA_OPTIMIZE_ENTITY_CREATE_ON_STARTUP) to false 
      3) start Optimize and confirm no management entities were created

      and:

      1) setup a clean ES
      2) set the config managementEntities.createOnStartup (env var CAMUNDA_OPTIMIZE_ENTITY_CREATE_ON_STARTUP) to true
      3) start Optimize and confirm management entities were created

      mgm-controller-panel

        This is the controller panel for Smart Panels app

        Attachments

          Issue Links

            Activity

              People

                Unassigned Unassigned
                giuliano.rodrigues-lima Giuliano Rodrigues Lima
                Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved:

                  Salesforce