-
Bug Report
-
Resolution: Fixed
-
L3 - Default
-
None
-
None
-
None
-
3
-
Not defined
Brief summary of the bug. What is it ? Where is it ?
In a long-running benchmark, the deployment of Optimize's importer starts to crash loop sometimes for a few minutes and sometimes for many hours. So that, the importer starts to lag behind.
The first time the crash loop has been observed was when there were 2.4 million (mostly completed) process instances in the system. There, the pod was crash looping for ~10h, and then suddenly the situation got resolved on its own magically. Typically, the following is logged when the pod is crash looping:
2022-05-30 11:48:03.014 CEST wait-for-it.sh: waiting 60 seconds for elasticsearch:9200 2022-05-30 11:48:03.019 CEST wait-for-it.sh: elasticsearch:9200 is available after 0 seconds 2022-05-30 11:48:03.024 CEST {} 2022-05-30 11:48:03.024 CEST Starting Camunda Optimize 3.8.1... 2022-05-30 11:48:03.024 CEST {} 2022-05-30 11:48:10.339 CEST [main] INFO o.c.o.s.e.OptimizeElasticsearchClientFactory - Initializing Elasticsearch rest client... 2022-05-30 11:48:11.734 CEST [main] INFO o.c.o.s.e.OptimizeElasticsearchClientFactory - Elasticsearch client has successfully been started 2022-05-30 11:48:11.745 CEST [main] WARN org.elasticsearch.client.RestClient - request [HEAD http://elasticsearch:9200/optimize-metadata?ignore_throttled=false&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=false] returned 1 warnings: [299 Elasticsearch-7.17.2-de7261de50d90919ae53b0eff9413fd7e5307301 "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."] 2022-05-30 11:48:12.039 CEST [main] WARN org.elasticsearch.client.RestClient - request [HEAD http://elasticsearch:9200/optimize-alert,optimize-business-key,optimize-collection,optimize-dashboard,optimize-dashboard-share,optimize-decision-definition,optimize-license,optimize-metadata,optimize-onboarding-state,optimize-process-definition?ignore_throttled=false&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=false] returned 1 warnings: [299 Elasticsearch-7.17.2-de7261de50d90919ae53b0eff9413fd7e5307301 "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."] 2022-05-30 11:48:12.046 CEST [main] WARN org.elasticsearch.client.RestClient - request [HEAD http://elasticsearch:9200/optimize-report-share,optimize-settings,optimize-tenant,optimize-terminated-user-session,optimize-variable-update-instance,optimize-event,optimize-event-process-definition,optimize-event-process-mapping,optimize-event-process-publish-state,optimize-import-index?ignore_throttled=false&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=false] returned 1 warnings: [299 Elasticsearch-7.17.2-de7261de50d90919ae53b0eff9413fd7e5307301 "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."] 2022-05-30 11:48:12.050 CEST [main] WARN org.elasticsearch.client.RestClient - request [HEAD http://elasticsearch:9200/optimize-timestamp-based-import-index,optimize-position-based-import-index,optimize-combined-report,optimize-single-decision-report,optimize-single-process-report,optimize-external-process-variable,optimize-variable-label,optimize-process-goals?ignore_throttled=false&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=false] returned 1 warnings: [299 Elasticsearch-7.17.2-de7261de50d90919ae53b0eff9413fd7e5307301 "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."] 2022-05-30 11:48:12.050 CEST [main] INFO o.c.o.s.e.s.ElasticSearchSchemaManager - Updating Optimize schema... 2022-05-30 11:48:12.971 CEST [main] INFO o.c.o.s.e.s.ElasticSearchSchemaManager - Creating or updating template with name optimize-variable-update-instance_v2 2022-05-30 11:48:13.047 CEST [main] WARN org.elasticsearch.client.RestClient - request [PUT http://elasticsearch:9200/_template/optimize-variable-update-instance_v2?master_timeout=30s] returned 1 warnings: [299 Elasticsearch-7.17.2-de7261de50d90919ae53b0eff9413fd7e5307301 "Legacy index templates are deprecated in favor of composable templates."] 2022-05-30 11:48:13.060 CEST [main] INFO o.c.o.s.e.s.ElasticSearchSchemaManager - Creating or updating template with name optimize-event_v4 2022-05-30 11:48:13.149 CEST [main] WARN org.elasticsearch.client.RestClient - request [PUT http://elasticsearch:9200/_template/optimize-event_v4?master_timeout=30s] returned 1 warnings: [299 Elasticsearch-7.17.2-de7261de50d90919ae53b0eff9413fd7e5307301 "Legacy index templates are deprecated in favor of composable templates."] 2022-05-30 11:48:13.651 CEST [main] INFO o.c.o.s.e.s.ElasticSearchSchemaManager - Creating or updating template with name optimize-external-process-variable_v2 2022-05-30 11:48:13.661 CEST [main] WARN org.elasticsearch.client.RestClient - request [PUT http://elasticsearch:9200/_template/optimize-external-process-variable_v2?master_timeout=30s] returned 1 warnings: [299 Elasticsearch-7.17.2-de7261de50d90919ae53b0eff9413fd7e5307301 "Legacy index templates are deprecated in favor of composable templates."] 2022-05-30 11:48:14.161 CEST [main] INFO o.c.o.s.e.s.ElasticSearchSchemaManager - Finished updating Optimize schema. 2022-05-30 11:48:14.994 CEST [main] INFO o.c.o.s.i.CCSaaSUserIdentityCache - Initializing Cloud user identity sync. 2022-05-30 11:48:15.037 CEST [main] INFO o.c.o.s.i.CCSaaSUserIdentityCache - Scheduling Cloud user identity sync 2022-05-30 11:48:15.049 CEST [CCSaaSUserIdentityCache-1] INFO o.c.o.rest.cloud.CCSaaSUserClient - Fetching Cloud users. 2022-05-30 11:48:15.051 CEST [main] INFO o.c.o.s.telemetry.TelemetryScheduler - Initializing TelemetryScheduler 2022-05-30 11:48:15.056 CEST [CCSaaSUserIdentityCache-1] INFO o.c.o.rest.cloud.CCSaaSUserClient - Fetching access token for Cloud user retrieval. 2022-05-30 11:48:15.057 CEST [main] INFO o.c.o.s.telemetry.TelemetryScheduler - Starting telemetry scheduling 2022-05-30 11:48:15.058 CEST [TelemetryScheduler-1] INFO o.c.o.s.telemetry.TelemetryScheduler - Checking whether telemetry data can be sent. 2022-05-30 11:48:15.063 CEST [TelemetryScheduler-1] INFO o.c.o.s.telemetry.TelemetryScheduler - Telemetry disabled. 2022-05-30 11:48:15.132 CEST [CCSaaSUserIdentityCache-1] ERROR o.c.o.s.i.CCSaaSUserIdentityCache - Could not sync Cloud user identities. Will stop retrying as next scheduled sync is approaching 2022-05-30 11:48:15.132 CEST org.camunda.optimize.service.exceptions.OptimizeRuntimeException: There was a problem fetching access token. at org.camunda.optimize.rest.cloud.CCSaaSUserClient.refreshAccessTokenIfRequired(CCSaaSUserClient.java:111) at org.camunda.optimize.rest.cloud.CCSaaSUserClient.performRequest(CCSaaSUserClient.java:87) at org.camunda.optimize.rest.cloud.CCSaaSUserClient.fetchAllCloudUsers(CCSaaSUserClient.java:71) at org.camunda.optimize.service.identity.CCSaaSUserIdentityCache.populateCache(CCSaaSUserIdentityCache.java:56) at org.camunda.optimize.service.identity.AbstractIdentityCache.synchronizeIdentities(AbstractIdentityCache.java:148) at org.camunda.optimize.service.identity.AbstractIdentityCache.syncIdentitiesWithRetry(AbstractIdentityCache.java:114) at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: org.apache.http.client.ClientProtocolException: null at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:187) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108) at org.camunda.optimize.rest.cloud.CCSaaSUserClient.refreshAccessTokenIfRequired(CCSaaSUserClient.java:98) ... 12 common frames omitted Caused by: org.apache.http.ProtocolException: Target host is not specified at org.apache.http.impl.conn.DefaultRoutePlanner.determineRoute(DefaultRoutePlanner.java:71) at org.apache.http.impl.client.InternalHttpClient.determineRoute(InternalHttpClient.java:125) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) ... 15 common frames omitted 2022-05-30 11:48:15.361 CEST [main] INFO o.c.o.s.i.PlatformUserTaskIdentityCache - Initializing platform assignee/candidateGroup identity sync. 2022-05-30 11:48:15.363 CEST [main] INFO o.c.o.s.i.PlatformUserTaskIdentityCache - Scheduling platform assignee/candidateGroup identity sync 2022-05-30 11:48:15.366 CEST [PlatformUserTaskIdentityCache-1] INFO o.c.o.s.i.PlatformUserTaskIdentityCache - platform assignee/candidateGroup identity sync complete 2022-05-30 11:48:15.734 CEST [main] INFO o.c.o.s.i.i.m.ExternalVariableUpdateEngineImportMediator - Engine 'optimize' could not be found in the configuration 2022-05-30 11:48:18.937 CEST [TerminatedSessionService-1] INFO o.c.o.s.e.w.ElasticsearchWriterUtil - Progress of task (ID:jUpMpfOySbOnMitPVJ4THA:274890) on terminated user sessions with timestamp older than 2022-05-30T08:48:18.731Z: 0% (total: 0, updated: 0, created: 0, deleted: 0). Completed is false 2022-05-30 11:48:19.243 CEST [main] INFO o.c.o.s.cleanup.CleanupScheduler - Initializing OptimizeCleanupScheduler 2022-05-30 11:48:19.253 CEST [main] INFO o.c.o.s.cleanup.CleanupScheduler - Stopping cleanup scheduling 2022-05-30 11:48:23.855 CEST [main] WARN org.elasticsearch.client.RestClient - request [HEAD http://elasticsearch:9200/optimize-process-instance-customer_onboarding_en?ignore_throttled=false&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=false] returned 1 warnings: [299 Elasticsearch-7.17.2-de7261de50d90919ae53b0eff9413fd7e5307301 "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."] 2022-05-30 11:52:07.000 CEST Back-off restarting failed container
Please note: The exception org.camunda.optimize.service.exceptions.OptimizeRuntimeException: There was a problem fetching access token. in the logs also happens when the pod got scheduled without any failures. So, to my understanding, this does not cause the crash loop.
It impacts the system in the sense that the data in Optimize is not up-to-date and lagging behind depending on how long the crash loop is ongoing. Luckily, Optimize is able to catch up quickly, for example, with 10h lagging behind, Optimize needed 30 minutes to be up-to-date.
Steps to reproduce:
- Start a long-running test and observe the pod scheduling
Actual result:
- The importer gets into a crash loop for some time
Expected result:
- There is no crash loop anymore.
This is the controller panel for Smart Panels app
- is related to
-
OPT-6693 Adjust onboarding importer to run after application has started
- Done