[OPT-5494] Investigate cause of stage ES crashing

Type: Task
Resolution: Added to backlog
Priority: L2 - Critical
Fix Version/s: 3.7.0-alpha1, 3.7.0
Affects Version/s: None
Component/s: backend
Labels:
None

PM Priority:
2
Effort:
Not defined
Epic Link:
None

Asia recently noticed stage crashing. Some preliminary digging around suggests that something is causing ES to crash. The container is then recreated while Optimize still runs, but the expected indices no longer exist and Optimize gets stuck in a forever loop of failed imports/background processing. asia.malina could reproduce this with the following steps:

Create Report using Hiring demo 5 tenants
View: Process Instance count
Group: Variable - String Var (I think)
Open the viz cog wheel button - switch on custom bucketing toggle

So it could be something to do with either variable parsing or bucket sizing. In the container logs, I found some of the following errors:

 index [optimize-variable-update-instance], type [_doc], id [b8afe60d-e919-4234-811d-bd8663150998], message [ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [value] of type [date] in document with id 'b8afe60d-e919-4234-811d-bd8663150998'. Preview of field's value: 'correlationValue_18719']]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=failed to parse date field [correlationValue_18719] with format [strict_date_optional_time||epoch_millis]]]; nested: ElasticsearchException[Elasticsearch exception [type=date_time_parse_exception, reason=Failed to parse with all enclosed parsers]];]

However, this could be a result of the indices missing rather than the cause. Really we need to tail the containers logs and check what is causing it to be killed to debug and solve this issue,

Notes from testing locally (with doubleVar):

Locally when following these steps it results in excecptions complainign about too many requests and "data too large" (see comment) in the optimize logs. In my ES logs I can then see some outOfMemory errors when it's trying to create empty buckets for the histogram (the doubleVar has extremely large values and the default baseline is 0). I'll attach the logs for reference

This is the controller panel for Smart Panels app

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

elasticsearch.log
37 kB
17/Aug/21 5:10 PM

There are no comments yet on this issue.

Assignee:: Unassigned

Reporter:: Joshua Windels

DRI:: Helene Waechtler

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 14/Aug/21 12:14 AM

Updated:: 02/Aug/22 9:34 AM

Resolved:: 08/Oct/21 3:39 PM

Camunda Optimize

Details

Description

mgm-controller-panel

This is the controller panel for Smart Panels app

Attachments

Attachments

Activity

People

Dates

Salesforce