Uploaded image for project: 'Camunda Optimize'
  1. Camunda Optimize
  2. OPT-5494

Investigate cause of stage ES crashing

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Added to backlog
    • Icon: L2 - Critical L2 - Critical
    • 3.7.0-alpha1, 3.7.0
    • None
    • backend
    • None
    • 2
    • Not defined

      Asia recently noticed stage crashing. Some preliminary digging around suggests that something is causing ES to crash. The container is then recreated while Optimize still runs, but the expected indices no longer exist and Optimize gets stuck in a forever loop of failed imports/background processing. asia.malina could reproduce this with the following steps:

      1. Create Report using Hiring demo 5 tenants
      2. View: Process Instance count
      3. Group: Variable - String Var (I think)
      4. Open the viz cog wheel button - switch on custom bucketing toggle

      So it could be something to do with either variable parsing or bucket sizing. In the container logs, I found some of the following errors:

       index [optimize-variable-update-instance], type [_doc], id [b8afe60d-e919-4234-811d-bd8663150998], message [ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [value] of type [date] in document with id 'b8afe60d-e919-4234-811d-bd8663150998'. Preview of field's value: 'correlationValue_18719']]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=failed to parse date field [correlationValue_18719] with format [strict_date_optional_time||epoch_millis]]]; nested: ElasticsearchException[Elasticsearch exception [type=date_time_parse_exception, reason=Failed to parse with all enclosed parsers]];]
      

      However, this could be a result of the indices missing rather than the cause. Really we need to tail the containers logs and check what is causing it to be killed to debug and solve this issue,

       

      Notes from testing locally (with doubleVar):

      Locally when following these steps it results in excecptions complainign about too many requests and "data too large" (see comment) in the optimize logs. In my ES logs I can then see some outOfMemory errors when it's trying to create empty buckets for the histogram (the doubleVar has extremely large values and the default baseline is 0). I'll attach the logs for reference

        This is the controller panel for Smart Panels app

              Unassigned Unassigned
              joshua.windels Joshua Windels
              Helene Waechtler Helene Waechtler
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: