Uploaded image for project: 'Camunda Optimize'
  1. Camunda Optimize
  2. OPT-5494

Investigate cause of stage ES crashing

    XMLWordPrintable

Details

    • Task
    • Resolution: Added to backlog
    • L2 - Critical
    • 3.7.0-alpha1, 3.7.0
    • None
    • backend
    • None
    • 2
    • Not defined

    Description

      Asia recently noticed stage crashing. Some preliminary digging around suggests that something is causing ES to crash. The container is then recreated while Optimize still runs, but the expected indices no longer exist and Optimize gets stuck in a forever loop of failed imports/background processing. asia.malina could reproduce this with the following steps:

      1. Create Report using Hiring demo 5 tenants
      2. View: Process Instance count
      3. Group: Variable - String Var (I think)
      4. Open the viz cog wheel button - switch on custom bucketing toggle

      So it could be something to do with either variable parsing or bucket sizing. In the container logs, I found some of the following errors:

       index [optimize-variable-update-instance], type [_doc], id [b8afe60d-e919-4234-811d-bd8663150998], message [ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [value] of type [date] in document with id 'b8afe60d-e919-4234-811d-bd8663150998'. Preview of field's value: 'correlationValue_18719']]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=failed to parse date field [correlationValue_18719] with format [strict_date_optional_time||epoch_millis]]]; nested: ElasticsearchException[Elasticsearch exception [type=date_time_parse_exception, reason=Failed to parse with all enclosed parsers]];]
      

      However, this could be a result of the indices missing rather than the cause. Really we need to tail the containers logs and check what is causing it to be killed to debug and solve this issue,

       

      Notes from testing locally (with doubleVar):

      Locally when following these steps it results in excecptions complainign about too many requests and "data too large" (see comment) in the optimize logs. In my ES logs I can then see some outOfMemory errors when it's trying to create empty buckets for the histogram (the doubleVar has extremely large values and the default baseline is 0). I'll attach the logs for reference

      mgm-controller-panel

        This is the controller panel for Smart Panels app

        Attachments

          Issue Links

            Activity

              People

                Unassigned Unassigned
                joshua.windels Joshua Windels
                Helene Waechtler Helene Waechtler
                Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved:

                  Salesforce