Uploaded image for project: 'Camunda Optimize'
  1. Camunda Optimize
  2. OPT-5468

Confusing error log on clearing an elasticsearch scroll that is already gone

    XMLWordPrintable

Details

    • Bug Report
    • Resolution: Unresolved
    • L3 - Default
    • None
    • None
    • backend
    • 2
    • S

    Description

      Problem:
      The elastic scroll utility tries to delete a scroll via the clear scroll API . Depending on the situation a scroll may already have expired though (default 1 minute), depending on the load on the elasticcluster or Optimize itself. In these situations Elastic returns a 404 on the delete operation. This 404, being an acceptable result for the delete call, is not gracefully handled though by the elastic client and causes an exception.

      The actual cause is a known bug in the elastic client https://github.com/elastic/elasticsearch/pull/58558#issuecomment-833214499 for which no fix is yet merged.

      Steps to reproduce:
      As the cause is a race condition the only way to reproduce this is to
      1. run Optimize in debug and pause it when a scroll is performed from `ElasticsearchReaderUtil.retrieveScrollResultsTillLimit`
      2. manually delete the scroll via the elastic clear scroll API, or wait one minute for it to expire automatically
      3. resume Optimize,

      Actual result :
      the log should contain an exception of which the cause is:

      Caused by: org.elasticsearch.client.ResponseException: method [DELETE], host [<hostname>:9200 ], URI [/_search/scroll], status line [HTTP/1.1 404 Not Found]
      {"succeeded":true,"num_freed":0}
      	at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:318)
      	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:288)
      	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:262)
      	at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1628)
      	... 22 common frames omitted
      

      Expected result:
      as 404 indicates the scroll is already deleted there should be no error log, at most a debug that the scroll was already deleted

      Hint:

      • There might be one redundant originating request causing this in EventProcessInstanceIndexManager. We fetch the deleted instances at the beginning of the sync and then again for the index cleanup. If we could do all of this with one fetch it would be a nice optimization to make at the same time.

      mgm-controller-panel

        This is the controller panel for Smart Panels app

        Attachments

          Activity

            People

              Unassigned Unassigned
              sebastian.bathke Sebastian Bathke
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Salesforce