Uploaded image for project: 'Camunda Optimize'
  1. Camunda Optimize
  2. OPT-5468

Confusing error log on clearing an elasticsearch scroll that is already gone

    • Icon: Bug Report Bug Report
    • Resolution: Unresolved
    • Icon: L3 - Default L3 - Default
    • None
    • None
    • backend
    • 2
    • S

      Problem:
      The elastic scroll utility tries to delete a scroll via the clear scroll API . Depending on the situation a scroll may already have expired though (default 1 minute), depending on the load on the elasticcluster or Optimize itself. In these situations Elastic returns a 404 on the delete operation. This 404, being an acceptable result for the delete call, is not gracefully handled though by the elastic client and causes an exception.

      The actual cause is a known bug in the elastic client https://github.com/elastic/elasticsearch/pull/58558#issuecomment-833214499 for which no fix is yet merged.

      Steps to reproduce:
      As the cause is a race condition the only way to reproduce this is to
      1. run Optimize in debug and pause it when a scroll is performed from `ElasticsearchReaderUtil.retrieveScrollResultsTillLimit`
      2. manually delete the scroll via the elastic clear scroll API, or wait one minute for it to expire automatically
      3. resume Optimize,

      Actual result :
      the log should contain an exception of which the cause is:

      Caused by: org.elasticsearch.client.ResponseException: method [DELETE], host [<hostname>:9200 ], URI [/_search/scroll], status line [HTTP/1.1 404 Not Found]
      {"succeeded":true,"num_freed":0}
      	at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:318)
      	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:288)
      	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:262)
      	at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1628)
      	... 22 common frames omitted
      

      Expected result:
      as 404 indicates the scroll is already deleted there should be no error log, at most a debug that the scroll was already deleted

      Hint:

      • There might be one redundant originating request causing this in EventProcessInstanceIndexManager. We fetch the deleted instances at the beginning of the sync and then again for the index cleanup. If we could do all of this with one fetch it would be a nice optimization to make at the same time.

        This is the controller panel for Smart Panels app

            [OPT-5468] Confusing error log on clearing an elasticsearch scroll that is already gone

            Sebastian Bathke created issue -
            Sebastian Bathke made changes -
            Summary Original: Misleading error log on trying to deleta an elasticsearch scroll that is already gone New: Confusing error log on trying to deleta an elasticsearch scroll that is already gone
            Sebastian Bathke made changes -
            Summary Original: Confusing error log on trying to deleta an elasticsearch scroll that is already gone New: Confusing error log on clearing an elasticsearch scroll that is already gone
            Sebastian Bathke made changes -
            Link New: This issue is depended on by SUPPORT-11290 [ SUPPORT-11290 ]
            Sebastian Bathke made changes -
            Component/s New: backend [ 13653 ]
            Sebastian Bathke made changes -
            Effort Original: Not defined [ 11259 ] New: S [ 11254 ]
            Daniel Ewing made changes -
            Labels New: SUPPORT
            Joshua Windels made changes -
            Labels Original: SUPPORT New: SUPPORT next_release
            Joshua Windels made changes -
            Description Original: *Problem:*
            The elastic scroll utility tries to delete a scroll via the clear [scroll API |https://www.elastic.co/guide/en/elasticsearch/reference/current/clear-scroll-api.html]. Depending on the situation a scroll may already have expired though (default 1 minute), depending on the load on the elasticcluster or Optimize itself. In these situations Elastic returns a 404 on the delete operation. This 404, being an acceptable result for the delete call, is not gracefully handled though by the elastic client and causes an exception.

            The actual cause is a known bug in the elastic client https://github.com/elastic/elasticsearch/pull/58558#issuecomment-833214499 for which no fix is yet merged.

            *Steps to reproduce:*
            As the cause is a race condition the only way to reproduce this is to
            1. run Optimize in debug and pause it when a scroll is performed from `ElasticsearchReaderUtil.retrieveScrollResultsTillLimit`
            2. manually delete the scroll via the elastic clear scroll API, or wait one minute for it to expire automatically
            3. resume Optimize,

            *Actual result :*
            the log should contain an exception of which the cause is:
            {code}
            Caused by: org.elasticsearch.client.ResponseException: method [DELETE], host [<hostname>:9200 ], URI [/_search/scroll], status line [HTTP/1.1 404 Not Found]
            {"succeeded":true,"num_freed":0}
            at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:318)
            at org.elasticsearch.client.RestClient.performRequest(RestClient.java:288)
            at org.elasticsearch.client.RestClient.performRequest(RestClient.java:262)
            at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1628)
            ... 22 common frames omitted
            {code}

            *Expected result:*
            as 404 indicates the scroll is already deleted there should be no error log, at most a debug that the scroll was already deleted
            New: *Problem:*
             The elastic scroll utility tries to delete a scroll via the clear [scroll API |https://www.elastic.co/guide/en/elasticsearch/reference/current/clear-scroll-api.html]. Depending on the situation a scroll may already have expired though (default 1 minute), depending on the load on the elasticcluster or Optimize itself. In these situations Elastic returns a 404 on the delete operation. This 404, being an acceptable result for the delete call, is not gracefully handled though by the elastic client and causes an exception.

            The actual cause is a known bug in the elastic client [https://github.com/elastic/elasticsearch/pull/58558#issuecomment-833214499] for which no fix is yet merged.

            *Steps to reproduce:*
             As the cause is a race condition the only way to reproduce this is to
             1. run Optimize in debug and pause it when a scroll is performed from `ElasticsearchReaderUtil.retrieveScrollResultsTillLimit`
             2. manually delete the scroll via the elastic clear scroll API, or wait one minute for it to expire automatically
             3. resume Optimize,

            *Actual result :*
             the log should contain an exception of which the cause is:
            {code:java}
            Caused by: org.elasticsearch.client.ResponseException: method [DELETE], host [<hostname>:9200 ], URI [/_search/scroll], status line [HTTP/1.1 404 Not Found]
            {"succeeded":true,"num_freed":0}
            at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:318)
            at org.elasticsearch.client.RestClient.performRequest(RestClient.java:288)
            at org.elasticsearch.client.RestClient.performRequest(RestClient.java:262)
            at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1628)
            ... 22 common frames omitted
            {code}
            *Expected result:*
             as 404 indicates the scroll is already deleted there should be no error log, at most a debug that the scroll was already deleted

            *Hint:*

            - There might be one redundant originating request causing this in {{EventProcessInstanceIndexManager}}. We fetch the deleted instances at the beginning of the sync and then again for the index cleanup. If we could do all of this with one fetch it would be a nice optimization to make at the same time.
            Joshua Windels made changes -
            Assignee New: Andromachi Rozaki [ andromachi.rozaki ]

              Unassigned Unassigned
              sebastian.bathke Sebastian Bathke
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: