Uploaded image for project: 'Camunda Optimize'
  1. Camunda Optimize
  2. OPT-4251

Make the Upgrade resilient to concurrent snapshot operations

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Fixed
    • Icon: L3 - Default L3 - Default
    • 3.2.0
    • None
    • backend
    • None
    • Not defined

      Context:
      Production elasticsearch instances may have scheduled and very frequent snapshot operations scheduled https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshot-restore.html

      When these happen any index delete operation usually fails with a 400 error:

      [HTTP/1.1 400 Bad Request]\n\{\"error\":{\"root_cause\":[{\"type\":\"snapshot_in_progress_exception\",\"reason\":\"Cannot delete indices that are being snapshotted: [[optimize-process-instance_v4/m_pJTkSIQ_2WlRsLdLqMrQ]]. Try again after snapshot finishes or cancel the currently running snapshot.\"}],\"type\":\"snapshot_in_progress_exception\",\"reason\":\"Cannot delete indices that are being snapshotted: [[optimize-process-instance_v4/m_pJTkSIQ_2WlRsLdLqMrQ]]. Try again after snapshot finishes or cancel the currently running snapshot.\"},\"status\":400}\n\t\tat 
      

      AT:

      • delete operations are made resilient to such failures in blocking and retrying until it proceeds or another error occurs
        • the failures should get logged though so it's transparent what is going on

        This is the controller panel for Smart Panels app

              Unassigned Unassigned
              sebastian.bathke Sebastian Bathke
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

                Created:
                Updated:
                Resolved: