Uploaded image for project: 'Camunda Optimize'
  1. Camunda Optimize
  2. OPT-4251

Make the Upgrade resilient to concurrent snapshot operations

    XMLWordPrintable

Details

    • Task
    • Resolution: Fixed
    • L3 - Default
    • 3.2.0
    • None
    • backend
    • None
    • Not defined

    Description

      Context:
      Production elasticsearch instances may have scheduled and very frequent snapshot operations scheduled https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshot-restore.html

      When these happen any index delete operation usually fails with a 400 error:

      [HTTP/1.1 400 Bad Request]\n\{\"error\":{\"root_cause\":[{\"type\":\"snapshot_in_progress_exception\",\"reason\":\"Cannot delete indices that are being snapshotted: [[optimize-process-instance_v4/m_pJTkSIQ_2WlRsLdLqMrQ]]. Try again after snapshot finishes or cancel the currently running snapshot.\"}],\"type\":\"snapshot_in_progress_exception\",\"reason\":\"Cannot delete indices that are being snapshotted: [[optimize-process-instance_v4/m_pJTkSIQ_2WlRsLdLqMrQ]]. Try again after snapshot finishes or cancel the currently running snapshot.\"},\"status\":400}\n\t\tat 
      

      AT:

      • delete operations are made resilient to such failures in blocking and retrying until it proceeds or another error occurs
        • the failures should get logged though so it's transparent what is going on

      mgm-controller-panel

        This is the controller panel for Smart Panels app

        Attachments

          Issue Links

            Activity

              People

                Unassigned Unassigned
                sebastian.bathke Sebastian Bathke
                Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved:

                  Salesforce