Make the Upgrade resilient to concurrent snapshot operations

XMLWordPrintable

    • Type: Task
    • Resolution: Fixed
    • Priority: L3 - Default
    • 3.2.0
    • Affects Version/s: None
    • Component/s: backend
    • None
    • Not defined

      Context:
      Production elasticsearch instances may have scheduled and very frequent snapshot operations scheduled https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshot-restore.html

      When these happen any index delete operation usually fails with a 400 error:

      [HTTP/1.1 400 Bad Request]\n\{\"error\":{\"root_cause\":[{\"type\":\"snapshot_in_progress_exception\",\"reason\":\"Cannot delete indices that are being snapshotted: [[optimize-process-instance_v4/m_pJTkSIQ_2WlRsLdLqMrQ]]. Try again after snapshot finishes or cancel the currently running snapshot.\"}],\"type\":\"snapshot_in_progress_exception\",\"reason\":\"Cannot delete indices that are being snapshotted: [[optimize-process-instance_v4/m_pJTkSIQ_2WlRsLdLqMrQ]]. Try again after snapshot finishes or cancel the currently running snapshot.\"},\"status\":400}\n\t\tat 
      

      AT:

      • delete operations are made resilient to such failures in blocking and retrying until it proceeds or another error occurs
        • the failures should get logged though so it's transparent what is going on

        This is the controller panel for Smart Panels app

              Assignee:
              Unassigned
              Reporter:
              Sebastian Bathke
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

                Created:
                Updated:
                Resolved: