Make the Upgrade resilient to concurrent snapshot operations

XMLWordPrintable

    • Type: Task
    • Resolution: Fixed
    • Priority: L3 - Default
    • 3.2.0
    • Affects Version/s: None
    • Component/s: backend
    • None
    • Not defined

      Context:
      Production elasticsearch instances may have scheduled and very frequent snapshot operations scheduled https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshot-restore.html

      When these happen any index delete operation usually fails with a 400 error:

      [HTTP/1.1 400 Bad Request]\n\{\"error\":{\"root_cause\":[{\"type\":\"snapshot_in_progress_exception\",\"reason\":\"Cannot delete indices that are being snapshotted: [[optimize-process-instance_v4/m_pJTkSIQ_2WlRsLdLqMrQ]]. Try again after snapshot finishes or cancel the currently running snapshot.\"}],\"type\":\"snapshot_in_progress_exception\",\"reason\":\"Cannot delete indices that are being snapshotted: [[optimize-process-instance_v4/m_pJTkSIQ_2WlRsLdLqMrQ]]. Try again after snapshot finishes or cancel the currently running snapshot.\"},\"status\":400}\n\t\tat 
      

      AT:

      • delete operations are made resilient to such failures in blocking and retrying until it proceeds or another error occurs
        • the failures should get logged though so it's transparent what is going on

            Assignee:
            Unassigned
            Reporter:
            Sebastian Bathke
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved: