Context:
Production elasticsearch instances may have scheduled and very frequent snapshot operations scheduled https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshot-restore.html
When these happen any index delete operation usually fails with a 400 error:
[HTTP/1.1 400 Bad Request]\n\{\"error\":{\"root_cause\":[{\"type\":\"snapshot_in_progress_exception\",\"reason\":\"Cannot delete indices that are being snapshotted: [[optimize-process-instance_v4/m_pJTkSIQ_2WlRsLdLqMrQ]]. Try again after snapshot finishes or cancel the currently running snapshot.\"}],\"type\":\"snapshot_in_progress_exception\",\"reason\":\"Cannot delete indices that are being snapshotted: [[optimize-process-instance_v4/m_pJTkSIQ_2WlRsLdLqMrQ]]. Try again after snapshot finishes or cancel the currently running snapshot.\"},\"status\":400}\n\t\tat
AT:
- delete operations are made resilient to such failures in blocking and retrying until it proceeds or another error occurs
- the failures should get logged though so it's transparent what is going on