-
Bug Report
-
Resolution: Fixed
-
L3 - Default
-
None
-
None
-
None
-
Not defined
When the search response is too large (i.e., > 100MB), the current import round fails with the exception:
java.io.IOException: entity content is too long [512997841] for the configured buffer limit [104857600] at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:923) at org.elasticsearch.client.RestClient.performRequest(RestClient.java:299) at org.elasticsearch.client.RestClient.performRequest(RestClient.java:287) at org.elasticsearch.client.RestHighLevelClient.performClientRequest(RestHighLevelClient.java:2699) at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:2171) at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:2137) at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:2105) at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1367) at org.camunda.optimize.service.es.OptimizeElasticsearchClient.searchWithoutPrefixing(OptimizeElasticsearchClient.java:416) at org.camunda.optimize.service.importing.zeebe.fetcher.AbstractZeebeRecordFetcher.getZeebeRecordsForPrefixAndPartitionFrom(AbstractZeebeRecordFetcher.java:73) at org.camunda.optimize.service.importing.zeebe.mediator.ZeebeVariableImportMediator.lambda$getVariables$0(ZeebeVariableImportMediator.java:62) at io.micrometer.core.instrument.composite.CompositeTimer.record(CompositeTimer.java:65) at org.camunda.optimize.service.importing.zeebe.mediator.ZeebeVariableImportMediator.getVariables(ZeebeVariableImportMediator.java:62) at org.camunda.optimize.service.importing.zeebe.mediator.ZeebeVariableImportMediator.importNextPage(ZeebeVariableImportMediator.java:47) at org.camunda.optimize.service.importing.PositionBasedImportMediator.importNextPageWithRetries(PositionBasedImportMediator.java:85) at org.camunda.optimize.service.importing.PositionBasedImportMediator.runImport(PositionBasedImportMediator.java:40) at org.camunda.optimize.service.importing.AbstractImportScheduler.lambda$executeImportRound$2(AbstractImportScheduler.java:99) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:550) at java.base/java.util.stream.AbstractPipeline.evalu…
That means, the search resulted in ~500MB of data, which could not be handled by the application, i.e., by default the configured response buffer size is 100MB. Consequently, the importer starts failing in a loop, i.e., it will retry with the same search request which will fail with the same exception.
Steps to reproduce:
- Update a (big) variable ~7k times, e.g., a collection with ~7k items (whereby each item is again a collection with 1 to 10 UUIDs)
- Try to import the variable updates (with a batch size of 10k)
Actual result:
- The import round fails with
java.io.IOException: entity content is too long [512997841] for the configured buffer limit [104857600]
- The importer retries the same search request and fails again, now it starts to loop.
Expected result:
- When catching the exception, a potential approach would be to dynamically decrease the batch size, for example, by dividing by 2.
- Repeat this until the import round succeeds.
- Continue with the decreased batch size for another x rounds (e.g., x=10).
- Start to increase slowly the batch size back to the max configured batch size.
- The dynamically adjusted batch size is transient, meaning, after a restart, the importer starts with the configured batch size as always.
Notes:
This has been backported to maintenance branches 3.7-3.10
Testing Notes
- This is implemented as above
- You can reduce the batch size to a low number to better test this
- You can validate the changing batch size via log messages