-
Bug Report
-
Resolution: Fixed
-
L3 - Default
-
None
Environment
- Oracle 12 and higher
- Default block size of 8,000 byte
Description
Creating the Camunda schema fails at the index ACT_IDX_JOB_HANDLER, if Oracle >= 12 is configured to create extended data types for character columns (including NVARCHAR2). This lifts the historic limitation of varchar fields to be restricted to 4,000 bytes, which depending on the encoding in use could not hold every possible value. With the larger field values, index keys can become too large for Oracle to handle, as one key must fit into a single block.
Steps to reproduce
- Enable extended data types (https://docs.oracle.com/en/database/oracle/oracle-database/18/spuss/enabling-the-new-extended-data-type-capability.html#GUID-88FB7FFD-4392-49C6-843A-45B49F8A1821). It lifts the limit of the size of NVARCHAR2 fields from 4,000 bytes to 32,767 bytes
- Create the Camunda schema, in particular index ACT_IDX_JOB_HANDLER
Observed Behavior
- Index creation fails, explaining that the maximum key length for the index is exceeded (maximum length is 6398)
Expected behavior
- The index and the entire schema can be created without problems
Root Cause
- On Oracle, an index key must fit into one block
- With the default block size of 8,000 bytes, the keys of the index may not exceed 6,398 bytes
- The index is defined on ACT_RU_JOB(HANDLER_TYPE_,HANDLER_CFG_). HANDLER_TYPE_ is NVARCHAR2(255), HANDLER_CFG_(2000)
- For example with UTF-8 encoding, HANDLER_CFG_ will consume up to 8,000 bytes
Solution Ideas
Oracle's recommendations for solutions can be found at https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/CREATE-INDEX.html -> Creating an Index on an Extended Data Type Column
- Instead of indexing the raw column, we can create a function-based index that hashes the column value on ACT_RU_JOB(HANDLER_TYPE_, STANDARD_HASH(HANDLER_CFG_)). According to Oracle docs, the index can still be used by the query optimizer if HANDLER_CFG_ is used in equality or IN predicates
- Instead of indexing the raw column, we can create an index on a substring of the column so that the index is overall below the mark of 6,398 bytes. In that case, the index can still be used for equality, IN and range predicates.
Option 1 looks preferable. With option 2, we would need to carefully evaluate that the field values never actually exceeds the chosen substring length (this may be true as of now, see hints for what we currently store in the field).
Hints
Function-based indexes
- Function-based indexes cannot index null values. STANDARD_HASH of null returns a non-null value. Also, since the index is composite and HANDLER_TYPE_ is never null, this should probably be ok either way
Query
- HANDLER_CFG_ is used in one query (besides the select by id and update queries): Job.xml selectJobsByConfiguration.
- Usages:
- If there is an async job for the triggering of an event subscription and that event subscription is deleted while jobs exist
- Deleting the timer start jobs when a process definition is deleted via API
- Replacing the timer start event jobs when a new version of a process is deployed
- The HANDLER_TYPE_ and HANDLER_CFG_ fields are only used in equality predicates
- Usages:
- Extended data types are only supported with Oracle >= 12
- Function-based indexes also work on Oracle 11
Usage of HANDLER_CFG_
- Maps to JobEntity#jobHandlerConfigurationRaw
- Written from JobHandlerConfiguration#toCanonicalString
- JobHandlerConfiguration has the following implementations (second part is the content that the configurations write):
- AsyncContinuationConfiguration: <atomic operation>$<transitionId>
- BatchJobConfiguration: <configurationByteArrayId>
- BatchMonitorJobConfiguration: <batchId>
- BatchSeedJobConfiguration: <batchId>
- EventSubscriptionJobConfiguration: <eventSubscriptionId>
- HistoryCleanupJobHandlerConfiguration: {"countEmptyRuns": <int>, "immediatelyDue": <boolean>, "minuteFrom": <int>, "minuteTo": <int>}
- JobDefinitionSuspensionStateConfiguration: {"by": <string: currently max 22 chars>, "jobDefinitionId": <string: id NVARCHAR2(64)>, "processDefinitionId": <string: id NVARCHAR2(64)>, "processDefinitionKey": <string: id NVARCHAR2(255)>, "processDefinitionTenantId": <string: custom id; NVARCHAR2(64)>, "includeJobs": <boolean>}
- ProcessDefinitionSuspensionStateConfiguration {"by": <string: currently max 22 chars>, "processDefinitionId": <string: id, NVARCHAR2(64)>, "processDefinitionKey": <string: id, NVARCHAR2(255)>, "processDefinitionTenantId": <string: custom id, NVARCHAR2(64)>, "includeProcessInstances": <boolean>}
- TimerJobConfiguration <timerElementKey; activity id of any size>$taskListener~<timerElementSecondaryKey; task listener id of any size>$followUpJobCreated
Additional sources
I verified that the index is useful for our case with the following procedure:
Setup
Artificial Test Data:
Set up schema
Current definition of job table plus all indexes except for ACT_IDX_JOB_HANDLER
Generate handler types
CREATE TABLE handler_type (NAME NVARCHAR2(255), primary key (NAME)); INSERT INTO handler_type(NAME) SELECT dbms_random.string('l', 10) FROM dual connect by level <= 9;
Generate job data
Test Query
Where the String literals should be replaced with some valid values of the generated Strings
Evaluated index options:
Query plans generated via
EXPLAIN PLAN FOR <query> SELECT * FROM table(DBMS_XPLAN.DISPLAY (FORMAT=>'ALL +OUTLINE'));
Option 1: No additional index
Query Plan:
The index ACT_IDX_JOB_HANDLER_TYPE is used and config filtering is applied on the results.
Option 2: With ACT_IDX_JOB_HANDLER
create index ACT_IDX_JOB_HANDLER on ACT_RU_JOB(HANDLER_TYPE_,HANDLER_CFG_);
Query Plan:
ACT_IDX_JOB_HANDLER is used
Option 3: With new hash-based index
create index ACT_IDX_JOB_HANDLER_ALT ON ACT_RU_JOB(HANDLER_TYPE_, STANDARD_HASH(HANDLER_CFG_));
Query Plan:
The new index is used by hashing the handler config first. Out of the results, the true config value must be checked again (due to potential hash collisions).
Double-check: Both indexes
When both indexes are in place, the new ACT_IDX_JOB_HANDLER_ALT was still used. I'm not confident though that ACT_IDX_JOB_HANDLER_ALT is universally better. I used an artificial data set and didn't investigate things like query plan caching.