Loading...

XML

Word

Printable

Steps to reproduce:

Expected behavior:

Observed behavior:

Hint:

This issue is not only limited to the case of having only one topic name in the database. The same behavior can be observed when having multiple topic names with high cardinality (i.e., >= 100.000) for each topic name.
According to the execution plan fetch_and_lock_distinct.dbplan, Postgresql executes the query as follows:
1. Hash Join: To join the tables ACT_RU_EXT_TASK and ACT_RU_EXECUTION. Therefore, both tables are accessed by a sequential scan (i.e., full table scan).
2. Sort Result of the Hash Join: After joining both tables, the result set gets sorted to get a distinct result set. The used sort method is external merge on disk. That makes that operation very expensive.
Basically, to get a distinct result set, Postgresql needs to scan both tables completely, sort the joined result, and get unique rows. Then the limit of 10 can be applied to get only the first 10 rows.
When removing the distinct from the "fetch and lock" query, Postgresql can push the limit of 10 "deep" into the execution plan. Also, the sorting does not happen anymore. (see fetch_and_lock_without_distinct.dbplan)
distinct can only be removed when de-duplication is not required. For example, when the fetch and lock query joins a table that is in a 1:n relationship with the external task table, then the distinct must remain. This is currently the case when we join the authorization table.