Loading...

Type: Bug Report
Resolution: Fixed
Priority: L3 - Default
Fix Version/s: 7.12.2, 7.13.0, 7.11.9, 7.13.0-alpha1
Affects Version/s: None
Component/s: engine
Labels:
- SUPPORT

Given

Postgres database
Perform the task query: taskService.createTaskQuery().taskUnassigned().withoutCandidateGroups().withoutCandidateUsers().count()

Problem

Consider the following query and the query plan:

-- QUERY
select count(distinct RES.ID_) from ACT_RU_TASK RES WHERE (1 = 1 and RES.ASSIGNEE_ IS NULL and
(RES.ASSIGNEE_ is null and RES.ID_ not in ( select TASK_ID_ from ACT_RU_IDENTITYLINK ARIEL WHERE
ARIEL.TYPE_ = 'candidate')))

-- QUERY PLAN
Aggregate  (cost=28712821.87..28712821.88 rows=1 width=8) (actual time=575362.672..575362.672 rows=1 loops=1)
   ->  Gather  (cost=2342.79..28712795.40 rows=10587 width=37) (actual time=293340.304..575364.177 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Parallel Bitmap Heap Scan on act_ru_task res  (cost=1342.79..28710736.70 rows=4411 width=37) (actual time=304801.026..575332.787 rows=1 loops=3)
               Recheck Cond: ((assignee_ IS NULL) AND (assignee_ IS NULL))
               Filter: (NOT (SubPlan 1))
               Rows Removed by Filter: 14033
               Heap Blocks: exact=1683
               ->  Bitmap Index Scan on act_idx_task_assignee  (cost=0.00..1340.15 rows=21173 width=0) (actual time=116.300..116.300 rows=42364 loops=1)
                     Index Cond: ((assignee_ IS NULL) AND (assignee_ IS NULL))
               SubPlan 1
                 ->  Materialize  (cost=0.00..6158.60 rows=139634 width=37) (actual time=0.004..27.630 rows=60595 loops=42102)
                       ->  Seq Scan on act_ru_identitylink ariel  (cost=0.00..4369.43 rows=139634 width=37) (actual time=0.003..29.409 rows=60688 loops=22205)
                             Filter: ((type_)::text = 'candidate'::text)
                             Rows Removed by Filter: 1
 Planning time: 0.632 ms
 Execution time: 575366.077 ms

The query performs not well and the costs are with ~ 28,7 Mio. rather high
NOT IN (SELECT ...) does not optimize well
- It uses a plain sub plan which has a bad performance [1]

Solutions
1. Rewrite the query with the help of a correlated subquery: NOT EXISTS (SELECT ...)

-- REWRITTEN QUERY
select count(distinct RES.ID_) 
from ACT_RU_TASK RES
WHERE (1 = 1 and RES.ASSIGNEE_ IS NULL
			 and (RES.ASSIGNEE_ is null
					and NOT EXISTS (
			 			select 1 
			 			from ACT_RU_IDENTITYLINK I 
			 			WHERE I.TYPE_ = 'candidate'
			 			AND   I.TASK_ID_ = RES.ID_
		 			)
			 	 )
		)

-- QUERY PLAN
Aggregate  (cost=13624.60..13624.61 rows=1 width=8) (actual time=138.484..138.484 rows=1 loops=1)
   ->  Hash Anti Join  (cost=7256.58..13615.49 rows=3645 width=37) (actual time=117.066..138.453 rows=3 loops=1)
         Hash Cond: ((res.id_)::text = (i.task_id_)::text)
         ->  Seq Scan on act_ru_task res  (cost=0.00..4569.86 rows=20607 width=37) (actual time=0.007..24.543 rows=42226 loops=1)
               Filter: ((assignee_ IS NULL) AND (assignee_ IS NULL))
               Rows Removed by Filter: 44501
         ->  Hash  (cost=4388.79..4388.79 rows=141183 width=37) (actual time=72.261..72.261 rows=142190 loops=1)
               Buckets: 65536  Batches: 4  Memory Usage: 2911kB
               ->  Seq Scan on act_ru_identitylink i  (cost=0.00..4388.79 rows=141183 width=37) (actual time=0.003..31.867 rows=142190 loops=1)
                     Filter: ((type_)::text = 'candidate'::text)
                     Rows Removed by Filter: 1
 Planning time: 0.473 ms
 Execution time: 138.647 ms

2. Rewrite the query to use a LEFT JOIN with NULL check

-- REWRITTEN QUERY
SELECT COUNT(distinct RES.id_)
FROM ACT_RU_TASK RES
LEFT JOIN ACT_RU_IDENTITYLINK I
ON  I.task_id_ = RES.id_
AND I.type_ = 'candidate'
WHERE   I.id_ IS NULL
AND     RES.assignee_ IS NULL

-- QUERY PLAN
Aggregate  (cost=9856.38..9856.39 rows=1 width=8) (actual time=123.516..123.516 rows=1 loops=1)
   ->  Hash Right Join  (cost=5096.96..9856.38 rows=1 width=37) (actual time=122.414..123.481 rows=3 loops=1)
         Hash Cond: ((link.task_id_)::text = (task.id_)::text)
         Filter: (link.id_ IS NULL)
         Rows Removed by Filter: 57826
         ->  Seq Scan on act_ru_identitylink link  (cost=0.00..4388.79 rows=141183 width=74) (actual time=0.008..31.148 rows=141212 loops=1)
               Filter: ((type_)::text = 'candidate'::text)
               Rows Removed by Filter: 1
         ->  Hash  (cost=4569.86..4569.86 rows=42168 width=37) (actual time=36.236..36.236 rows=42099 loops=1)
               Buckets: 65536  Batches: 1  Memory Usage: 3349kB
               ->  Seq Scan on act_ru_task task  (cost=0.00..4569.86 rows=42168 width=37) (actual time=0.009..22.638 rows=42099 loops=1)
                     Filter: (assignee_ IS NULL)
                     Rows Removed by Filter: 44193
 Planning time: 0.829 ms
 Execution time: 123.567 ms

Thoughts

Both solutions perform compared to the original query way better
- The costs are reduced by the factor of ~ 2.000
Investigate the performance of the solutions for non-postgres databases to avoid performance degradation
The implementation effort of solution 1 might be less compared to solution 2
- In the withoutCandidateUser/Group case a LEFT JOIN is needed
- The query taskService.createTaskQuery().withCandidateUser().withoutCandidateGroup().or().priority(5).dueDate(now).endOr().count() would need an additional LEFT JOIN

[1] https://wiki.postgresql.org/wiki/Don't_Do_This#Don.27t_use_NOT_IN

This is the controller panel for Smart Panels app

is related to

CAM-12938 Slow performance of historic task instance query

Closed

CAM-11277 Improve Task Query Performance

Closed

CAM-11395 Use left join to test variable value conditions in historic process instance query

Closed

mentioned in: Page Loading...

Details

Description

mgm-controller-panel

This is the controller panel for Smart Panels app

Attachments

Issue Links

Activity

People

Dates

Salesforce