In Camunda Cockpit we get Timeout on count api endpoints when there is nearly 3000 active process instances.
Debugging what is taking so long we got that this COUNT(DISTINCT query is running to slow (Few minutes). This is one example that we got:
select count(distinct RES.ID_) from ACT_RE_PROCDEF RES inner join (select KEY_, TENANT_ID_, max(VERSION_) as MAX_VERSION from ACT_RE_PROCDEF group by TENANT_ID_, KEY_) VER on RES.KEY_ = VER.KEY_ left JOIN ( SELECT A.* FROM ACT_RU_AUTHORIZATION A WHERE A.TYPE_ < 2 AND ( A.USER_ID_ in ( 'sample.user', '*') OR A.GROUP_ID_ IN ( 'Head manager' , 'Admin' ) ) AND ( ( A.RESOURCE_TYPE_ = 6 AND A.PERMS_ &2 = 2 ) ) ) AUTH ON (AUTH.RESOURCE_ID_ in (RES.ID_, RES.KEY_, '*')) WHERE RES.VERSION_ = VER.MAX_VERSION and (RES.TENANT_ID_ = VER.TENANT_ID_ or (RES.TENANT_ID_ is null and VER.TENANT_ID_ is null)) AND AUTH.RESOURCE_ID_ IS NOT NULL
I see that there is already simular bug report which was resolved for PostgreSQL:
https://jira.camunda.com/browse/CAM-10282
but i am not sure if this is also fixed for SQL Server in version 7.12
It would be nice to have some Index workaround until the query is optimized as in example:
https://stackoverflow.com/questions/13924688/performance-using-distinct-count
UPDATE
Adding following index makes count faster but still it takes few seconds to run:
CREATE NONCLUSTERED INDEX [ACT_RE_PROCDEF_KEY_TENANT_ID] ON [dbo].[ACT_RE_PROCDEF] ([KEY_],[TENANT_ID_]) INCLUDE ([VERSION_]) GO
Hi vukvasic,
thanks for bringing this to our attention.
However, root causing this would require high effort on our side, which we currently can not invest. If you are very interested in optimizing this query, I encourage you to raise a pull request and also provide some query analysis that proves the performance advantages of the new solution.
Cheers,
Miklas