Details
-
Bug Report
-
Resolution: Fixed
-
L3 - Default
-
None
-
None
-
Not defined
Description
What are the steps to reproduce your problem?
- Set up SSO Plugin for Optimize
- Create (and save) a new Collection "Demo".
- Copy the Link to the new Collection.
- Wait until a session timeout.
- Login with the same user.
Testing Notes
- Create collection and add other user to collection
- Stop Optimize
- Delete user from connected engine
- Disable collection role cleanup in config
- Start Optimize
- Observe that the deleted user still exists as a user in the collection
- (also check that when enabled), the user does not exist in the collection after start up
What is the problem?
The Collection is not visible anymore.
If you navigate to the Link saved at Step 2, you get a "403" Error.
What would be the expected behavior:
You have access to the collection after the session timeout.
Hints (optional):
The problem only occurs when an automatic logout occurs after the session timeout (approx. 30 min). If you select the "logout" command manually, the collections are retained. All other objects such as reports and dashboards are not affected by this problem.
Solution Proposal
- Add a option to disable collection role cleanup by configuration
- Update documentation recommending SSO users to disable it
- Backport to older versions up to 3.7
- Investigate whether it makes sense to disable this by default in SaaS. This is possibly this is not needed
make disabled by default, but still configurable. Candidate for feature deprecation
Developer notes
We were unable to reproduce it, but the messages in the customer's log files seem compelling that the error is indeed occurring.
1) There are two types of caching synchronization with the user directory. One is the synchronization of the user data and another one is the synchronization of metadata (the metadata one is $.import.data.user-task-worker.metadata.cronTrigger in case you're curious). They are both run by cron jobs. By default, the user sync runs every 2 hours and the metadata sync runs every 3 hours. So every 6 hours they will be both running in parallel, causing potential racing conditions. Since the user can consistently reproduce this, I find this scenario unlikely, but not impossible. You told the customer to update the cron job so that it basically doesn't run anymore . I was fearing that changing the one you told the customer to change would be the wrong one, but that's not the case, your change will indeed mitigate the problem for the time being. So let's leave it for now
2) The more likely scenario is that when reading data from our user cache, that we cannot properly identify the user, so Optimize believes the user doesn't exist anymore and then clears the user from the collection. Since I am unable to create a user ID with the same format as he has (FSSO_6300–pmengelt) because the engine won't allow me. Therefore I cannot confirm this hypothesis. Analyzing the code I don't see any obvious place where this check might fail, so in order to reproduce that properly we'd need to setup an environment similar to what he has and see if this type of user ID causes an issue with our caching. I find this is the most likely reason for the bug
The bug doesn't seem to have anything to do with the session expiring, it is just a matter of when the synchronization runs. I will create a bug ticket for this so that we can prioritize it properly and post it here