On Aug 4, 2023, Trello users encountered issues accessing their workspaces. This was caused by a processing error during user deletion events involving two users who shared a workspace. The error resulted in unintended workspaces being marked as deleted.
The issue was identified, the deletion process halted, and data restoration initiated. The solution involved marking workspaces as undeleted and implementing a code fix to prevent similar issues in the future.
The overall impact occurred on August 4th, 2023, spanning from the afternoon to the early evening, in UTC time.
All Trello workspaces created before July 2021 were inaccessible during the incident. The impact of this was 39% of active workspaces were inaccessible.
The event was triggered by a race condition which occurred during the response to user deletion events. When the last user in a workspace is deleted the system automatically marks the workspace as deleted. In this case two users sharing a workspace were deleted simultaneously, causing a race condition. The race condition triggered a code path which generated a query that was not targeted to an individual workspace, but instead marked all workspaces (including unrelated ones) as deleted in our database in a systematic way.
We know that outages affect your productivity, and we are committed to preventing incidents like these from occurring. We already implemented code changes to prevent the specific condition that caused the incident.
We are prioritizing the following improvement actions to avoid repeating this type of incident:
We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability.
Thanks,
Atlassian Customer Support