On May 15, 2025, between 13:55 and 14:18 UTC, Atlassian customers using the Trello product experienced errors or slow loading times when attempting to view their cards and boards. The event was triggered by a database plan cache expiring and high resource usage caused by subsequent database query planning operations. The particular database shard that was impacted held data that was required for every card load. The incident was detected within two minutes by the automated monitoring system and mitigated by increasing resources available to the affected database shard, which put Atlassian systems into a known good state. The total time to resolution was about 23 minutes.
The overall impact was between May 15, 2025, 13:55 and May 15, 2025, 14:18 UTC on the Trello product. The incident caused service disruption for all Trello customers.
The issue was caused by a query plan expiring from the database cache, which caused incoming queries to go through a replanning operation. These queries had multiple plans that could satisfy them, and depending on the size of the query, one plan might be significantly more efficient than another. This caused the query planner to perform a great many more replanning operations than usual, which consumed all of the CPU on the server for a brief moment. Once the CPU was consumed, the planning operations themselves began taking too long and therefore required constant replanning in an effort to find more efficient options. This negative feedback loop could not be broken without intervention.
We know that outages impact your productivity. While we have a number of testing and preventative processes in place, this specific issue wasn’t identified because it would only occur under very distinct conditions, including the amount of load and the order of database queries.
We are prioritizing the following improvement actions to avoid repeating this type of incident:
Improve query planner performance by:
Furthermore, we are prioritizing the following additional measures to reduce the impact of any future incidents:
We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability.
Thanks,
Atlassian Customer Support