On Thursday, March 24 between 15:56 and 16:12 UTC, Trello customers were unable to use the product. The event was triggered by a deployment misconfiguration. As the faulty deploy propagated across our server fleet, each server failed and the entire fleet became unresponsive. The incident was detected within 1 minute by our automated monitoring and resolved by correcting the misconfiguration and redeploying the fleet, thus bringing Trello up for all users. The total time to resolution was about 16 minutes.
The overall impact was between Mar 24, 2022, 15:56 UTC and Mar 24, 2022, 16:12 UTC. Trello was completely unavailable to all customers and other dependent needs for 16 minutes.
The issue was caused by an error that occurred when setting manual configuration information for Trello's deployments. As a result, a misconfiguration occurred and Trello's servers were not able to fully deploy a new version of the code, causing them to fail to start.
We know that outages impact your productivity. While we have a number of testing and preventative processes in place, this specific issue wasn’t identified because the error occurred during an optional, manual step in our deployment process.
We are prioritizing the following improvement actions to avoid repeating this type of incident:
We apologize to Trello customers who were impacted during this incident; we are taking immediate steps to improve Trello's availability as a result of this incident.
Thanks,
Atlassian Customer Support