On 12-07-2020 between 14:20 UTC and 16:08 UTC, Atlassian customers using Trello may have experienced service interruptions.
Trello became unavailable due to our database not being able to handle a high level of load coming from our API. This high level of load was triggered by our websockets service hitting a hard limit on the number of network connections allowed, which caused an increase in API traffic due to our clients reconnecting. We provisioned a new database node at twice the size to take over as primary and absorb the load and this resolved the incident.
We are taking a number of actions to prevent these types of incidents in the future, including:
While we working on those long-term improvements we have put the following immediate short-term measures in place to improve reliability. They are:
We understand that outages negatively impact your productivity and we apologize for the inconvenience this has caused.