On 2-10-2020 between 14:25 UTC and 15:18 UTC, Atlassian customers using Trello may have experienced service interruptions due to an incident.
This incident was caused by a CPU saturation on our production database. Two related factors contributed to this: 1) a sudden surge in connections from a limited set of IPs, 2) an abnormally high number of TCP connections to the database, which began failing, timing out, and generating more TCP connections, further increasing load. Our incident response team was alerted and we recovered from this situation by cutting off traffic to Trello—this reduced CPU load and allowed TCP connections to complete successfully. We then slowly allowed traffic back into Trello.
We are prioritizing the following actions to avoid repeating this type of incident:
We understand that outages negatively impact your productivity and we apologize for the inconvenience this has caused.