Trello Connectivity Problems
Incident Report for Trello
Postmortem

On 12-07-2020 between 14:20 UTC and 16:08 UTC, Atlassian customers using Trello may have experienced service interruptions.

Trello became unavailable due to our database not being able to handle a high level of load coming from our API. This high level of load was triggered by our websockets service hitting a hard limit on the number of network connections allowed, which caused an increase in API traffic due to our clients reconnecting. We provisioned a new database node at twice the size to take over as primary and absorb the load and this resolved the incident.

We are taking a number of actions to prevent these types of incidents in the future, including:

  • Strategies for rapidly scaling up our database nodes, as needed in an emergency
  • Improved load testing for services
  • Additional monitoring/alerting for potential operating system networking limits
  • Tooling to assist us with bringing Trello back online more quickly, including improved load shedding capabilities

While we working on those long-term improvements we have put the following immediate short-term measures in place to improve reliability. They are:

  • Increased resource provisioning across many parts of our infrastructure
  • Increased dashboard monitoring during and post-release, across all clients
  • Enhanced capabilities to control which clients can generate traffic and how much

We understand that outages negatively impact your productivity and we apologize for the inconvenience this has caused.

Posted Dec 18, 2020 - 21:37 UTC

Resolved
A fix has been implemented - we are continuing to monitor.
Posted Dec 07, 2020 - 16:41 UTC
Identified
The Trello service is available again and we are continuing to work on bringing the API back as soon as possible.
Posted Dec 07, 2020 - 16:25 UTC
Update
The engineering team is still actively investigating and working to bring Trello back up as quickly as possible.
Posted Dec 07, 2020 - 15:32 UTC
Update
We are continuing to investigate this issue.
Posted Dec 07, 2020 - 14:36 UTC
Investigating
We're receiving reports of connectivity problems while accessing Trello. You may see a red box in the top right corner of your window notifying you that you've been disconnected from the server.

We're currently investigating the cause, and will post updates here as we determine it.
Posted Dec 07, 2020 - 14:36 UTC
This incident affected: Trello.com and API.