On November 5, 2020, between 06:57 - 07:50 UTC, some Trello customers may have experienced partial or complete outages in both the web and mobile apps.
This outage was triggered by a change to Trello’s load balancing infrastructure which customers use to access Trello. This change involved the incorrect removal of resources required to deploy new infrastructure, preventing the ability to scale-up to handle increasing daily loads.
The incident was detected by our automated monitoring, which alerted engineers to failures within two minutes, as customer demand started to exceed available capacity. The incident was mitigated by deploying a new release of our load balancing infrastructure, which restored access to the missing resources.
We know that outages are impactful to your productivity. During the post-incident review, we have identified process improvements which will allow us to remove the manual operator steps that resulted in the incorrect removal of required resources. By replacing these manual steps with an automated process, we will prevent this class of incident from recurring.
We apologize for any inconvenience this may have caused.