Trello performance is degraded

Incident Report for Trello

Postmortem

Summary

On Feb 9, 2026, between 07:12 UTC and 08:05 UTC, Atlassian customers were unable to access Trello. The event was triggered by Trello servers reaching maximum memory limits and a subsequent failure to automatically scale for the traffic. The incident was detected within 5 minutes by automated monitoring systems, engaging Trello teams for resolution. Two parallel mitigation efforts were undertaken to restore access: 1) manual scaling to address concerns with high utilization of hosts, and 2) temporarily limiting traffic from free customers. This intervention put Trello systems in a known good state with a total time to resolution of ~53 minutes.

IMPACT

The overall impact occurred on Feb 9, 2026 between 07:12 AM UTC and 08:05 AM UTC for Trello customers and caused service disruption for all customers, making Trello inaccessible during that time. Access was restored for paid users approximately 43 minutes after the onset, with full service restoration for free users 10 minutes later.

ROOT CAUSE

The issue was cause by increased memory requirements during the transition from the weekend to EU business hours, combined with Auto Scaling group scaling rules based on CPU utilization. Memory allocation on our instances outpaced the CPU usage, which caused processes to hit Out-Of-Memory (OOM) errors and crash before reaching CPU usage thresholds that would trigger the auto scaling policies. As a result, Trello went down and users received HTTP 502 errors until the incident was resolved.

REMEDIAL ACTION PLAN & NEXT STEPS

We know that outages impact your productivity. While we have a suite of automated infrastructure management in place, this specific issue required manual intervention to restore Trello access to users. To avoid repeating this type of incident, we are prioritizing the following remedial action items:

Pre-scale capacity before EU morning traffic - This change has already been introduced to prevent further incidents while we implement additional safeguards.
Adjust Trello’s Auto Scaling group settings - This change will ensure that unhealthy hosts will be replaced more rapidly and scaling policies will consider memory usage.
Refine Trello host memory commitments - This change will decrease the likelihood of memory overcommitment on hosts and associated OOM errors.
Increase isolation for OOM errors - This change will improve the ability for hosts to recover in the event of a single worker experiencing an OOM error.

We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability.

Thanks,

Atlassian Customer Support

Posted Feb 18, 2026 - 03:37 EST

Resolved

Trello users experienced performance degradation. The issue has now been resolved, and the service is operating normally for all affected customers.

Posted Feb 09, 2026 - 03:36 EST

Monitoring

Trello performance was degraded and the performance degradation of Trello has been resolved. All the services are now operating normally for all affected customers.
We'll continue to monitor performance closely to confirm stability.

Posted Feb 09, 2026 - 03:20 EST

This incident affected: Trello.com, API, Atlassian Support - Support Portal, Atlassian Support Ticketing, and Atlassian Support Knowledge Base.