Identified: Ongoing attacks on our DNS Provider
Incident Report for imgix
Postmortem

DNS Post-Mortem

What Happened

On May 16th, 2016, a distributed denial of service (DDoS) attack began, specifically targeted against our DNS partner’s network. The attack began around 10:00 EDT and has been ongoing at least into the 18th. It began as a DNS amplification attack, and while quite severe, was initially filtered properly and held back from impacting service. As the attack progressed, it shifted modalities to additionally employ random label attacks and direct attacks against further upstream providers.

In response, our provider shifted traffic to an anti-DDoS network. This was successful in handling the ingress traffic, but it caused an uneven distribution of legitimate DNS traffic, resulting in name resolution timeouts. To accommodate this concentrated traffic, they brought up significant additional capacity, at which point availability began to be restored.

Once the attack pattern was identified, proper filtering was established and traffic transferred back to the standard network. However, the continued scale and duration of the attack resulted in some Tier-1 providers null-routing traffic despite a sizable dedicated capacity. This caused the second wave of partial unavailability. Traffic was then routed to other carriers and again resumed.

Preventing Similar Incidents

To more permanently accommodate these issues, our DNS partner has made changes to ensure proper distribution when operating behind the anti-DDoS provider network. This will enable a seamless experience when an even larger attack inevitably takes place.

At imgix, we are working to establish a second DNS network, but critically, this must provide the same capabilities as our primary provider. And due to the simple way name servers are selected, the secondary network effectively must become an active part of our service. This does present some trade-offs, and crucially, this second network must have at least as much resiliency as our primary provider.

Images are a very important piece to operating a site or app, and imgix knows this very well. We do not take this incident lightly and sincerely apologize to our customers for problems this may have caused.
If you would like to discuss this outage or the post-mortem further, we will be more than happy to answer any questions via support@imgix.com.

Posted May 23, 2016 - 09:48 PDT

Resolved
Our DNS provider and internal tests continue to indicate full recovery.
Posted May 17, 2016 - 05:58 PDT
Monitoring
Our DNS provider and internal tests indicate recovery. We are both continuing to monitor the situation.

"At this time we are seeing full recovery following another attack in the European region. We are continuing to monitor the situation." http://nsone.statuspage.io/incidents/g9fkrhqr7wnv
Posted May 17, 2016 - 05:43 PDT
Identified
Our DNS provider is in the process of mitigating an attack on their European datacenters. We are monitoring the issue on our end.

"We are observing a resurgence of attack traffic in the Europe region and are actively working to mitigate." http://nsone.statuspage.io/incidents/g9fkrhqr7wnv
Posted May 17, 2016 - 05:15 PDT