Partial DNS Outage - What happened?

Posted by: 
Dejan Grofelnik Pelzel
Dejan Grofelnik Pelzel
March 19th, 2019

Yesterday at around 22PM GMT+1 we began noticing a slow, but steady drop of traffic on our system, but nothing seemed wrong at first. Our monitoring did not go off initially and it just seemed like everything was in order despite a small dip in traffic, which happens regularly.

A while after, we started receiving reports of connectivity issues from our users, but nothing looked wrong on our side initially. After further investigation it became apparent that the DNS resolution was failing on the root level. Our team investigated the issue and found out that the domain name registrar that we used for the domain without any prior notice removed the NS DNS records from our domain, which caused the DNS to slowly start to die.  As a result, at the lowest point, we have experienced a roughly 30% drop in connectivity worldwide.

It appears this happened due to a single abuse report that was sent at the same time as they began removing the NS records. Without a second thought, dropped tens of thousands of websites from the internet. While I understand it helps keep the internet clean, I feel this was highly irresponsible from their side considering the many different abuse contact options that we present and handle almost immediately.

We run tight access control on our system to assure that no breach can happen, so access to the domain itself is very limited for security reasons. Unfortunately in this case it also meant that a very small number of people had access to the actual account hosting the domain. Approximately 2.5 hours later after the beginning of the incident, we finally managed to establish contact and what exactly went wrong and restored the configuration to our domain, after which the connectivity quickly restored at approximately 1AM GMT+1.

We have been with the registrar for many years without any problems, however the incredibly poor handling of the abuse report on their part has made us strongly reconsider switching to a different company to prevent any similar issues in the future. We will be evaluating different options to find a registrar that we and our users can rely on. Another thing that is already on our roadmap is moving the status page to an externally hosted domain, so that in such a case, we can keep our users up to date with what's going on.

I wanted to personally apologize to all of our users for the issue and thank everyone for their patience. We had an almost perfect uptime record in the past 12 months and we have multiple systems in place to prevent any connectivity issues from happening. We designed a system with automatic healing and monitoring for most cases, but this one was unfortunately out of our control. That being said, we share your frustrations and of course accept full responsibility. If you have been affected by the outage, please open a ticket and request SLA compensation and we will be more than happy to help.

As always, we want to be fully transparent so if you have any questions, please let us know. Finally, I wanted to thank everyone for your patience and understanding. We share your frustration and will take steps to hopefully prevent any similar issues in the future.

Dejan - Founder of BunnyCDN