Friday, June 28, 2013

DNS vs DDoS - When 3rd Party Externalities Affect Our Service

For the past week or so, Zerigo - our managed DNS provider, has suffered a number of DDoS attacks against its network; thereby disrupting name servers and disabling service for many of our customers.

Although we at VM Farms have designed and built our environment to deliver an uptime of 99.99%  (or better), we realize that issues affecting 3rd parties will still creep into our world and trickle down to affect our customers as well.  Not cool.

In continuing with our militant approach to maintaining a fault-tolerant infrastructure, our operations team sprung into action to move customer DNS records over to our secondary DNS provider, Route53.  All that remained was for customers to update their name server information with their registrar and access would be restored.  Piece of cake, right?  Well, not quite.

The Problem

To achieve this task, our team set out to use Zerigo's API to retrieve the zone files for each of our customers, convert them, and push them up to Amazon's Route53 service using their respective API.  The problem that our team quickly encountered was that Zerigo's API was accessed through a domain name, managed through Zerigo's DNS service, whose name servers were down.

"@#$%!" - Q*Bert

The Solution

Upon hitting this catch-22, an internet-wide scavenger hunt ensued to find a cached name server that still had the IP address information for Zerigo's API.  Searching high and low (and stopping only to look at amusing pictures of cats), a cached name server was finally located in what we can only imagine was deep under a mountain in Colorado.  Success!  With the IP information, our team was able to access the API and create an ad hoc script that would convert and move all zone files into Route53.  Access could now be restored.

The Impact

Funny anecdote to the anecdote I just told.  As we were drafting instructions for our customers to update their name servers with their respective registrars, Zerigo announced that the DDoS attack had been mitigated, and that service had been mostly restored.  As awesome as that was for everyone, we were still on our high - we wanted to be heroes.

But it wasn't all for waste.  Although we set out to work around a 3rd party outage on behalf of our customers, what we ended up with was a redundant DNS system that is now built into our platform, and an our overall improvement to our fault-tolerance as a managed service provider.

With that said, we are pleased to announce that the DNS management service included with our service is now redundant across multiple providers; at no additional cost to our customers!

No comments:

Post a Comment