An earlier post talked about how important it is to maximize the responsiveness and availability of caching DNS in order to maintain a good user experience. It focused on the benefits of using Anycast. There are several other things worth considering for caching DNS as covered below:
- Place recursive DNS servers close to customer access networks – this minimizes network latency for DNS queries, and helps to reduce the perceived response times from websites. Complex web pages can generate 20 or more DNS queries, often sequentially, so fast DNS response has a significant impact on overall page load times.
- Allow plenty of capacity. A maximum CPU load of 20% for the DNS process and 30% overall (including management agents and other things that may be running) at peak time are good targets to ensure ample headroom. Remember that monitoring systems such as Cacti give a short-term average, usually over several minutes, and DNS traffic spikes may be hidden. Plenty of headroom means that these spikes can be handled, and allows “breathing space” if an attack starts or something on the Internet creates an unusual surge in traffic.
- Block UDP and TCP port 53 access to your servers from outside your network. There is no need to provide DNS service to the rest of the Internet (and all the DDoS sources out there!). Optionally, using the distributed Anycast system described above, each DNS server only needs to be exposed to “local” users. However, if this is done, care must be taken to ensure that there are accessible servers in the event that the local one fails.
- If you site multiple DNS servers together, use a suitable load balancer. The Anycast system is not capable of accurately load-balancing between adjacent servers. However, most load balancers and switches are capable of advertising Anycast routes, and withdrawing them when available server capacity drops below a defined threshold. Remember the traffic is mostly UDP. Load balancers are very important tools, and can help rate-limit when an attack occurs, but they do introduce another potential failure point.
- An option worth considering is hosted services (Nominum offers Skye Resolution) to provide additional live or backup capacity to maintain caching DNS service if you are attacked. Be sure to confirm the vendor has a globally distributed network to minimize the likelihood a DDOS attack against your DNS servers affects the hosted service at the same time. Hosted services should also be actively monitored for unusual traffic.
- Before the DDOS attack happens, make sure your monitoring and alarms are up to the job. You need to know very quickly when server load has increased. Have trace tools ready to look at query source addresses for unusual patterns. Nominum iView is a valuable ally here, as it is specifically designed to monitor and control DNS systems.
- Make sure your organization and response procedures are well understood. Often, when a major outage or attack occurs, the biggest reason for delayed response is confusion: who to call, what steps to take? In many cases, the attack is over before the attacked organization has started analyzing it. At minimum be ready to capture relevant data during an attack so you have something to analyse after the event.
- Document what you find out after the event. Even if you cannot analyse the event while it is happening, there is still great value in a post-mortem investigation, as it will help you defend better and respond more quickly next time.