After making decisions about scale, latency targets, and additional DNS based features that will be supported it’s time to define the next level of details.
Choose a suitable hardware platform
- Fast Intel/AMD based processor architecture
- 2 GB RAM – 8GB or higher if you plan to take advantage of additional DNS based features like redirection, or extensive statistics
- Gigabit interface
- Assign a separate IP address for query source
- Memory cache value: Depending on your environment, this value could vary. The number of subscribers accessing this server will have an impact on the configuration value. A value between 500-1500MB can be used.
- Recursive contexts: This value needs to stay low. A Recursive context is a thread that is used for recursion. The lower the number of recursive contexts the better is your Cache hit. The two values go hand in hand during the optimization of your caching server.
- Negative Cache TTL: the default value usually works in most environments, but with very low TTLs critical for RRs for most global entities, it is better to keep this value low, between 15 and 45 minutes.
- Don’t deploy a firewall in front of caching DNS servers (can deploy an Intrusion Prevention System or similar if needed)
- Define an ACL list that matches the address ranges of the subscribers who can access the server
- Use the maximum number of ports (16) for UDP Source Port Randomization to maximize protection against spoofed queries.
- Take advantage of query case randomization, also known as “0×20”. Be sure the server can requery (over TCP) without randomization to cover the small percentage of authoritative nameservers that don’t mirror query case.
Redundancy: Redundancy is critical when building a reliable caching infrastructure. Some common practices are
- Two logical caching servers at a minimum
- Ideally on different networks
- In different datacenters
- At least one server close to the subscriber < 20msec delay
Availability: The caching infrastructure always has to be available for the best subscriber Internet experience. There are several ways to deploy – the best solution will be guided by the network topology, the desired subscriber experience and cost.
- Load balanced: Horizontally scalable in large environments. It also allows additional control of ACLs on a hardware device if needed. The overhead is that it requires a load balancer expertise to manage the environment
- Anycast: This is a very common deployment model. This allows for individual servers to present themselves as DNS nodes on a network. DNS traffic is routed to the closest server on the network.
- Hybrid (Anycasting via a load-balanced configuration): This configuration is in use in large environments. This provides the flexibility of scaling a node to multiple servers based on subscriber density and traffic flows.
Capacity: A scalable infrastructure should be capable of handling failures in the network and/or hardware. Always provide enough headroom on a server for:
- Loss of a site
- Loss of a server
- Site maintenance
- Server maintenance
Set thresholds: This will allow your network operations center to be proactive to potential problems way before they become serious
- CPU utilization > 40%, 50% 60%
- Recursive Contexts should run at around 20% sustained rate, 50% should trigger a notification and 75% requires attention to see what’s going on.
- QPS per client
- Rate limit queries per IP
- Distribute servers
- Use best of breed caching servers
Simplify the deployment process so patches and software upgrades can be deployed quickly.
- Be able to quickly rebuild the OS
- Be able to quickly deploy a patch
- Be able to quickly upgrade the DNS software