Excerpted from Nominum Spring 2017 Security Report
In late April, we released the Nominum Spring 2017 Security Report, the latest report on our security research team’s DNS and HTTP analysis which provides a comprehensive view of the current cyberthreat landscape. In the report we take a look at “new core domains” and how they help us identify “zero-day attacks” so we can take steps mitigate them.
Nominum Data Science defines a ‘new core domain’ as a domain that hasn’t seen any traffic to it before. From our experience (and from others’), a majority of new domains has the tendency to be malicious, or at least have mal-intent; in most cases these are not domains created to serve legitimate purposes. The following is the process Nominum Data Science applies to detect suspicious and malicious core domains.
An excerpt from the report follows. You can access and read the full report here.
Nominum Data Science has a recipe to classify new domains. First, we filter new domains into a quarantine list or “gray list.” Then, additional classification algorithms are used to make the distinction between gray-area domains and legitimate domains.
Next, we group domains that have resolved DNS queries and the domains that have unresolved queries. This is important for our threat classification: new, unresolved domains are usually associated with botnet C&Cs. New, resolved domains are associated with phishing, adware, malvertising and other types of attacks, which must be registered and resolvable to perform their intended malicious function.
The Nominum Zero-day Dashboard provides a real-time, inside view into the process of detecting new malicious core domains. We begin with one million queries processed per second, then filter for new core domains only (usually 50-60 per second). Then, Nominum machine learning algorithms are applied, along with filtering and clustering to identify malicious domains. On average, four to five percent of domains reach the end of the funnel, and are relayed to our streaming threat intelligence.
Once the domains are classified into two mega-groups, Nominum Data Science applies proprietary (unsupervised) machine learning algorithms to build smaller clusters of domains, identifying subtle relationships between the cluster’s members to glue them together. Now that we have our new core domain clusters, we move to final classification. We want to determine “known/named-malicious” or “unknown/unnamed-malicious” queries.
We match our clusters with up-to-date third party cyber-intelligence data. If even a single domain in a cluster is mapped to one of the “known” malicious domains, this elevates the maliciousness level of the entire cluster (what we call “guilt by association”). The more domains we can map in a cluster to “known” malicious domains, the higher our confidence is in the maliciousness of the cluster.
Next are the “unknown/unnamed-malicious” clusters. In this category, we consider the clusters that do not match any known threat but still have enough bad characteristics to indicate maliciousness. A cluster of unresolved domains, e.g., those with a similar string length, are very likely malicious, even though the security industry has not yet identified and named them.
As mentioned earlier, the enormous growth in the number of cyberthreats, powered by the commercialization of malware production, tests the limits of security firms. How can they survive the cat-and-mouse game against attackers without exhausting resources? Rather than hiring thousands more researchers and analysts—a good, yet expensive idea— Nominum takes a leap of faith into the threat-agnostic approach where threats are detected and blocked without prior knowledge about them based on anomalous behavior. This is accomplished with the unknown/unnamed-malicious clusters classification.
The video below shows the extent to which our Zero-day Quarantine method helps stretch existing cyberthreat intelligence. All data is based on a single day of analysis.
In the chart below, you see quarantine Cluster 25 which was created through the Zero-day Quarantine clustering process on March 19. It includes seven core domains. When matched against third party cyber-intelligence, one of the domains in the cluster was found to be related to a recent C&C activity of a specific threat type. Based on this information, we use the “guilty by association” approach, elevating the risk level of the other six (never seen before) domains.
quarantine Cluster 25