TU Delft Research

Stronger Together
Clean Network

The Benchmark Method

The benchmark method was developed in collaboration with TU Delft. Providers are ranked based on their response to reports. These reports concern hacked servers, phishing sites, command-and-control servers, spam servers, and exploit kits on their networks. The method takes into account the characteristics of the provider. The abuse feeds come from various sources, such as the Shadowserver Foundation, Spamhaus DBL, and Google Safe Browsing.

Objective

Almost all online service providers deal with abuse. This is also true in the hosting market, where it takes the form of hacked servers, phishing sites, malicious redirects, command-and-control servers, spam servers, exploit kits, etc.

As a hosting company trying to combat abuse, how do you know if you’re effective?

If you have ten incidents in a year, say hacked servers or spamming customers, is that a lot? A little? How many incidents do other providers in the hosting market actually have? How can you compare these numbers between companies of different sizes and with different types of services? Without answers to these questions, it’s difficult to know if you’re doing well.

Over the past few years, a team at TU Delft has been working on an abuse benchmark for the hosting sector. We are now sharing the most recent version with the sector. This doesn’t mean the benchmark is perfect and without limitations. There’s still uncertainty and noise. What we do know is that the benchmark contains valuable information that is useful for any hosting company that wants to learn how to combat abuse even better. We’ve extensively tested it, and it turns out that the benchmark can predict to a high degree how many incidents will occur in a hosting network. The benchmark is not intended for ‘naming and shaming’, but to give each individual company insight into where they stand compared to other providers.

Model

The ins and outs of the benchmark, and the tests we’ve conducted, have been published in a peer-reviewed scientific article [1]. In a nutshell, it works as follows:

  1. We define a hosting provider as the entity responsible for the IP ranges with hosting services according to WHOIS data. We don’t take Autonomous Systems (AS) as a starting point. In an earlier study, we discovered that there are an average of seven providers active per AS. In this first version of the benchmark, we’ve only included providers who are members of NBIP or Dutch Cloud Community and for whom we could retrieve the WHOIS information. That’s 129 providers.
  2. We then take a number of abuse feeds [2]. For each feed, we count the number of incidents seen at each provider in the period January-August 2018.
  3. Large providers have more incidents than small ones because they have more customers and more infrastructure. That doesn’t mean they’re less secure, of course. The type of services also makes a difference. Therefore, we collect some characteristics of the providers, such as how many IP addresses they advertise, how many domains they host, and how much shared hosting they have.
  4. We put the abuse and provider data into a statistical model. The model looks at the number of incidents, taking into account the size of the provider and, to some extent, the type of services in the network. It goes too far to explain exactly how the model works, but it’s basically the same as an IQ test or a standardized test. Such a model estimates how good someone is at math if this person has a certain point distribution over the different questions of the math test. In the benchmark model, each feed is like a test question on which the provider scores a number of points. The model then estimates which underlying skill best explains that point distribution relative to the other ‘students’. It also gives an uncertainty margin around that score. Some scores are fairly robust, others have a large margin.
  5. The benchmark is a number that expresses where the provider is located in the total group of 129 providers according to the model. We express this number as a percentile. A provider with a score of 20 is thus in the 20th percentile. This means that 20% of all providers have more abuse than this provider and that 80% of all providers have less abuse, taking into account size and type of services. A score of 20 is thus a poor score in terms of abuse prevention. You then belong to the worst 20% of the market. We use the following simple designations to communicate these results: a score between 1-20 is poor, between 20-80 is average, and between 80-100 is good.
  6. Finally, in addition to the abuse benchmark, we’ve also calculated a vulnerability benchmark. It follows the same steps, but with vulnerability data instead of abuse data[3]. This benchmark is also expressed with the descriptions poor (1-20), average (20-80), and good (80-100). So it’s possible that a provider performs well in terms of abuse and poorly in terms of vulnerabilities.

The abuse and vulnerability data used in the benchmark covers January through August 2018. This data is only available to a limited extent in the AbuseIO environment of abuseplatform.nl. This environment does not yet contain all feeds and only the most recent data. It’s intended that in the next iterations of the benchmark, the data will align with the platform. In other words, that the benchmark is based on the incidents and vulnerabilities that the provider can see in their own AbuseIO account on abuseplatform.nl.

More Information

If you have questions about the benchmark or the underlying data, you can contact Carlos Gañán from TU Delft:

C.HernandezGanan@tudelft.nl
+31 15 27 82216

References

[1] Publication with a scientific description of the benchmark

Arman Noroozian, Michael Ciere, Maciej Korczynski, Samaneh Tajalizadehkhoob & Michel van Eeten (2017), Inferring Security Performance of Providers from Noisy and Heterogenous Abuse Datasets, Workshop on the Economics of Information Security (WEIS2017), La Jolla, CA.
http://weis2017.econinfosec.org/wp-content/uploads/sites/3/2017/05/WEIS_2017_paper_60.pdf

[2] List of used abuse feeds

  • Spamhaus DBL (split into C&C, phishing, malware, spam)
  • Shadowserver Compromised website report
  • Shadowserver Command and control report
  • APWG
  • Phishtank
  • Google Safe Browsing

[3] List of used vulnerability feeds

  • Shadowserver Drone Report
  • Shadowserver CHARGEN report
  • Shadowserver Open Memcached server report
  • Shadowserver Open Resolvers Report
  • Shadowserver SSL Freak Vulnerable Report
  • Shadowserver SSL Poodle Report