The 95th percentile

The term Megabits should be familiar. Most ADSL lines at home have a fixed number of Mbits. Mine has 4Mbit down, meaning that I'm able to download a maximum of 4Mbit per second. In one hour I can download 4 Mbit/s X 3600 seconds / 8 bytes = 1.8 GigaByte per hour. This is called 'flat fee' or 'unmetered'.

In our data center, even if you buy 4 Mbit we do not limit you to this speed. With our servers you will be able download the same 1.8 GigaByte in about 15 seconds! You are allowed to use the full 1000Mbits available on our switches, even if you only buy 4Mbit. This is called 'bursting'. To make sure there is a sane balance between what you buy and what you burst, we, like most data centers, charge traffic to our bigger accounts in Mbits using the '95th percentile' method.

How and why

Our switches and routers are checked each 5 minutes to measure the counter on your uplink port. During a month (say 30 days) there are 30 X 24 X 12 = 8640 measurements stored in a database. These measurements create the following graph of your traffic usage:

• the peak traffic is about 5.9 Mbit, this means that during those 5 minutes you transferred 221 MegaBytes. (221 MB X 8 bits / 300 seconds = 5,9 Mbit/s)
• the average transferspeed is 4.3 Mbit, this means that during the entire month you have used 4.3 Mbit/s X 30 X 24 X 3600 / 8 = 1.39 TeraByte
• The 95th percentile is calculated by sorting all measurements and removing the top 5% of them. We charge you for the maximum measurement that is left. In the example the 95th percentile is 4.6 Mbit. You pay for 4.6 Mbit even though you had an average of 4.3 Mbit, because of the fact that you were able to peak up to the 5.9 Mbit.

So why do data centers calculate cost to bigger accounts like this?

In order for a data center to allow you to peak, so your website does not slow down at busy times, they have to make sure they rent a fiber line that exceeds this maximum peak. In the above example the data center would probably need to rent a 10Mbit fiber connection (absurd example, I know) Now, the data center pays for that line, not for the amount of traffic that passes it. Digging the fiber is expensive and it does not get cheaper by not using it. Once they have a line that can 'do' 10 Mbit, nobody cares if they use it 24/7 or just for 1 second a month, the line is there anyway.

Back to the 95th percentile

If you are responsible for using 59% (5.9Mbit) of their entire 10Mbit line for just 5 minutes a month, they will forgive you because chances are that no other customer is going to use more than the remaining 41% during that exact same time. In fact the 95th percentile says that data centers forgive you the peak usage of about 36 hours each month. But if you start using 59% of their line during more than 5% of the time, the other customers will notice. The chance that another customer, or at least the sum of customers, will want to use more than the remaining 41% of the line at the same time is getting pretty big. If this situation arises the data center will need to expand its lines just for you, usually at a contract length of one year, with their respective providers (there is alway a bigger fish).

So why just for 'big' accounts?

Because many smaller customers peak at different times and they do not peak very high, compared to the overall usage of a data center, the spikes tend to cancel each other out in a very predictable shape. The data center can calculate a pretty good Mbit to GigaByte ratio. So in order to make life simple for the smaller users, traffic may as well be charged in an easy to understand format.

On the other hand, a big account will have a distinct impact on the shape of the traffic for a data center. And worse, this shape may not at all be predictable. That is why for the larger accounts the billing is done based on the degree to which you 'fill the pipeline' instead of how many gigabytes you manage to push through that pipeline, unifying the interests of you, the customer and us, the data center.