The 95th percentile

The term Megabits should be familiar. Most ADSL lines at home have a fixed number of Mbits. Mine has 4Mbit down, meaning that I'm able to download a maximum of 4Mbit per second. In one hour I can download 4 Mbit/s X 3600 seconds / 8 bytes = 1.8 GigaByte per hour. This is called 'flat fee' or 'unmetered'.

In our data center, even if you buy 4 Mbit we do not limit you to this speed. With our servers you will be able download the same 1.8 GigaByte in about 15 seconds! You are allowed to use the full 1000Mbits available on our switches, even if you only buy 4Mbit. This is called 'bursting'. To make sure there is a sane balance between what you buy and what you burst, we, like most data centers, charge traffic to our bigger accounts in Mbits using the '95th percentile' method.

How and why

Our switches and routers are checked each 5 minutes to measure the counter on your uplink port. During a month (say 30 days) there are 30 X 24 X 12 = 8640 measurements stored in a database. These measurements create the following graph of your traffic usage:

  • the peak traffic is about 5.9 Mbit, this means that during those 5 minutes you transferred 221 MegaBytes. (221 MB X 8 bits / 300 seconds = 5,9 Mbit/s)
  • the average transferspeed is 4.3 Mbit, this means that during the entire month you have used 4.3 Mbit/s X 30 X 24 X 3600 / 8 = 1.39 TeraByte
  • The 95th percentile is calculated by sorting all measurements and removing the top 5% of them. We charge you for the maximum measurement that is left. In the example the 95th percentile is 4.6 Mbit. You pay for 4.6 Mbit even though you had an average of 4.3 Mbit, because of the fact that you were able to peak up to the 5.9 Mbit.

So why do data centers calculate cost to bigger accounts like this?

In order for a data center to allow you to peak, so your website does not slow down at busy times, they have to make sure they rent a fiber line that exceeds this maximum peak. In the above example the data center would probably need to rent a 10Mbit fiber connection (absurd example, I know) Now, the data center pays for that line, not for the amount of traffic that passes it. Digging the fiber is expensive and it does not get cheaper by not using it. Once they have a line that can 'do' 10 Mbit, nobody cares if they use it 24/7 or just for 1 second a month, the line is there anyway.

Back to the 95th percentile

If you are responsible for using 59% (5.9Mbit) of their entire 10Mbit line for just 5 minutes a month, they will forgive you because chances are that no other customer is going to use more than the remaining 41% during that exact same time. In fact the 95th percentile says that data centers forgive you the peak usage of about 36 hours each month. But if you start using 59% of their line during more than 5% of the time, the other customers will notice. The chance that another customer, or at least the sum of customers, will want to use more than the remaining 41% of the line at the same time is getting pretty big. If this situation arises the data center will need to expand its lines just for you, usually at a contract length of one year, with their respective providers (there is alway a bigger fish).

So why just for 'big' accounts?

Because many smaller customers peak at different times and they do not peak very high, compared to the overall usage of a data center, the spikes tend to cancel each other out in a very predictable shape. The data center can calculate a pretty good Mbit to GigaByte ratio. So in order to make life simple for the smaller users, traffic may as well be charged in an easy to understand format.

On the other hand, a big account will have a distinct impact on the shape of the traffic for a data center. And worse, this shape may not at all be predictable. That is why for the larger accounts the billing is done based on the degree to which you 'fill the pipeline' instead of how many gigabytes you manage to push through that pipeline, unifying the interests of you, the customer and us, the data center.

Lets say that you rent a server in our data center that does nothing but copy a large amount of backup data each night. The 'other side' is capable of sending at a speed of 50Mbits. For two hours each night your server is downloading data at that rate of 50 Mbits per second and the rest of the dayd it does nothing. We must make sure that your 50 Mbit does not interfere with the rest of the customers, so we have to reserve an extra 50 Mbit just for you. We will have to pay for this extra 'pipeline' so we charge you the 50 Mbit. A much better approach would be to set up a speed limit for the download. If you download the data in 8 hours instead of 2 you need only buy 12.5 Mbit to transfer the same amount of data and if you spread it to the entire 24 hours, at a constant rate, you will only need about 2 Mbit.

See how this unifies our interests? If you set up a flat, predictable traffic shape we do not have to reserve a large 'pipeline' and you do not have to pay for one. You need only pay for 2 Mbits, while still being able to burst to gigabit if you need it, for instance to quickly restore the backup at the maximum speed. It is not our choice to make, it is yours. You can choose to download it in 2 hours or in 24. Using the 95th percentile method we give you the full benefit of our maximum capacity if you want it and leave the choice up to you. You may argue here that we make it entirely your problem, but that is not true. The top 5% of the peaks is actually the part with the most risk. Lets imagine you are attacked for 5 hours. Instead of your ususal 4 Mbit, you get 150 Mbit! We do not charge you for this attack as it is less than 36 hours, but we still have to reserve the space in our lines to handle the traffic.

In practice

For most applications you can get about 200 to 250 GB out of the theoretical maximum of 330 GB per Mbit.

If you have a website that 'peaks' during the day and does little or nothing during the night, the best time to do a backup is during the night. Your day usage is what makes up the 95th percentile and determines how much of the pipeline you need and pay for. And you can 'fill up' the unused part of your part of the pipeline with backups at night.

If you want to get the most out of your Mbits, aim for a flat and predictable traffic shape. And most importantly if you expect traffic to increase for a longer period of time, let us know in advance. Traffic bought in advance is much cheaper than traffic that is unexpected and charged afterwards.