Tuesday, November 22, 2011

Concepts in Disk Sizing

Here are the basics of what you need to know about disk sizing for the Exchange server. This article may not be comprehensive, but it should be enough to:
  1. Determine if you have sufficient disks 
  2. Detect if you have a disk bottleneck 
  3. Calculate the number of disk I/Os per second per user (also known as IOPS/user) 
  4. Estimate how many disks you need for a new server, based on past user behavior. 
"The amount of  IO" is the number of reads and writes to a drive. The actual bytes that are read or written are less interesting than the number of times the disk head has to move to a location.There is often confusion around the size of a disk (the number of bytes that can be stored on it) and the throughput (the number of IOs per second that can be read and written). Throughput is usually measured in IOPS (IOs per second or io/sec). It's important to know the maximum throughput (the maximum number of IOs your disks can sustain), because if you exceed that maximum, Hello Outlook Popup! RPC latencies will quickly go through the roof when maximum disk throughput is exceeded. When someone is referring to a disk bottleneck, they are referring to a throughput bottleneck, not a limitation of disk space.

Also, for this discussion, when I say IO, I'm usually refering to the physical disk\disk transfers per second to the database drives, but the basic principles can apply to sizing the rest of the drives as well. The reason I focus on the database drives is that Exchange server makes heavy use of the disks that house the database drives. For comparison, the store writes 1/10 the number of IOs to the log drive compared to the database drives. Even though I focus on database drives, be aware that SMTP queue drives and Exchange temp drives, depending on your company's email users, can also be heavy consumers of IO, and you will want to make sure you aren't exceeding the disk maximum throughput of those drives either.

Determining if you have enough disks

How do you know if your drives are healthy? The simplest way to check is to measure how long it takes for a read and write (referred to as the read and write latency). Take a look at the physical disk\sec per read and sec per write counters for your database drives. The server reports this in seconds, but we generally talk about it in ms. Are the latencies under 20 ms? If so, excellent. Your users are probably happy (or at least, complaining about something other than email responsiveness). If the latencies are larger than 20 ms, it's time to take a look at your disk usage. Do the physical disk\disk transfers/sec counters exceed the maximum throughput of your drives? Aah, now you ask, how do I determine the maximum throughput of my drives?

The best way to determine maximum throughput is to measure it. The jetstress tool is an excellent way to measure the maximum throughput of your disks. The documentation explains how to do this, so I'll skip that here. However, to use jetstress, you have to test your disks in a lab (not in a production environment). So what do you do if you have already have a server in production, and suspect you have exceeded the maximum throughput? The best thing you can do is make an estimate. Here's how I make estimates (there are many tricks, but these are fairly simple):
  1. Most disks can do between 130 to 180 IOPS. 
  2. Exchange typically has a Read-to-Write (R:W) ratio of 3:1 or 2:1. 
  3. We recommend that you plan to use less than 80% disk utilization at peak load. 
Raid 0 (striping) has the same cost as no raid. Reads and writes happen once.

Raid 0+1 requires two disk IOs for every write (the mirrored data is written twice)

Raid 5 requires four disk IOs for every write (two reads, two writes to calculate and write parity)

 I'm skipping the math unless someone asks for it. Essentially, this translates into the values in the tables below. These are the values you should when you estimate how much disk throughput is available for users during peak load.

Tables to lookup recommended maximum disk throughput per disk:

Table 1. Estimated maximum disk throughput for No Raid or Raid 0

R:W ratio \ Disk speed 130 IOs per second 180 IOs per second
3:1 104 IOPS 144 IOPS
2:1 104 IOPS 144 IOPS


Table 2. Estimated maximum disk throughput for Raid 0+1 (or Raid 10)

R:W ratio \ Disk speed 130 IOs per second 180 IOs per second
3:1 83 IOPS 115 IOPS
2:1 78 IOPS 108 IOPS


Table 3: Estimated maximum disk throughput for Raid 5

R:W ratio \ Disk speed 130 IOs per second 180 IOs per second
3:1 59 IOPS 82 IOPS
2:1 52 IOPS 72 IOPS


I'm too lazy to even use tables, so I take the conservative approach and assume I can safely get a throughput of 80 IOs per second for most disks, in a raid 0+1 configuration (Raid 0+1 is generally recommended for most database drives).

If you have multiple drives (or "spindles") connected in a raid configuration, multiply the throughput by the number of drives. Thus, 10 disks in raid 0+1 will safely support a load of 800 IOPS. I spoke with one customer who recently changed disks. The company had previously had 6 small disks, and recently replaced them with 3 large disks. Since then, users had been seeing a lot of Outlook popups while waiting for messages to open, change folders etc. When the disks were replaced with fewer larger disks, the 3 disks were unable to deliver the io throughput that 6 disks were able to deliver. The disks were bottlenecked; io latency went up, and so did RPC latency as a consequence. Solution? Put more disks in there! I want to stress that this is not an uncommon scenario - it seems perfectly reasonable to move to fewer larger disks….but you can see here how it can get your server into trouble.

What if your disks throughput is below the maximum, but the latencies are still high? Sometimes the problem is a configuration problem (eg, max queue depth) or is occuring because you are sharing SAN drives with another application, which is consuming a lot of io bandwidth. When Exchange is competing with another application for io, user experience suffers. If you are having a poor latency and think the throughputs are well below the disk maximum throughput, you will have to go back to your disk guru and start troubleshooting. In general, we don't recommend sharing database SAN spindles with other applications. And never never share log drives with any other application (this significantly reduces the throughput).

Detecting disk bottlenecks

It's pretty simple to tell if you have a disk bottleneck. If the latencies to your disk drives are greater than 20 ms (0.02 as measured from physical disk\disk seconds per read and disk seconds per write), then disks are starting to be an issue. You can survive on disks with 50 ms latencies, but the user experience improves significantly if they are reduced. On our internal exchange servers, we keep the latency to 10 ms for read IOs, and around 1 ms for write io (write latencies can be very low if you have a battery-back write-back cache).

You should be able to confirm the cause of your bottleneck (exceeding maximum disk throughput) by measuring the physical disk\disk transfers per second and comparing with your estimated maximum throughput.

Calculate your IOPS/user

If you've been reading some of the whitepapers or attending talks on Exchange server, you've probably seen references to IOPS per user. Generally, this refers to the number of IO read and write requests to the database drive, divided by the number of users.

Measure the physical disk\disk transfers per second for all databases for between 20 minutes to 2 hours during your most active time (for example, this is from 9-11 AM on a Monday here at Microsoft). During this time, also measure the number of active users (MSExchangeIS\Active User Count). Take an average of these counters. Sum the disk transfers/sec for each database, divide the first number by the second and… Voila! You have just calculated the number of IOPS per user.

Keep in mind that the number of IOPS/user is determined by how active your users are. You may find that this differs from server to server (and database to database). Don't sweat it. These numbers are used as guidelines, but accurate numbers aren't always necessary…as long as you build in a little overhead when planning & populating your servers. However, you can use these numbers to help decide when you want to move users from a busy server to another server.

(Note that, as a general practice, it's a good idea to always measure the server when it's at peak load. When you are sizing your servers, you always need to plan for maximum usage… and then leave a little buffer overhead for those extra special days…like when all the users return from Christmas break).

Estimate how many disks you need for a new server, based on past user behavior.

Now that you know how to measure (via jetstress) or estimate (from above) maximum disk throughput, and you know the IOPS/user, it's a simple task to plan for how many disks you'll need for a new server.

Assuming the new users have a similar email usage profile (are using the same clients, have the same percentage of plugins, send about the same mail), then here's how you go about it:

Calculate the throughput you will need. (multiply the number of users on the new server by the number of IOPS/user)

Divide the throughput by the maximum throughput of the disks you are using (use the numbers from the table above, or the result from jetstress * 0.8. The numbers in the table above already include the 80% max usage to build in some overhead). Round up. This will give you the minimum number of disks that you will need for the server. Next, divide by the number of databases, and round up. This will give you the number of disks you need per database (or repeat with storage groups if you databases share the same physical drive).

That's it! Oh, I suppose it's always a good idea to do an example:

Ok, suppose I am hiring 5000 people (growth is good!), and I want to figure out how to size my server. My current users require 0.4 IOPS per user, and I expect the new guys to be just as hard working as my current employees. I will need a total of 2000 IOPS.

I'm going to buy fast disks capable of 180 IOPS, which I'm going configure in Raid 0+1. From the table, I can expect to get around 108 IOs per second. 2000 IOPS/108 IOPS per disk = 18.5. This will imply that I'm going to need 19 disks, if all IOs were all going to the same place. But they aren't of course (backup times would be unwieldy!!!) - I'm planning to have 20 db spread across 4 storage groups. The databases on the same storage group will share the same disk. So each storage group disk will need to support 2000/4 = 500 IOPS. That means each storage group disk will need to have 500/108 = 4.6 disks. Rounding up shows that I will need 5 disks for each storage group. So the total number of disks I will need is 5*4 = 20 disks.

Suppose after buying my disks, I test them in the lab and my jet stress tests of these disks only show 120 IOs per second total, which gives me 96 IOPS to play with after I've multiplied by 0.8 to give me a 20% safety buffer. I redo my calculations and find it doesn't affect my results (because I'd already rounded up earlier). So, I'm ready to build out my server and add the new users.

(Note I haven't calculated the amount of disk space capacity I'll need…I'll leave this for the readers…unless I get specific requests. In many cases, disk capacity is less of an issue because disk space on disks has risen significantly. For most Exchange customers, the real issue is around disk throughput).

Thanks to you (the reader) for taking the time to read this - I hope you have found the content interesting :)

No comments:

Post a Comment