I recently tested the performance of an application that was relying heavily on tempdb. Since the tempdb files reside on a dedicated LUN (F:\, in this case), it was easy to examine the performance of this drive through Perfmon. Surprisingly, I noticed an average read latency of 9 ms, which is much higher than the sub-millisecond latency that I usually see from a Kaminario K2 high performance storage appliance.
LogicalDisk F:
Avg. Disk Bytes/Read 110,295.106
Avg. Disk Bytes/Write 0.000
Avg. Disk sec/Read 0.009
Avg. Disk sec/Write 0.000
Disk Read Bytes/sec 165,151,292.596
Disk Reads/sec 1.621.429
Disk Write Bytes/sec 106,420,027.524
Disk Writes/sec 1,623.386
I/O bottlenecks on tempdb are a common performance issue, so I wanted to understand why Perfmon reported such a high latency number.
First, I tried to understand what the load was against tempdb. I managed to identify several queries that share a common plan: reading a large amount of data from two tables and performing a merge join. So tempdb was used to sort the intermediate data to enable the merge join.
Now, the feature that helped me a lot was the ability to view the internal latency of the K2 device. K2 offers deep analysis of its internal latency within its management UI. Examining this display revealed, as expected, a sub-millisecond latency. So why did Perfmon report a 9 ms average?
Looking at the workload (another powerful feature of the K2 UI), I saw the following, which was quite easily explained from the SQL load:
1. The workload contained 2-3 seconds of 64k writes followed by a read burst where the majority of reads were 512k.
2. The latency of the writes was fine.
3. The latency of the reads was very high since the read bursts were causing queuing on the host. This is a result of the Fibre Channel (FC) connectivity being a bottleneck. This happens because a 512k transfer in an 8 Gb FC takes around 600usec (and over 1 ms with a 4 GB FC), and when there are multiple requests simultaneously, the latency increases significantly.
Therefore, we added two more FC links and we saw a linear reduction in the read latency. What seemed, at first, to be a storage system issue turned out to be a storage connectivity problem.
Three points to learn from this exercise:
1. The ability to examine storage internal latencies is important when diagnosing performance problems. It can shorten the time to identify the source and cause of the problem.
2. When examining OS performance counters (such as Perfmon or iostat), you must remember that you see the entire stack that includes server -> link -> storage. The bottleneck can be anywhere in the path.
3. When architecting the FC connectivity, it is common to calculate the expected throughput and determine the number of FC links and speed accordingly. In our case, the average throughput was only 265 MB/s and the maximum throughput never reached the maximum bandwidth. However, if the application workload includes bursts of I/O, more host/target ports will help reduce the host latencies.
There is still a valid question to answer: Should you invest in additional ports to reduce the latencies? The answer is very subjective to the application. Obviously, it depends on many factors, such as how many and what speed ports you already have, how critical the application is to the business, etc. Regardless, you should perform an analysis of the application to understand the amount of I/O wait in order to predict the expected improvements from adding more ports. You should make the FC infrastructure investment only if the I/O wait is significant. For example, I can easily create a batch process that will run for 2 hours with a workload profile similar to what I described above. With additional ports, which will reduce the latencies significantly, this batch will complete in 1 hour and 58 minutes, a saving of only 2 seconds. This is because the I/O wait for this batch is low and the majority of the time is for CPU processing, which will not improve by adding more ports. In this case, it does not make sense to invest in more ports for this job.
Tags: application performance, Application Performance Tuning, Database Performance, high latency, I/O bottlenecks, I/O latency, I/O Performance, I/O wait, SSD appliances, Storage Performance





