Optimize iSCSI random access performance

Since we’re a company with a majority of software R&D, we obviously have to do with coding and computing all day long. Using Virtual Machines (VMs) is one of the most efficient way for our team to access multiple environments without multiplying the physical hardware and increasing our IT overhead. In my team, everyone ran at least two VMs to accommodate different development environments. All of the VMs ran on our team’s server — two ESXi hosts in a cluster connected to a DS3612xs as a shared storage pool. Usually everything was fine when we ran Windows VMs. However, whenever a member switched to a Linux VM and started to compile code, which was quite resource-intensive, all of the VMs became slow.

(Click to Enlarge)

Performance challenges in virtualization environments

When choosing a storage solution for your company, your decision should be based on the system’s performance and I/O requirements, which are quite “predictable” in most situations. However, in a virtualized environment, where each iSCSI LUN hosts multiple Virtual Machines (VMs), the situation becomes more complicated.

Here we faced an issue, named the “I/O blender effect”, which causes your servers to become slow and inefficient when processing multiple requests. Even though access is sequential from the perspective of individual VMs, the fact that multiples VMs are on the same LUN causes the requests to look like a large amount of random accesses.

Random access performance: identifying the problem

We conducted a test to illustrate that the random access is a problem for IOPS: on a 100Gb advanced file-based LUN running on a DS412+, the IOPS was very high when there were two workers with sequential access. But as soon as we changed a worker or two to random access, the performance plummeted.

DS412+/WD-Red 2Tx4/SHR-1, Adv-LUN 100G
IOmeter: 100% sequential or 100% random I/O 4K- READ/ Worker=2, Outstanding-commands=8

Imagine a file-based LUN as a reception desk and each VM as a customer. When there’s only one customer at the desk, the request is handled quickly. But when there are 20 customers in line, each with a different request to fulfill, the receptionist will need more time to finish the work. Even if each VM focuses on one task only, its efficiency and performance depends on the total requests from all VMs. If we can reduce the average waiting time during multiple VM requests, the performance can be increased.

An ideal solution is to replace typical hard drives with SSDs, which are not tied down by the same physical limitations as regular hard drives. Since there are no moving mechanical parts in a SSD, random access is much faster. But if budget is a primary concern and you can’t afford a large SSD cache, you’ll need to reduce the number of simultaneous accesses.

Boost of iSCSI random read performance

Starting from last year, we continued to improved iSCSI random read performance by allowing the file-based LUN to perform parallel requests. Basically it’s like increasing the number of receptionist at the desk without increasing the number of lanes. Therefore, the tasks taken care of by one receptionist are independent from other tasks. With this improvement, VMs are running much smoother. System’s boot-up time has been optimized as well.

The following charts compares a newer DSM version (v4377), which benefits from our work on this area. As you can see, changes are pretty impressive on read, and read/write.

DS1813+/WD-Red 2Tx8/SHR-1, Adv-LUN 100G
IOmeter: I/O 4K- READ and WRITE / Worker=2, Outstanding-commands=8

There are many factors that affect performance in virtualization environments. But it all comes down to providing higher throughput and IOPS. Optimizing random reading performance is just the start. I’ll share more about performance improvements next time.