iSCSI target for Oracle RAC cluster with SSD and 10 Gb Ethernet

It’s already an old one story 🙂 More then year ago we was ripe for migrating from SATA to new SSD disk storage. The problem was our infrastructure wasn’t ready – I mean when I launched local performance measure test it showed 180K IOPS (in 8192 blocks) whilst remote (using iSCSI) performance measure test showed only 18K IOPS. 10 times is altogether bad.

After some investigation we found the bottleneck was iSCSI target. And since IET hasn’t being developed any more we decided to move out to SCST. Below is our experience with getting the maximum performance using SCST as iSCSI target.

1. About the measurements.

We used ORION for testing IO throughout. How to do this has been skilfully explained by Alex Gorbachev.

I made a bunch of experiments but executing just a single test appeared to be enough:


orion -run advanced -duration 60 -matrix row -num_large 0 -num_disks 300 -simulate raid0 -type rand

It lasts for about 96 minutes and gradually (in reality logarithmically from 1 to 1500) increases a number of parallel workers so we could see:
1. What is the throughput of a single session
2. What is the maximum throughput
3. The system behaviour: how many parallel session needs for reaching the peak, what happens after reaching the peak, what is the velocity of changing and so on

Other tests of course might be very useful too but in my case have added nothing new.

2. System-wide settings

Migrating from IET to SCST increased throughput from 18K IOPS to 30K IOPS. Good but not enough. The next step was in tuning OS and hardware devices. The similar was described in Tuning 10Gb network cards on Linux

Below are the results of my experiments

img1

On the pictuire:

  • Y-axis is the overall throughput in 8K blocks;
  • X-axis is the amount of parallel read processes;
  • Green line – disk performance from one host without any changes, roughly 30K IOPS;
  • Red and blue lines – disk performance after changing the queue and scheduler settings (see for example tuning noop scheduler) for SSD devices on a target nodes, about 45K IOPS;
  • Yellow line – the same as red + changing the ixgb driver settings, increase the number of parallel processing queues, RSS and VMDQ (see details in README in the latest ixgbe Intel driver). As you can see the peak performance is the same but for small amount of parallel processes (less than 20) performance is higher on 5-10K IOPS, so we could state that the system is more responsive;
  • Brown line – same as yellow + NUMA affinity setttings (see for example IRQ affinity and CPU affinity) , as you can see performance is only slightly better, practically the same 45K IOPS;
  • Sky Blue – the total performance from 2 initiator hosts and 1 target host. Peak performance is 80K IOPS but periodically falls down to 60K IOPS;
  • Black line – “raw” performance from target host, without iSCSI, just for comparison.

3. Tuning iSCSI initiator

The difference between raw and iSCSI devices is almost 2 times, it’s really huge. After a series of experiments I did the test from target, when host is both initiator and target and got the same results. It was strange (network wasn’t used and a such degradation), I was confused and began checking all settings for each component. And at this moment I found post Bart Van Assche about parallel sessions and everything fell into place. My problem was not in target but in initiator. According to open-iscsi site maximum performance for one session is 50K IOPS. I’ve read it but didn’t relate to my configuration – as you could see in previous graph I’ve reached even 60K IOPS, but it was a theoretical limit for one open-iSCSI session. Cure is simple, I just changed my scst config to using multiple targets (and thus sessionsб 3 disks per 1 session):


TARGET iqn.2001-04.com.yandex:targettest01e_1.yandex.ru {
LUN 0 T01L200
LUN 1 T01L201
LUN 2 T01L202
}

TARGET iqn.2001-04.com.yandex:targettest01e_2.yandex.ru {
LUN 0 T01L203
LUN 1 T01L204
LUN 2 T01L205
}

And it increased IOPS twice to 100K IOPS from one host. Below is the new series of tests (sorry for unnecessary lines I made this graph for internal usage)

img2

On a graph:

  • Y-axis is the overall throughput in 8K blocks;
  • X-axis is the amount of parallel read processes;
  • Blue and Red lines – “raw” performance from target host, without iSCSI, just for comparison;
  • Green and Yellow lines – multiple sessions from one host, about 80K IOPS;

4. Changing linux kernel

It is need to mention the last strange thing: the overall instability. During the measurements I saw that net performance varied from 4 to 8 Gb/s from time to time and the similar is being watched on the graph above (Green and Yellow lines). Surprisingly, it was solved as easily as in a case of iSCSI initiator. I just changed kernel from UEK R2 (2.6.39.400) to UEK R3 (3.8.13) and you can see

  • Sky Blue – performance with UEK R3, about 100K IOPS.

To sum up, we got 100K IOPS from one host over iSCSI – it’s 8 Gb/s, I think for 10 Gb Ethernet it’s very close to the best result.

Advertisements
iSCSI target for Oracle RAC cluster with SSD and 10 Gb Ethernet