Performance testing the pmsApp with filebench – RAID1

Following on from my previous post, I’ve decided to use filebench  (Version 1.4.9.1)
for a more structured testing approach.  Filebench is easy to work with, as it has already defined several application types (workload) profiles and is easy to install. The workloads can be defined easily (using WML scripting) to mimic any sort of application load on the IO system. I cannot comment on how accurate these workloads are, but they make testing easy :).

The method I’ve used was very simple. Three runs of 60 seconds each, with the default settings for the workload profiles, except for the randomwrite and randomread workloads where I had to reduce the file size to 100 MB. I disabled randomization ( echo 0 > /proc/sys/kernel/randomize_va_space ) as recommended by filebench. Filebench allocated 170 MB of shared memory on all the runs (this appears to be the default).

The results are the average results of the three runs, except for the randomread workload where I only did two runs as the results were very similar. I did not repeat any tests as there was a lot less variation in the results. The results presented below are the IO summary result, rather than the individual operations results.

So without further ado here are the results:

Workload varmail webserver fileserver randomwrite randomread webproxy
Server
PMSTest (RAID1) 1.24 MB/s 6.83 MB/s 4.20 MB/s 13.63 MB/s 89.1 MB/s 1.50 MB/s
PMSControl 5.02 MB/s 3.73 MB/s 4.93 MB/s 39.83 MB/s 88.9 MB/s 2.33 MB/s

The webserver results are somewhat mystifying. The workload consists of opening, reading and closing files, so the results should be fairly similar but for some reason the pmsApp is faster!

The rest of the results are as expected:  Writes are significantly slower due to nature of the pmsApp (namely a RAID array across a network link) and reads are comparable.

It is also worth noting that regardless of the results, the test prep phase took longer, sometimes a lot longer, when running off the pmsApp, i.e on PMSTest.

Ver 0.2 – Finally ready for download

I finally managed to create an OVA that can be deployed on both ESXi4, VirtualBox and VMware Workstation.

I had to create and OVF with VirtualBox, edit it to change the hardware type from virtualbox-2.2 to vmx-7 and then use VMware ovftool to convert the ovf to OVA.

When deployed on ESXi4, it will complain about the guest os not being recognized, but you can still deploy it and later change the guest os setting.

Happy downloading.

Performance testing the pmsApp

So a few weeks ago a colleague pointed out Jimmy’s website about the pmsApp and seeing as we have a few underused blades in our development environment I offered my help to Jimmy.

He’s already published my findings regarding failover testing of the pmsApp and today I thought I would post some performance metrics. The first set of tests will be using the pmsApp configured to use a RAID1 (software) array.

In order to run the tests, I have created two guests:

One running off the pmsApp, PMSTest, and the other running locally on the blade disk (It’s a single disk), PMSControl.

I have installed CentOS 6.2 minimal install on both of them and installed the openssh-clients package to allow me to use scp.

Each guest has 512 MB of RAM, a single NIC (dynamically configured (lease time is 36 hours)) and a 8 GB hard drive.

Generally, speaking I ran three instances of the tests, unless there was a great disparity between the results, in which case I ran extra tests (normally two but sometimes three) and discarded the top and bottom results.

Generally speaking, the results of the guest running off the pmsApp were considerably less stable.

Reboot Time :  This measures the time taken from typing init 6 and pressing enter to the logon screen appearing. I found this test to be more reliable that pressing reset on the vsphere client or even using powercli.

PMSControl    PMSTEST (RAID1)
34.66 s           30.44 s
33.02 s           27.75 s
34.07 s           28.63 s

This was a clear win for the pmsApp.

The remaining tests are file copying tests. The source and destination server (depending on the test)  was a third blade server running RHEL 6.0. In theory traffic bound for another blade never leaves the enclosure, but I’m not so sure. I ran these tests out of hours so there would have been very little traffic, if any, on the network. I did repeat some tests during the working day for good measure and results were comparable.

SCP Write –  Copy File VMware ESXi 5.0 iso (297 MB) from blade server to Guest

PMSControl    PMSTEST (RAID1)
24.675 s          45.195 s
30.980 s          45.197 s
28.505 s          56.486 s

The pmsApp is clearly slower.

SCP Read –  Copy File VMware ESXi 5.0 iso (297 MB) from Guest to blade server.

PMSControl    PMSTEST (RAID1)
7.764 s           7.762 s
8.211 s            7.711 s
7.674 s           8.017 s

Reading is a tie.

SCP Write – Copy Directory containing multiple files (total size 203 MB) from blade server to guest. There is an 87 MB file and a 16 MB file, but the rest are sub 1 MB files.

PMSControl    PMSTEST (RAID1)
12.717 s         20.572 s
13.092 s         18.772 s
11.752 s          23.380 s

Another writing test and the pmsApp is slower again.

SCP Read – Copy Directory containing multiple files (total size 203 MB) from guest to blade server. There is an 87 MB file and a 16 MB file, but the rest are sub 1 MB files.

PMSControl    PMSTEST (RAID1)
11.531 s          11.623 s
10.972 s          11.082 s
11.294 s          11.245 s

Both the pmsApp and the direct guest are faster copying multiple small files than one big one, which is odd as I would’ve expected one big file to be quicker, but the results are consistent on both guests so I’m not too worried. I’d be happy to hear an explanation though.

A very small set of tests has shown that reading performance is comparable to hosting a guest directly on a disk. Unfortunately, write performance does suffer quite a bit, it’s ~ 40% slower and not very consistent.

In my next post I intend to use filebench to carry out some more tests using a more standardized approach.

Failover testing

A friend of the pmsApp project – A.Z. has done some failover testing. This is what he has found.

As long as there is no I/O taking place while the cluster is failing over, all is ok. If there is I/O then the results are mixed.

  • SCP stopped and would not restart
  • A script I wrote (writing the current date to a file every second) is unable to write to the file, but when the cluster comes back up, it continues on its merry way.

The problem seems to be that an ongoing session is not moved from the failed node to the node that takes over. A new session needs to be established. I am looking in to the possibility of having the sessions transferred from the failing node to the new active node. So far all I have found, is an article on how to create a failover nfs cluster. This is however with drbd, but that should not matter much.

A big thank you to A.Z. for testing the pmsApp. Looking forward to your performance tests.