Performance testing the pmsApp

So a few weeks ago a colleague pointed out Jimmy’s website about the pmsApp and seeing as we have a few underused blades in our development environment I offered my help to Jimmy.

He’s already published my findings regarding failover testing of the pmsApp and today I thought I would post some performance metrics. The first set of tests will be using the pmsApp configured to use a RAID1 (software) array.

In order to run the tests, I have created two guests:

One running off the pmsApp, PMSTest, and the other running locally on the blade disk (It’s a single disk), PMSControl.

I have installed CentOS 6.2 minimal install on both of them and installed the openssh-clients package to allow me to use scp.

Each guest has 512 MB of RAM, a single NIC (dynamically configured (lease time is 36 hours)) and a 8 GB hard drive.

Generally, speaking I ran three instances of the tests, unless there was a great disparity between the results, in which case I ran extra tests (normally two but sometimes three) and discarded the top and bottom results.

Generally speaking, the results of the guest running off the pmsApp were considerably less stable.

Reboot Time :  This measures the time taken from typing init 6 and pressing enter to the logon screen appearing. I found this test to be more reliable that pressing reset on the vsphere client or even using powercli.

PMSControl    PMSTEST (RAID1)
34.66 s           30.44 s
33.02 s           27.75 s
34.07 s           28.63 s

This was a clear win for the pmsApp.

The remaining tests are file copying tests. The source and destination server (depending on the test)  was a third blade server running RHEL 6.0. In theory traffic bound for another blade never leaves the enclosure, but I’m not so sure. I ran these tests out of hours so there would have been very little traffic, if any, on the network. I did repeat some tests during the working day for good measure and results were comparable.

SCP Write –  Copy File VMware ESXi 5.0 iso (297 MB) from blade server to Guest

PMSControl    PMSTEST (RAID1)
24.675 s          45.195 s
30.980 s          45.197 s
28.505 s          56.486 s

The pmsApp is clearly slower.

SCP Read –  Copy File VMware ESXi 5.0 iso (297 MB) from Guest to blade server.

PMSControl    PMSTEST (RAID1)
7.764 s           7.762 s
8.211 s            7.711 s
7.674 s           8.017 s

Reading is a tie.

SCP Write – Copy Directory containing multiple files (total size 203 MB) from blade server to guest. There is an 87 MB file and a 16 MB file, but the rest are sub 1 MB files.

PMSControl    PMSTEST (RAID1)
12.717 s         20.572 s
13.092 s         18.772 s
11.752 s          23.380 s

Another writing test and the pmsApp is slower again.

SCP Read – Copy Directory containing multiple files (total size 203 MB) from guest to blade server. There is an 87 MB file and a 16 MB file, but the rest are sub 1 MB files.

PMSControl    PMSTEST (RAID1)
11.531 s          11.623 s
10.972 s          11.082 s
11.294 s          11.245 s

Both the pmsApp and the direct guest are faster copying multiple small files than one big one, which is odd as I would’ve expected one big file to be quicker, but the results are consistent on both guests so I’m not too worried. I’d be happy to hear an explanation though.

A very small set of tests has shown that reading performance is comparable to hosting a guest directly on a disk. Unfortunately, write performance does suffer quite a bit, it’s ~ 40% slower and not very consistent.

In my next post I intend to use filebench to carry out some more tests using a more standardized approach.

Failover testing

A friend of the pmsApp project – A.Z. has done some failover testing. This is what he has found.

As long as there is no I/O taking place while the cluster is failing over, all is ok. If there is I/O then the results are mixed.

  • SCP stopped and would not restart
  • A script I wrote (writing the current date to a file every second) is unable to write to the file, but when the cluster comes back up, it continues on its merry way.

The problem seems to be that an ongoing session is not moved from the failed node to the node that takes over. A new session needs to be established. I am looking in to the possibility of having the sessions transferred from the failing node to the new active node. So far all I have found, is an article on how to create a failover nfs cluster. This is however with drbd, but that should not matter much.

A big thank you to A.Z. for testing the pmsApp. Looking forward to your performance tests.