So long, and thanks for all the fish

This is it. I have not been able to find the time or the energy to keep this project updated. The domain pmsapp.org will expire on the 2. of March this year and I do not plan on renewing it. I am sorry to see the project die, but lack of interest and a strained schedule on my part forces me to stop this.

If you are interested in this project or need assistance in setting up your own storage appliance, you can reach me at pmsapp at dansbo.dk.

Update in lack of news

Just wanted to let everyone know that this project is still alive. I am continuing work on creating a web interface to ease deployment of a pmsApp storage cluster. Unfortunately it will probably be some time before version 0.3 with the new web interface is released.

In the meantime I am still hoping for a nice logo for the pmsApp project, with a bit of luck something will have been submitted before the release of the next version.

I would like to thank manyrootsofallevil for his great work in performance testing the pmsApp and at the same time ask for input on how to increase performance. If you have any suggestions or tips on how to increase performance on software RAID created from iSCSI targets, please let us know.

Performance testing the pmsApp with filebench – RAID 5

I finally managed to get the 3 node pmsApp running, although there are some odd issues relating to failover that I will need to investigate further.

I altered the methodology slightly this time, I decided to use a single guest and move it around to the various storage devices.

In addition to the pmsApp, I tested on our SAN array (M2000 storage works configured as RAID10) and direct to an ESX host disk. The direct to host results are better than last time, particularly the fileserver workload. The result for the 2 node pmsApp (RAID1) are displayed for convenience although they are the same as from last’s post.

varmail webserver fileserver randomwrite (100 MB file) randomread (100 MB file) webproxy
SAN 25.8 MB/s 25.45 MB/s 25.85 MB/s 76.1 MB/s 90.0 MB/s 15.3 MB/s
Direct 6.15 MB/s 3.9 MB/s 9.1 MB/s 48.12 MB/s 88.9 MB/s 3.2 MB/s
PMSARAID1 1.24 MB/s 6.83 MB/s 4.20 MB/s 13.63 MB/s 89.1 MB/s 1.5 MB/s
PMSARAID5 1.56 MB/S 12.86 MB/s 4.6 MB/s 15.2 MB/s 89.3 MB/s 4.41 MB/s

As might be expected the SAN array is miles faster than anything else, so I won’t mention anything else about it, I simply wanted to provide a comparison point.

Direct to host tests show a significant speed up on the fileserver workload over the results on last’s post. I don’t really have an explanation for this. The results were consistent, standard deviation of only 0.7 MB/s. The other workloads are faster but not as much. This is a better control test, though, as it was exactly the same guest.

There is still the mystery of the webserver workload, which has now been amplified by the 3 node pmsApp, which triples the performance of the direct to host configuration.

The good news is that the 3 node pmsApp is faster than the 2 node. This shouldn’t be all that surprising as now it is only the parity data that needs to be written across the network rather than all data, so the results show a nice improvement throughout. Consistency has also improved, with standard deviation around 5%, except for the webserver workload.

I would love to compare the performance of the pmsApp to the actual VMware app, but I don’t have the hardware to do it. Not sure, whether the app will work with single disks.

Performance testing the pmsApp with filebench – RAID1

Following on from my previous post, I’ve decided to use filebench  (Version 1.4.9.1)
for a more structured testing approach.  Filebench is easy to work with, as it has already defined several application types (workload) profiles and is easy to install. The workloads can be defined easily (using WML scripting) to mimic any sort of application load on the IO system. I cannot comment on how accurate these workloads are, but they make testing easy :).

The method I’ve used was very simple. Three runs of 60 seconds each, with the default settings for the workload profiles, except for the randomwrite and randomread workloads where I had to reduce the file size to 100 MB. I disabled randomization ( echo 0 > /proc/sys/kernel/randomize_va_space ) as recommended by filebench. Filebench allocated 170 MB of shared memory on all the runs (this appears to be the default).

The results are the average results of the three runs, except for the randomread workload where I only did two runs as the results were very similar. I did not repeat any tests as there was a lot less variation in the results. The results presented below are the IO summary result, rather than the individual operations results.

So without further ado here are the results:

Workload varmail webserver fileserver randomwrite randomread webproxy
Server
PMSTest (RAID1) 1.24 MB/s 6.83 MB/s 4.20 MB/s 13.63 MB/s 89.1 MB/s 1.50 MB/s
PMSControl 5.02 MB/s 3.73 MB/s 4.93 MB/s 39.83 MB/s 88.9 MB/s 2.33 MB/s

The webserver results are somewhat mystifying. The workload consists of opening, reading and closing files, so the results should be fairly similar but for some reason the pmsApp is faster!

The rest of the results are as expected:  Writes are significantly slower due to nature of the pmsApp (namely a RAID array across a network link) and reads are comparable.

It is also worth noting that regardless of the results, the test prep phase took longer, sometimes a lot longer, when running off the pmsApp, i.e on PMSTest.

Ver 0.2 – Finally ready for download

I finally managed to create an OVA that can be deployed on both ESXi4, VirtualBox and VMware Workstation.

I had to create and OVF with VirtualBox, edit it to change the hardware type from virtualbox-2.2 to vmx-7 and then use VMware ovftool to convert the ovf to OVA.

When deployed on ESXi4, it will complain about the guest os not being recognized, but you can still deploy it and later change the guest os setting.

Happy downloading.

Performance testing the pmsApp

So a few weeks ago a colleague pointed out Jimmy’s website about the pmsApp and seeing as we have a few underused blades in our development environment I offered my help to Jimmy.

He’s already published my findings regarding failover testing of the pmsApp and today I thought I would post some performance metrics. The first set of tests will be using the pmsApp configured to use a RAID1 (software) array.

In order to run the tests, I have created two guests:

One running off the pmsApp, PMSTest, and the other running locally on the blade disk (It’s a single disk), PMSControl.

I have installed CentOS 6.2 minimal install on both of them and installed the openssh-clients package to allow me to use scp.

Each guest has 512 MB of RAM, a single NIC (dynamically configured (lease time is 36 hours)) and a 8 GB hard drive.

Generally, speaking I ran three instances of the tests, unless there was a great disparity between the results, in which case I ran extra tests (normally two but sometimes three) and discarded the top and bottom results.

Generally speaking, the results of the guest running off the pmsApp were considerably less stable.

Reboot Time :  This measures the time taken from typing init 6 and pressing enter to the logon screen appearing. I found this test to be more reliable that pressing reset on the vsphere client or even using powercli.

PMSControl    PMSTEST (RAID1)
34.66 s           30.44 s
33.02 s           27.75 s
34.07 s           28.63 s

This was a clear win for the pmsApp.

The remaining tests are file copying tests. The source and destination server (depending on the test)  was a third blade server running RHEL 6.0. In theory traffic bound for another blade never leaves the enclosure, but I’m not so sure. I ran these tests out of hours so there would have been very little traffic, if any, on the network. I did repeat some tests during the working day for good measure and results were comparable.

SCP Write -  Copy File VMware ESXi 5.0 iso (297 MB) from blade server to Guest

PMSControl    PMSTEST (RAID1)
24.675 s          45.195 s
30.980 s          45.197 s
28.505 s          56.486 s

The pmsApp is clearly slower.

SCP Read -  Copy File VMware ESXi 5.0 iso (297 MB) from Guest to blade server.

PMSControl    PMSTEST (RAID1)
7.764 s           7.762 s
8.211 s            7.711 s
7.674 s           8.017 s

Reading is a tie.

SCP Write – Copy Directory containing multiple files (total size 203 MB) from blade server to guest. There is an 87 MB file and a 16 MB file, but the rest are sub 1 MB files.

PMSControl    PMSTEST (RAID1)
12.717 s         20.572 s
13.092 s         18.772 s
11.752 s          23.380 s

Another writing test and the pmsApp is slower again.

SCP Read – Copy Directory containing multiple files (total size 203 MB) from guest to blade server. There is an 87 MB file and a 16 MB file, but the rest are sub 1 MB files.

PMSControl    PMSTEST (RAID1)
11.531 s          11.623 s
10.972 s          11.082 s
11.294 s          11.245 s

Both the pmsApp and the direct guest are faster copying multiple small files than one big one, which is odd as I would’ve expected one big file to be quicker, but the results are consistent on both guests so I’m not too worried. I’d be happy to hear an explanation though.

A very small set of tests has shown that reading performance is comparable to hosting a guest directly on a disk. Unfortunately, write performance does suffer quite a bit, it’s ~ 40% slower and not very consistent.

In my next post I intend to use filebench to carry out some more tests using a more standardized approach.

Failover testing

A friend of the pmsApp project – A.Z. has done some failover testing. This is what he has found.

As long as there is no I/O taking place while the cluster is failing over, all is ok. If there is I/O then the results are mixed.

  • SCP stopped and would not restart
  • A script I wrote (writing the current date to a file every second) is unable to write to the file, but when the cluster comes back up, it continues on its merry way.

The problem seems to be that an ongoing session is not moved from the failed node to the node that takes over. A new session needs to be established. I am looking in to the possibility of having the sessions transferred from the failing node to the new active node. So far all I have found, is an article on how to create a failover nfs cluster. This is however with drbd, but that should not matter much.

A big thank you to A.Z. for testing the pmsApp. Looking forward to your performance tests.