Failover testing

A friend of the pmsApp project – A.Z. has done some failover testing. This is what he has found.

As long as there is no I/O taking place while the cluster is failing over, all is ok. If there is I/O then the results are mixed.

  • SCP stopped and would not restart
  • A script I wrote (writing the current date to a file every second) is unable to write to the file, but when the cluster comes back up, it continues on its merry way.

The problem seems to be that an ongoing session is not moved from the failed node to the node that takes over. A new session needs to be established. I am looking in to the possibility of having the sessions transferred from the failing node to the new active node. So far all I have found, is an article on how to create a failover nfs cluster. This is however with drbd, but that should not matter much.

A big thank you to A.Z. for testing the pmsApp. Looking forward to your performance tests.

Leave a Reply