Creating an OVA

Posted on October 17, 2015 by Jimmy

While working on creating the next version of the pmsApp OVA, I ran into some issues so I thought I would try to make a small guide to creating an OVA here. Mostly for my own benefit.

I installed version 0.2 of the pmsApp and updated it to get the latest software versions. When I then tried to create a new OVA it turned out to be quite a lot larget than the original. I needed to make the image smaller.

I started out by clearing the yum cache and uninstalling the old kernel, that shrinked the used space to around the same as before the update. I found this guide as to how to prepare a VM to be a template, that freed up a bit more space.

Using VirtualBox, i created an OVA from the VM. Now the problem is that an OVA created with VirtualBox can not be imported into ESXi. Fortunately an OVA is actually just a tar archive so I was able to extract the .ovf file and the vmdk’s without issues. After that it is a queston of editing the ovf and changing the hardware version from virtualbox-2.2 to vmx-Y where Y is the vmware hardware version wanted. I plan on going with vmx-8.

Now the problem is assemling the OVA again, to do that, I used vmware’s ovftool, but it wants a .mf file to ensure that the files are not corrupted. I found this article, explaining how to create the .mf file.

To have everything in one place, here is what I have done to the VM to make it into an OVA:

First, we are working “inside” the VM to reduce space used.

Stop logging:

service rsyslog stop
service auditd stop

Remove old kernels:

uname -a
rpm -qa | grep kernel
#remove old kernel versions with. yum remove

Clean yum cache:

yum clean all

Empty logs:

logrotate -f /etc/logrotate.conf
rm –f /var/log/*-???????? /var/log/*.gz /var/log/*.old /var/log/anac*
cat /dev/null > /var/log/audit/audit.log
cat /dev/null > /var/log/wtmp
cat /dev/null > /var/log/lastlog
#there might be other logfiles to empty out with cat /dev/null
#*I will update this next time I go through the process

Clean tmp space:

rm -rf /tmp/*
rm -rf /var/tmp/*

Zero out freespace to ensure disk images can be made as small as possible:

dd if=/dev/zero of=/boot/zerofile bs=1M
sync
rm -rf /boot/zerofile

dd if=/dev/zero of=/zerofile bs=1M
sync
rm +rf /zerofile

Finally shut down the vm:

shutdown -h now

Using VirtualBox, I exported the VM as pmsApp-0.3.ova. Next are the steps taken to ensure that the OVA is as small as possible and can be imported by ESXi.

Extract OVA:

mv pmsApp-0.3.ova pmsApp-0.3.tar
tar xf pmsApp-0.3.tar

Above step can be skipped if the VM is exported as .ovf instead of .ova

Replace the system type to ensure that the OVA can be imported by ESXi:

sed 's/virtualbox-2.2/vmx-8/g' pmsApp-0.3.ovf > pmsApp-tmp.ovf
mv pmsApp-tmp.ovf pmsApp-0.3.ovf

Ensure that the disk image is as small as possible:

qemu-img convert -p -O vmdk pmsApp-0.3-disk1.vmdk thindisk.vmdk
mv thindisk.vmdk pmsApp-0.3-disk1.vmdk

Create a manifest file to ensure data consistency in OVA:

openssl sha1 *.ovf *.vmdk > pmsApp-0.3.mf

Create the OVA:

ovftool pmsApp-0.3.ovf pmsApp-0.3.ova

A new version of the pmsApp OVA

Posted on September 22, 2015 by Jimmy

For some time now, I have been contemplating updating the pmsApp OVA to make it even easier to deploy the Poor Mans Storage Appliance.

Some of the things I am thinking of updating are:

Set a password for the ricci user (probably going to be pmsapp)
Make it possible to create the shared storage from web interface
Include Luci so that the cluster can be created via web browser
Create a guide to configuring the cluster using Luci
Possibly update the web interface to provide cluster and storage information.

This list is mostly for my own benefit, but stay tuned for updates.

Expand a zpool with a single supporting device on linux

Posted on June 24, 2015 by Jimmy

Initial status:

Because I have had trouble with the way ZFS on Linux (ZOL) handles failing drives, I have decided to create a software
RAID set of my drives before creating my zpool.

This means that my zpool is created with a single device (/dev/md0). If I remember correctly, it was created like this:

zpool create pool0 md0

Expansion:

By default pool’s are not set to autoexpand so it is needed to set the autoexpand to on:

zpool set autoexpand=on pool0

Before I could get the pool expanded I needed to grow my RAID set. It is done by first adding a device to the existing
RAID set and then “growing” it with the new amount of devices.

mdadm /dev/md0 --add /dev/sdd
mdadm --grow /dev/md0 --raid-devices=4

After the RAID set have reshaped, the extra space is available for use. According to all information I could find,
it should be possible to expand the pool by either exporting and importing it

zpool export pool0
zpool import pool0

or by expanding the online pool

zpool online -e pool0 md0

I trid both several times and in different order, but to no avail. My pool was fixed at its original size.

Reboot:

Finally I decided to do a reboot to see if that would change anything. Remember to save the new mdadm configuration before
rebooting.

# Remove existing array configuration from mdadm.conf file
grep -v ARRAY /etc/mdadm/mdadm.conf > /tmp/mdadm.conf
# Add new array configuration to mdadm.conf
cat /tmp/mdadm.conf > /etc/mdadm/mdadm.conf
mdadm --examine --scan >> /etc/mdadm/mdadm.conf

After the reboot the pool was still the original size, but now it actually worked to do an online expansion.

zpool online -e pool0 md0

And that solved the problem. Unfortunately I do not know why I could not perform the online expansion without a reboot.

General ramblings on fencing

Posted on December 11, 2014 by Jimmy

I have been playing around with pmsapp v0.2 in my new vmware environment and I got to thinking that maybe it was possible to fence through vmware… And it is.

A bit of googling gave me this result https://fedorahosted.org/cluster/wiki/VMware_FencingConfig it turns out that the fence_vmware agent depends on the VI Perl Toolkit so I googled a bit more to find an installation guide from VMware: https://www.vmware.com/support/pubs/beta/viperltoolkit/doc/perl_toolkit_install.html

As I am working on CentOS the perl dependencies can be installed through RPMs. The following list of RPMs seems to satisfy the dependencies.

perl-Crypt-SSLeay \
perl-XML-LibXML \
perl-libwww-perl \
perl-Class-MethodMaker \
perl-devel \
perl-Test-Simple

I followed VMwares guide to build and install the VI Perl Toolkit, but afterwards it still did not work.

I got an error message saying:

fence_vmware_helper returned Undefined subroutine &Opts::add_options called at /usr/sbin/fence_vmware_helper line 82.

Googling for that error gave me this result: http://www.redhat.com/archives/linux-cluster/2011-January/msg00134.html

Basicly, I need to edit /usr/local/share/perl5/VMware/VIRuntime.pm and add:

use VMware::VILib;

above the line:

use VMware::VIM2Runtime;

Now I am able to check the status of a VM with fence_vmware

fence_vmware -o status -a vcenterserver -l [username] -p [password] -n [vmname]

Shared storage with compression using drbd and zfs

Posted on June 8, 2014 by Jimmy

This not an update of the original Poor Mans Storage Appliance, but rather a new take on it. This time it is built through DRBD and ZFS. DRBD is designed for mirroring over network and the zfs filesystem is able to do compression and deduplication although deduplication is not recommended unless you have lots and lots of memory. I started out with Debian 7.5 (wheezy), but it comes stock with drbd 8.3. Most guides I could find on configuring and using drbd is for drbd 8.4. By using backports, I was able to get drbd up to version 8.4, but then I had trouble getting zfs to work correctly. After some trial and error I decided that it was too much hassle and instead opted for Ubuntu. Ubuntu 14.04 LTS comes with drbd 8.4 and zfs installs without any issues. I started with a VM on each of my two physical hosts. Each VM has a virtual disk for the OS and a virtual disk for the shares storage. The following resources were used as reference:

After installing Ubuntu Server, configuring static IP and ensuring host names resolve (either through your own DNS server or /etc/hosts files on both VMs), I started installing software (everything needs to be done on both VMs).

# Update repository list and upgrade packages to latest version
sudo apt-get update && sudo apt-get -y dist-upgrade

To install zfs on linux, an extra repository needs to be added

# Add zfs on linux repository
sudo add-apt-repository ppa:zfs-native/stable

Install the software. drbd to distribute data between nodes, ntp to ensure time is synchronized, zfs to create filesystem capable of doing compression and deduplication, nfs to export the distributed storage and heartbeat to manage the two nodes in a cluster.

# Update repository list and install software
sudo apt-get update && sudo apt-get -y install \
drbd8-utils ntp ntpdate ubuntu-zfs nfs-kernel-server heartbeat

Reboot to ensure that new versions are being used.

# Reboot to use new kernel and software
sudo reboot

After the VMs have restarted, we start by configuring the drbd device, this is done by creating a resource description file in /etc/drbd.d

sudo nano /etc/drbd.d/myredundantstorage.res

The file should look something like this:

resource myredundantstorage {
 protocol C;
 startup { wfc-timeout 5; degr-wfc-timeout 15; }

 disk { on-io-error detach; }

 syncer { rate 10M; }

 on node1.mydomain.local {
 device /dev/drbd0;
 disk /dev/vdb;
 meta-disk internal;
 address 192.168.0.41:7788;
 }

 on node2.mydomain.local {
 device /dev/drbd0;
 disk /dev/vdb;
 meta-disk internal;
 address 192.168.0.42:7788;
 }
}

Make sure that the VMs host names match the host names specified in above configuration file. The names in the file should be the same as the name provided by:

uname -n

Now that the resource file is created on both nodes, describing how the resource should be handled, we can create the actual resource.

# Create and enable storage on both nodes
sudo drbdadm create-md myredundantstorage
sudo drbdadm up myredundantstorage

You should now be able to see that both nodes are up although they are both secondary.

# Show status of our drbd device
cat /proc/drbd

On the node that you want to be primary, you can run either of these two commands:

# Create primary node without syncing devices (should only
# be done if you are sure that disks does not contain any
# data
sudo drbdadm -- \
--clear-bitmap new-current-uuid myredundantstorage

or:

# Create primary node by syncing the contents of local disk
# to the remote disk (can be very time-consuming)
sudo drbdadm -- \
--overwrite-data-of-peer primary myredundantstorage

Depending on which command is used, you should see the device being ready or syncing, but you should see that now one node is primary and the other node is secondary. (cat /proc/drbd) Now we have a device on which we can create a filesystem (/dev/drbd0). To enable us to choose compression and deduplication we will use zfs. The filesystem will only be created on the primary node.

# Create zfs filesystem on the redundant device
sudo zpool create myredundantfilesystem /dev/drbd0

The zfs documentation says that after the zpool is created, a zfs should be created inside that pool. I have opted not to do that for 3 reasons. The zpool functions as a filesystem, I do not neet more that one filesystem in my pool and if a zfs is created inside the zpool, zfs will try to mount it during boot. If you want compression enabled on the filesystem:

sudo zfs set compression=on myredundantfilesystem

If you want deduplication enabled on the filesystem:

sudo zfs set dedup=on myredundantfilesystem

Now we need to ensure that the mountpoint for the zfs filesystem exists on both nodes:

# Create directory for mounting zfs if it does not exist
if [ ! -d /myredundantfilesystem ]; then
 sudo mkdir /myredundantfilesystem
fi

On the primary node the directory should already exist and the zfs filesystem should be mounted. Now we can edit /etc/exports on both nodes to ensure that it can export the zfs filesystem to clients. The export should look something like this:

/myredundantfilesystem 192.168.0.0/24(rw,async,no_root_squash,no_subtree_check,fsid=1)

Most options are “normal” for nfs exports, but the fsid is there to tell the client that it is the same filesystem on both nodes. Finally we need clustering to maintain a failover relationship between the nodes. We have already installed heartbeat, now we need to configure it. (it should be done on both nodes) Before we start configuring heartbeat, we need to disable the nfs server during boot.

sudo update-rc.d -f nfs-kernel-server remove

First we will create the /etc/ha.d/ha.cf file, it should look something like this.

autojoin none
auto_failback off
keepalive 1
warntime 3
deadtime 5
initdead 20
bcast eth0
node openvs1.dansbo.local
node openvs2.dansbo.local
logfile /var/log/heartbeat-log
debugfile /var/log/heartbeat-debug

The parameter deadtime tells Heartbeat to declare the other node dead after this many seconds. Heartbeat will send a heartbeat every keepalive number of seconds. Next we will protect the heartbeat configuration by editing /etc/ha.d/authkeys, it should look something like this:

auth 3
3 md5 my_secret_key

Set permissions on the file

sudo chmod 600 /etc/ha.d/authkeys

Next we need to tell heartbeat about the resources we want it to manage. It is done in the file /etc/ha.d/haresources and it should look something like this:

node1.mydomain.local \
IPaddr::192.168.0.45/24/eth0 \
drbddisk::myredundantstorage \
zfsmount \
nfs-kernel-server

In this example node1 will be primary, but if it fails, the other node will take over. We need a “floating” IP address as our clients needs a fixed IP to connect to. Heartbeat needs to ensure that the drbd device is active on the primary node. zfsmount is a script I have created in /etc/init.d, more about in a short while Lastly, heartbeat should start the nfs server. In haresources there is a reference to the zfsmount script. This is a script I have created my self to ensure that both the drbd device and the zfs filesystem gets activated on the secondary node in the event of primary node failure. The script should be located in /etc/init.d and accept at least the start and stop parameters. My script looks like this:

#!/bin/bash
EXITCODE=0
case "$1" in
 start)
  # Try to make this node primary for the drbd
  drbdadm primary myredundantstorage
  # Ensure that zfs knows about our filesystem
  zpool import myredundantfilesystem -f
  # Try to mount the zfs filesystem
  zfs mount myredundantfilesystem
  EXITCODE=$?
 ;;
 stop)
  # If the filesystem is mounted, it should be unmounted
  df -h | grep myredundantfilesystem > /dev/null
  if [ $? -eq 0 ]; then
   zfs unmount myredundantfilesystem
   EXITCODE=$?
  else
   EXITCODE=0
  fi
 ;;
esac
exit $EXITCODE

Some trial and error as well as a lot of looking through /var/log/heartbeat-debug log file helped me create this script. The script must exit with success if it is called with stop even though it has already been stopped.

Initially I thought that it would be sufficient to make the node primary (drbdadm primary myredundantstorage) during a failover, but it seems that the zfs driver does not necessarily recognize that there is a zfs filesystem just because the device becomes available. That is why the import is done. The import will fail if the filesystem is already recognized, but that does not matter in this script, it will still try to mount the filesystem.

I hope this will be helpful for other people as well. It took me a long time to find anything dealing with drbd and zfs at the same time. If you have any questions, don’t hesitate to contact me. ( jimmy at dansbo dot dk )

Btw. I can say that this setup performs a hell of a lot better than zfs and gluster on the same hardware.

So long, and thanks for all the fish

Posted on January 28, 2013 by Jimmy

This is it. I have not been able to find the time or the energy to keep this project updated. The domain pmsapp.org will expire on the 2. of March this year and I do not plan on renewing it. I am sorry to see the project die, but lack of interest and a strained schedule on my part forces me to stop this.

If you are interested in this project or need assistance in setting up your own storage appliance, you can reach me at pmsapp at dansbo.dk.

Update in lack of news

Posted on May 20, 2012 by Jimmy

Just wanted to let everyone know that this project is still alive. I am continuing work on creating a web interface to ease deployment of a pmsApp storage cluster. Unfortunately it will probably be some time before version 0.3 with the new web interface is released.

In the meantime I am still hoping for a nice logo for the pmsApp project, with a bit of luck something will have been submitted before the release of the next version.

I would like to thank manyrootsofallevil for his great work in performance testing the pmsApp and at the same time ask for input on how to increase performance. If you have any suggestions or tips on how to increase performance on software RAID created from iSCSI targets, please let us know.

Ver 0.2 – Finally ready for download

Posted on March 17, 2012 by Jimmy

I finally managed to create an OVA that can be deployed on both ESXi4, VirtualBox and VMware Workstation.

I had to create and OVF with VirtualBox, edit it to change the hardware type from virtualbox-2.2 to vmx-7 and then use VMware ovftool to convert the ovf to OVA.

When deployed on ESXi4, it will complain about the guest os not being recognized, but you can still deploy it and later change the guest os setting.

Happy downloading.

Now with bling

Posted on March 16, 2012 by Jimmy

A new version (0.2) has just been uploaded. This version adds a web interface to help do the initial network configuration. It also allows NTP and iSCSI configuration without having to dig around configuration files. Have a look at the guide and get the new appliance from the download page.

Failover testing

Posted on March 10, 2012 by Jimmy

A friend of the pmsApp project – A.Z. has done some failover testing. This is what he has found.

As long as there is no I/O taking place while the cluster is failing over, all is ok. If there is I/O then the results are mixed.

SCP stopped and would not restart
A script I wrote (writing the current date to a file every second) is unable to write to the file, but when the cluster comes back up, it continues on its merry way.

The problem seems to be that an ongoing session is not moved from the failed node to the node that takes over. A new session needs to be established. I am looking in to the possibility of having the sessions transferred from the failing node to the new active node. So far all I have found, is an article on how to create a failover nfs cluster. This is however with drbd, but that should not matter much.

A big thank you to A.Z. for testing the pmsApp. Looking forward to your performance tests.

Share this:

Share this:

Initial status:

Expansion:

Reboot:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: