Archive for the ‘File Systems’ Category

New updates: RapidDisk and RapidCache Stable release 2.0.1b and more.

March 28th, 2012 Comments off

Late last week I released 2.0.1b, with a bug fix (bug #5) in the rxadm admin tool and a few cleanup changes in some of the module print messages. More details can be read here:

I have also been working on an ARM based Linux distribution called RapidDisk LX which brings RapidDisk functionality as a SCSI Target. More details of it can be read here:

This distribution has already been listed in the waiting list at DistroWatch.

To stay updated on this project, please follow the Google+ page.

Some updates: New RapidDisk website, release, patches and more.

February 24th, 2012 Comments off

Well, it has been quite some time since I posted an update here. A lot of new and exciting things have been happening with the RapidDisk project. For instance, there is now a new dedicated project website located at This new website has been using the RapidDisk Google+ page as the venue for latest project updates.

Also, in recent weeks, I released a stable version of 1.4 alongside a patch to build for older 2.6.18 kernels; that is, for those still using Red Hat or CentOS 5.x.

I have been spending most of my free time focusing on the 2.0 beta release. I have made a lot of progress and continue to do so. I should have something available in the next two weeks. RapidDisk 2.0 introduces a new module alongside rxdsk.ko called RapidCache or rxcache.ko. The purpose of RapidCache is to use an rxdsk volume as a caching volume for any local/remote physical block device via the device-mapper framework. The only supported caching is Write-Through and Reads (safest for data integrity). I need to find some way to better gauge my performance benchmarks because in some cases, I see a 17% performance boost to the standalone SATA drive while in other cases I notice some slightly worse performance. Obviously, this is primarily intended for certain environments and I will provide all details on the Wiki as soon as I (1) test on other Linux kernels, (2) add support for it in the rxadm administration utility, (3) and capture some cleaner performance data points.

Once I release it into beta and make sure it is stable, I will then try to see about getting it into the Linux kernel’s staging tree.

Reiser4? Really? How much of a demand is there for it?

February 1st, 2011 1 comment

Yesterday Phoronic posted an article, An Update On Reiser4 For The Mainline Linux Kernel. Truth be told, I kind of forgot about the file system. It seemed to fall into the far background of Linux file system development as more light was shed upon Ext4-fs and the upcoming Btrfs. I am kind of surprised that it still hasn’t been included into the mainline Linux kernel yet. But I was surprised to see Phoronix report on it. They must of had nothing else to report on. ;-)

So, who is really waiting for Reiser4? And even if does boast interesting features that Ext4 does not, how can it compete with something like Btrfs? I mean, c’mon, volume creation, snapshots, data compression, etc. Don’t get me wrong. The approach taken with the Dancing B*-trees are an interesting concept and do aid in the file system’s ability to retain atomicity and stability. But it is difficult to see a future for Reiser4 when (as the article mentions) no one is supporting it. ReiserFS v 3 got a major boost from SUSE.

Categories: File Systems, File Systems, Linux, Storage Tags:

ZFS as a Linux kernel module? What is the point?

August 27th, 2010 5 comments

This morning, I came across two Phoronix articles (here and here)  generating some sort of hype with regards to a ZFS port for the Linux kernel, with article headlines as Native ZFS Is Coming To Linux Next Month. My response to this, as I will explain below is: Why?

First of all (as the first article cites) there has always been the ZFS over FUSE project. There was a reason as to why this project was initiated and that is primarily a result of a conflict in licensing. ZFS is CDDL’d while Linux is GPL’d. The one is incompatible with the other. I will not spend any time in the details of this but you can read some of my opinions with the licensing here. Although using any file system over FUSE has its drawbacks; one of which is performance. But you still had ZFS functionality. In a worst case scenario, you can always deploy Solaris, (at one time) OpenSolaris, Nexenta, FreeBSD, among a few other lesser known operating systems using ZFS.

Despite the reasons given in the articles for porting ZFS to the Linux kernel, it becomes almost irrelevant that (a) it will never be integrated into the mainline kernel (unless Oracle were to re-license it, but then it would become incompatible with Solaris 11+ and Solaris 11+ Express), (b) will never obtain any real commercial support, because if it does, then most likely Oracle will step in to destroy everything in its path for infringement of patents and other intellectual property. Also, (c) you will never be able to commercially redistribute the code, binaries and modules in a custom Linux enabled appliance because if Oracle does not get you for IP infringement, other companies such as NetApp will. Who will protect you then? Finally, (d) it will be a long while before this ZFS port is considered stable for Linux. Especially when it will not be in the mainline kernel to receive additional exposure. These reasons alone are enough for a storage company to not seek interest in this port.

At the end of the day, all of this excitement means nothing. Who cares if this ZFS port for the Linux kernel comes out next month? I don’t. Don’t misunderstand me. I love ZFS as I use it all the time on Solaris and OpenSolaris. It is a wonderful file system. This Linux port is also extremely limited in functionality. Today, the GPL’d Btrfs, a competitor to ZFS, is considered “generally stable“. Why not use that instead without all the concerns listed above?

Categories: BSD, File Systems, Linux, OpenSolaris, Solaris, Storage Tags:

Article ZFS data integrity testing and more random ZFS thoughts.

May 15th, 2010 Comments off

Earlier this week I came across this blog posting about data integrity testing on ZFS title: ZFS data integrity tested. It was a few months old from Robin Harris’ blog Storage Bits. I guess the most exciting part was validating Sun Microsystem’s claims to ZFS having the ability to correct data corruption even with error injection to both the disk and memory. ZFS continues to prove its worth on enterprise class systems and applications.

My only frustatrions with ZFS are that cluster support is currently not available, at least until Lustre 3.0 is out, whenever that will be. Another frustration is trying to write an application that will work directly with a zpool. For instance, there is no simple method to send a zpool a generic ioctl() such as DKIOCGGEOM to obtain the size of the volume. In most cases I don’t care about the number of cylinders, heads and sectors. In the end I calculate the total volume block and/or byte count. So those values could be generic and made up.

In the early stages of my discovering this, I posted a simple question on the OpenSolaris Forums:

“As I was navigating through the source code for the ZFS file system I saw that in zvol.c where the ioctls are defined, if a program sends a DKIOCGGEOM or DKIOCDVTOV, an ENOTSUP (Error Not Supported) is returned.

You can view this here:

My question is, what is an appropriate method to obtain the zpool’s volume size from a C coded application?”

After posting my question, I immediately went to view the open source to the general zpool/zfs binaries and observe how zpool reported the drive pool’s capacity back into user space. Unfortunately it utilized some cryptic method not as straight forward as sending a simple ioctl() to the desired volume. This was a bit frustrating as it was such an ugly approach to only receive the size of the volume.

I was grateful to have a response confirming my fear of choosing the ugly route; but it also made me realize the true value of open source. What if I simply patched a supported ioctl() definition to return the total accessible “block” count of a zpool? It would be similar to the Linux BLKGETSZ/BLKGETSZ64. This would be the most realistic and proper method; to add a new ioctl() and then modify all storage modules to accommodate it. For instance in the usr/src/uts/common/sys/dkio.h file we would need to define:


And then go back to the zvol.c file and add the extra ioctl() to handle this:

uint64_t vs = zv->zv_volsize;
if(ddi_copyout(&vs, (void *)arg, sizeof(uint64_t), flag))
error = EFAULT;
return (error);

To give a level of consistency across all storage devices, we will need to add the ioctl() definition to the following modules:


Although we do not necessarily have to support it and can instead interpret it as such:

return ENOTSUP;

Who knows, one of these days I may get around to patching this myself and if the OpenSolaris community doesn’t accept it I can always make it available on any one of my website. I will most definitely post about it.

Categories: File Systems, OpenSolaris, Solaris Tags:

OpenSolaris and ZFS: The beauty of snapshots.

March 20th, 2010 Comments off

Two days ago, I ran through a long needed image update to the OpenSolaris 2010.03 preview. I was updating through the pkg update manager from build 129 to build 134. So when I say, it was much needed, I wasn’t kidding. Anyway, after over 1 GBytes of updates was completed, a new boot environment (BE) was created with the native ZFS snapshot feature and I shut down the PC for the night.

The next day I turned the PC on into the latest boot environment to find that my gnome-terminal was giving me problems. The obvious symptom was that certain characters were not being echoed and their was misalignment with every entry and output displayed within the terminal.

petros@opensolaris:~$ ls
            .    ..    Desktop Documents    [ ... more results ... ]

After some research I came across OpenSolaris bug 12380: image-update loses /dev/ptmx from /etc/minor_perm. The fix (workaround) was simple: boot into the previous boot environment, mount the newest boot environment and clone the /etc/minor_perm from the one to the other. The steps are as follows:

[reboot into previous BE]
$ pfexec beadm mount [newest BE] /mnt
$ pfexec sh -c "grep ^clone: /etc/minor_perm >> /mnt/etc/minor_perm"
$ pfexec touch /mnt/reconfigure
$ pfexec bootadm update-archive -R /mnt
$ pfexec beadm unmount [newest BE]
[reboot newest BE]

And the problem was fixed. It was quick and easy thanks to ZFS.

Categories: File Systems, OpenSolaris, Solaris, Storage Tags:

Revisited: ZFS, Btrfs and Oracle.

March 19th, 2010 5 comments

This entry is a continuation of one published in May of 2009. In fact it is relating to a comment made earlier today which I responded to in brief words. I am now taking the time to offer my viewpoint on the whole ZFS licensing under the CDDL and the reasoning for it.

It wasn’t until I started working with the OpenSolaris kernel and by working I mean, modifying code and going through the build process that I finally realized why OpenSolaris was licensed under the Common Development and Distribution License (CDDL). A lot of other people and companies have claim to code used within Solaris. That includes copyrighted code to which Sun does not have the authority to publish in an open source license. This is why they needed to work with a weak copyleft license such as the Mozilla Public License and modify it to their expectations. The CDDL was eventually approved by the Open Source Initiative (OSI) as a valid open source license and Sun Microsystems was then able to release code under its limitations.

Now before I continue I wish to describe 3 different open source licensing models: (1) the strong copyleft license, (2) the weak copyleft license and (3) the non-copyleft license.

The strong copyleft license is a project based license in which it requires that any derived code from the original project must remain under the original license. This method of licensing makes it nearly impossible to link with code under a non-strong copyleft license. As a result of this approach, strong copyleft licenses are often referred to as viral licenses. The most popular of these licenses is the General Public License (GPL) with 3 available versions. The Linux kernel is licensed under this and its success and growth can be attributed to it.

The weak copyleft license is similar to the strong copyleft license except that it is file-based instead of project based. This means that if there are any modifications to a file, the original license must apply; but that file can be combined in a project with code under a different license. This method makes the type of licensing non-viral. The CDDL and the MPL are categorized as weak copyleft licenses.

The third type is the non-copyleft license which offers no requirement for derived works to stay under the original license. In fact, there is also no requirement for derived code to be released under any open source license. This makes it simple for someone to take an open source project and use it as a basis for a proprietary product. A best known example is the BSD license; and Apple’s adoption of FreeBSD kernel code in their XNU kernel or NetApp and their use of FreeBSD in their customized storage appliances.

Continuing where I left off, it would not have been possible to open source the Solaris kernel for the OpenSolaris project if it weren’t for the CDDL license. In turn, ZFS would have been incompatible with the CDDL license if it were licensed under the GPL; although it has no conflict with non-copyleft licenses such as the BSD license. Because of this and now because of Oracle’s admitted support and commitment to Solaris, I doubt this licensing will change; especially to merge it into the Linux kernel. That is why we should be grateful that: (a) ZFS is available under an open source license making it impossible for it to disappear and (b) that Oracle has been committed to Btrfs and bringing an enterprise class solution into the Linux kernel.

This is why we have choices. If you want ZFS functionality, use OpenSolaris or Solaris. If you don’t necessarily need ZFS and are more comfortable with Linux, you have a lot more distributions to choose from. Or if you want ZFS and a familiar Linux environment, there is also Nexenta.

Categories: File Systems, Linux, OpenSolaris, Solaris Tags:

AMD RAID-on-Chip: A valid technology? Or is it just too late in the game?

March 5th, 2010 Comments off

Back in December I just came across this article for an AMD RoC (RAID-on-Chip) that will be embedded into servers to provide uninterrupted RAID functionality. A quick question came to mind as I was reading this: “Considering today’s storage capabilities and low cost equipment, who will be using this?” And honestly I was not able to come up with an answer.

In an earlier blog post I had mentioned the rise in usage of software RAID. Small to Medium sized Business (SMB) have been running to these low cost solutions. And why not? You are able to get more bang for your buck. For instance, by running OpenSolaris, one is able to use the redundancy of the ZFS file system (with single/double parity or mirrored RAID), file system level snapshot, data deduplication, and more. On top of that, there is a checksum calculator to ensure that all data corruption (noisy and silent) are never a threat. Take these ZFS pools and share them via NFS/CIFS, over ftp/http to even mapping them over iSCSI, Fibre Channel, AoE or FCoE protocols. The operating system (with all bells and whistles) is freely distributed under the CDDL license. The only costs will be the hardware equipment (a server or two and if external storage is needed, a JBOD) and the storage administrator. For years, servers have been equipped with LSI Logic (or other) RAID controllers that have proven to be just as efficient as anything else to handle local storage. Now when you look at larger enterprise scale companies, they are not going to want a server to manage their RAID. Instead they will keep the external storage managed externally with special purpose RAID controllers managing hundreds of terabytes to petabytes of data storage and apart from all the nodes in a cluster accessing that equipment.

But going back to the server, how practical is it to have an implemented RoC? With today’s level of high speed computing, does it make that much of a difference if the RAID is accomplished on the chipset as opposed to the operating system? If so how easy is it to recover from data corruption or any other error? Unless you are setting up a small home or small business server, what if you wanted additional functionality such as snapshots, data deduplication and checksum validation? You still have to go to the operating system and have some sort of volume manager on top of the RoC grouped volumes. No offense to Dot Hill even though they were a direct competitor to one of my previous employers (Xyratex). According to their numbers posted on Google Finance, financially they have been struggling for at least the past 5 to 6 years and this is a great opportunity for them. Although it is in my opinion that this would have been a valid technology back in 2001 and not 2010.

Categories: File Systems, RAID, SCSI, Storage Tags:

Linux Magazine Article: Three Simple Tweaks for Better SSD Performance

November 26th, 2009 Comments off

Earlier today I came across this interesting article on tuning your SSD drive to achieve greater performance. It is worth noting that this article is intended for Linux and when it mentions setting your file systems mount options with noatime, this too is relevant for file systems that support such an option.

I would also take the time to read the comments. There are some distribution specific responses to the author’s notes.

Categories: File Systems, Linux, Red Hat, Storage, Ubuntu Tags:

Recently integrated into ZFS: Data Deduplication

November 3rd, 2009 Comments off

I just stumbled onto this blog entry on the implementation of data deduplication into the Sun Microsystem’s ZFS file system. It is implemented in such a nice and clean way, I am looking forward to testing it. For instance, just like any other feature of the ZFS file system, data dedup can be enabled disabled at any path from the ZFS root mount point. Examples taken from Jeff Bonwick’s blog post cited above:

zfs set dedup=on tank
zfs set dedup=off tank/home
zfs set dedup=on tank/vm
zfs set dedup=on tank/src

It is that simple (man 1 zfs).

Categories: File Systems, OpenSolaris, Solaris, Storage, UNIX Tags: