Filesystems

Note: I’m planning to make several pages about software relevant to Clusters and HPC; i started with an overview of different relevant filesystems and some clustersoftware. This is work in progress, if you got any advice, wish or just general feedback, contact me directly via email or use the comment-function on the bottom of this page.
— Alex.

Usual

Ext3

Ext3 is a journaling filesystem developed by Stephen Tweedie. It is compatible to ext2 filesystems; actually you can look at it as an ext2 filesystem with a journal file. The journaling capability means no more waiting for fsck’s or worrying about metadata corruption. What is most noticeable is that you can switch back and forth between ext2 and ext3 on a partition without any problem: it is just a matter of giving the mount command the right filesystem type.

Ext3 FAQ

JFS

The Journaled File System (JFS) provides a log-based, byte-level file system that was developed for transaction-oriented, high performance systems. Scalable and robust, its advantage over non-journaled file systems is its quick restart capability: JFS can restore a file system to a consistent state in a matter of seconds or minutes.

While tailored primarily for the high throughput and reliability requirements of servers ( from single processor systems to advanced multi-processor and clustered systems), JFS is also applicable to client configurations where performance and reliability are desired.

JFS is IBM software.

JFS Overview at IBM Developer Works

ZFS

ZFS is a new kind of file system that provides simple administration, transactional semantics, end-to-end data integrity, and immense scalability.
ZFS includes a pooled storage model, is always consistent on disk, offers protection from data corruption, does live data scrubbing, makes instantaneous snapshots and clones, has fast native backup and restore capabilities, is highly scalable, has built in compression and a simplified administration model.

ZFS is shipped with OpenSolaris.

OpenSolaris Community: ZFS

XFS

XFS is a high-performance journaling filesystem. XFS combines advanced journaling technology with full 64-bit addressing and scalable structures and algorithms. This combination delivers the most scalable high-performance filesystem ever conceived.

XFS is SGI software.

XFS homepage at SGI Developer Central

VXFS

VXFS is the Veritas Filesystem and is distributed through Symantec, who bought Veritas quite a while ago.

Symantec Homepage

Remote

NFS

The Network Filesystem is available on every UNIX-like OS.

Wikipedia about NFS

Open AFS

AFS is a distributed filesystem product, pioneered at Carnegie Mellon University and supported and developed as a product by Transarc Corporation (now IBM Pittsburgh Labs). It offers a client-server architecture for file sharing, providing location independence, scalability, security, and transparent migration capabilities for data.

IBM branched the source of the AFS product, and made a copy of the source available for community development and maintenance. They called the release OpenAFS.

OpenAFS Homepage

CIFS/SAMBA

Samba is an Open Source implementation of Microsoft’s Common Internet Filesystem (CIFS).
CIFS itself is part of all Microsoft Windows versions.
Samba support for IRIX can be bought from SGI itself.

Samba Homepage
CIFS or Public SMB Information on Common Internet File System
Samba for IRIX FAQ

Distributed

Lustre

Lustre is a scalable, secure, robust, highly-available cluster file system. It is designed, developed and maintained by Cluster File Systems, Inc.

The central goal is the development of a next-generation cluster file system which can serve clusters with 10,000’s of nodes, petabytes of storage, move 100’s of GB/sec with state of the art security and management infrastructure.

Lustre runs today on many of the largest Linux clusters in the world, and is included by CFS’s partners as a core component of their cluster offering (examples include HP StorageWorks SFS, and the Cray XT3 and XD1 supercomputers). Today’s users have also demonstrated that Lustre scales down as well as it scales up, and run in production on clusters as small as 4 and as large as 15,000 nodes.

The latest version of Lustre is always available from Cluster File Systems, Inc. Public Open Source releases of Lustre are made under the GNU General Public License. These releases are found here, and are used in production supercomputing environments worldwide.

You may subscribe to the lustre-announce mailing list to be informed of releases.

Lustre development would not have been possible without funding and guidance from many organizations, including several US National Laboratories, early adopters, and product partners.

Lustre Homepage

GPFS

The IBM General Parallel File System (GPFS) is a high-performance shared-disk file system that can provide fast, reliable data access from all nodes in a homogenous or heterogenous cluster of IBM UNIX® servers running either the AIX 5L or the Linux operating system.

GPFS allows parallel applications simultaneous access to a set of files (or even a single file) from any node that has the GPFS file system mounted while providing a high level of control over all file system operations.

GPFS is IBM software.

GPFS Homepage at IBM

GFS

GFS (Global File System) is a cluster file system. It allows a cluster of computers to simultaneously use a block device that is shared between them (with FC, iSCSI, NBD, etc…). GFS reads and writes to the block device like a local filesystem, but also uses a lock module to allow the computers coordinate their I/O so filesystem consistency is maintained. One of the nifty features of GFS is perfect consistency — changes made to the filesystem on one machine show up immediately on all other machines in the cluster.

GFS Homepage at Red Hat

PVFS

The goal of the Parallel Virtual File System (PVFS) Project is to explore the design, implementation, and uses of parallel I/O. PVFS serves as both a platform for parallel I/O research as well as a production file system for the cluster computing community. PVFS is currently targeted at clusters of workstations, or Beowulfs.

PVFS supports the UNIX I/O interface and allows existing UNIX I/O programs to use PVFS files without recompiling. The familiar UNIX file tools (ls, cp, rm, etc.) will all operate on PVFS files and directories as well. This is accomplished via a Linux kernel module which is provided as a separate package.

PVFS stripes file data across multiple disks in different nodes in a cluster. By spreading out file data in this manner, larger files can be created, potential bandwidth is increased, and network bottlenecks are minimized. A 64-bit interface is implemented as well, allowing large (more than 2GB) files to be created and accessed.

The Parallel Virtual Filesystem Project
PVFS 2

4 Responses to Filesystems

  1. darkfader says:

    *blink* how about CXFS? Used to be the fastest clustered filesystem some day, and surely still is the sexiest.

  2. […] The Google File System (GFS) and Google’s BigTable. These projects represent the current cutting-edge in data storage for scalable web applications. But if you want to use them, you’ll have to join Google or use one of these projects that offer some (but not all) of their features: Amazon’s S3 and SimpleDB (see below), MogileFS, Global File System, and Hadoop’s HDFS file system with HBase acting as BigTable, but HBase may not be ready for prime-time just yet. There are, of course, other solutions of varying complexity as well. […]

  3. […] The Google File System (GFS) and Google’s BigTable. These projects represent the current cutting-edge in data storage for scalable web applications. But if you want to use them, you’ll have to join Google or use one of these projects that offer some (but not all) of their features: Amazon’s S3 and SimpleDB (see below), MogileFS, Global File System, and Hadoop’s HDFS file system with HBase acting as BigTable, but HBase may not be ready for prime-time just yet. There are, of course, other solutions of varying complexity as well. […]

  4. Steve says:

    Symantec’s Cluster File System, based on VxFS, is also a very high performance distributed file system, plus it supports Veritas Cluster Server.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 120 other followers

%d bloggers like this: