Generally, file systems are not considered "sexy." When a young programmer wants to do something really cool, his or her first thought is generally not "Dude, two words... file system." However, I am what is politely termed "different." I find file systems very interesting and they have seldom been more so than they are right now. Hans Reiser is working on getting Reiser4 integrated into the Linux kernel, the BSD's are working on getting a journaled file system together, and Sun Microsystems just recently released a beta of ZFS into OpenSolaris. This is an excellent time to be me.
Looking around on the net, you get the idea that for every strange, complicated, and niche subject there is an author that is utterly convinced that people need to know about it. Add me to the list. This article (or actually three articles, because I am much more delusional than most of these authors) will be on file system design and specifically the new file systems that are just appearing. We will start with the basics, UFS/FFS, and then proceed to our first modern file system, XFS.
Background - Discs and FFS
To understand file systems, you have to have some idea what exactly a hard disc is doing on the inside. Let’s assume its not a Maxtor hard drive (HD), because I have had bad experiences. Think fear of clowns...except with Maxtor. So, a HD is made of various spinning discs, called platters, each with a read head. The discs are stacked around a central spindle and the read heads are attached to each other, so they move as one. This is important to understand as it is faster to read/write from one platter to the next than it is from one section of a platter to the next section of the same platter. This is fairly easy to understand, since the heads won’t have to move when you go from one platter to the next. The other important thing to understand is that you can’t just write the number of bits you want to the disc. There is a minimum size you can write to or read from the disc. This minimum size is called a "sector," and is usually around 512 bytes. So, unless you really like 512 byte files, it is very likely that you will end up either wasting space or cutting off the end of the file if your file system doesn’t deal with this.
[6]
[6]
Note: My ibook has a "30 gig" drive. This is bullshit and I'll tell you why: Drives are defined by the binary definition of mega, kilo and giga. For example, a kilobyte is not 1000 bytes, but actually 1024 bytes. However, your HD manufacturer uses the metric definitions, even up to gigabytes. Now I can see you thinking..."But Wait Mr. Mad Penguin Person...Thats patently ridiculous and means they are lying on the box." Yah... [3]
Berkeley And The Golden FFS
When researching file systems, one quickly discovers that there is an elephant in the room. Almost every research paper, almost every design, almost every new file system, and even most file system developers at some point reference FFS, the artist formerly known as the Fast File System. FFS was developed at Berkeley for their UNIX system, which if you read Slashdot was apparently still-born. The FS before this was called the "File System." However it had a problem, (besides an incredibly unimaginative name) it was not designed for large discs and performed exceptionally poorly on such. To understand the problem with their first file system, we have to define some terms:
I-Node: An inode is quite simply a data structure that stores a link to the file data, the file name and metadata (such as file size and creation date). These inodes are what define the structure of the drive. Have you ever used a card catalog in a library? This is pretty much the same idea. Actually, an inode is almost exactly like a card in the catalog.
Block: A block is the software-level abstraction of a sector. Basically, if a disc has a sector size of 512 bytes and yet the OS wants a 1024 sector size, they just call it a block and not a sector. The words block and sector are used interchangeably in a lot of documentation and can kind of be interchangeable in your mind as well.
Superblock: The superblock is usually located at the front of the file system and defines it. This block contains various statistics about the file system, such as size, a magic number that supposedly uniquely identifies the file system type and, in the original Berkeley file system, a list of the partition’s inodes.
Data Blocks: This is the actual data, strangely unimportant in file system design. (and this article)
So, what was the problem with the Berkeley file system? Simple. All the inode information was stored in the superblock. This works fine for small drives, but picture in your mind a large drive. This would mean that the read head would have to go to the beginning of the drive to read an inode, then to the data blocks to get the file data, then back to the inode section, then back to the data and so on. This is called thrashing and is about as inefficient as file systems get. The good people at Berkeley set out to solve this problem in creating their new faster file system. They were exceptionally creative about the name and ended up calling it the Fast File System.
The main difference between FS and FFS is the creation of cylinder groups. These groups serve two purposes:
They associate those sectors of the driver that are "parallel" on the platters to take advantage of the speed increase of writing from platter to platter. This is no longer very valuable as most drives do this in their own firmware.
More importantly, they split up the drive into sections and hold the inode information for those sections, as well as a backup of the superblock. The superblock then holds information not about the inodes, but about the cylinder groups.
This greatly reduces "thrashing" on large discs and turns out to be a design that lasts. Most every file system includes an idea similar to cylinder groups to solve the thrashing problem. [2]