Creating ZFS drives is not difficult. However, there are important details which require attention, particularly for external drives. Once created, managing ZFS drives is remarkably easy. This tutorial covers creating an external OpenZFS drive with Linux Mint 20. The steps shown are generally applicable to any linux distribution with OpenZFS 0.8.3 or above, including Ubuntu 20.04 LTS.
Table of Contents
Advantages of ZFS
ZFS is a combined file system and volume manager originally developed by Sun Microsystems. It was developed for Solaris which is a proprietary Unix operating system then owned by Sun (now Oracle). Thanks to OpenZFS, ZFS is available on Linux and the performance is rock solid on non-system drives. ZFS offers the following desirable features.
- ZFS provides enhanced data integrity. ZFS uses a “copy-on-write” system which writes new data to new blocks while preserving old data. Read/writes are verified by a checksum. Running fsck after a system crash or power interruption is not needed.
- ZFS can create a file system that spans across several drives in a pool. ZFS provides mirroring and other RAID configurations with automatic error correction. There is no need for a RAID controller or third party software.
- Any size drive can be used. Any number of drives can function as one device.
- It is easy to organize your data by creating datasets and managing them independently.
- ZFS offers reliable and efficient encryption and compression. Compression is so efficient it can actually increase the read/write speed of a drive.
- Using ZFS for system drives (drives or partitions containing the Linux operating system) is currently experimental. This feature is in active development and full support will be available soon.
Installing ZFS
If you are using Linux Mint 20 or later there is nothing to install. ZFS is natively available. If you are using Ubuntu 20.04 or later and ZFS was configured during installation, there is also nothing to install.
To check to see if ZFS is included with your Linux distribution, enter the following command in the terminal (bash). You need ZFS version 0.8.3 or above.
zfs --version
For Ubuntu 20.04, if ZFS was not configured during installation, it is easy to install ZFS as described below. For Linux Mint, also use the commands below for versions of Mint previous to Mint 20.
sudo apt update
sudo apt install zfsutils-linux
zfs --version
Preparing the External Drive
Warning: the following procedure will destroy all files on the external drive. Make sure you know which drive is your external drive.
Use the Linux program Gparted to locate your external drive. The drive should not be partitioned and all space should be unallocated. Remove partitions if necessary. If you are starting with a new unused drive, you need to create a GPT partition table. This can be done using Gparted: Device > Create Partition Table.
Good record keeping is important. It is best to name each drive and write the name on the drive with a sharpie being careful not to cover up the serial number. Keep a list of all your drives with a description of the data they contain and the ZFS commands to manage them.
Creating the Pool
Start by defining a pool which encompasses your storage area, typically one or several drives. [Later on we will create datasets, which live inside the zpool, that contain your data.] In ZFS lingo, the zpool is made up of one or more vdevs. Each vdev is usually a group of drives that work together (a RAID group for example). For this tutorial, yourpool contains one vdev containing a single drive which is the simplest way to create a backup disk.
Use the Disks program, Gparted, or other disk utility program to find the name of the external drive that you want to format with ZFS. In this tutorial we will format sdb.
Warning: when you create the zpool using the next command, it will wipe all existing data off this drive. In the following command you must replace sdb with the name of your external drive.
sudo zpool create yourpool /dev/sdb
In the above, I named the pool “yourpool”. You can use any name you want. Whatever name you choose, it is good practice to make the name end in “pool”. This will remind you that it is a pool name, for example “jennypool”.
Use the Disks program or Gparted to view your drive. You will see that when the pool was created, it made two partitions on your drive. The first partition is a ZFS pool which takes up almost the entire drive, in our case sdb1. The second partition sdb2 is marked “Solaris Reserved” which is where ZFS does its housekeeping. For now, we will reference the pool as being on sdb1.Recharacterizing the Pool
cd /dev/disk/by-id
ls -alF
ata-WDC_WD101EFAX-68LDBN0_VCG88VNN-part1 -> ../../sdb1
Now we will recharacterize the pool partition with its by-id name. To do this we will use a trick. This only has to be done once.
Export the pool so that it will no longer be recognized by your machine.sudo zpool export yourpool
zpool status
Import the pool so that the machine will once again recognize the pool. Use the by-id name of your pool.
sudo zpool import -d /dev/disk/by-id/ata-WDC_WD101EFAX-68LDBN0_VCG88VNN-part1 yourpool -N
zpool status
The zpool status command should show that the pool is now being referred to with its by-id name. Always use an import command like the above when you want to import a pool. Never use a common name like sdb1 again.
Adding Compression to the Pool
sudo zfs set compression=on yourpool
The only case where you may not want to use compression is when most of your data is photos or video. The compression ratio for this type of data is low for all file systems including ZFS. In this case, compression will not reduce the data size significantly.
Planning to Mount the Datasets
A dataset is a space where you put your data. Datasets are flexible in size and are located inside your pool. In this tutorial we chose “yourdataset” for the name of your dataset. It is important to mount your datasets in a good location. ZFS will allow you to mount your datasets anywhere you choose, so choose wisely. The default is the directory /yourpool/yourdataset. In most cases this is NOT the best place to mount your ZFS dataset. The default mountpoint can cause catastrophic problems for people that use Timeshift or other backup programs. Timeshift is a program that automatically takes snapshots of your system (OS), so that the system can be rolled back if an upgrade (or new package install) damages the system. When Timeshift does a snapshot, it automatically excludes the /mnt and /media directories which are common mountpoints for external drives. It is best practice to mount your ZFS datasets in one of these directories so that they also will be excluded by Timeshift. In most cases you do not want to make a Timeshift snapshot of a external drive, particularly if the external drive is a backup drive. A particular problem that the suggested mountpoint avoids is writing the entire contents of your external drive to a Timeshift snapshot stored on an internal drive, perhaps to a point beyond the drive’s capacity. This can cause your system to crash.
The easiest way to set the mountpoint of your datasets is to set the mountpoint of the pool to which the datasets will belong. Set the pool’s mountpoint immediately after the pool is created. All datasets created later in this pool will “inherit” that mountpoint.
sudo zfs set mountpoint=/mnt/yourpool yourpool
All new datasets created in yourpool will inherit the new mountpoint. In our case, yourdataset will be mounted on /mnt/yourpool/yourdataset.
Creating Encrypted Datasets
Datasets can be used to help you organize your data. For example, consider the following scenarios.
- You need to control access to two different types of data, for example accounting data and medical data. You can put the data in separate datasets and encrypt them with different passwords.
- You want to mount different types of data in different directories. Create several datasets, each with a different mountpoint.
- You use “Back In Time” to backup your home directory to an external drive. You want to encrypt your data, but do not want to use the default EncFs encryption method which is known to have security issues. https://backintime.readthedocs.io/en/latest/settings.html#local-encrypted Instead, backup to a secure ZFS aes-256 encrypted dataset.
sudo zfs create -o encryption=on -o keylocation=prompt -o keyformat=passphrase yourpool/yourdataset
The mountpoint of yourdataset is /mnt/yourpool/yourdataset (the mountpoint was inherited from yourpool). Verify this.
zfs get mountpoint
As discussed above, this directory is a good safe mountpoint. However, if it does not suit your needs, you can change it with the following command. [Do not bother to create the directory of your new mountpoint prior to running this command. ZFS will create the directory for you.]
sudo zfs set mountpoint=/any/path/you/want yourpool/yourdataset
Mounting Datasets
ZFS makes mounting/unmounting easy. You do not need to manage ZFS entries in your /etc/vfstab file. Each ZFS dataset is mounted according to properties of that dataset. Before mounting a dataset, you should check to see if the pool has been imported.
zpool status
If the machine has been restarted (or the drive has been moved to another machine) you need to import the pool.
sudo zpool import -d /dev/disk/by-id/ata-WDC_WD101EFAX-68LDBN0_VCG88VNN-part1 yourpool -N
sudo zfs mount -l yourpool/yourdataset
Now you can read/write to your dataset. When datasets are first mounted they are owned by root. You may want to change the file permissions.
Using the External Drive on Any Machine
Now that the external drive has been properly formated with ZFS, it can be moved and used on a different machine. All you have to do is connect the drive, import the pool, and mount the dataset (providing a password if required).
sudo zpool import -d /dev/disk/by-id/ata-WDC_WD101EFAX-68LDBN0_VCG88VNN-part1 yourpool -N
sudo zfs mount -l yourpool/yourdataset
Unmounting and Powering Down
Flush memory and write all data to your drives.
sync
Unmount the dataset.
sudo zfs unmount yourpool/yourdataset
Export the pool.
sudo zpool export yourpool
The command above is mandatory if you are going to move the external drive to a different machine.
To power down the drive, open the Disks program, select the drive, and push the power button symbol at the top of the screen.
Using the steps above can be considered to be a “best practice”. However, if you omit some of these steps , chances are that you will get away with it. ZFS is very robust. I was once in the middle of writing a large file to a ZFS dataset when my system crashed and I had to do an emergency reboot. This was a worst case scenario in the sense that the pool was online, the dataset was mounted, and unwritten data was in memory. I had no issues after rebooting. The pool imported and the dataset mounted with no problems. I lost only the file that was being written at the time of the crash.
Checking Data Integrity
SMART Monitoring
There are many programs that will extract and display SMART data. For Linux (or Windows), I recommend Gsmartcontrol which is a GUI interface for Smartmontools. SMART reports a long list of drive attributes. These are the attributes that are most critical:
- ID 5 – Reallocated_Sector_Count
- ID 187 – Reported_Uncorrectable_Errors
- ID 196 – Reallocation Event Count
- ID 197 – Current_Pending_Sector_Count
- ID 198 – Offline_Uncorrectable
Interpreting SMART data is a bit subjective, but most companies will replace drives which have a non-zero count in ID# 187 or ID# 198 (as expressed as a non-zero Raw Value). If this has occured, the drive hardware ECC has been unable to fix an error by repeatedly trying to read the data and relocate it to a good sector. In other words, there is bad data on the drive which the drive’s internal correction mechanism can not fix. The other attributes listed above are a measure of the number of times data has been or will be reallocated to a healthy sector. Consider replacing the drive if these numbers are high or rapidly increasing.
ZFS Monitoring with Scrub
Scrub drives to check data integrity using this command.
sudo zpool scrub yourpool
If you want to stop scrubbing, use this command.
sudo zpool scrub -s yourpool
If you restart a scrub it will not restart where the previous scrub left off. You will be starting the scrub all over again.
When you check the status of your pool, you will see the results of your last scrub.
sudo zpool status -v yourpool
For users that want to test zfs to see how well it can detect and repair errors, there is a ZFS utility called zinject which creates artificial problems in a ZFS pool by simulating data corruption or device failures. This program is very dangerous and should not be used with pools containing valuable information.
I would like to have a set of three external drives synced via ZFS replication with compression and encryption. A working drive, a local backup and an offsite backup. I am thinking of syncing to the local backup and then swap it for the offsite backup and sync that when it arrives.
Any suggestions or caveats?
You have a good plan. When you are through with this project, consider backing up to the cloud using zfs. There is one cloud storage provider offering native zfs send and receive. The price is a bit high but customer support appears to be excellent.
https://www.rsync.net/products/zfsintro.html
Hello Jenny, could you also write another article on setting up two drives in mirrored mode? I have an enclosure with 2 drives in jbod and would like to use a USB stick to mount them and backup my NAS. Thank you.
Using two drives in one enclosure works if the controller truly provides jbod. Not all of them do. It does no harm to try it. If it does not work, use a separate enclosure for each drive. Even if you do this you can still run into trouble if the make and model of the two enclosures are the same. I know these two work well together:
OWC Mercury Elite Pro
Yottamaster Pro
Setting up two mirrored drives is very similar to setting up a single drive. To create the pool (use the sdx designation of your drives):
sudo zpool create yourpool mirror /dev/sda /dev/sdb
To import (use the id of your drive partitions):
yourdrive01=/dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K0ZN6T2S-part1
yourdrive02=/dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K3ULXS27-part1
sudo zpool import -d $yourdrive01 -d $yourdrive02 doc1pool -N
This is a great article! Thank you.
When I tried to mount the encrypted drive on another system I had to first run `sudo zfs load-key yourpool` before I could run `sudo zfs mount yourpool/yourdataset` But I also set up the zpool as a mirror with two partitions on the backup drive, so I’m not sure if that changed something? Oh, you know what, it’s probably because I encrypted the entire pool and not a particular dataset!
Thanks for the feedback! Good for you in successfully setting the zpool up as a mirror on two different partitions on the same drive! This will not protect you from a catastrophic drive failure, but it will certainly help protect the integrity of your data from surface defects on the drive. Well worth doing. You have chosen to encrypt the pool. There is nothing wrong with that and it should work perfectly fine. For most people it is probably better to encrypt datasets. This gives you more flexibility in the sense that you can put your data in different buckets and control access to each bucket (each bucket being a different dataset). For example, I usually leave one dataset unencrypted which contains a file explaining the purpose of the drive and its contents. This file can also contain a password hint. However, this may be unnecessary in your case if this is a backup disk and you will always want all data on the drive encrypted. Well done!
Excellent write up on formatting ext drives with ZFS and usage methodology. Keep up the great work.
I am glad you found the tutorial useful. If you run into any problems or have any suggestions, please let me know.
great tutorial,, thank you ! it worked really well ! If i have a zfs pool on my server, do i then need to unmount and export before restarting due to updates ?
I am glad you liked it! Updating your software, including updating your version of zfs, should not effect a currently running zfs drive. However, you should always unmount/export before restarting your operating system.
Jen,
brilliant explanation – helped me to successfully setup zfs on portable external hard drive
thanks
Jonno
Thank you Jonno. I am glad everything worked for you.