Back when I was even less experienced in self-hosting I setup my media/backup server using a RAIDZ1 array and 3 x 8TB disks. It’s been running well for a while and I haven’t had any problems and no disk errors.
But today I read a post about ‘pool design rules’ stating that RAIDZ1 configurations should not have drives over 1TB because the chances of errors occurring during re-silvering are high. I wish I had known this sooner.
What can I do about this? I send ZFS snapshots to 2 single large (18TB) hardrives for cold backups, so I have the capacity to do a migration to a new pool layout. But which layout? The same article I referenced above says to not use RAIDZ2 or RAIDZ3 with any less than 6 drives…I don’t want to buy 3 more drives. Do I buy an additional 8TB drive (for a total of 4 x 8TB) and stripe across two sets of mirrors? Does that make any sense?
Thank you!
Honestly, if you’re doing regular backups and your ZFS system isn’t being used for business you’re probably fine. Yes, you are at increased risk of a second disk failure during resilver but even if that happens you’re just forced to use your backups, not complete destruction of the data.
You can also mitigate the risk of disk failure during resilver somewhat by ensuring that your disks are of different ages. The increased risk comes somewhat from the fact that if you have all the same brand of disks that are all the same age and/or from the same batch/factory they’re likely to die from age around the same time, so when one disk fails others might be soon to follow, especially during the relatively intense process of resilvering.
Otherwise, with the number of disks you have you’re likely better off just going with mirrors rather than RAIDZ at all. You’ll see increased performance, especially on write, and you’re not losing any space with a 3-way mirror versus a 3-disk RAIDZ2 array anyway.
The ZFS pool design guidelines are very conservative, which is a good thing because data loss can be catastrophic, but those guidelines were developed with pools that are much larger than yours and for data in mind that is fundamentally irreplaceable, such as user generated data for a business versus a personal media server.
Also, in general backups are more important than redundancy, so it’s good you’re doing that already. RAID is about maintaining uptime, data security is all about backups. Personally, I’d focus first on a solid 3-2-1 backup plan rather than worrying too much about trying to mitigate your current array suffering catastrophic failure.
I think it’s worth pointing out that this article is 11 years old, so that 1TB rule-of-thumb probably probably needs to be adjusted for modern disks.
If you have 2 full backups (18TB drives being more than sufficient) of the array, especially if one of those is offsite, then I’d say you’re really not at a high enough risk of losing data during a rebuild to justify proactively rebuilding the array until you have at least 2 or more disks to add.
Let’s do the math:
The error-reate of modern hard disks is usually on the order of one undetectable error per 1E15 bits read, see for example the data sheet for the Seagate Exos 7E10. An 8 TB disk contains 6.4E13 (usable) bits, so when reading the whole disk you have roughly a 1 in 16 chance of an unrecoverable read error. Which is ok with zfs if all disks are working. The error-correction will detect and correct it. But during a resilver it can be a big problem.
If the actual error rate were anywhere near that high, modern enterprise hard drives wouldn’t be usable as a storage medium at all.
A 65% filled array of 10x20TB drives would average at least 1 bit failure on every single scrub (which is full read of all data present in the array), but that doesn’t actually happen with any real degree of regularity.
Then why do you think manufacturers still list these failure rates (to be sure, it is marked as a limit, not an actual rate)? I’m not being sarcastic or facetious, but genuinely curious. Do you know for certain that it doesn’t happen regularly? During a scrub, these are the kinds of errors that are quietly corrected (althouhg the scrub log would list them), as they are during normal operation (also logged).
My theory is that they are being cautious and/or perhaps don’t have any high-confidence data that is more recent.
I’ve read many many discussions about why manufacturers would list such a pessimistic number on their datasheets over the years and haven’t really come any closer to understanding why it would be listed that way, when you can trivially prove how pessimistic it is by repeatedly running badblocks on a dozen of large (20TB+) enterprise drives that will nearly all dutifully accept hundreds of TBs written to and read from with no issues when the URE rate suggests that would result in a dozen UREs on average.
I conjecture, without any specific evidence, that it might be an accurate value with respect to some inherent physical property of the platters themselves that manufactures can and do measure that hasn’t improved considerably, but has long been abstracted away by increaed redundancy and error correction at the sector level that result in much more reliable effective performance, but the raw quantity is still used for some internal historical/comparative reason rather than being replaced by the effective value that matters more directly to users.
Those failure rates are nonsense based on theoretical limits. In practice: a few weeks ago I resilvered a 4 disk z1 array with 8tb disks at 85% capacity in less than 24 hours. I scrub the pool every month and it didn’t seem any more taxing than a scrub.
Personally I’d just upgrade to RAIDZ2, and add as many disks to that as reasonably practical. To be honest, I fail to see any downsides to using four disks for this other than the storage inefficiency.
It’s fine. RAID is not a backup. I’ve been running simple mirrors for many years and never lost data because I have multiple backups. Focus on offsite and resilient backups, not how many drives can fail in your primary storage device.
That article is from 2013, so I’m a bit skeptical about the claims about under 1 TB drives. It was probably reasonable advice back then when 1 TB capacities were sorta cutting edge. Now we have 20+ TB hard drives, nobody’s gonna be making arrays of 750 GB drives.
I have two 4TB drives in a simple mirror configuration and have resilvered it a few times due to oopsies and it’s been fine, even with my shitty SATA ports.
The main concern is bigger drives take longer to resilver because well, it’s got much more data to shuffle around. So logically, if you have 3 drives that are the same age and have gotten the same amount of activity and usage, when one gives up it would be likely for the other 2 to be getting close as well. If you only have 1 drive of redundancy, then this can be bad because temporarily, you have no redundancy so one more drive failure and the zpool is gone. If you’re concerned about them all failing at the same time, the best defense is either different drive brands, or different drive ages.
But you do have backups, so, if that pool dies, it’s not the end of the world. You can pull it back from your 18TB mirror array. And it’s different drives, so those are unlikely to fail at the same time as your 3x4TB drives, let alone 2 more of them. You need 4 drives to give up in total in your particular case before your data is truly gone. That’s not that bad.
It’s a risk management question. How much risk do you tolerate? How’s your uptime requirements? For my use case, I deemed a simple 2 drive mirror to be sufficient for my needs, and I have a good offsite backup on a single USB external drive, and an encrypted cloud copy of things that are really critical and I can’t possibly lose like my Keepass database.
Bit error rates have barely improved since then. So the probability of an error whenr reading a substantial fraction of a disk is now higher than it was in 2013.
But as others have pointed out. RAID is not, and never was, a substitute for a backup. Its purpose is to increase availability. And if that is critical to your enterprise, these things need to be taken into account, and it may turn out that raidz1 with 8 TB disks is fine for your application, or it may not. For private use, I wouldn’t fret. but make frequent backups.
This article was not about total disk failure, but about the much more insidious undetected bit error.
Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I’ve seen in this thread:
Fewer Letters More Letters RAID Redundant Array of Independent Disks for mass storage SATA Serial AT Attachment interface for mass storage ZFS Solaris/Linux filesystem focusing on data integrity
3 acronyms in this thread; the most compressed thread commented on today has 30 acronyms.
[Thread #630 for this sub, first seen 26th Mar 2024, 18:55] [FAQ] [Full list] [Contact] [Source code]