Safety Engineer, Dad, Husband, Pilot, Musician. Not necessarily in that order.

Ingenieur für funktionale Sicherheit, Vater, Ehemann, Pilot, Musiker. Nicht notwendigerweise in dieser Reihenfolge.

  • 1 Post
  • 3 Comments
Joined 1 year ago
cake
Cake day: June 11th, 2023

help-circle
  • Then why do you think manufacturers still list these failure rates (to be sure, it is marked as a limit, not an actual rate)? I’m not being sarcastic or facetious, but genuinely curious. Do you know for certain that it doesn’t happen regularly? During a scrub, these are the kinds of errors that are quietly corrected (althouhg the scrub log would list them), as they are during normal operation (also logged).

    My theory is that they are being cautious and/or perhaps don’t have any high-confidence data that is more recent.


  • Hopfgeist@feddit.detoSelfhosted@lemmy.worldHow to fix my ZFS pool mistakes
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    7 months ago

    Bit error rates have barely improved since then. So the probability of an error whenr reading a substantial fraction of a disk is now higher than it was in 2013.

    But as others have pointed out. RAID is not, and never was, a substitute for a backup. Its purpose is to increase availability. And if that is critical to your enterprise, these things need to be taken into account, and it may turn out that raidz1 with 8 TB disks is fine for your application, or it may not. For private use, I wouldn’t fret. but make frequent backups.

    This article was not about total disk failure, but about the much more insidious undetected bit error.


  • Let’s do the math:

    The error-reate of modern hard disks is usually on the order of one undetectable error per 1E15 bits read, see for example the data sheet for the Seagate Exos 7E10. An 8 TB disk contains 6.4E13 (usable) bits, so when reading the whole disk you have roughly a 1 in 16 chance of an unrecoverable read error. Which is ok with zfs if all disks are working. The error-correction will detect and correct it. But during a resilver it can be a big problem.