1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Storage Why I Always Say RAID Should Be Made of Differing Disks

Discussion in 'Hardware' started by Gareth Halfacree, 26 Nov 2019.

  1. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    17,133
    Likes Received:
    6,728
    Y'know how when anyone talks about RAID on here, the second thing I say - after "RAID is not a backup" - is that they should build their arrays out of disks from different manufacturers - or, at least, of different models.

    "But why," people ask. "Surely having everything match is better - like matched memory sticks?"

    I talk about the bathtub curve. I talk about how a firmware bug made some Seagate drives disappear from the system, all data lost. "But that's really rare," people say. "That never happens these days."

    Well... Yes, sadly, it does. That's HPE, there, advising customers to update their SSDs 'cos a firmware flaw makes 'em die with complete loss of all data after 32,768 hours of operating time. You build your array out of nothing but HPE SAS SSDs, that's the entire array gone all at once.
     
    adidan and Bloody_Pete like this.
  2. Mister_Tad

    Mister_Tad Will work for nuts Super Moderator

    Joined:
    27 Dec 2002
    Posts:
    14,085
    Likes Received:
    2,451
    This still seems like more a reason to have (and test) a backup plan for data you don't want to lose than to go through the pains of assembling an array of disparate disks.

    Matched is better not for any technical/performance reason, but because it's easier to assemble and allows you to shop to a price point. All that going out of your way to assemble an array from mismatched models/vendors is going to achieve is give a false sense of security IMO.

    It doesn't happen all that often at least. Last notable industry-wide incident AFAIK was the Seagate Moose drives circa 2008.
     
    silk186 likes this.
  3. Mister_Tad

    Mister_Tad Will work for nuts Super Moderator

    Joined:
    27 Dec 2002
    Posts:
    14,085
    Likes Received:
    2,451
    JBOD isn't a form of RAID at all - it's multiple disks presenting as one logical volume with data simply spanning from one disk to the next. The only advantage with JBOD is one of ease of use - instead of many volumes, you have one. No performance benefits, no reliability/durability benefits. The disadvantage is that if you lose one disk, it's still likely that you will lose the whole volume. There is a greater likelihood of being able to recover data on the good disks, but it's not 100%.
     
  4. Votick

    Votick My CPU's hot but my core runs cold.

    Joined:
    21 May 2009
    Posts:
    2,321
    Likes Received:
    109
    "3PAR, Nimble and Primera arrays are not affected."

    So us big boy's don't really give a toss... lol
     
  5. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    17,133
    Likes Received:
    6,728
    Assuming we're not talking about buying a prefilled box, here, how is putting 1x Vendor A and 1x Vendor B in your basket any more difficult than 2x Vendor A?

    I'm not saying that a 12-drive array should use 12 totally distinct drives, but I am saying that using 12 totally identical drives isn't a great idea.
     
  6. Mister_Tad

    Mister_Tad Will work for nuts Super Moderator

    Joined:
    27 Dec 2002
    Posts:
    14,085
    Likes Received:
    2,451
    For two disks, sure, split vendors, why not.

    For an 8+ disk array, how many vendors are you supposed to pick? Using two vendors ins't going to help you if you're using a dual-parity RAID, you'll need at least 4 different vendors to "protect" from a failure like this. given there are only really two vendors in the disk market (and not many more in terms of "real" SSD vendors), and they do a semi-decent job of differentiating models, what do you do then?

    So I perhaps want to use WD RED because they seem to be everyone's favourite, but then use the Seagate Iron Wolf equivalent as well to mix it up a bit. For my other two vendors what then - I can use Red Pro and Iron Wolf Pro at a significantly increased cost, or I could use Green/Cuda but they're not necessarily appropriate for NAS use. If I need more than 4 models, I could be into enterprise disks at a huge price uptick.

    And when it comes down to it, there are still a million and one ways you can lose the entire array anyway. If you're already backing up important data anyway as you should be, who cares?
     
  7. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    17,133
    Likes Received:
    6,728
    Which has always been my point.
    Hang on, you start off talking about "real SSD vendors" then talk about Seagate Iron Wolf - which is spinning rust.

    If I were looking to fill an eight-disk array with spinning rust and wanted to spread across vendors, I'd have two WD Reds, two Seagate Iron Wolfs, two Toshiba MN Series, and two HGST DeskStar NAS drives. Bosh, job done.¹
    'cos restoring humpty-tump terabytes from backup takes time. Restoring an array from a (couple of) failed disk(s) just degrades performance for a bit, you can carry on working as normal.

    So, to lay out my approach more clearly if I have been unclear in the past: I believe that for smaller arrays you should use differing drives to spread your risk from exactly the kind of problem that just hit HPE and the bathtub-curve's impact on identical drives. For larger arrays, that's not likely to be an option for you - in which case, y'know, sucks to be you, just buy whatever you can.

    1: What I *actually* have in my server are mostly drives shucked from USB caddies, 'cos I'm a cheapskate. But hey, at least they're still from different vendors!
     
  8. Mister_Tad

    Mister_Tad Will work for nuts Super Moderator

    Joined:
    27 Dec 2002
    Posts:
    14,085
    Likes Received:
    2,451
    I start talking about spinning disk, and also refer to a similarly limited number of truly disparate SSDs.
    I know full well that you're not going to concede a point and I'm not interesting in arguing my case with you in a nitpicky fashion right now. Or ever.

    Your belief is that for smaller arrays (wait, what's sma.... no... don't nitpick), one should use disparate disks. There's also an argument to say you should make sure each disk you order has a disparate supply chain, because if you get them all from the same place they're likely to be subject to all of the same environmental factors.

    My belief is that these aren't failure scenarios worth bothering to try to cover, which at best may save you from a one in a million (figuratively) need to restore data and at worst give you a false sense of security with regards to how durable your data really is.
     
  9. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    17,133
    Likes Received:
    6,728
    Spoilsport. :p
    And my belief is that for smaller arrays (let's say four or fewer disks, why not) it takes so little effort to do it's definitely worth covering it. And, remember, I'm not just talking about guarding against "we tracked the runtime counter as a signed INT16, which turns out to have been a really bad idea" firmware flaws - it also covers you against any design, manufacturing, or material defects or deficiencies that would slide the drives the wrong way along the bathtub curve.
     
  10. Mister_Tad

    Mister_Tad Will work for nuts Super Moderator

    Joined:
    27 Dec 2002
    Posts:
    14,085
    Likes Received:
    2,451
    Of course. 10 years ago Seagate's Moose drives had a series of firmware issues that made them universally prone to early failure. 10 years before that the IBM Deathstars were simply an issue of mechanical reliability IIRC. As they say, there are only two types of disks, those that have failed, and those that will fail. Some will fail today, some will fail in 10 years, you don't know which one you have until they do.

    From a protection point of view, make the upfront assumption that at some point one or all of your drives will fail (because it's entirely correct), and buy for warranty, feature/specs, cost and convenience in that order, I say.
     
  11. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    17,133
    Likes Received:
    6,728
    Very true - although with the proviso that some drives can be considered more likely to reach the ten-year mark than others. You've highlighted some of the biggest blunders in the last few decades, but as BackBlaze's stats show some drive models seem to fail more than others - hello, Seagate ST4000DM000, 2.3% annualised failure rate, and the WDC WD30EFRX, 2.25% annualised failure rate. So even without a headline-grabbing "every single unit of this particular drive just babbed itself at once" firmware flaw, there's something to be said with balancing your risk across vendors. (In my opinion, of course.)
    Sensible, though I don't tend to hold much stock in hard drive warranties. First, the data was the valuable stuff; second, they tend to replace your failed drive with a "reconditioned" drive, and I don't trust those - not since a "factory reconditioned" Crucial SSD went mammaries-skyward on me halfway through a working day...
     
  12. Mister_Tad

    Mister_Tad Will work for nuts Super Moderator

    Joined:
    27 Dec 2002
    Posts:
    14,085
    Likes Received:
    2,451
    Or you could place a bet on one model only because the most likely scenario is it won't be a lemon, rather than increase your overall chances of encountering a lemon by diversifying.

    The thing about the Backblaze stats, aside from being deeply flawed in general, the vast majority of even their "bad" disks will be fine for years to come. And the vast majority of all other disks, aside from the the ones mentioned, will also probably run for longer than they will be useful. You just don't know which ones won't, until they don't. It's easy to look back at a sample of hundreds of disks and say "these ones failed more than those ones", but I'm not sure that helps in any way for looking forward and questioning how long an individual disk, or 2 or 4, are going to run.

    I won't use a Crucial SSD at all as I've had a total of three outright failures, or a 100% failure rate. I have no beef with reconditioned disks though if the price is right.
    That said the reality of a warranty for me though is one of investment protection - if one or all of my drives fails now or at any point within the 5y warranty, I'm still going to next-day a replacement disk. The warranty replacement just offsets the of the cost of that replacement (or might offer expansion if I need it).
     
    Last edited: 26 Nov 2019
  13. Bloody_Pete

    Bloody_Pete Technophile

    Joined:
    11 Aug 2008
    Posts:
    8,439
    Likes Received:
    1,112
    This is a real nightmare of risk aversion to fall into. I did my dissertation of similar stuff, just aerospace, and its a hole you can fall in. In my mind you shouldn't even worry about the drives, as there's so much more in a NAS that can fail, so if you need to keep it reliable annd safe you need multiple redundant active arrays. Its the madness of failure rates!
     
  14. silk186

    silk186 Derp

    Joined:
    1 Dec 2014
    Posts:
    1,935
    Likes Received:
    150
    MIxing up vendors in a array arguably maxes the array vulnerable to all vendor-specific bugs unless the array is set up so that all drives of a single vendor can fail without compromising the array.
     
  15. Mister_Tad

    Mister_Tad Will work for nuts Super Moderator

    Joined:
    27 Dec 2002
    Posts:
    14,085
    Likes Received:
    2,451
    The pain is real. I spent years working as a storage & backup architect listening to the ridiculous and infinitesimally unlikely failure scenarios people thought they wanted to protect against whilst totally missing the bleeding obvious way it was all going to go to sh__ right in front of them.
     
    Bloody_Pete likes this.
  16. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    17,133
    Likes Received:
    6,728
    Which was my thinking for "so why put all your eggs in one basket by buying only one model of disk?"
    I think I've had three, apart from the SSD, and of the two spinning rusts one started to clock up bad blocks so I stopped using it, and the other... Actually, I have no idea what happened to the other. It's not in any of my current systems, I know that.
    That's how I've done mine: all the drives from any one vendor can fail, and it'll keep on tickin'. But, as @Mister_Tad says, that's not something you can do very easily once you get beyond a handful of disks.
     
  17. Bloody_Pete

    Bloody_Pete Technophile

    Joined:
    11 Aug 2008
    Posts:
    8,439
    Likes Received:
    1,112
    People often forget just burning it to a disk and whacking that in a fireproof safe is the safest option!
     
  18. Mister_Tad

    Mister_Tad Will work for nuts Super Moderator

    Joined:
    27 Dec 2002
    Posts:
    14,085
    Likes Received:
    2,451
    With the 6 in my NAS it wasn't easy. I started out thinking I might try if it's not too much of a faff.

    I was after 7.2k, 4Kn SATA disks, so that left me with three options for model.
    I briefly explored buying two of each from 6 different places until it became clear it was going to cost several hundred extra.

    I got 6 identical disks from the same place in the end, and I don't sweat it. If you post an article tomorrow about an fundamental flaw causing Exos 7E8 drives to literally catch fire spontaneously, I may start to, mind.
     

Share This Page