1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Storage Backblaze: 2015 Q1 HDD Reliability

Discussion in 'Hardware' started by JamesRC, 16 Jun 2015.

  1. JamesRC

    JamesRC Member

    Joined:
    5 Oct 2012
    Posts:
    272
    Likes Received:
    8
    Backblaze store customer data for an online backup business. As of the end of Q1 2015 they had 44,252 hard drives spinning in their datacenter. HERE are the hard drive reliability statistics for these drives for Q1 2015.

    Hitachi does very well, WD middle of the pack and Seagate a long way behind.

    My next drives certainly won't be Seagates, my current 7200.14s have the 'chirp' noise and it drives me mad, let alone reliability questions...
     
  2. Mister_Tad

    Mister_Tad Will work for nuts Super Moderator

    Joined:
    27 Dec 2002
    Posts:
    12,945
    Likes Received:
    1,214
    The accuracy and relevance of the Backblaze reports have been debunked many times, so take them with a pinch of salt.

    Buy a drive that does what you want it to do, at the price you want to pay with a warranty you're happy with, and forget about what logo it has on the box. There are only two types of HDDs, ones that have failed, and ones that will fail.
     
  3. edzieba

    edzieba Virtual Realist

    Joined:
    14 Jan 2009
    Posts:
    3,909
    Likes Received:
    590
    Those debunkings' essentially boiled down to crying "you're using non-enterprise drives in an enterprise environment!".
    The results are still applicable for home NAS use (where most NAS boxes aren't exactly gentle with drive decoupling and ventilation), for direct comparison between physical reliability of enterprise and non-enterprise drives (Enterprise drives are nigh identical, but have a longer warranty and faster replacement service), and for direct comparison between drives in the same environment.
     
  4. JamesRC

    JamesRC Member

    Joined:
    5 Oct 2012
    Posts:
    272
    Likes Received:
    8
    That's awesome. :)
     
  5. Mister_Tad

    Mister_Tad Will work for nuts Super Moderator

    Joined:
    27 Dec 2002
    Posts:
    12,945
    Likes Received:
    1,214

    That's one of them, but I wouldn't call their uses "enterprise" simply because they have lots of drives - their storage pods are mechanically and electrically home-brew and essentially designed via trial and error - comparable with neither "enterprise" nor home use, and I don't doubt this reflects in their failure rate. I'm not being derisive by the way - cheap and disposable is just as valid a anapproach as extensively and expensively validated and tested, I just don't think much weight should be put behind their findings because of this approach.

    Let's look on the other side of things - I work closely with a site running a footprint of 45,000 NL-SAS disks (all of which are getting hammered at all times, and even more densely packed than Backblaze) from all three of the manufacturers and they've consistently observed a 0.25% annual failure rate across the board (including proactive disk failure as well), and a major contributor to this isn't the drives themselves, but the enclosures and environment in which they're in - the complete opposite end of the spectrum to Backblaze and no more or less relevant to the home user.

    Don't get me wrong, it's interesting to read of their findings, but every time their report comes out the internet is awash with "OMG I'm never buying Seagate drives again" because individuals didn't think past the one damning graph showing the same drives each time.
     
    Last edited: 17 Jun 2015
  6. Corky42

    Corky42 Where's walle?

    Joined:
    30 Oct 2012
    Posts:
    9,648
    Likes Received:
    386
    Do they still do "shucking" stripping the HDD from external USB drives and the like?

    Do they still use RMA's, and refurbished HDD?

    I also see their on version 4.5 of the Storage Pod, upgraded five times to improve reliability each time, do they still include drives from older version of Storage Pods in the results?

    Do their Storage Pods still suffer from uneven temperature variations?

    Are the HDD still subjected to different workloads?
     
  7. Mister_Tad

    Mister_Tad Will work for nuts Super Moderator

    Joined:
    27 Dec 2002
    Posts:
    12,945
    Likes Received:
    1,214
    I can hypothesise...

    It makes sense to do so if they're cheaper, and there's no real reason not to, depending on how quickly someone on minimum wage can whip a disk out of its enclosure :lol:

    They claim to store 150PB and this latest run of stats is across ~42k disks, so that suggests it's the entire footprint including all deployed pods.

    They use declustered RAID and what I can only imagine is an object based front-end, as there would be no other sensible/feasible way to store 150PB of unstructured data - given the use case, I would imagine that there's large chunks of disks sitting idle much of the time.
     
  8. edzieba

    edzieba Virtual Realist

    Joined:
    14 Jan 2009
    Posts:
    3,909
    Likes Received:
    590
    That was my point: Backblaze's harsh treatment of HDDs (low ventilation, lots of drives loosely coupled to a sheet-metal chassis with very little damping and lots of rattling, and just sitting right on top of the backplane with no strain relief) is closer to home usage than even a tightly packed Thumper with a sturdy chassis and proper caddies.
    While the data isn't very valuable for "how long might my HDD last?' unless you also happen to be running a Backblaze pod, it is still useful for 'what drives are least likely to crap out in a harsh environment?'. If you have a good chassis, then pretty much any HDD is fine. If you have 5 drives crammed into a Microserver hidden in a hot cupboard (or hard-mounted inside a minimal-airflow SFF case), then maybe Seagate's drives don't handle that abuse as well for as long as Hitachi's.
     
  9. Corky42

    Corky42 Where's walle?

    Joined:
    30 Oct 2012
    Posts:
    9,648
    Likes Received:
    386
    Personally I would disagree.

    For starters there's no way of know what make of HDD are in what version of their Storage pods, maybe early on they bought a job lot of Seagate drives and placed them in version 1 that's more likely to cause HDD failure, maybe a larger percentage of HGST drives are in version 4.5 racks so suffer less from vibration.

    Next up would be how we don't know what drives come from external USB drives and the like, how they've been handled, etc, etc.

    What percentage of what models are from RMA's, and refurbished HDD, how do we know if they didn't get a job lot of Toshiba RMA's, or a load of cheap refurbished Seagate's.

    The data is so incomplete (imo) that any subjective judgements are fairly pointless.
     
  10. edzieba

    edzieba Virtual Realist

    Joined:
    14 Jan 2009
    Posts:
    3,909
    Likes Received:
    590
    They do split up their summary reports by quarter (though do not explicitly list when each version of the pod was introduced, though you can look through their blog to see when they are introduced), and make the raw data available for analysis if you want a day-by-day breakdown.
    Handling should be of minimal issue (if a drive cannot handle being removed from packaging and installed without an elevated failure rate, that's a big problem on it's own), but data on what drives were sold bare and which were removed from caddies would be useful, particularly as it would show whether caddie-drives have been 'binned' noticeably differently from bare drives.
    I can't recall Backblaze ever stating they've purchased second hand drives. If any manufacturer is providing refurbished drives as replacements (and/or fiddling the SMART data to reset the run-time values) that would be a massive black mark against them in any situation.
    Is this in regards to their posted analysis, or from looking at the raw data?
     
  11. Corky42

    Corky42 Where's walle?

    Joined:
    30 Oct 2012
    Posts:
    9,648
    Likes Received:
    386
    Both, not that I've looked at the raw data as I CBA to install sqlight or a similar database engine.

    Even if they split up the reports per quarter and that could be linked to when each new version of rack came into use that still doesn't tell us what drives are in what version of rack, maybe they're seeing high failures on Seagate drives because they got a job lot on the cheap early on and placed them all in Storage pod V1, maybe we're also seeing mainly Hitachi drives in the new versions of the racks hence the lower failure rates.

    With regards to drives coming from external USB drives and the like how do we know there not treated differently? The ones that they sourced from here most likely were as delivery people can be less than careful when dealing with packages at times, same could be said for the shops they sourced external USB drives from, do warehouse staff treat external USB drives differently than consumer HDD?

    Moving onto the second hand drive situation IDK if they still purchase them but the report from January 2014 states the following...
    Given they source drives from the cheapest available source my guess would be that some refurbished drives still find their way into the mix.

    I'm not saying their data is totally worthless, there are however holes in the methodology they used to arrive at the statistics they publish, so many holes that little meaningful conclusion can be arrived at (imo).
     
    Last edited: 18 Jun 2015
  12. edzieba

    edzieba Virtual Realist

    Joined:
    14 Jan 2009
    Posts:
    3,909
    Likes Received:
    590
    The raw data contains the date the drive was received, so this can be correlated with the date new chassis were introduced (with a buffer period of say a month to assume use-up of remaining old-design chassis). Though even the quarterly breakdown would show if they had purchased a bunch of dodgy drives early on (or ant any point); only the Seagate Barracuda 7200.14 appears to suffer from 'spikes' in cumulative failures.
    If handling of external drives in their original packaging is sufficient to noticeably reduce lifetime, then that itself is a useful data point and I would be wary of using them regardless. The handling those boxes experience in transport by an end-user is nothing compared to the handling they experience in back-end bulk transport from the factory to distributors and from distributors to retailers (as well as variable storage conditions). I have experienced enough external drive 'failures' due to failed USB-SATA controllers without worrying about an elevated drive failure rate on top of that. Adding a 'removed from external chassis' flag to their data would be a good idea.
    Again, if warranty replacements are provided second-hand, then that itself would be a black mark against a manufacturer.
     
  13. Mister_Tad

    Mister_Tad Will work for nuts Super Moderator

    Joined:
    27 Dec 2002
    Posts:
    12,945
    Likes Received:
    1,214
    I see where you're coming from, but if we're looking at how drives perform in harsh environments, I'm sure the Gadget show has done a test at some point where they play football with HDDs and see which ones last the longest, to "prove" something about reliability. What I mean is, a disk performing better than another in the harshest of environments doesn't necessarily speak to that disk's reliability in a different environment, just as a disk's reliability in a "perfect" environment may not be reflective of it's abilities to handle more punishment.

    I think home environments are pretty friendly really - sure there will be the odd enclosure or case layout that neglects the HDD a little bit, but typically these are enclosures running up to a few disks only so no biggie.

    I've also always wondered as well, if the orientation of the disks in the Storage Pods plays a role in things, as that's a fairly unique aspect of it. I've had a few optical drives that have been picky about orientation (insomuch as not working at all when on their side). Clearly these are an entirely different kettle of fish, but it just made me think about it when I saw the first Storage Pod that Backblaze launched.

    Bear in mind that disks of every type at some point have to make their way from the manufacturer to a distributor to a reseller to the end-user - at every step of the way there are people handling the disks that are unlikely to appreciate or care about the relatively fragile item within. Fortunately they're pretty hardy when they're not spinning.

    Good on them for tracking and publishing the stats, they certainly have no obligation to do so. Whilst I think their interpretation of the stats and the implication arising from it has some gaps, I think the main issue is that so many jump to conclusions about the data without doing their homework - for an uninitiated majority it's easier to read half way through a third-hand write-up and just say "Seagate is rubbish, clearly" than it is to think a bit deeper about it.

    It's analogous to publishing stats showing that Pirelli P Zero road tyres disintegrate within 1000 miles when used for rallying, and readers then thinking, well that's it, no more Pirelli tyres for me (I may have just bought a new set of tyres, but I wasn't rallying and they lasted more than 1000 miles thankfully)
     
    Last edited: 18 Jun 2015
  14. Byron C

    Byron C And now a word from our sponsor

    Joined:
    12 Apr 2002
    Posts:
    6,688
    Likes Received:
    1,541
    I always take things like this with a pinch of salt. I've seen lots of reports, articles, stats, etc about long term hard drive reliability - or the general lack thereof - but to be quite honest the only time I've experienced any data loss is when I bugger it up and accidentally delete something.

    I'm putting together a sort of home lab (VoIP, LAMP, Domain, centralised storage, "sandpit" SQL environment), I plan to use whatever cheap drives I can get my hands on - whether they're SATA or SAS - and stick them in RAID-5 arrays. If a disk goes down then I have redundancy and the array can be rebuilt with a replacement disk; I might even consider hot spares if I can get my hands on enough disks. As long as the disks are at least 150GB or so then I'm sorted - if heavy-duty storage is offloaded to another box then OS drives/arrays don't need to be huge.

    Storage is cheap. Chances are that my data will be transferred to newer disks with higher capacity before the disk gets to the point where I'm getting read errors. I thought a couple of my drives were starting to have issues but it turns out that I'd just totally borked my media server install - I've tested (as in, diagnostics testing) the same drives with a new OS build and they're rock solid.

    The bigger concern for me frankly is backup: I've got an increasing amount of data that isn't practical to store as hardcopy, so to whom/what do I entrust all the data I really *really* don't want to lose? A cloud provider where privacy/reliability/longevity isn't guaranteed? Do I rent a VPS or hosted server? Do I roll my own long-term backup solution at home and worry about enterprise vs. consumer drives and MTBF ratings? If I'm honest I probably don't have as much *truly* critical data as I think I do, but it's still probably in the order of hundreds of GB.
     
  15. edzieba

    edzieba Virtual Realist

    Joined:
    14 Jan 2009
    Posts:
    3,909
    Likes Received:
    590
    Just a head's up: For disks 2Tb and up, RAID5 is a BAD IDEA. The Uncorrectable Bit Error rate is then high enough that there is an uncomfortably high chance of hitting a UBR during array rebuild*, which is a lot of cases means kiss your array goodbye. RAID6 or it's software equivalent, e.g. RAIDZ2) or any other double-parity solution should be the preferred choice, in order to handle errors during array rebuild where a single-parity solution would be vulnerable.



    * General consumer UBR is 1 in 10^14 bits, or 12.8TB (or 11.4TiB). e.g.. for a RAID5 array of 4x 3TB drives, you're approaching a 75% chance of things going pear-shaped during a rebuilt of a full array. UBR rates are a 'worst case' rather than an average rate, but that's well outside of comfortable probabilities for something you want to treat as having data redundancy rather than just operational redundancy.
    If you're in an enterprise environment and only need your array to keep working for a few hours while you image a new array from a backup in order to failover, RAID5 is fine. If you don't have another array lying around AND a separate backup to populate that array with in order to avoid having to rebuild altogether, RAID6 is a safer bet.
     
  16. Byron C

    Byron C And now a word from our sponsor

    Joined:
    12 Apr 2002
    Posts:
    6,688
    Likes Received:
    1,541
    I think you miss my point, the point is to use cheap disposable disks and run a daily/weekly/whatever backup. Even if I can't rebuild an array then data should already be backed up. No way in hell I'll be using even 1TB disks, let alone 2TB :)
     
  17. Mister_Tad

    Mister_Tad Will work for nuts Super Moderator

    Joined:
    27 Dec 2002
    Posts:
    12,945
    Likes Received:
    1,214
    Just be careful that you're not getting complacent because of a perceived safety net with RAID5 - failures during rebuild are very real and restoring from a backup from who knows when which may or may not be valid is a very different proposition to rebuilding from a failure.

    It's a shame there's not yet a decent declustered RAID option for small scale use - I was hoping there might be something in W2016, but that doesn't seem to be the case.
     
  18. Parge

    Parge the worst Super Moderator

    Joined:
    16 Jul 2010
    Posts:
    12,935
    Likes Received:
    563
    I have to say, for what it's worth (not much) the back blaze tests corroborate with my personal experience. Seagate drives fail more than any other, my WDs and Hitachis have always been rock solid.

    Anyway, it'll all be over in two years or so, as SSDs will have eclipsed the size of HDs and offer a comparable cost per GB.
     
  19. Corky42

    Corky42 Where's walle?

    Joined:
    30 Oct 2012
    Posts:
    9,648
    Likes Received:
    386
    Wow you've got deep pockets to be saying SSDs offer a comparable cost per GB, either that or I'm just a poor old tramp. :D
     
  20. Parge

    Parge the worst Super Moderator

    Joined:
    16 Jul 2010
    Posts:
    12,935
    Likes Received:
    563

    *Will* - being the operative word. Not today, but in two years or so.

    Have a read of this
     

Share This Page