1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

News Apple confirms switch to Arm-based architecture

Discussion in 'Article Discussion' started by bit-tech, 23 Jun 2020.

  1. edzieba

    edzieba Virtual Realist

    Joined:
    14 Jan 2009
    Posts:
    3,909
    Likes Received:
    591
    Note that both of these are a result of "shove eleventy billion cores at a task you wrote/rewrote specifically to target those cores in massive parallel". A situation that does not hold for use in consumer devices: even if we ignore all existing OS X applications built for x86 under the assumption Apple will simply tell you to shove it if you ever want to run legacy code beyond a year or two after the transition (i.e. what they did last time they moved arch), then we're still stuck with the issue that after a decade and a half of dual-core x86 CPUs, a decade of quad (or more) cores, and easily a decade of multi-core CPUs being the norm (including on consoles, for those who love to blame them as 'holding PCs back'), software continues to be largely single threaded with very diminishing returns to the parts that are threaded. Because software on a consumer device is almost entirely software that you can't just point at a big dataset and wander off for a few hours, or have tens to hundreds of people using simultaneously, but instead is software that interacts with the user to perform serial tasks, i.e. it's all Amdahl's Law scaling. To achieve Gustafson's Law scaling requires both a task that can be expanded to fit the available resources (e.g. adding more cores will not making opening a folder any faster, only allow you to open more folders at the same time) and the ability to feed your CPU with the additional data to perform more tasks at once (why so much focus on HPC is in memory bandwidth and high-speed interconnects).
    Notable exceptions on desktop machines are CPU rendering (mostly being offloaded to GPU now, 'cuz it's faster), and video encoding (for real-time like streaming this gets shoved to FFBs to minimise any CPU or GPU impact, but a factor for offline encoding).
    The upshot being that for perceptually equivalent performance in existing tasks, you can't just drop in 4/8/etc ARM cores for every x86 core, even if you can convince every developer for your platform to re-write their software for both a new CPU arch and in massive parallel.
     
  2. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    17,129
    Likes Received:
    6,717
    ...except they aren't, because the EC2 instances don't have eleventy billion cores. You can just rent one.

    For the overwhelming majority of desktop users, a high-end Arm core is functionally equivalent to an AMD64 core. Having a 20GHz Intel Core i-69 doesn't make Facebook load any faster, or win you extra points in Candy Crush. For professionals, a small edge-case, sure - but those professionals can also benefit from more cores. I know that for my workloads, I'd rather have eight middling cores than four faster cores - I'll get done a lot quicker.

    (I mean, I literally made this choice: I bought an eight-core AMD Ryzen over a faster-single-threaded-performance Intel chip. Bonus: my next laptop is Arm-based. It'll be delivered in December, with a following wind.)
    [CITATION NEEDED]

    My web browser runs multiple threads; my office suite runs multiple threads; when I'm compiling software, you bet your sweet hiney I'm -j16ing that puppy; both my raster and vector image editors use multiple threads; my compression software uses multiple threads; my video rendering software uses multiple threads...

    While I've got you: fancy popping your head into that VAT thread?
     
  3. edzieba

    edzieba Virtual Realist

    Joined:
    14 Jan 2009
    Posts:
    3,909
    Likes Received:
    591
    Dodging propriety: A multi-thousand device rollout (not sure what the contract figure was, but easily into 8 figures) to replace the high-core-count dual-socket workstations, because the 'back office' 4-core single-socket machines were consistently running rings around them in actual day-to-day performance rather than benchmarks. Nobody has asked to retain their HCC boxes, nobody has asked for their HCC boxes back or new (single CPU or otherwise) HCC boxes to be deployed as special orders. Install base covers a pretty good swathe of tasks from operational use to web dev to app dev to video production (admittedly most of the heavy lifting there is offloaded to dedicated broadcast hardware) to HPC dev/test to financial operations. So in my professional experience, real-world tasks that take advantage of high core counts are few and far between for boxes-that-sit-under-the-desk.
    We also trialed each generation of not-iPad ARM devices, from the original WinRT series onwards (and 'onwards' today if you catch my drift). They all end up in the bin within a few days. Cheap is no good if everyone hates using them so much they'll pick a crappy remote instance in preference to even worse local performance (and half their stuff unable to even launch).
     
  4. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    17,129
    Likes Received:
    6,717
    Ahem:
    Also: sounds like you need better software on your workstations - it's normally the workstation-class stuff that benefits most from additional cores.
    Ahem again:
    None of those real-world, then? 'cos... well, I appear to be in the real world. (Though, given what's going on in 2020, you'd be forgiven for thinking we'd somehow all been transported to The Darkest Timeline.)

    I have a neat trick for boosting single-threaded performance, too: if I need to do a bunch of single-threaded stuff - like, I dunno, Google Guetzli'ing a bunch of images, 'cos I never did investigate that pull request someone sent in for offloading it to the GPU - I run it through GNU Parallel. Makes it about NUMBER_OF_CORES-times faster, didn't need any of the software rewriting.

    Now, about that VAT thread...
     
  5. Xlog

    Xlog Minimodder

    Joined:
    16 Dec 2006
    Posts:
    714
    Likes Received:
    80
    Most of those examples are "task multithreading" - as are most programs, things like GUI, some internal workloads (like each tab in browser), etc are split into each thread, but tasks themselves are pretty much serial - ie javascript engine in browser, or GCC, yes you can pass "-j16" to make, but that will only means that it will spawn up to 16 instances, if you need to compile one big lib that everything else uses, back to single core utilization. Can we rewrite those tasks to be multi threaded? Maybe - though as edzieba already said, most fall into Amdahl's Law. That and wast majority of programmers simply cant think in or code true multithreaded stuff.
     
  6. edzieba

    edzieba Virtual Realist

    Joined:
    14 Jan 2009
    Posts:
    3,909
    Likes Received:
    591
    Sadly, workstations need to run the software the users actually need to run, not some hypothetical software that should work the way chip manufacturers would like it to. Workstations do work after all, not run a decade old build of the Cinebench benchmark suite.
    That's great for you. But if your task is linear, splitting it into discrete units just means those units have to sit idle waiting for the previous one to complete. Parallelism gains from a task that is only a small fraction parallel is nowhere close to a linear core scaling (e.g. if a task is only 50% parallel, then the best speedup you can ever hope for no matter how many cores you have is 2x).

    Either way, we'll see if this latest iteration of "this time, ARM for client computing won't suck!" will turn out like every previous time: hyped as the best thing since sliced bread right up until devices actually end up in the hands of people that need to use them to do things.
     
  7. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    17,129
    Likes Received:
    6,717
    To the end-user, though, the two are indistinguishable. And if an eight-core chip is eight times faster for the stuff you can spawn multiple workers for, its likely to come out ahead even if it's half as fast for single-core operations.

    Take my PDF viewer, for example. It's about eight times faster running on an eight-core processor than on a single-core processor, because it's spawning eight workers to each render one-eighth of the page. Is it truly multithreaded? No. Is it still eight times faster? Yes. Does the end user care about why it's eight times faster? Nope. Does it need rewriting to be "properly" mulithreaded? Eh, why bother.

    You're right about, say, the JavaScript engine of a browser being a serial doofer - but I challenge anyone to actually tell the difference by eye between that JavaScript engine doing its doings on a Raspberry Pi 4 and on a high-end gaming rig. The bottleneck's the network connection, not the processor.
    No, but they run stuff that scales to multiple cores nicely - why do you think people buy many-core workstations?
    Oh, I see how it is: a many-core Arm can't run whatever cockamamie software you use very well, and that's proof Arm on the desktop (workstation) will never work; I point out it'll run my stuff really really well, and that's just "great for me."

    How many people are using your workstation software compared to, say, an office suite?
    Set a calendar entry for... five years, check back to see if Apple's still using Arm and still selling roughly the same or better volume? Fiver plus inflation to the charity of the winner's choice?

    While we're waiting... The VAT thread?
     
    Last edited: 7 Aug 2020
  8. edzieba

    edzieba Virtual Realist

    Joined:
    14 Jan 2009
    Posts:
    3,909
    Likes Received:
    591
    The end user cares about execute time, not throughput per fixed unit time. If an 8-core device can only speed up a task 2x (limit for a 50% parallelism task), and a 2-core device can sped up a task 2x, then adding more cores does not produce any net benefit for the user. But if those 8 cores are individually slower, then you end up in the situation where the 8-core device's total task execution time is longer then the 2-core device, because the floor on total task execution time is the single threaded component of the task.
    e.g. task is to boil an egg and butter 8 soldiers. You can split your buttering task into 8 'threads' that can be executed at once, but your boil-an-egg task cannot be accelerated no matter how many saucepans or eggs you have available. That then becomes the execution time floor for for the total task. If your task is only to boil a single egg, the capability to boil 8 eggs at once is of no practical value to you.
    Same reason they buy LCC workstations: people spec and purchase workstations to the tasks they perform, or they end up paying more for a slower box.
    The proof is that ARM on the desktop has flopped hard, repeatedly. Trying the same "oh, the cores are slower, but we have more of them and lower power!" over and over doesn't change things.
    I'd say on the order of 5-7k on workstations, plus another 15-20k for general BO tasks (mostly office suite plus some more esoteric Excel & PP data source plugins). The BO guys were never on HCC machines, and those who ended up with them because of legacy or department-move wanted their LCC machines back! That's for UK which I have visibility of, RoW is probably an order of magnitude above that but the story I've heard from other regions is the same (as are purchasing changes which are global level).
    If we limit that to comparable devices (i.e. laptops, desktop, and AIOs, no "oh, but people are buying iPad Pros now!") then I'll take that bet in terms of pure devices shipped volumes.
     
  9. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    17,129
    Likes Received:
    6,717
    That's a lot of "ifs", there.

    My bitmap image editor runs most operations eight times faster when set to eight threads/workers than when set to one. That's a NUMBER_OF_CORES gain.

    My vector image editor runs most operations eight times faster when set to eight threads/workers than when set to one. That's a NUMBER_OF_CORES gain.

    My video renderer finishes eight times faster when set to eight threads/workers than when set to one. That's a NUMBER_OF_CORES gain.

    My PDF viewer displays a page eight times faster when set to eight threads/workers than when set to one. That's a NUMBER_OF_CORES gain.

    Does my word processor run eight times faster? No, it doesn't. Is my word processor the thing that I spend my time watching a progress bar in? No, it's not - it operates way faster than I type, and has done ever since I graduated from Mini-Office II on a Commodore 128.
    So "no", then, there weren't more people using your poorly-optimised software than an office suite. And I wasn't asking about your business, apologies for the lack of clarity there: I meant globally.
    You've heard lots of stories - like not being able to claim raw materials as a business expense on your VAT return, for example...
    Yup, that was what I was after: the iPad stuff is already Arm. The terms are simple: if, in five years, Apple hasn't switched away from Arm to something else (c'mon, RISC-V, this could be your moment!) and hasn't appreciably shrunk its volume of desktops, laptops, all-in-ones, and whatever you'd call the Mac Mini, I win; if it switches before the five years is up, or if its volume over the five years has noticeably decreased - I'm not talking tiny year-on-year variance here - then you win.

    I'll pop it in my calendar! (That's gonna confuse the hell out of me when the reminder goes off in five years...)

    upload_2020-8-7_13-47-59.png

    EDIT: Happy using these Statista figures? If so, the number to beat is 5.3 million in Q4 '19 (so we'll be comparing Q4 '24.)

    EDIT EDIT: Oh, and if Arm China ends up bringing down the rest of Arm, or Samsung/Nvidia/A. N. Other buys it up and decides to stop licensing the IP to third parties, or if Arm goes belly-up 'cos we're all running RISC-V instead, the bet's a wash - Apple has to choose to move from Arm, not be forced into it by Arm going away.
     
    Last edited: 7 Aug 2020
    monty-pup likes this.
  10. Anfield

    Anfield Multimodder

    Joined:
    15 Jan 2010
    Posts:
    7,061
    Likes Received:
    970
    5 years?

    Fine, I'll throw in a blinder of they'll start adding non ARM cores to the ARM cores in a mix and match design.
     
  11. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    17,129
    Likes Received:
    6,717
    Ooh, I like it!

    So: Apple leaves Arm or *should* leave Arm 'cos sales have dropped - @edzieba wins.
    Apple adds non-Arm cores to the chip (not counting something like the T5 security chip core or other minor system-management thing, it has to be a proper core exposed to the end user for computational work) - @Anfield wins.
    Apple sticks with Arm, sales remain flat or go up: I win.
    Arm goes bust or otherwise can't provide IP to Apple any more: wildcard, nobody wins and we all throw a fiver at charity.
     
    edzieba likes this.
  12. edzieba

    edzieba Virtual Realist

    Joined:
    14 Jan 2009
    Posts:
    3,909
    Likes Received:
    591
    I'd expect Apple to sooner shutter their non-iOS lines before switching away from ARM if sales volumes drop (e.g. the death of the Macbook in the face of the Macbook Air). Maybe leaving the Mac Pro as-is for another 6-7 years untouched like with the last Mac Pro, and slowly losing lines as they go (I could see the Macbook and iMac lines being pared down to a single model each, and the Mac Mini just vanishing again).
     
Tags: Add Tags

Share This Page