News Apple confirms switch to Arm-based architecture

edzieba · 7 Aug 2020

Gareth Halfacree said: ↑

For single-thread performance? Maybe not. For multithreading? Oh, absolutely: Amazon has its in-house Graviton and Graviton2 chips, which it claims offer its cloud computing customers "40% better price/performance over comparable x86-based instances"; if that's not proof enough Arm chips have grunt, the world's fastest (publicly-disclosed) supercomputer is powered by 48-core A64FX processors running at 2.2GHz. Okay, 2.2GHz might not sound that much, and there's "only" 48 cores per chip (plus two or four cores for background tasks)... but they're energy-efficient enough that the thing packs 152,064 of 'em. It's nearly three times faster than the closest competition, an IBM POWER9-based system.

Oh, and the IBM system uses Nvidia Volta GPUs for the bulk of its compute - the POWER9 cores are just there to feed the GPUs data; Fugaku is driven solely by the Arm chips, there aren't any GPUs or accelerators in there.
Click to expand...

Note that both of these are a result of "shove eleventy billion cores at a task you wrote/rewrote specifically to target those cores in massive parallel". A situation that does not hold for use in consumer devices: even if we ignore all existing OS X applications built for x86 under the assumption Apple will simply tell you to shove it if you ever want to run legacy code beyond a year or two after the transition (i.e. what they did last time they moved arch), then we're still stuck with the issue that after a decade and a half of dual-core x86 CPUs, a decade of quad (or more) cores, and easily a decade of multi-core CPUs being the norm (including on consoles, for those who love to blame them as 'holding PCs back'), software continues to be largely single threaded with very diminishing returns to the parts that are threaded. Because software on a consumer device is almost entirely software that you can't just point at a big dataset and wander off for a few hours, or have tens to hundreds of people using simultaneously, but instead is software that interacts with the user to perform serial tasks, i.e. it's all Amdahl's Law scaling. To achieve Gustafson's Law scaling requires both a task that can be expanded to fit the available resources (e.g. adding more cores will not making opening a folder any faster, only allow you to open more folders at the same time) and the ability to feed your CPU with the additional data to perform more tasks at once (why so much focus on HPC is in memory bandwidth and high-speed interconnects).
Notable exceptions on desktop machines are CPU rendering (mostly being offloaded to GPU now, 'cuz it's faster), and video encoding (for real-time like streaming this gets shoved to FFBs to minimise any CPU or GPU impact, but a factor for offline encoding).
The upshot being that for perceptually equivalent performance in existing tasks, you can't just drop in 4/8/etc ARM cores for every x86 core, even if you can convince every developer for your platform to re-write their software for both a new CPU arch and in massive parallel.

Gareth Halfacree · 7 Aug 2020

edzieba said: ↑

Note that both of these are a result of "shove eleventy billion cores at a task you wrote/rewrote specifically to target those cores in massive parallel".
Click to expand...

...except they aren't, because the EC2 instances don't have eleventy billion cores. You can just rent one.

For the overwhelming majority of desktop users, a high-end Arm core is functionally equivalent to an AMD64 core. Having a 20GHz Intel Core i-69 doesn't make Facebook load any faster, or win you extra points in Candy Crush. For professionals, a small edge-case, sure - but those professionals can also benefit from more cores. I know that for my workloads, I'd rather have eight middling cores than four faster cores - I'll get done a lot quicker.

(I mean, I literally made this choice: I bought an eight-core AMD Ryzen over a faster-single-threaded-performance Intel chip. Bonus: my next laptop is Arm-based. It'll be delivered in December, with a following wind.)

edzieba said: ↑

software continues to be largely single threaded with very diminishing returns to the parts that are threaded.
Click to expand...

[CITATION NEEDED]

My web browser runs multiple threads; my office suite runs multiple threads; when I'm compiling software, you bet your sweet hiney I'm -j16ing that puppy; both my raster and vector image editors use multiple threads; my compression software uses multiple threads; my video rendering software uses multiple threads...

While I've got you: fancy popping your head into that VAT thread?

edzieba · 7 Aug 2020

Gareth Halfacree said: ↑

[CITATION NEEDED]
Click to expand...

Dodging propriety: A multi-thousand device rollout (not sure what the contract figure was, but easily into 8 figures) to replace the high-core-count dual-socket workstations, because the 'back office' 4-core single-socket machines were consistently running rings around them in actual day-to-day performance rather than benchmarks. Nobody has asked to retain their HCC boxes, nobody has asked for their HCC boxes back or new (single CPU or otherwise) HCC boxes to be deployed as special orders. Install base covers a pretty good swathe of tasks from operational use to web dev to app dev to video production (admittedly most of the heavy lifting there is offloaded to dedicated broadcast hardware) to HPC dev/test to financial operations. So in my professional experience, real-world tasks that take advantage of high core counts are few and far between for boxes-that-sit-under-the-desk.
We also trialed each generation of not-iPad ARM devices, from the original WinRT series onwards (and 'onwards' today if you catch my drift). They all end up in the bin within a few days. Cheap is no good if everyone hates using them so much they'll pick a crappy remote instance in preference to even worse local performance (and half their stuff unable to even launch).

Gareth Halfacree · 7 Aug 2020

edzieba said: ↑

workstations,
Click to expand...

Ahem:

Gareth Halfacree said: ↑

desktop users
Click to expand...

Also: sounds like you need better software on your workstations - it's normally the workstation-class stuff that benefits most from additional cores.

edzieba said: ↑

real-world tasks that take advantage of high core counts are few and far between for boxes-that-sit-under-the-desk.
Click to expand...

Ahem again:

Gareth Halfacree said: ↑

My web browser runs multiple threads; my office suite runs multiple threads; when I'm compiling software, you bet your sweet hiney I'm -j16ing that puppy; both my raster and vector image editors use multiple threads; my compression software uses multiple threads; my video rendering software uses multiple threads...
Click to expand...

None of those real-world, then? 'cos... well, I appear to be in the real world. (Though, given what's going on in 2020, you'd be forgiven for thinking we'd somehow all been transported to The Darkest Timeline.)

I have a neat trick for boosting single-threaded performance, too: if I need to do a bunch of single-threaded stuff - like, I dunno, Google Guetzli'ing a bunch of images, 'cos I never did investigate that pull request someone sent in for offloading it to the GPU - I run it through GNU Parallel. Makes it about NUMBER_OF_CORES-times faster, didn't need any of the software rewriting.

Now, about that VAT thread...

Xlog · 7 Aug 2020

Gareth Halfacree said: ↑

My web browser runs multiple threads; my office suite runs multiple threads; when I'm compiling software, you bet your sweet hiney I'm -j16ing that puppy; both my raster and vector image editors use multiple threads; my compression software uses multiple threads; my video rendering software uses multiple threads...
Click to expand...

Most of those examples are "task multithreading" - as are most programs, things like GUI, some internal workloads (like each tab in browser), etc are split into each thread, but tasks themselves are pretty much serial - ie javascript engine in browser, or GCC, yes you can pass "-j16" to make, but that will only means that it will spawn up to 16 instances, if you need to compile one big lib that everything else uses, back to single core utilization. Can we rewrite those tasks to be multi threaded? Maybe - though as edzieba already said, most fall into Amdahl's Law. That and wast majority of programmers simply cant think in or code true multithreaded stuff.

edzieba · 7 Aug 2020

Gareth Halfacree said: ↑

Also: sounds like you need better software on your workstations - it's normally the workstation-class stuff that benefits most from additional cores.
Click to expand...

Sadly, workstations need to run the software the users actually need to run, not some hypothetical software that should work the way chip manufacturers would like it to. Workstations do work after all, not run a decade old build of the Cinebench benchmark suite.

Gareth Halfacree said: ↑

I have a neat trick for boosting single-threaded performance, too: if I need to do a bunch of single-threaded stuff - like, I dunno, Google Guetzli'ing a bunch of images, 'cos I never did investigate that pull request someone sent in for offloading it to the GPU - I run it through GNU Parallel. Makes it about NUMBER_OF_CORES-times faster, didn't need any of the software rewriting.
Click to expand...

That's great for you. But if your task is linear, splitting it into discrete units just means those units have to sit idle waiting for the previous one to complete. Parallelism gains from a task that is only a small fraction parallel is nowhere close to a linear core scaling (e.g. if a task is only 50% parallel, then the best speedup you can ever hope for no matter how many cores you have is 2x).

Either way, we'll see if this latest iteration of "this time, ARM for client computing won't suck!" will turn out like every previous time: hyped as the best thing since sliced bread right up until devices actually end up in the hands of people that need to use them to do things.

Gareth Halfacree · 7 Aug 2020

Xlog said: ↑

Most of those examples are "task multithreading" [...] Can we rewrite those tasks to be multi threaded? Maybe - though as edzieba already said, most fall into Amdahl's Law.
Click to expand...

To the end-user, though, the two are indistinguishable. And if an eight-core chip is eight times faster for the stuff you can spawn multiple workers for, its likely to come out ahead even if it's half as fast for single-core operations.

Take my PDF viewer, for example. It's about eight times faster running on an eight-core processor than on a single-core processor, because it's spawning eight workers to each render one-eighth of the page. Is it truly multithreaded? No. Is it still eight times faster? Yes. Does the end user care about why it's eight times faster? Nope. Does it need rewriting to be "properly" mulithreaded? Eh, why bother.

You're right about, say, the JavaScript engine of a browser being a serial doofer - but I challenge anyone to actually tell the difference by eye between that JavaScript engine doing its doings on a Raspberry Pi 4 and on a high-end gaming rig. The bottleneck's the network connection, not the processor.

edzieba said: ↑

Workstations do work after all, not run a decade old build of the Cinebench benchmark suite.
Click to expand...

No, but they run stuff that scales to multiple cores nicely - why do you think people buy many-core workstations?

edzieba said: ↑

That's great for you.
Click to expand...

Oh, I see how it is: a many-core Arm can't run whatever cockamamie software you use very well, and that's proof Arm on the desktop (workstation) will never work; I point out it'll run my stuff really really well, and that's just "great for me."

How many people are using your workstation software compared to, say, an office suite?

edzieba said: ↑

Either way, we'll see if this latest iteration of "this time, ARM for client computing won't suck!" will turn out like every previous time: hyped as the best thing since sliced bread right up until devices actually end up in the hands of people that need to use them to do things.
Click to expand...

Set a calendar entry for... five years, check back to see if Apple's still using Arm and still selling roughly the same or better volume? Fiver plus inflation to the charity of the winner's choice?

While we're waiting... The VAT thread?

edzieba · 7 Aug 2020

Gareth Halfacree said: ↑

To the end-user, though, the two are indistinguishable. And if an eight-core chip is eight times faster for the stuff you can spawn multiple workers for, its likely to come out ahead even if it's half as fast for single-core operations.
Click to expand...

The end user cares about execute time, not throughput per fixed unit time. If an 8-core device can only speed up a task 2x (limit for a 50% parallelism task), and a 2-core device can sped up a task 2x, then adding more cores does not produce any net benefit for the user. But if those 8 cores are individually slower, then you end up in the situation where the 8-core device's total task execution time is longer then the 2-core device, because the floor on total task execution time is the single threaded component of the task.
e.g. task is to boil an egg and butter 8 soldiers. You can split your buttering task into 8 'threads' that can be executed at once, but your boil-an-egg task cannot be accelerated no matter how many saucepans or eggs you have available. That then becomes the execution time floor for for the total task. If your task is only to boil a single egg, the capability to boil 8 eggs at once is of no practical value to you.

No, but they run stuff that scales to multiple cores nicely - why do you think people buy many-core workstations?
Click to expand...

Same reason they buy LCC workstations: people spec and purchase workstations to the tasks they perform, or they end up paying more for a slower box.

Oh, I see how it is: a many-core Arm can't run whatever cockamamie software you use very well, and that's proof Arm on the desktop (workstation) will never work; I point out it'll run my stuff really really well, and that's just "good for me."
Click to expand...

The proof is that ARM on the desktop has flopped hard, repeatedly. Trying the same "oh, the cores are slower, but we have more of them and lower power!" over and over doesn't change things.

How many people are using your workstation software compared to, say, an office suite?
Click to expand...

I'd say on the order of 5-7k on workstations, plus another 15-20k for general BO tasks (mostly office suite plus some more esoteric Excel & PP data source plugins). The BO guys were never on HCC machines, and those who ended up with them because of legacy or department-move wanted their LCC machines back! That's for UK which I have visibility of, RoW is probably an order of magnitude above that but the story I've heard from other regions is the same (as are purchasing changes which are global level).

Set a calendar entry for... five years, check back to see if Apple's still using Arm and still selling roughly the same or better volume? Fiver plus inflation to the charity of the winner's choice?
Click to expand...

If we limit that to comparable devices (i.e. laptops, desktop, and AIOs, no "oh, but people are buying iPad Pros now!") then I'll take that bet in terms of pure devices shipped volumes.

Gareth Halfacree · 7 Aug 2020

edzieba said: ↑

The end user cares about execute time, not throughput per fixed unit time. If an 8-core device can only speed up a task 2x (limit for a 50% parallelism task),
Click to expand...

That's a lot of "ifs", there.

My bitmap image editor runs most operations eight times faster when set to eight threads/workers than when set to one. That's a NUMBER_OF_CORES gain.

My vector image editor runs most operations eight times faster when set to eight threads/workers than when set to one. That's a NUMBER_OF_CORES gain.

My video renderer finishes eight times faster when set to eight threads/workers than when set to one. That's a NUMBER_OF_CORES gain.

My PDF viewer displays a page eight times faster when set to eight threads/workers than when set to one. That's a NUMBER_OF_CORES gain.

Does my word processor run eight times faster? No, it doesn't. Is my word processor the thing that I spend my time watching a progress bar in? No, it's not - it operates way faster than I type, and has done ever since I graduated from Mini-Office II on a Commodore 128.

edzieba said: ↑

I'd say on the order of 5-7k on workstations, plus another 15-20k for general BO tasks (mostly office suite plus some more esoteric Excel & PP data source plugins).
Click to expand...

So "no", then, there weren't more people using your poorly-optimised software than an office suite. And I wasn't asking about your business, apologies for the lack of clarity there: I meant globally.

edzieba said: ↑

the story I've heard
Click to expand...

You've heard lots of stories - like not being able to claim raw materials as a business expense on your VAT return, for example...

edzieba said: ↑

If we limit that to comparable devices (i.e. laptops, desktop, and AIOs, no "oh, but people are buying iPad Pros now!") then I'll take that bet in terms of pure devices shipped volumes.
Click to expand...

Yup, that was what I was after: the iPad stuff is already Arm. The terms are simple: if, in five years, Apple hasn't switched away from Arm to something else (c'mon, RISC-V, this could be your moment!) and hasn't appreciably shrunk its volume of desktops, laptops, all-in-ones, and whatever you'd call the Mac Mini, I win; if it switches before the five years is up, or if its volume over the five years has noticeably decreased - I'm not talking tiny year-on-year variance here - then you win.

I'll pop it in my calendar! (That's gonna confuse the hell out of me when the reminder goes off in five years...)

EDIT: Happy using these Statista figures? If so, the number to beat is 5.3 million in Q4 '19 (so we'll be comparing Q4 '24.)

EDIT EDIT: Oh, and if Arm China ends up bringing down the rest of Arm, or Samsung/Nvidia/A. N. Other buys it up and decides to stop licensing the IP to third parties, or if Arm goes belly-up 'cos we're all running RISC-V instead, the bet's a wash - Apple has to choose to move from Arm, not be forced into it by Arm going away.

Anfield · 7 Aug 2020

5 years?

Fine, I'll throw in a blinder of they'll start adding non ARM cores to the ARM cores in a mix and match design.

Gareth Halfacree · 7 Aug 2020

Anfield said: ↑

Fine, I'll throw in a blinder of they'll start adding non ARM cores to the ARM cores in a mix and match design.
Click to expand...

Ooh, I like it!

So: Apple leaves Arm or *should* leave Arm 'cos sales have dropped - @edzieba wins.
Apple adds non-Arm cores to the chip (not counting something like the T5 security chip core or other minor system-management thing, it has to be a proper core exposed to the end user for computational work) - @Anfield wins.
Apple sticks with Arm, sales remain flat or go up: I win.
Arm goes bust or otherwise can't provide IP to Apple any more: wildcard, nobody wins and we all throw a fiver at charity.

edzieba · 10 Aug 2020

I'd expect Apple to sooner shutter their non-iOS lines before switching away from ARM if sales volumes drop (e.g. the death of the Macbook in the face of the Macbook Air). Maybe leaving the Mac Pro as-is for another 6-7 years untouched like with the last Mac Pro, and slowly losing lines as they go (I could see the Macbook and iMac lines being pared down to a single model each, and the Mac Mini just vanishing again).

Log in or Sign up

News Apple confirms switch to Arm-based architecture

edzieba Virtual Realist

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

edzieba Virtual Realist

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

Xlog Minimodder

edzieba Virtual Realist

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

edzieba Virtual Realist

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

Anfield Multimodder

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

edzieba Virtual Realist

Share This Page

Log in or Sign up

News Apple confirms switch to Arm-based architecture

edzieba Virtual Realist

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

edzieba Virtual Realist

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

Xlog Minimodder

edzieba Virtual Realist

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

edzieba Virtual Realist

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

Anfield Multimodder

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

edzieba Virtual Realist

Share This Page

Useful Searches