News New GPGPU approach promises 20 per cent performance boost

brumgrunt · 8 Feb 2012

In a paper co-authored by AMD...

http://www.bit-tech.net/news/hardware/2012/02/08/gpgpu-performance-boost/1

Hustler · 8 Feb 2012

Whats this?...more promises of Jam tomorrow from AMD hardware?

...sigh

debs3759 · 8 Feb 2012

I like that AMD are looking into newer and better ways of doing things. Fully integrating the CPU and GPU will, IMO, be a very good thing. We might even see AMD take back the lead in high-end computing in a few short years (Remember AMD64? You should, it's the basis for the instruction set in every 64-bit x86 CPU we see today!)

borandi · 8 Feb 2012

You act very sceptical in this news story. This is understandable. But any university that wants to work on stuff like this has to have a good partnership with some aspect of the manufacturer - e.g. NVIDIA have a lot of academic people around the word who act as reps for their institution, promoting GPGPU and exposing the members of their institutions to as much stuff as possible. As a result, those who collaborate with the academics (in this case, Mike Mantor) on a significant level are put in as co-authors even if they didn't directly do any research themselves, they contributed a lot to the outcome.

tonyd223 · 8 Feb 2012

but will it run Cry...

DbD · 8 Feb 2012

That 20% is a picked out the air figure, and seems very small if they actually want to make use of it.

There's a load of overhead required to set-up some bit of code to run on the gpu not the cpu. Equally only some bits of code are faster - if the lump of code is too small then the overhead of setting it up for gpu outweighs the benefit it gives. Hence that 20% figure is based on a guess for overhead, and a guess for code that would work well with gpu, all this done without real silicon - just a simulation of some *future* silicon. Being as the research is set-up to show this is possible (this is the result AMD will have wanted) 20% is a very low number to have come out with.

Then there's the other things - e.g. power usage. Done on a traditional cpu might have been 20% slower but it only required cpu, where as your 20% performance gain now requires all the extra complexity and power requirements of a gpu. Would it have been more efficient power wise to just have a more power hungry cpu that ran 20% faster?

I think this sort compute tends to work really well when you can allocate big lumps of code, or full jobs to some specialised hardware (e.g. video decode), but for just working on standard code in combination with cpu it doesn't looks so effective.

azazel1024 · 8 Feb 2012

On what kind of a workload? We already know that there are plenty of workloads that are single threaded only, or at most a bare 2-6 threads. The only time a GPU MAY help is in a scenario where the calculations/work you are doing has more threads than there are CPU cores (or with hyperthreading, really virtual cores to) to handle it. Even then, with the rather low frequency and low IPC of each individual "stream processor" in a GPU, you have to get to fairly high threaded calculations to see a performance benifit of GPU over CPU calculations. I have no idea the numbers, but I'd suspect we are talking at least double the threads of the CPU core count of a lot more.

Heck, look at H.264 encoding, which can be pretty massively threaded. High end discrete GPUs only manage something like 50-150% odd faster encoding than a high end Intel CPU...and they have to take short cuts which compromises image quality some. That is with like 800+ stream processors versus 4-6 real cores.

Sure there are things that make sense to process on a GPU with all its many cores, but most things in general computing are still best left to the CPU, or at most might gain a little assistance from having the GPU step in to handle some of the processing (example rendering a webpage).

However, how much actual performance gain is there on having the CPU and GPU on die, sharing L3 cache and workloads? As compared to say a discrete GPU sharing some of the processing with the CPU? I'd assume at least a small percent gain due to significantly lower latency and shared cache...but it is as much of a gain as the processing ability of a discrete GPU over an integrated GPU? After all current day AMD Llano top of the line GPU is not much better than a 6450 discrete GPU, basically the most bottom of the barrel. Even with shared cache and more main memory bandwidth I doubt the iGPU performance improves that much.

Now compared that with higher clock rates, much greater memory bandwidth and a significant number more stream processors of something like a 550 or 6670 even? Despite having much, much higher latency?

I see shared CPU/GPU processing on the core as important, but I really suspect that the numbers the researchers came up with are specious at best. I doubt that compares shared CPU to GPU computing with discrete GPU (even the exact same performance discrete GPU, but with the higher lag resulting from the discreteness).

FelixTech · 8 Feb 2012

I think it will be interesting if they can reduce or remove the time spent transferring data between CPU and GPU memories. There are many things that can be done well with GPUs, but where latency is prioritised over throughput the time for a return trip to the GPU is simply too big at the moment.

schmidtbag · 8 Feb 2012

amd and their partners are absolutely right in the sense that in order to get the best performance, the cpu and gpu need to work together as a single unit, but as long as x86 is the dominant architecture, amd can't expect programs to follow in their desired steps. intel is "happy" with their current cpu+gpu setup because they have plans to just keep improving performance without the need to increase clock speed or slap on more cores. both sandy bridge and ivy bridge are perfect examples of this.

if intel followed amd's decision, then i think we'd get a whole other world of computing, in a good way. but as long as intel doesn't want to follow amd, the idea won't take off. i feel like if any company wants to do the fused cpu and gpu idea, they might as well create an entirely new architecture from scratch, and we all know that isn't going to happen. what amd wants to do is effectively what SPARC and PPC currently do.

velo · 8 Feb 2012

DbD said:

That 20% is a picked out the air figure, and seems very small if they actually want to make use of it.
Click to expand...

"Using synthetic benchmarks, Zhou's team was able to show significant performance gains using the CPU-assisted GPU model. On average, benchmarks ran 21.4 per cent faster..."

Doesn't seem particularly air-like to me...

fluxtatic · 9 Feb 2012

azazel1024 said:

Heck, look at H.264 encoding, which can be pretty massively threaded. High end discrete GPUs only manage something like 50-150% odd faster encoding than a high end Intel CPU...and they have to take short cuts which compromises image quality some. That is with like 800+ stream processors versus 4-6 real cores.
Click to expand...

If you mean SB, you're a bit off - SB is using dedicated hardware for Quick Sync. Not that I mean your results are wrong, just the reasoning behind it. QS is ridiculously fast, yes, but it isn't directly the result of the magic of the SB arch itself, just that Intel saw fit to cram dedicated hardware to handle that job, and only that job, onto the die. Compare it to the previous Core arch, and the picture is a bit different.

I'm always a bit suspicious of this type of research, in that it isn't actual silicon they're working on. Rather, it's a model of some future arch that may never come to be. Or they tuned it, not even necessarily intentionally, to crank out results, not taking into account that what they ended up designing won't be practical as an actual CPU.

DbD said: ↑

That 20% is a picked out the air figure, and seems very small if they actually want to make use of it.
Click to expand...

Have you read the paper? I'm actually asking, as I haven't. If they were going to start pulling figures out of...some place, why go so low? Or, shall we put on our tinfoil hats and realize it's a cunning scheme - make it sound good, but not suspiciously good. Otherwise people won't believe it. What have they got to lose, with so many people shitting on them now?

Even here, it's starting to feel as if people want AMD to fail. You think Intel won't get lazy (and even more expensive) with zero competition? Intel, the company so dedicated to the enthusiast community they'll sell you insurance on your processor in case you blow it up. Never mind the fine print that essentially says they can wriggle out of the obligation to replace the hardware, leaving you with no recourse. If they're so dedicated to us, give me the crack dealer model: first one is always free. Kill it, they'll replace it no questions asked. But that's it. Blow up the replacement and you're back on Newegg or Scan like every other sucker. That's dedication to the community...and it isn't like they can't afford it. This market segment is such a tiny portion of their revenue, they could start a "send us a picture of the PC you're building and we'll send you the processor for free" program and you wouldn't even see a dent in their quarterly revenues.

On-topic, though, this is exciting. Rather than piss on their shoes, cheer them on and hope they're right. Last time they had something that was a great leap forward (one that succeeded, that is), the result is all of us using 64-bit processors. They were also the first in x86 using true, native multicore processors, as well.

Between this and Intel's Haswell announcement (http://arstechnica.com/business/new...emory-going-mainstream-with-intel-haswell.ars), this is a big day in hardware news - be happy!

Snips · 9 Feb 2012

How long ago was AMD64 again? Why is it always mentioned ever since everytime AMD release a dissappointed processor?

The word for today is "Simulated"

I'm sure every processor "on paper" performs like a demon, it's manufacturing the idea that AMD fall down.

Nexxo · 9 Feb 2012

Interesting concept. Wouldn't dismiss it just because AMD is the one experimenting with it. I remember a time when AMD beat the pants off Intel and I've been around long enough to know that whatever happened can happen again.

Guinevere · 9 Feb 2012

DbD said:

That 20% is a picked out the air figure
Click to expand...

Ahhh peer review is always at it's very very best when undertaken by someone unqualified and unwilling to read the paper.

Well done sir. Well done I say.

azazel1024 · 9 Feb 2012

fluxtatic said:

azazel1024 said:

Heck, look at H.264 encoding, which can be pretty massively threaded. High end discrete GPUs only manage something like 50-150% odd faster encoding than a high end Intel CPU...and they have to take short cuts which compromises image quality some. That is with like 800+ stream processors versus 4-6 real cores.
Click to expand...

If you mean SB, you're a bit off - SB is using dedicated hardware for Quick Sync. Not that I mean your results are wrong, just the reasoning behind it. QS is ridiculously fast, yes, but it isn't directly the result of the magic of the SB arch itself, just that Intel saw fit to cram dedicated hardware to handle that job, and only that job, onto the die. Compare it to the previous Core arch, and the picture is a bit different.

I'm always a bit suspicious of this type of research, in that it isn't actual silicon they're working on. Rather, it's a model of some future arch that may never come to be. Or they tuned it, not even necessarily intentionally, to crank out results, not taking into account that what they ended up designing won't be practical as an actual CPU.

DbD said: ↑

That 20% is a picked out the air figure, and seems very small if they actually want to make use of it.
Click to expand...

Have you read the paper? I'm actually asking, as I haven't. If they were going to start pulling figures out of...some place, why go so low? Or, shall we put on our tinfoil hats and realize it's a cunning scheme - make it sound good, but not suspiciously good. Otherwise people won't believe it. What have they got to lose, with so many people shitting on them now?

Even here, it's starting to feel as if people want AMD to fail. You think Intel won't get lazy (and even more expensive) with zero competition? Intel, the company so dedicated to the enthusiast community they'll sell you insurance on your processor in case you blow it up. Never mind the fine print that essentially says they can wriggle out of the obligation to replace the hardware, leaving you with no recourse. If they're so dedicated to us, give me the crack dealer model: first one is always free. Kill it, they'll replace it no questions asked. But that's it. Blow up the replacement and you're back on Newegg or Scan like every other sucker. That's dedication to the community...and it isn't like they can't afford it. This market segment is such a tiny portion of their revenue, they could start a "send us a picture of the PC you're building and we'll send you the processor for free" program and you wouldn't even see a dent in their quarterly revenues.

On-topic, though, this is exciting. Rather than piss on their shoes, cheer them on and hope they're right. Last time they had something that was a great leap forward (one that succeeded, that is), the result is all of us using 64-bit processors. They were also the first in x86 using true, native multicore processors, as well.

Between this and Intel's Haswell announcement (http://arstechnica.com/business/new...emory-going-mainstream-with-intel-haswell.ars), this is a big day in hardware news - be happy!
Click to expand...

I was refering to x86 encoding of h.264 compared to GPU 580 or 5870 encoding. Quick synch is faster than GPU h.264 encoding, and it appears to actually deliver on par or maybe better quality than GPU encoding. x86 CPU encoding delivers by far the best quality, though at speeds that are roughly half or so of faster GPU cards. However, if you look at power use...overall energy used for encoding might actually be better on a sandybridge processor than a GPU. If the high end GPU can do it twice as fast, but uses 3 times the power...

Anyway, my point is that as things stand this second, GPU encoding is nice, but it isn't a panacea. I think as it regards the APU that AMD and Intel seem to be putting together, with both moving further and further along the path of SOCs (Haswell pretty much will be a SOC with just about everything moved on die), a GPU is going to be "critical", but not nearly as good as discrete cards. Heck, just look at the die area of Nvidia and AMD high end discrete cards right now. Vaguely 300mm^2. That is about 50% bigger than Sandybridge, which is already using a big chunk for the GPU (roughly half? A third?). As process size shrinks, I think we'll see the GPU portion of the APU/CPU get bigger and bigger, however, it is likely to still be smaller and less powerful than what you'll find in discrete GPUs.

I do think at some point in the next 1-3 CPU generations (maybe by Haswell?) we'll see a complete disapperance of the low and maybe even the mid-low GPU markets. Ivy Bridge looks like it may be on par wtih a 6550 and AMDs GPU in Llano is just about on par with that as well. Haswell sounds like it is probably going to improve on Ivy Bridge anywhere from 25-100% and Trinity is likely to be better than Llano. Integrated GPUs are certainly improving faster than Discrete graphics are.

Two keys to integrated GPUs though is going to be closer integration with the CPU (Intels shared L3 cache, AMD deciding to implement a real L3/Shared cache) as well as a larger main memory pipeline and/or dedicated VRAM slots. However, Intel at least, and to a lesser degree AMD it seems, are moving toward lower power CPU/APUs. In part because of portable computing, but also because of the server space and desktop CPUs are mirroring this as well. So the discrete GPU is always going to be much more powerful, so long as you don't mind coughing up the money. Your average mid-range card with a TDP in the 100-150W range is going to be much more powerful than a combined CPU/GPU that total might have a TDP of 65-130w.

So integrated GPUs can accelerate somethings and be significantly lower latency than a discrete GPU. However, for raw processing power, a discrete GPU is still going to be head shoulders more powerful. It is really just going to be in situations where low latency is required that the iGPU is going to be better than a discrete card, or in a situation where there is no discrete card present, which the market is quickly moving toward as integrated GPUs start becoming "Good enough" for basic users, corporate computing and casual gamers. Heck at the rate of improvement they are going to be good enough for even heavy gamers who are on a budget or have lower resolution displays (I'd say give it 3-4 years and iGPUs are going to be able to handle 1080p with medium/high settings at >30FPS in basically all games, though hopefully by then the >20" monitor group is going to be standardizing on something more like >1500p).

Log in or Sign up

News New GPGPU approach promises 20 per cent performance boost

brumgrunt What's a Dremel?

Hustler Minimodder

debs3759 Was that a warranty I just broke?

borandi What's a Dremel?

tonyd223 king of nothing

DbD Minimodder

azazel1024 What's a Dremel?

FelixTech Robot

schmidtbag What's a Dremel?

velo What's a Dremel?

fluxtatic What's a Dremel?

Snips I can do dat, giz a job

Nexxo * Prefab Sprout – The King of Rock 'n' Roll

Guinevere Mega Mom

azazel1024 What's a Dremel?

Share This Page

Log in or Sign up

News New GPGPU approach promises 20 per cent performance boost

brumgrunt What's a Dremel?

Hustler Minimodder

debs3759 Was that a warranty I just broke?

borandi What's a Dremel?

tonyd223 king of nothing

DbD Minimodder

azazel1024 What's a Dremel?

FelixTech Robot

schmidtbag What's a Dremel?

velo What's a Dremel?

fluxtatic What's a Dremel?

Snips I can do dat, giz a job

Nexxo * Prefab Sprout – The King of Rock 'n' Roll

Guinevere Mega Mom

azazel1024 What's a Dremel?

Share This Page

Useful Searches