News Researchers boost graphics performance through processing in-memory

Gareth Halfacree · 21 Apr 2017

65 percent improvements claimed.
https://www.bit-tech.net/news/hardware/2017/04/21/graphics-processing-in-memory/1

SinxarKnights · 21 Apr 2017

Pretty sweet stuff. Seems like bandwidth would be the limiting factor as is with current GPUs. Then again I don't know what they are doing exactly. I need to check out the paper when it is released and see how they did it.

Can you keep us posted about this Gareth?

perplekks45 · 21 Apr 2017

Seconded.

IamSoulRider · 21 Apr 2017

"the team's work is based on the increasingly common 3D stacked memory modules available on high-end graphics hardware."

I'd assume that would be HBM2, possibly first gen HBM. In that case Memory Bandwidth should be High.

Do you see what I did there?

SinxarKnights · 21 Apr 2017

Doesn't answer the question though. I imagine even HBM3 would be a significant bottleneck processing instructions in memory instead of directly on die.

But like I said, I don't know what exactly they are doing. Need that paper to check it out.

edzieba · 21 Apr 2017

SinxarKnights said:

Pretty sweet stuff. Seems like bandwidth would be the limiting factor as is with current GPUs. Then again I don't know what they are doing exactly. I need to check out the paper when it is released and see how they did it.
Click to expand...

Other way around: this would alleviate memory-bandwidth-limited operations (i.e. those that nee to operate on a lot of data, but the operations themselves are very basic) by pushing those operations out to the memory itself, so that data never needs to cross the memory bus in the first place.

Wakka · 21 Apr 2017

I'm nowhere near smart enough to know how this stuff works in detail, but how would memory chips process that kind of data? I mean, they're memory chips - surely they are designed to either store something, or pass it along to a smarter chip?

Wouldn't you be a bit pissed if you were an nvidia or AMD engineer and someone came along and was like "we can make things faster by moving instructions OFF those fancy multi-billion transistor GPU's!"...

edzieba · 21 Apr 2017

Wakka said:

I'm nowhere near smart enough to know how this stuff works in detail, but how would memory chips process that kind of data? I mean, they're memory chips - surely they are designed to either store something, or pass it along to a smarter chip?

Wouldn't you be a bit pissed if you were an nvidia or AMD engineer and someone came along and was like "we can make things faster by moving instructions OFF those fancy multi-billion transistor GPU's!"...
Click to expand...

The storage dies themselves are 'just memory' But for HBM stacks to work at all, the bottom element in the stack is a processing die to handle interface between the memory dies and the memory bus. What the researchers have done is to augment the existing processing die to allow it to do basic computations on the memory traffic it is already handling.

Corky42 · 21 Apr 2017

@Wakka...

The Article said:

Song's team added logic layers able to work directly on the stored data, effectively turning each memory chip into a co-processor. Although the capabilities of the logic layer are limited compared to the far larger GPU, it was enough to see considerable improvements: By offloading anisotropic filtering to the modified memory processors the performance of tested games was boosted by up to 65 percent.
Click to expand...

Basically they added a small ASIC that they could send a command to that said something like perform anisotropic filtering on data held in memory at X location.

Cr@1g · 21 Apr 2017

edzieba said:

SinxarKnights said:

Pretty sweet stuff. Seems like bandwidth would be the limiting factor as is with current GPUs. Then again I don't know what they are doing exactly. I need to check out the paper when it is released and see how they did it.
Click to expand...

Other way around: this would alleviate memory-bandwidth-limited operations (i.e. those that nee to operate on a lot of data, but the operations themselves are very basic) by pushing those operations out to the memory itself, so that data never needs to cross the memory bus in the first place.
Click to expand...

Im wondering if AMD's new approach with HBM2 along with ts HBC creates a 512TB virtual address space and is made for this way of thinking?

Gareth Halfacree · 21 Apr 2017

To clear up some misconceptions in the comments - and apologies if the article was unclear:

The technique works, as Ed and Corky have both mentioned, by adding a processing element to each memory stack which is capable of working directly on data stored in said memory. So, instead of the GPU having to read 8GB (or whatever) of data, do its thing, and write the 8GB back again, the processing happens on the memory directly - hence 'in-memory processing.' It's not a new idea, but it's the first time I've seen it applied to graphics processing with practical results.

As Ed mentioned, it's the exact opposite of bandwidth-dependent: the data doesn't go anywhere, so where the GPU can only work on the contents of the memory at the throughput of the memory bus the in-memory processing system can operate however quickly the memory itself works at - and in parallel, too, meaning if you've got eight stacks of memory you can do your processing eight times faster than if you had one stack of memory without worrying about saturating any buses.

You're limited in what you can do, though: the die space and power envelope for adding a logic layer to stacked memory are both way, way smaller than for a GPU - so you can't have anything general-purpose going on there. Hence the proof-of-concept: a logic layer that only does one thing, anisotropic filtering - something which is fairly simple computationally but that requires massive memory bandwidth. With that tiny bit of extra processing power, you're lightening the load on the GPU by a percent or two at most - but because you're no longer bottlenecked by the memory bus you're increasing the performance by 65 percent.

Step one of commercialisation: task offload acceleration, by adding anisotropic filtering logic to the memory stacks (or whatever task ends up making sense to offload - there may be something else that would give even bigger performance increases in modern gaming engines.)

Step two: add more logic layers. As well as your anisotropic filter logic layer, why not stick a - I don't know - bump-mapping layer on there? Keep adding layers until you can't fit any more on there.

Step ???: by now your GPU is basically just there to tell the memory stacks what they should be doing, so you've effectively created a fundamentally new architecture. Instead of an ultra-powerful GPU talking to dumb memory, your graphics card is now a dumb and lightweight central controller talking to ultra-powerful in-memory processors. Likely? Who knows; the technique has to survive the prior steps first.

As for the paper, I'll drop the guys an email and see if there's a timescale on public access - or, given that it's DoE funded, whether it'll ever be publicly accessible.

greigaitken · 22 Apr 2017

@ GH
you just don't get this kind of analysis on the bbc tech section?

Gareth Halfacree · 22 Apr 2017

greigaitken said: ↑

@ GH
you just don't get this kind of analysis on the bbc tech section?
Click to expand...

I'm Ron Burgundy?

Gareth Halfacree · 28 Apr 2017

Got in touch with Shuaiwen Leon Song at PNNL, and he's out travelling this week but he's going to swing back around next with with as much additional information as he can gather. Should be interesting!

perplekks45 · 29 Apr 2017

Cheers, Gareth. Much appreciated!

Log in or Sign up

News Researchers boost graphics performance through processing in-memory

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

SinxarKnights Minimodder

perplekks45 LIKE AN ANIMAL!

IamSoulRider Minimodder

SinxarKnights Minimodder

edzieba Virtual Realist

Wakka Yo, eat this, ya?

edzieba Virtual Realist

Corky42 Where's walle?

Cr@1g What's a Dremel?

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

greigaitken Minimodder

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

perplekks45 LIKE AN ANIMAL!

Share This Page

Log in or Sign up

News Researchers boost graphics performance through processing in-memory

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

SinxarKnights Minimodder

perplekks45 LIKE AN ANIMAL!

IamSoulRider Minimodder

SinxarKnights Minimodder

edzieba Virtual Realist

Wakka Yo, eat this, ya?

edzieba Virtual Realist

Corky42 Where's walle?

Cr@1g What's a Dremel?

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

greigaitken Minimodder

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

perplekks45 LIKE AN ANIMAL!

Share This Page

Useful Searches