News Researchers boost graphics performance through processing in-memory

Discussion in 'Article Discussion' started by Gareth Halfacree, 21 Apr 2017.

  1. Gareth Halfacree

    Gareth Halfacree WIIGII! Staff Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    9,628
    Likes Received:
    366
  2. SinxarKnights

    SinxarKnights Member

    Joined:
    21 Jan 2007
    Posts:
    236
    Likes Received:
    2
    Pretty sweet stuff. Seems like bandwidth would be the limiting factor as is with current GPUs. Then again I don't know what they are doing exactly. I need to check out the paper when it is released and see how they did it.

    Can you keep us posted about this Gareth?
     
  3. perplekks45

    perplekks45 LIKE AN ANIMAL!

    Joined:
    9 May 2004
    Posts:
    5,099
    Likes Received:
    121
    Seconded.
     
  4. IamSoulRider

    IamSoulRider Member

    Joined:
    24 Aug 2016
    Posts:
    61
    Likes Received:
    0
    "the team's work is based on the increasingly common 3D stacked memory modules available on high-end graphics hardware."

    I'd assume that would be HBM2, possibly first gen HBM. In that case Memory Bandwidth should be High.

    Do you see what I did there? :p
     
  5. SinxarKnights

    SinxarKnights Member

    Joined:
    21 Jan 2007
    Posts:
    236
    Likes Received:
    2
    Doesn't answer the question though. I imagine even HBM3 would be a significant bottleneck processing instructions in memory instead of directly on die.

    But like I said, I don't know what exactly they are doing. Need that paper to check it out.
     
  6. edzieba

    edzieba Virtual Realist

    Joined:
    14 Jan 2009
    Posts:
    2,082
    Likes Received:
    55
    Other way around: this would alleviate memory-bandwidth-limited operations (i.e. those that nee to operate on a lot of data, but the operations themselves are very basic) by pushing those operations out to the memory itself, so that data never needs to cross the memory bus in the first place.
     
  7. Wakka

    Wakka Yo, eat this, ya?

    Joined:
    23 Feb 2017
    Posts:
    892
    Likes Received:
    77
    I'm nowhere near smart enough to know how this stuff works in detail, but how would memory chips process that kind of data? I mean, they're memory chips - surely they are designed to either store something, or pass it along to a smarter chip?

    Wouldn't you be a bit pissed if you were an nvidia or AMD engineer and someone came along and was like "we can make things faster by moving instructions OFF those fancy multi-billion transistor GPU's!"...
     
  8. edzieba

    edzieba Virtual Realist

    Joined:
    14 Jan 2009
    Posts:
    2,082
    Likes Received:
    55
    The storage dies themselves are 'just memory' But for HBM stacks to work at all, the bottom element in the stack is a processing die to handle interface between the memory dies and the memory bus. What the researchers have done is to augment the existing processing die to allow it to do basic computations on the memory traffic it is already handling.
     
  9. Corky42

    Corky42 What did walle eat for breakfast?

    Joined:
    30 Oct 2012
    Posts:
    7,784
    Likes Received:
    114
    @Wakka...
    Basically they added a small ASIC that they could send a command to that said something like perform anisotropic filtering on data held in memory at X location.
     
  10. Cr@1g

    Cr@1g New Member

    Joined:
    19 Oct 2011
    Posts:
    16
    Likes Received:
    0
    Im wondering if AMD's new approach with HBM2 along with ts HBC creates a 512TB virtual address space and is made for this way of thinking?
     
  11. Gareth Halfacree

    Gareth Halfacree WIIGII! Staff Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    9,628
    Likes Received:
    366
    To clear up some misconceptions in the comments - and apologies if the article was unclear:

    The technique works, as Ed and Corky have both mentioned, by adding a processing element to each memory stack which is capable of working directly on data stored in said memory. So, instead of the GPU having to read 8GB (or whatever) of data, do its thing, and write the 8GB back again, the processing happens on the memory directly - hence 'in-memory processing.' It's not a new idea, but it's the first time I've seen it applied to graphics processing with practical results.

    As Ed mentioned, it's the exact opposite of bandwidth-dependent: the data doesn't go anywhere, so where the GPU can only work on the contents of the memory at the throughput of the memory bus the in-memory processing system can operate however quickly the memory itself works at - and in parallel, too, meaning if you've got eight stacks of memory you can do your processing eight times faster than if you had one stack of memory without worrying about saturating any buses.

    You're limited in what you can do, though: the die space and power envelope for adding a logic layer to stacked memory are both way, way smaller than for a GPU - so you can't have anything general-purpose going on there. Hence the proof-of-concept: a logic layer that only does one thing, anisotropic filtering - something which is fairly simple computationally but that requires massive memory bandwidth. With that tiny bit of extra processing power, you're lightening the load on the GPU by a percent or two at most - but because you're no longer bottlenecked by the memory bus you're increasing the performance by 65 percent.

    Step one of commercialisation: task offload acceleration, by adding anisotropic filtering logic to the memory stacks (or whatever task ends up making sense to offload - there may be something else that would give even bigger performance increases in modern gaming engines.)

    Step two: add more logic layers. As well as your anisotropic filter logic layer, why not stick a - I don't know - bump-mapping layer on there? Keep adding layers until you can't fit any more on there.

    Step ???: by now your GPU is basically just there to tell the memory stacks what they should be doing, so you've effectively created a fundamentally new architecture. Instead of an ultra-powerful GPU talking to dumb memory, your graphics card is now a dumb and lightweight central controller talking to ultra-powerful in-memory processors. Likely? Who knows; the technique has to survive the prior steps first.

    As for the paper, I'll drop the guys an email and see if there's a timescale on public access - or, given that it's DoE funded, whether it'll ever be publicly accessible.
     
  12. greigaitken

    greigaitken Member

    Joined:
    26 Aug 2009
    Posts:
    381
    Likes Received:
    2
    @ GH
    you just don't get this kind of analysis on the bbc tech section?
     
  13. Gareth Halfacree

    Gareth Halfacree WIIGII! Staff Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    9,628
    Likes Received:
    366
    I'm Ron Burgundy?
     
  14. Gareth Halfacree

    Gareth Halfacree WIIGII! Staff Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    9,628
    Likes Received:
    366
    Got in touch with Shuaiwen Leon Song at PNNL, and he's out travelling this week but he's going to swing back around next with with as much additional information as he can gather. Should be interesting!
     
  15. perplekks45

    perplekks45 LIKE AN ANIMAL!

    Joined:
    9 May 2004
    Posts:
    5,099
    Likes Received:
    121
    Cheers, Gareth. Much appreciated! :thumb:
     

Share This Page