Graphics AMD’s DirectX 12 Advantage Explained – GCN Architecture More Friendly To Parallelism

Harlequin · 7 Sep 2015

things are moving so fast, that the comment you linked is now redundant - they can now run 10 wavefronts....

and the 970 does use segmented ram - NVidia said so themselves (on the final cuda block)

edit:

If my assumptions are correct, this still means that GCN is a lot more powerful, where Nvidia could schedule at most 992 (31*32) "draw calls" (well, "async compute tasks" actually), AMD would go up to 8192 (64*128) draw calls being executed in parallel.
Click to expand...

would seem that even AOTS isn't pushing the GCN 1.2

loftie · 7 Sep 2015

Harlequin said: ↑

yes that's correct - each ACE is single threaded and have `HT` for them so they can run 2 hardware queues - so 8 ACE , each with 4 GCN (total = 32 GCN) can run 64 hardware queues by default.
Click to expand...

Now you've thoroughly lost me! You sure you're not getting GCN 1.0 mixed with GCN 1.1/1.2?

The 7970 has 2 ACE, they are both single threaded. Surely that means they can only handle 1 queue each? The quoted paragraph, and this is how i interpreted it, says

Devices in the Southern Islands families typically have two ACEs. The ACE
engines on the Southern Islands families are single-threaded, which means that
they
Click to expand...

IMO the they is referring to the Southern Island Devices, not the ACEs

contain two hardware queues.
Click to expand...

GCN 1.1/1.2 have, up to, 8 ACE each containing 8 queues. E.G. the 290x has 8 ACEs, each having 8 queues (threads?) for a total of 64 queues.

The other reason I think that GCN 1.0 only has 2 queues is, on that Anandtech article where everyone is up in arms about the GCN/Maxwell comparison, Zlatan didn't correct Ryan on GCN 1.0 he left that as 2 Compute.

Anyone else notice the AMD PDF said ACE engines? (Asynchronous Compute Engine engines )

Corky42 · 7 Sep 2015

Harlequin said: ↑

things are moving so fast, that the comment you linked is now redundant - they can now run 10 wavefronts....
Click to expand...

Wavefronts?
Was that directed at loftie because I don't have a clue what a wavefront is, other than the weather variety.

Harlequin said: ↑

and the 970 does use segmented ram - NVidia said so themselves (on the final cuda block)
Click to expand...

Yes, but people were using it to not only to detect the segmented RAM but also to benchmark it, something the program wasn't designed to do, IIRC it bypassed the memory management in Nvidias DX11 drivers and directly addressed the card using CUDA (Nvidia's own proprietary parallel computing platform and API).

Kind of the same thing this new program does, all it does is generate Async compute in CUDA, it's a quick and dirty way to test if a card can do Async compute, not to measure how fast it is or anything else as it bypasses all optimisation work done via drivers (DX11)

Harlequin · 8 Sep 2015

Corky42 said: ↑

Wavefronts?
Was that directed at loftie because I don't have a clue what a wavefront is, other than the weather variety.
Click to expand...

then maybe you want to actually read the links you supply - wavefronts are mentioned later in that very thread....

Yes, but people were using it to not only to detect the segmented RAM but also to benchmark it, something the program wasn't designed to do, IIRC it bypassed the memory management in Nvidias DX11 drivers and directly addressed the card using CUDA (Nvidia's own proprietary parallel computing platform and API).

Kind of the same thing this new program does, all it does is generate Async compute in CUDA, it's a quick and dirty way to test if a card can do Async compute, not to measure how fast it is or anything else as it bypasses all optimisation work done via drivers (DX11)
Click to expand...

still doesn't explain the stuttering people still get in games on 970`s that are not on 980`s...

and its not the same - AOTS is a game not something `quick and dirty` - but please , go over to B3D and tell them you have far more skill in creating programmes designed to test part of the DX12 feature set than they do , at short notice , which can produce verifiable results on multiple platforms.

AMD have created something which has potential - but why all the extremism? its not as If they suddenly made nv redundant , they simply have leveld the playing field for now , till nv find yet another way to cheat.

Corky42 · 8 Sep 2015

Harlequin said: ↑

then maybe you want to actually read the links you supply - wavefronts are mentioned later in that very thread....
Click to expand...

So your talking about the part that I initially ignored because it related to the strange results with a Fury X? I fail to see how that's relevant to the use of a program that's not designed as a benchmark being used exactly for that purpose.

Perhaps you care to point out, for us slow folks, what a wavefront is and why they're relevant to a program written to specifically test for the existence of async compute.

Harlequin said: ↑

still doesn't explain the stuttering people still get in games on 970`s that are not on 980`s...
Click to expand...

Since when have we been trying to explain why stuttering happens?
I though we were discussing that using a program not intended as a benchmark to benchmark something was an indication of absolutely nothing, other than IF the card can do async compute.

Wouldn't getting into a discussion on why some people supposedly get stuttering, something that's still very much debatable due to the crazily high setting used during tests, wouldn't that be taking this conversation off on a tangent?

Harlequin said: ↑

and its not the same - AOTS is a game not something `quick and dirty` - but please , go over to B3D and tell them you have far more skill in creating programmes designed to test part of the DX12 feature set than they do , at short notice , which can produce verifiable results on multiple platforms.
Click to expand...

Sorry I'm failing to see why you're getting so angry, the section you quoted from me makes no mention of AotS so IDK why you brought it up and used it as a stick to beat me with, I was under the impression we were discussing the validity of the program MDolenc wrote to test for the existence of async compute.

I'm also not sure how you're defining testing parts of DX12 on multiple platforms as the program has been re-written six times to deal with different architectures, there's plenty of doubt on the B3D forums as to the validity of using such a program as a benchmark if you cared to read about it.

Harlequin said: ↑

AMD have created something which has potential - but why all the extremism? its not as If they suddenly made nv redundant , they simply have leveld the playing field for now , till nv find yet another way to cheat.
Click to expand...

The only extremism I see is in your reaction to what until now had seemed like a reasoned discussion on the differences in architecture between AMD & Nvidia, and what impact the different designs have on DX12 versus DX11.

rollo · 8 Sep 2015

Reading through some of the links posted Async is bad for VR with latency times in the 60-70ms instead of 15ms. If VR is as badly affected by latency as some think it will be you could end up with some interesting choices between VR or Async.

Most of the nvidia tests the card is sub 10ms indicative that it's not doing Async. AMD all cards are at least 50-60ms and that increases as the workload does. One of the devs for VR was posting in the topics saying anything above 20ms would effect VR.

That would be worrying for any would be VR user

loftie · 8 Sep 2015

If you're basing that on the charts/graphs than show GCN has a flat line, and Maxwell as a lower gradually increasing line, don't because iirc the guy who wrote the async test program stated it's not designed to test the latency, it's only there to test if async compute is working. Think the post was linked to on the Anand forums.

Personally, as AMD stated, no card really fully supports DX12 so I'm far more interested to see the next set of cards, but then I would be regardless

edzieba · 8 Sep 2015

rollo said: ↑

Reading through some of the links posted Async is bad for VR with latency times in the 60-70ms instead of 15ms. If VR is as badly affected by latency as some think it will be you could end up with some interesting choices between VR or Async.
Click to expand...

Thus far, the 'latency impact' of Nvidia architectures (listed variously as 60-70ms, 33ms, 24ms, etc) does not appear in practice. Fire up Oculus World and check out the latency debug overlay.
A lot of people are seeing "asynchronous compute shaders" and "asynchronous timewarp" and thinking the two are intimately related because they share a word. They key arbiters of Asynchronous Timewarp's effectiveness are the accuracy of Late Latching timing (timing before the next VSYNC readout begins) and making sure the GPU is not busy at that time (don't use draw calls that take super-long to complete*).

* Basically, if the GPU is taking 8ms to complete a draw call, there's a chance that the time to perform the Async warp comeas and goes while the GPU is still working. If you split that call into 2ms chunks, then every 2ms there's the chance to pre-empt the next chunk and fire off the Async warp. This is a less a "OMG, this architecture is no good for VR!" and more and "oh, that's good to know for optimisation among a whole load of other low-level considerations because VR performance optimisation is hard".

Corky42 · 8 Sep 2015

David Kanter talking about async compute, AotS, and AMD's ACE engines on a TR podcast.

If it's not jumping to relevant part they start talking about it at 1h 12m 30sec and then prattle on about HPC until 1h 19m when they actually start talking about AotS and async compute.

Corky42 · 10 Sep 2015

Asynchronous compute, AMD, Nvidia, and DX12: What we know so far
https://www.extremetech.com/extreme/213519-asynchronous-shading-amd-nvidia-and-dx12-what-we-know-so-far

When AMD and Nvidia talk about supporting asynchronous compute, they aren’t talking about the same hardware capability. The Asynchronous Command Engines in AMD’s GPUs (between 2-8 depending on which card you own) are capable of executing new workloads at latencies as low as a single cycle. A high-end AMD card has eight ACEs and each ACE has eight queues. Maxwell, in contrast, has two pipelines, one of which is a high-priority graphics pipeline. The other has a a queue depth of 31 — but Nvidia can’t switch contexts anywhere near as quickly as AMD can.
Click to expand...

Having read through this article is seem a fairly fair appraisal of the situation to me, nicely summed up by the following...

“There were claims originally, that Nvidia GPUs wouldn’t even be able to execute async compute shaders in an async fashion at all, this myth was quickly debunked. What become clear, however, is that Nvidia GPUs preferred a much lighter load than AMD cards. At small loads, Nvidia GPUs would run circles around AMD cards. At high load, well, quite the opposite, up to the point where Nvidia GPUs took such a long time to process the workload that they triggered safeguards in Windows. Which caused Windows to pull the trigger and kill the driver, assuming that it got stuck.

“Final result (for now): AMD GPUs are capable of handling a much higher load. About 10x times what Nvidia GPUs can handle. But they also need also about 4x the pressure applied before they get to play out there capabilities.”
Click to expand...

roosauce · 24 Sep 2015

New Fable DX12 benchmark out

Guest-16 · 25 Sep 2015

Doesn't seem UE4 makes extensive use of CPU threading, which is highly disappointing considering how oft-used UE is. Someone needs to contact UE team to get more insights into their DX12 implementation because it doesn't look like it's using everything available from the Anand benchmarks.

Harlequin · 25 Sep 2015

also it appears that the engine has certain features `switched on` for consoles but not for desktop ; which doesn't make sense tbh

edzieba · 25 Sep 2015

CPU threading isn't an easy thing to use. Many tasks simply can't be threaded effectively. If you're spending 10ms on single-threaded engine simulation and 2ms on job dispatch, then cutting down that job dispatch time through threading by 4x only takes you from 12ms to 10.5ms of CPU time before render starts.
DX12 is not, and never will be, a magic go-faster switch.

rollo · 25 Sep 2015

The fable results are not a huge shock it was never draw cell limited on dx11. And the main benefit of dx12 is draw cells.

Rts games will be where people will see huge dx12 gains at least in early titles. Be mid next year before a PC game will come with most major dx12 features included be that long before there's a gpu with all the features in truth.

Guest-16 · 29 Sep 2015

When is next Civ due? Should be soon - V was out in 2010.

Harlequin · 29 Sep 2015

Bindibadgi said: ↑

When is next Civ due? Should be soon - V was out in 2010.
Click to expand...

and BE was last year

Log in or Sign up

Graphics AMD’s DirectX 12 Advantage Explained – GCN Architecture More Friendly To Parallelism

Harlequin Modder

loftie Multimodder

Corky42 Where's walle?

Harlequin Modder

Corky42 Where's walle?

rollo Modder

loftie Multimodder

edzieba Virtual Realist

Corky42 Where's walle?

Corky42 Where's walle?

roosauce Looking for xmas projects??

Guest-16 Guest

Harlequin Modder

edzieba Virtual Realist

rollo Modder

Guest-16 Guest

Harlequin Modder

Share This Page

Log in or Sign up

Graphics AMD’s DirectX 12 Advantage Explained – GCN Architecture More Friendly To Parallelism

Harlequin Modder

loftie Multimodder

Corky42 Where's walle?

Harlequin Modder

Corky42 Where's walle?

rollo Modder

loftie Multimodder

edzieba Virtual Realist

Corky42 Where's walle?

Corky42 Where's walle?

roosauce Looking for xmas projects??

Guest-16 Guest

Harlequin Modder

edzieba Virtual Realist

rollo Modder

Guest-16 Guest

Harlequin Modder

Share This Page

Useful Searches