1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Blogs AMD needs to do more to fix Threadripper's optimisation issues

Discussion in 'Article Discussion' started by bit-tech, 6 Feb 2019.

  1. bit-tech

    bit-tech Supreme Overlord Lover of bit-tech Administrator

    Joined:
    12 Mar 2001
    Posts:
    3,676
    Likes Received:
    138
    Read more
     
  2. Corky42

    Corky42 Where's walle?

    Joined:
    30 Oct 2012
    Posts:
    9,648
    Likes Received:
    388
    I'd say AMD are being aggressive and taking a proactive approach, there's only so much they can do and we know AMD were aware of the issue Level1Tech identified since at least the initial launch of (iirc) the 2990WX and the same with the Nvidia issue, was it something like 2-3 month after launch?

    It seems to be more of a lead issue on engineering samples but as IDK what's the typical time between ES' being sent and a release i maybe talking rubbish. :)

    Either way I've just made myself a big tub of popcorn. ;)
     
  3. Anfield

    Anfield Multimodder

    Joined:
    15 Jan 2010
    Posts:
    7,059
    Likes Received:
    970
    They didn't reinvent the way glue works for Zen 2 just for fun.
     
  4. Zak33

    Zak33 Staff Lover of bit-tech Administrator

    Joined:
    12 Jun 2017
    Posts:
    263
    Likes Received:
    54
    I don't understand ... explain please :)
     
  5. Anfield

    Anfield Multimodder

    Joined:
    15 Jan 2010
    Posts:
    7,059
    Likes Received:
    970
    With Zen 1 they have the Infinity Fabric connecting the individual modules
    With Zen 2 they send all chiplet to chiplet as well as CPU to system data through a separate chip

    Why such a big change in just two years from a company that needs to squeeze every penny 7 times?
    Logic dictates that they would have never bothered with the R&D effort to completely change the way the individuals modules connect if they wouldn't have been aware of severe issues with the old approach.
     
  6. Corky42

    Corky42 Where's walle?

    Joined:
    30 Oct 2012
    Posts:
    9,648
    Likes Received:
    388
    Zen 1&2 still use Infinity Fabric, IF is just the name given to a collection of two 'fabrics' and each of those can be further broken down into on and off package communications.

    With Zen 2 IF is still used between chiplets and I/O die, the only thing to really change is what die contains what.
     
  7. Anakha

    Anakha Minimodder

    Joined:
    6 Sep 2002
    Posts:
    587
    Likes Received:
    7
    AMD are using the Chiplet system (Which uses Infinity Fabric between them) for better yields. The main IO die is a (reliable) 14nm part, which they can produce on a separate wafer. The chiplets are 7nm (which is not currently as reliable) and considerably smaller, so they can discard unusable chiplets (or lower-clocking ones) and still retain a much larger portion of the wafer, improving yields.

    It's quite possible the issue that NV identified and corrected could be related to the one CorePrio is showing up. Either way, this is a huge flaw in the Windows Kernel, something AMD has very little control over. Chances are they did know about it, but MS wouldn't fix it as "Couldn't reproduce", but now L1T has done the groundwork and given us CorePrio, MS finally has no option but to actually fix their sh... er... stuff.
     
  8. Anfield

    Anfield Multimodder

    Joined:
    15 Jan 2010
    Posts:
    7,059
    Likes Received:
    970
    Think about it:

    Zen 1 = Issues with windows scheduler
    What does Windows love above anything else? Providing compatibility to obsolete tech like the traditional northbridge
    AMD decides to stick what is effectively a traditional northbridge in between the chiplets and the system

    Put 1 + 1 + 1 together and you might just get a fix for said issues.
     
  9. Corky42

    Corky42 Where's walle?

    Joined:
    30 Oct 2012
    Posts:
    9,648
    Likes Received:
    388
    They didn't do that because Windows throws a wobbly when it comes to ccNUMA nodes, they did it because some things, like I/O related IP blocks, don't scale as efficiently as logic blocks.
     
  10. Slithery

    Slithery What's a Dremel?

    Joined:
    11 Jul 2018
    Posts:
    7
    Likes Received:
    5
    This is a Windows issue, not an AMD issue.

    The Linux kernel handled 2nd gen Threadrippers just fine from launch day...
     
    Last edited: 6 Feb 2019
    Paradigm Shifter, MLyons and Corky42 like this.
  11. edzieba

    edzieba Virtual Realist

    Joined:
    14 Jan 2009
    Posts:
    3,909
    Likes Received:
    591
    It's a radical change in topology.
    Zen 1 uses a distributed mesh topology: core and uncore parts are spread between dies and access between any given core (or cache) and external memory is heterogenous due to differing paths and number of hops between the external interfaces and any given core or cache. This requires hierarchical nested NUMA domains to properly handle.
    Zen 2 uses a centralised hub-and-spoke topology: the uncore is centralised, and ALL core-to-core and core-to-external accesses go through it, and are heterogenous beyond inter-core exchanges. In most cases (where you aren't deliberately digging down into the inter-cache level of memory management) you can get away with not worrying about NUMA at all as external memory access is now uniform across all cores.
    Sure, but why did it end up not even mentioned by AMD until several months after release?
    Standard MO would be to be working with MS loooong before product release to work out the bugs of sticking massive convoluted multi-core multi-domain chips n a consumer OS (even before the first tapeout you have a good idea of your architecture model). And even if MS were for some reason willing to deliberately make Windows perform worse, why would AMD just sit and twiddle their thumbs in silence rather than outright stating "performance is below that on other operating systems because Windows does X"?
     
  12. Anfield

    Anfield Multimodder

    Joined:
    15 Jan 2010
    Posts:
    7,059
    Likes Received:
    970
    It has no relevance if AMD or Microsoft caused the issue, because regardless of who caused it the victim is the performance (and sales) of AMD CPUs and as such it is in the interest of AMD to fix it (or workaround it with a change in topology).
     
  13. Corky42

    Corky42 Where's walle?

    Joined:
    30 Oct 2012
    Posts:
    9,648
    Likes Received:
    388
    I know, that's what i said, I'm glad we agree.
     
  14. Paradigm Shifter

    Paradigm Shifter de nihilo nihil fit

    Joined:
    10 May 2006
    Posts:
    2,306
    Likes Received:
    86
    To be fair, I've not had hands-on time on a 29x0 Threadripper, but I've had some interesting times with a 2700X recently in Linux with PCI-E errors cropping up in one kernel version that didn't in others (Ubuntu 4.15.-42 and -45 are fine, *-43 is horribly broken). *-43 exhibited the same issues with first-gen TR and an 1800X, with no issues on Intel systems of comparable generations. That said, Linux handled scheduling a hell of a lot better than Windows did...
     
  15. Knowbody

    Knowbody What's a Dremel?

    Joined:
    15 Aug 2012
    Posts:
    6
    Likes Received:
    0
    AMD can contribute patches to open source projects such as the Linux Kernel.
    And they have done so. Which is why the 2990WX performs well under Linux.

    AMD cannot fix Microsoft's OS. Only Microsoft can.
    AMD doesn't even have access to the Windows source code.
     
Tags: Add Tags

Share This Page