Memory How much of an impact does quad channel memory make?

Discussion in 'Hardware' started by Spraduke, 4 Oct 2019.

  1. Spraduke

    Spraduke Lurker

    Joined:
    23 Sep 2009
    Posts:
    1,152
    Likes Received:
    466
    Hi,

    At our office we do an awful lot of CFD simulations and we have a range of workstations to enable us to do this. We recently bought some i9-9900k based machines but have found the performance hasn't been as much of an improvement as anticipated versus an older Xeon E5-2630 0 @ 2.30GHz
    despite the i9900k being faster on paper (2 more cores and higher clocks) - cpuz reckons its 60% slower single thread and 70% slower in multithread than the i9-9900kf (slightly different CPU but you get the idea)

    We tend to load up a simulation per thread with an average of 2-4gb required per simulation.

    I was wondering if the Quad channel memory of the xeon is allowing it to bridge the CPU deficit to the i9900K but wondered if anyone here had any insights? Could there be an instruction set on the xeon CPU that is far more optimised than on the i9900k?
     
    Last edited: 4 Oct 2019
  2. RedFlames

    RedFlames ...is not a Belgian football team

    Joined:
    23 Apr 2009
    Posts:
    15,427
    Likes Received:
    3,013
    Is CFD not something that most programs offload to the GPU now? Could it be differences in GPU more than CPU
     
  3. Spraduke

    Spraduke Lurker

    Joined:
    23 Sep 2009
    Posts:
    1,152
    Likes Received:
    466
    Not our specific CFD program we have to use - its quite niche and doesn't use GPUs yet (its barely multithreaded!)
     
  4. edzieba

    edzieba Virtual Realist

    Joined:
    14 Jan 2009
    Posts:
    3,909
    Likes Received:
    591
    General benchmarks show the difference between dual- and quad-channel as between naff and all, but for such a specific task nothing beats benchmarking that specific task: grab a Xeon and populate it with the same capacity and speed of memory in dual-channel (e.g. from 4gbx4 to 8gbx2) and run it against the same Xeon in quad-channel. If you get a big performance hit then that's your bottleneck. If not, then your bottleneck is something else, e.g. the 12-physical/24-logical cores of the dual-Xeon setup vs. the 9900k's 8-physical/16-logical. Or the CFD program doing something weird and out of date with thread dispatching (or throwing a "that's not a Xeon, I'd better sandbag it" silly fit, or using some ancient hard-coded processor ID whitelist to determine threading behavior, or etc).
     
  5. Spraduke

    Spraduke Lurker

    Joined:
    23 Sep 2009
    Posts:
    1,152
    Likes Received:
    466
    Sorry the x2 xeons is a red herring - it came from CPU-Z but its only a single cpu in the workstation so there is only 6 cores.

    That was my understanding the quad channel doesn't make much difference. I've suggested bench marking quad v dual just to make sure by pulling 2 sticks from it but my boss got jumpy about "breaking" it - I might just do it when he's not looking that hard.

    There was also some crashing going on when fully loaded with the i9s due to an memory XMP profile (setting to auto fixed) so this may have been causing odd performance issues at less load also.

    Time to load up both machines with the same runs to see what real world performance differences we are getting.
     
  6. sandys

    sandys Multimodder

    Joined:
    26 Mar 2006
    Posts:
    4,934
    Likes Received:
    727
    I guess the xeon would have separate numa nodes and so might share the thread load and memory in a different way, can you submit the jobs another way so throw threads at each sim, rather than a sim at a thread etc.

    Perhaps you have a disk i/o bottleneck, when I upgraded machine i found that file system performance was a limiter due to files per job/thread, lessened bottleneck lots with move from ssd to nvme and even more once raided, up to the point of IF saturation, 3d solvers not CFD though, only CFD experience for me is doing aero bits for my mx5, that takes buggerall resource.
     
  7. Spraduke

    Spraduke Lurker

    Joined:
    23 Sep 2009
    Posts:
    1,152
    Likes Received:
    466
    When we've tested more threads per run we find an upper 'return on investment' at about 4-8 threads per run, any more threads gets diminishing returns. I don't think its very HDD IO heavy workload - either way the i9's have NVME drives in so unlikely to be that.

    A lot of this is based on subjective feedback from the people running it rather than actual numbers so it might all be psychosomatic caused by the initial crashing of the i9's (which I only just fixed) making them think they are slow!
     
  8. edzieba

    edzieba Virtual Realist

    Joined:
    14 Jan 2009
    Posts:
    3,909
    Likes Received:
    591
    For the older Xeons here: no, not really.
    For recent Xeons: yes, if you're doing fancy AVX-512 stuff there are more features and pipelines exposed on the Xeon dies than on the desktop bins (which mostly do not yet have AVX-512 enabled at all yet).
     
  9. Spraduke

    Spraduke Lurker

    Joined:
    23 Sep 2009
    Posts:
    1,152
    Likes Received:
    466
    Just to update on this. After benchmarking across 5 different cpu architectures we discovered that the particular CFD code is totally memory bandwidth limited.

    We had virtually 1:1 ratio between theoretical memory bandwidth and average run time per thread with the CPUs loaded. I even tried the i9-9900k at a faster memory frequency and saw a direct equal % increase in performance.

    We recently got a thread ripper 3970x and found that we could do 5 seconds of CFD dispersion calculation in 8 mins if we gave it all 64 threads. So we're not quite real time but not far off now!

    Moral of story: don't assume software works in a particular way with hardware, benchmark it!

    At least we now know what to buy now. Max memory bandwidth with a good amount of cores but not necessarily the fastest cores. (A 6 core 2.3ghz cpu with quad channel memory was as fast as an i9-9900k with dual channel memory!)
     
    edzieba and Arboreal like this.
  10. Spraduke

    Spraduke Lurker

    Joined:
    23 Sep 2009
    Posts:
    1,152
    Likes Received:
    466
    Personally I was thinking octa channel epyc Rome might be our next purchase given the bandwidth needs. However we have no immediate need for more computer power right now.
     
  11. edzieba

    edzieba Virtual Realist

    Joined:
    14 Jan 2009
    Posts:
    3,909
    Likes Received:
    591
    Preach!
     

Share This Page