News Intel's Haswell rumoured to feature L4 cache

brumgrunt · 19 Mar 2012

Intel's upcoming Haswell chips are rumoured to include an L4 cache layer at the top end for improved graphics performance.

http://www.bit-tech.net/news/hardware/2012/03/19/haswell-l4-cache-rumour/1

CAT-THE-FIFTH · 19 Mar 2012

It seems that CD from SA mentioned this a while back(not sure if the numbers are correct though):

http://semiaccurate.com/2012/02/08/haswell-is-a-graphics-monster/

http://semiaccurate.com/2011/09/21/analysis-intel-shows-off-haswell-minus-the-important-bits/

AMD is also investigating used of stacked RAM too(there are pictures):

http://semiaccurate.com/2011/10/27/amd-far-future-prototype-gpu-pictured/

http://semiaccurate.com/forums/showpost.php?p=140928&postcount=26

2013 is going to be a very interesting year for integrated graphics as both Intel and AMD might be using on-die RAM!

Adnoctum · 19 Mar 2012

The Bobcat-based APUs are already being looked at for their GPGPU uses in servers, and I would imagine that APUs will graduate to the high end eventually as well, so Intel would be foolish if they weren't prepared to offer competition to AMD.

My only issue would be to question Intel's competence in creating GPGPU-capable graphics hardware. And their driver team hasn't been exactly stellar in trying to make up for hardware inadequacies either.
It is one thing to play HD video and some not very demanding games on Intel's graphics, but I wouldn't like to crunch important 1s and 0s on it. Hopefully Intel will surprise us and turn their graphics around.

greigaitken · 19 Mar 2012

that more intel and amd press forward with integrated - the more nvidia have to look behind and run faster.
hurry up the lot of you, i got 30 years gaming left and i want photo realism before that!

schmidtbag · 19 Mar 2012

5 years later:
"in other news, intel releases a L8 cache with 1GB of memory"

does intel even remember the purpose of on-die caches? i suppose its fine that the L4 is shared with the gpu, but still, i don't see it making a performance improvement on the cpu side. just make an L2 for the gpu.

yougotkicked · 19 Mar 2012

@schmidtbag: If you were to slap 1GB of cache (any level) on a current processor, it could probably break the TeraFLOP barrier with two cores tied behind it's FSB.

adding more levels and larger quantities of Cache memory is actually one of the most powerful ways to improve processor performance; it's also expensive as hell.

cache memory is quite literally 100 times faster than RAM, the CPU can access data from L1 cache in a single clock cycle, fetching from main memory can take up to 100 clock cycles. L2 takes about 15 cycles and L3 about 30. fetching a single datum from main memory during a calculation can easily double the execution time of an operation, AND potentially slow down other operations. adding more levels of cache helps avoid these time-wasting calls to main memory, even a call to L4 cache taking ~50 cycles is 50% faster than a call to main memory.

The reason we don't have uber-tons of cache memory is that it is stupidly expensive, like $50 per Mb.

The idea of it being specialized to graphics makes me think it may just be a stepping stone to a full L4 cache implementation. SATA's successor is supposed to communicate over the PCI-E bus that graphics cards use to communicate with the CPU, and Intel has made a habit recently of clever production implementations (see their "tic-tock" strategy). L4 cache (and eventually L5) is more or less inevitable and CPU's continue to advance much faster than RAM, so I think this theory has some merit.

[/longwindedpost]

schmidtbag · 19 Mar 2012

yougotkicked said:

@schmidtbag: If you were to slap 1GB of cache (any level) on a current processor, it could probably break the TeraFLOP barrier with two cores tied behind it's FSB.

adding more levels and larger quantities of Cache memory is actually one of the most powerful ways to improve processor performance; it's also expensive as hell.

cache memory is quite literally 100 times faster than RAM, the CPU can access data from L1 cache in a single clock cycle, fetching from main memory can take up to 100 clock cycles. L2 takes about 15 cycles and L3 about 30. fetching a single datum from main memory during a calculation can easily double the execution time of an operation, AND potentially slow down other operations. adding more levels of cache helps avoid these time-wasting calls to main memory, even a call to L4 cache taking ~50 cycles is 50% faster than a call to main memory.

The reason we don't have uber-tons of cache memory is that it is stupidly expensive, like $50 per Mb.

The idea of it being specialized to graphics makes me think it may just be a stepping stone to a full L4 cache implementation. SATA's successor is supposed to communicate over the PCI-E bus that graphics cards use to communicate with the CPU, and Intel has made a habit recently of clever production implementations (see their "tic-tock" strategy). L4 cache (and eventually L5) is more or less inevitable and CPU's continue to advance much faster than RAM, so I think this theory has some merit.

[/longwindedpost]
Click to expand...

yes, do you know WHY cache is so much faster than ram? because it's on the same die as the cpu and its small, so its easier to address and find instructions while there's little to no latency problem. caches were made specifically because of these 2 benefits. there's a reason why L1 caches are still generally very tiny. if you've ever run memtest, you'd find that the smaller the cache, the faster it is. however, if you make a cache too small, it can't store complex instructions, hence stuff like L2 caches being notably larger. i made the joke about a 1gb L8 cache because if for some stupid reason that ever happens, it'll be as slow as regular ram, minus the latency times and would make ram obsolete.

look at it in terms of the common cache and ram analogy:
imagine you walk into a library looking for a book. the front door of the library is the CPU. the book shelves are RAM. you would have to walk all way way to the correct bookshelf and then pinpoint the location of that book within that shelf, then walk all the way back to the front desk to check it out and leave the library.
however, maybe you're looking for a few popular books. these books would already be placed at the front desk (the cache). this prevents you from having to walk to each book shelf and scan thru every book.

now if you take this analogy and put some gigantic cache like 1gb, that's basically saying that all you're doing is taking those book shelves and putting them really close to the front desk. you might not need to walk as far, but you still waste time searching for what you want.

iwod · 20 Mar 2012

Once Intel has Stacked Silicon done, ( it was actually announced nearly a decade ago ), they could move 10s to 100MBs of L4 Cache as another layer of CPU which could make GPU many times faster. Look at what the 36eDRAM brings to original Xbox.

yougotkicked · 20 Mar 2012

@schmidtbag: not trying to be insulting or anything, just saw what I felt was a misinformed statement and took the opportunity to explain some of the finer points of cache architecture for everyone.

and; not to be a smart ass, but cache is faster b/c it's made with static RAM, which requires 6 transistors for every bit of data, whereas DRAM requites only one. And the capacity of a memory device is fully independent of it's access times, since a fetch operation is an indexed table jump, not a scan. the size DOES effect the cost, and the staggered caching architecture present in ALL digital storage devices is nothing but a cost-saving design meant to virtualize a higher performance memory device through the optimization of several lower cost devices.

schmidtbag · 20 Mar 2012

yougotkicked said:

@schmidtbag: not trying to be insulting or anything, just saw what I felt was a misinformed statement and took the opportunity to explain some of the finer points of cache architecture for everyone.

and; not to be a smart ass, but cache is faster b/c it's made with static RAM, which requires 6 transistors for every bit of data, whereas DRAM requites only one. And the capacity of a memory device is fully independent of it's access times, since a fetch operation is an indexed table jump, not a scan. the size DOES effect the cost, and the staggered caching architecture present in ALL digital storage devices is nothing but a cost-saving design meant to virtualize a higher performance memory device through the optimization of several lower cost devices.
Click to expand...

i never said cache wasn't made with static ram, and i know that its more than just simply a small local ram source on the cpu. i'm not really sure what your point is about that sentence about memory capacity; it concluded nothing. i never said size doesn't affect cost - i'm aware cache is more expensive, but larger caches these days don't bring up the price THAT much, so by now you'd probably end up seeing server CPUs with over 20MB if it were practical and proven worth it.

all i was trying to say in my post was that caches are deliberately small (for more than just price differences) and when they are smaller, they're generally faster. yes, i'm aware that 1gb of cache is not going to perform like 1gb of ram even without all the technicalities, but my point is it'll still run relatively slow. i'm not the one who came up with the library analogy, its been used for a long time.

i don't appreciate being called "misinformed" when you said nothing to prove that. the information i didn't supply (which you did) wouldn't have changed my point at all, in fact, all you really did was help prove my point even further.

technogiant · 20 Mar 2012

Well you can call me misinformed, because compared to you guys I am, but I don't really see the point of adding more and more layers of shared cache, once you have one level of shared cache just increase its size rather than adding another layer ( with slower latency)...surely?

rocket · 20 Mar 2012

If you remove ram And use a cache it would be more like a System on a chip

yougotkicked · 20 Mar 2012

@schmidtbag; I didn't intend to say that you WERE misinformed, i was simply saying that when I made my first reply to you, I thought your first post sounded misinformed. after your second post I realized your understanding of the concept went much deeper than I initially realized. I don't want to turn this into an argument, but it is hard to disagree with someone without sounding at least a little condescending.

I'll say this; I realize my first post came off a bit insulting, I did not intend for it to do so but I apologize for it nonetheless. We both obviously know a lot about the subject, which is complicated enough for us to go back and forth like this for pages.

@technogiant; as I understnd it, the main reason (but not the only reason) for having multiple layers of cache is that higher levels of cache are accessed and written by more system processes and elements. because of this there are protocols in place for managing which components can do what with that ram pool. on current generation i7 processors, the L3 cache is accessed by all the physical processing cores, so every time one of the cores wants to put some data there, a calculation has to be done to decide if it can be stored there and if so where it can go. at L2 and L1 the cache is exclusive to a single core, and these calculations do not need to be done.

besides that there are some complicated reasons for having two layers of cache dedicated to the same core; it's mostly to do with how data locations are accessed. every piece of storage on your computer is part of a "memory hierarchy" where each device is serving as a sort of cache for the slower, larger, and cheaper device below it. this is not only less expensive for the same capacity as using a single device, it is theoretically almost as fast. so a slight loss in performance can drastically reduce the cost. the more layers you add to the memory hierarchy, the smaller the performance gap becomes. that's why people have 100gb SSD's and 2TB hard drives, by putting the right data in the right places, you computer will be 80% as fast as it would be with a 2.1TB SSD, for a fraction of the cost. now if you put some of the savings into more/faster RAM, suddenly we're looking at 90% as fast. put the rest of the savings into a better CPU and you computer is now, in many ways, FASTER than if you put all your money into 2.1TB worth of SSD's. It's all about balance.

Gareth Halfacree · 21 Mar 2012

schmidtbag said: ↑

i never said size doesn't affect cost - i'm aware cache is more expensive, but larger caches these days don't bring up the price THAT much, so by now you'd probably end up seeing server CPUs with over 20MB if it were practical and proven worth it.
Click to expand...

We are: take the Intel Xeon X7560 for example, which has a whopping 24MB of cache. As does the Xeon E7-4830. Granted, AMD's equivalent chips only have 16MB - but they're typically used in dual- or quad-socket motherboards, meaning a grand total of 32MB or 64MB of cache.

Log in or Sign up

News Intel's Haswell rumoured to feature L4 cache

brumgrunt What's a Dremel?

CAT-THE-FIFTH What's a Dremel?

Adnoctum Kill_All_Humans

greigaitken Minimodder

schmidtbag What's a Dremel?

yougotkicked A.K.A. YGKtech

schmidtbag What's a Dremel?

iwod What's a Dremel?

yougotkicked A.K.A. YGKtech

schmidtbag What's a Dremel?

technogiant What's a Dremel?

rocket What's a Dremel?

yougotkicked A.K.A. YGKtech

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

Share This Page

Log in or Sign up

News Intel's Haswell rumoured to feature L4 cache

brumgrunt What's a Dremel?

CAT-THE-FIFTH What's a Dremel?

Adnoctum Kill_All_Humans

greigaitken Minimodder

schmidtbag What's a Dremel?

yougotkicked A.K.A. YGKtech

schmidtbag What's a Dremel?

iwod What's a Dremel?

yougotkicked A.K.A. YGKtech

schmidtbag What's a Dremel?

technogiant What's a Dremel?

rocket What's a Dremel?

yougotkicked A.K.A. YGKtech

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

Share This Page

Useful Searches