bit-tech.net

Go Back   bit-tech.net Forums > bit-tech.net > Article Discussion

Reply
 
Thread Tools
Old 19th Mar 2012, 11:48   #1
brumgrunt
Ultramodder
 
brumgrunt's Avatar
 
Join Date: Dec 2011
Posts: 1,009
brumgrunt is a hoopy frood who really knows where their towel is.brumgrunt is a hoopy frood who really knows where their towel is.brumgrunt is a hoopy frood who really knows where their towel is.brumgrunt is a hoopy frood who really knows where their towel is.brumgrunt is a hoopy frood who really knows where their towel is.brumgrunt is a hoopy frood who really knows where their towel is.brumgrunt is a hoopy frood who really knows where their towel is.brumgrunt is a hoopy frood who really knows where their towel is.brumgrunt is a hoopy frood who really knows where their towel is.brumgrunt is a hoopy frood who really knows where their towel is.brumgrunt is a hoopy frood who really knows where their towel is.
Intel's Haswell rumoured to feature L4 cache

Intel's upcoming Haswell chips are rumoured to include an L4 cache layer at the top end for improved graphics performance.

http://www.bit-tech.net/news/hardwar...cache-rumour/1
brumgrunt is offline   Reply With Quote
Old 19th Mar 2012, 12:31   #2
CAT-THE-FIFTH
Modder
 
Join Date: Apr 2009
Posts: 52
CAT-THE-FIFTH has yet to learn the way of the Dremel
It seems that CD from SA mentioned this a while back(not sure if the numbers are correct though):

http://semiaccurate.com/2012/02/08/h...phics-monster/

http://semiaccurate.com/2011/09/21/a...mportant-bits/

AMD is also investigating used of stacked RAM too(there are pictures):

http://semiaccurate.com/2011/10/27/a...-gpu-pictured/

http://semiaccurate.com/forums/showp...8&postcount=26

2013 is going to be a very interesting year for integrated graphics as both Intel and AMD might be using on-die RAM!
CAT-THE-FIFTH is offline   Reply With Quote
Old 19th Mar 2012, 12:33   #3
Adnoctum
Kill_All_Humans
 
Adnoctum's Avatar
 
Join Date: Apr 2008
Posts: 482
Adnoctum should be considered for presidentAdnoctum should be considered for presidentAdnoctum should be considered for presidentAdnoctum should be considered for presidentAdnoctum should be considered for presidentAdnoctum should be considered for presidentAdnoctum should be considered for presidentAdnoctum should be considered for presidentAdnoctum should be considered for presidentAdnoctum should be considered for presidentAdnoctum should be considered for president
The Bobcat-based APUs are already being looked at for their GPGPU uses in servers, and I would imagine that APUs will graduate to the high end eventually as well, so Intel would be foolish if they weren't prepared to offer competition to AMD.

My only issue would be to question Intel's competence in creating GPGPU-capable graphics hardware. And their driver team hasn't been exactly stellar in trying to make up for hardware inadequacies either.
It is one thing to play HD video and some not very demanding games on Intel's graphics, but I wouldn't like to crunch important 1s and 0s on it. Hopefully Intel will surprise us and turn their graphics around.
__________________
Main Rig: Amiga A1200 - Motorola 68EC020@14.2MHz + 68030@50MHz Acc. Card - Lisa Graphics - 2+8MB RAM - 80MB HDD
LAN Rig: Amiga A500 - Motorola 68000@8MHz - Denise Graphics - 512+512KB RAM
Adnoctum is offline   Reply With Quote
Old 19th Mar 2012, 12:34   #4
greigaitken
Supermodder
 
Join Date: Aug 2009
Posts: 281
greigaitken has yet to learn the way of the Dremel
that more intel and amd press forward with integrated - the more nvidia have to look behind and run faster.
hurry up the lot of you, i got 30 years gaming left and i want photo realism before that!
greigaitken is offline   Reply With Quote
Old 19th Mar 2012, 13:50   #5
schmidtbag
Hypermodder
 
Join Date: Jul 2010
Location: MA, USA
Posts: 782
schmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on you
5 years later:
"in other news, intel releases a L8 cache with 1GB of memory"

does intel even remember the purpose of on-die caches? i suppose its fine that the L4 is shared with the gpu, but still, i don't see it making a performance improvement on the cpu side. just make an L2 for the gpu.
__________________
4.4GHz FX-6300 (on an AM3 board) with C'n'Q on, 8GB of RAM, 2x ATI HD5750, ADATA SP900 64GB SSD, Arch Linux 64 bit.
schmidtbag is online now   Reply With Quote
Old 19th Mar 2012, 21:02   #6
yougotkicked
A.K.A. YGKtech
 
yougotkicked's Avatar
 
Join Date: Jan 2010
Location: Minneapolis, Minnesota. USA
Posts: 243
yougotkicked has yet to learn the way of the Dremelyougotkicked has yet to learn the way of the Dremelyougotkicked has yet to learn the way of the Dremel
@schmidtbag: If you were to slap 1GB of cache (any level) on a current processor, it could probably break the TeraFLOP barrier with two cores tied behind it's FSB.

adding more levels and larger quantities of Cache memory is actually one of the most powerful ways to improve processor performance; it's also expensive as hell.

cache memory is quite literally 100 times faster than RAM, the CPU can access data from L1 cache in a single clock cycle, fetching from main memory can take up to 100 clock cycles. L2 takes about 15 cycles and L3 about 30. fetching a single datum from main memory during a calculation can easily double the execution time of an operation, AND potentially slow down other operations. adding more levels of cache helps avoid these time-wasting calls to main memory, even a call to L4 cache taking ~50 cycles is 50% faster than a call to main memory.

The reason we don't have uber-tons of cache memory is that it is stupidly expensive, like $50 per Mb.

The idea of it being specialized to graphics makes me think it may just be a stepping stone to a full L4 cache implementation. SATA's successor is supposed to communicate over the PCI-E bus that graphics cards use to communicate with the CPU, and Intel has made a habit recently of clever production implementations (see their "tic-tock" strategy). L4 cache (and eventually L5) is more or less inevitable and CPU's continue to advance much faster than RAM, so I think this theory has some merit.

[/longwindedpost]
__________________
Corsair Carbide 500R ::::: NINE (!) case fans
Intel I5 2500K @ 4.5Ghz ::::: AIR cooled (modified tuniq tower 120)
Asus P8Z68-V LX ::::: 16GB 1866 9-9-9-27-1T (1.4v) Samsung low profile 30nm's
Gigabyte Radeon HD 6870 1GB ::::: 1TB Samsung Spinpoint F3
128GB Intel SSD :::: 2x WD 1TB drives in RAID 1
yougotkicked is offline   Reply With Quote
Old 19th Mar 2012, 22:44   #7
schmidtbag
Hypermodder
 
Join Date: Jul 2010
Location: MA, USA
Posts: 782
schmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on you
Quote:
Originally Posted by yougotkicked
@schmidtbag: If you were to slap 1GB of cache (any level) on a current processor, it could probably break the TeraFLOP barrier with two cores tied behind it's FSB.

adding more levels and larger quantities of Cache memory is actually one of the most powerful ways to improve processor performance; it's also expensive as hell.

cache memory is quite literally 100 times faster than RAM, the CPU can access data from L1 cache in a single clock cycle, fetching from main memory can take up to 100 clock cycles. L2 takes about 15 cycles and L3 about 30. fetching a single datum from main memory during a calculation can easily double the execution time of an operation, AND potentially slow down other operations. adding more levels of cache helps avoid these time-wasting calls to main memory, even a call to L4 cache taking ~50 cycles is 50% faster than a call to main memory.

The reason we don't have uber-tons of cache memory is that it is stupidly expensive, like $50 per Mb.

The idea of it being specialized to graphics makes me think it may just be a stepping stone to a full L4 cache implementation. SATA's successor is supposed to communicate over the PCI-E bus that graphics cards use to communicate with the CPU, and Intel has made a habit recently of clever production implementations (see their "tic-tock" strategy). L4 cache (and eventually L5) is more or less inevitable and CPU's continue to advance much faster than RAM, so I think this theory has some merit.

[/longwindedpost]
yes, do you know WHY cache is so much faster than ram? because it's on the same die as the cpu and its small, so its easier to address and find instructions while there's little to no latency problem. caches were made specifically because of these 2 benefits. there's a reason why L1 caches are still generally very tiny. if you've ever run memtest, you'd find that the smaller the cache, the faster it is. however, if you make a cache too small, it can't store complex instructions, hence stuff like L2 caches being notably larger. i made the joke about a 1gb L8 cache because if for some stupid reason that ever happens, it'll be as slow as regular ram, minus the latency times and would make ram obsolete.

look at it in terms of the common cache and ram analogy:
imagine you walk into a library looking for a book. the front door of the library is the CPU. the book shelves are RAM. you would have to walk all way way to the correct bookshelf and then pinpoint the location of that book within that shelf, then walk all the way back to the front desk to check it out and leave the library.
however, maybe you're looking for a few popular books. these books would already be placed at the front desk (the cache). this prevents you from having to walk to each book shelf and scan thru every book.

now if you take this analogy and put some gigantic cache like 1gb, that's basically saying that all you're doing is taking those book shelves and putting them really close to the front desk. you might not need to walk as far, but you still waste time searching for what you want.
__________________
4.4GHz FX-6300 (on an AM3 board) with C'n'Q on, 8GB of RAM, 2x ATI HD5750, ADATA SP900 64GB SSD, Arch Linux 64 bit.
schmidtbag is online now   Reply With Quote
Old 20th Mar 2012, 03:31   #8
iwod
Multimodder
 
Join Date: Jul 2007
Posts: 86
iwod has yet to learn the way of the Dremel
Once Intel has Stacked Silicon done, ( it was actually announced nearly a decade ago ), they could move 10s to 100MBs of L4 Cache as another layer of CPU which could make GPU many times faster. Look at what the 36eDRAM brings to original Xbox.
iwod is offline   Reply With Quote
Old 20th Mar 2012, 06:04   #9
yougotkicked
A.K.A. YGKtech
 
yougotkicked's Avatar
 
Join Date: Jan 2010
Location: Minneapolis, Minnesota. USA
Posts: 243
yougotkicked has yet to learn the way of the Dremelyougotkicked has yet to learn the way of the Dremelyougotkicked has yet to learn the way of the Dremel
@schmidtbag: not trying to be insulting or anything, just saw what I felt was a misinformed statement and took the opportunity to explain some of the finer points of cache architecture for everyone.

and; not to be a smart ass, but cache is faster b/c it's made with static RAM, which requires 6 transistors for every bit of data, whereas DRAM requites only one. And the capacity of a memory device is fully independent of it's access times, since a fetch operation is an indexed table jump, not a scan. the size DOES effect the cost, and the staggered caching architecture present in ALL digital storage devices is nothing but a cost-saving design meant to virtualize a higher performance memory device through the optimization of several lower cost devices.
__________________
Corsair Carbide 500R ::::: NINE (!) case fans
Intel I5 2500K @ 4.5Ghz ::::: AIR cooled (modified tuniq tower 120)
Asus P8Z68-V LX ::::: 16GB 1866 9-9-9-27-1T (1.4v) Samsung low profile 30nm's
Gigabyte Radeon HD 6870 1GB ::::: 1TB Samsung Spinpoint F3
128GB Intel SSD :::: 2x WD 1TB drives in RAID 1
yougotkicked is offline   Reply With Quote
Old 20th Mar 2012, 14:41   #10
schmidtbag
Hypermodder
 
Join Date: Jul 2010
Location: MA, USA
Posts: 782
schmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on youschmidtbag - may the hammer of Bindi be bestowed on you
Quote:
Originally Posted by yougotkicked
@schmidtbag: not trying to be insulting or anything, just saw what I felt was a misinformed statement and took the opportunity to explain some of the finer points of cache architecture for everyone.

and; not to be a smart ass, but cache is faster b/c it's made with static RAM, which requires 6 transistors for every bit of data, whereas DRAM requites only one. And the capacity of a memory device is fully independent of it's access times, since a fetch operation is an indexed table jump, not a scan. the size DOES effect the cost, and the staggered caching architecture present in ALL digital storage devices is nothing but a cost-saving design meant to virtualize a higher performance memory device through the optimization of several lower cost devices.
i never said cache wasn't made with static ram, and i know that its more than just simply a small local ram source on the cpu. i'm not really sure what your point is about that sentence about memory capacity; it concluded nothing. i never said size doesn't affect cost - i'm aware cache is more expensive, but larger caches these days don't bring up the price THAT much, so by now you'd probably end up seeing server CPUs with over 20MB if it were practical and proven worth it.

all i was trying to say in my post was that caches are deliberately small (for more than just price differences) and when they are smaller, they're generally faster. yes, i'm aware that 1gb of cache is not going to perform like 1gb of ram even without all the technicalities, but my point is it'll still run relatively slow. i'm not the one who came up with the library analogy, its been used for a long time.

i don't appreciate being called "misinformed" when you said nothing to prove that. the information i didn't supply (which you did) wouldn't have changed my point at all, in fact, all you really did was help prove my point even further.
__________________
4.4GHz FX-6300 (on an AM3 board) with C'n'Q on, 8GB of RAM, 2x ATI HD5750, ADATA SP900 64GB SSD, Arch Linux 64 bit.
schmidtbag is online now   Reply With Quote
Old 20th Mar 2012, 16:01   #11
technogiant
Supermodder
 
Join Date: May 2009
Location: UK
Posts: 321
technogiant is definitely a rep cheat.technogiant is definitely a rep cheat.technogiant is definitely a rep cheat.technogiant is definitely a rep cheat.technogiant is definitely a rep cheat.technogiant is definitely a rep cheat.technogiant is definitely a rep cheat.technogiant is definitely a rep cheat.technogiant is definitely a rep cheat.technogiant is definitely a rep cheat.technogiant is definitely a rep cheat.
Well you can call me misinformed, because compared to you guys I am, but I don't really see the point of adding more and more layers of shared cache, once you have one level of shared cache just increase its size rather than adding another layer ( with slower latency)...surely?
technogiant is offline   Reply With Quote
Old 20th Mar 2012, 19:44   #12
rocket
Minimodder
 
Join Date: Aug 2011
Location: london
Posts: 27
rocket has yet to learn the way of the Dremel
If you remove ram And use a cache it would be more like a System on a chip
__________________
my rig i7 2600k /asus ROG Maximus IV Extreme-Z Z68 /sli Asus MARS II/2DIS/3GD5 Nvidia GTX580 X2, / Creative Labs X-Fi Titanium PCIe Sound Card/Sony BWU-500S-WW Blu-ray Writer x2/OCZ Technology 240GB Vertex 3 MAX IOPS SSD /
rocket is offline   Reply With Quote
Old 20th Mar 2012, 22:12   #13
yougotkicked
A.K.A. YGKtech
 
yougotkicked's Avatar
 
Join Date: Jan 2010
Location: Minneapolis, Minnesota. USA
Posts: 243
yougotkicked has yet to learn the way of the Dremelyougotkicked has yet to learn the way of the Dremelyougotkicked has yet to learn the way of the Dremel
@schmidtbag; I didn't intend to say that you WERE misinformed, i was simply saying that when I made my first reply to you, I thought your first post sounded misinformed. after your second post I realized your understanding of the concept went much deeper than I initially realized. I don't want to turn this into an argument, but it is hard to disagree with someone without sounding at least a little condescending.

I'll say this; I realize my first post came off a bit insulting, I did not intend for it to do so but I apologize for it nonetheless. We both obviously know a lot about the subject, which is complicated enough for us to go back and forth like this for pages.

@technogiant; as I understnd it, the main reason (but not the only reason) for having multiple layers of cache is that higher levels of cache are accessed and written by more system processes and elements. because of this there are protocols in place for managing which components can do what with that ram pool. on current generation i7 processors, the L3 cache is accessed by all the physical processing cores, so every time one of the cores wants to put some data there, a calculation has to be done to decide if it can be stored there and if so where it can go. at L2 and L1 the cache is exclusive to a single core, and these calculations do not need to be done.

besides that there are some complicated reasons for having two layers of cache dedicated to the same core; it's mostly to do with how data locations are accessed. every piece of storage on your computer is part of a "memory hierarchy" where each device is serving as a sort of cache for the slower, larger, and cheaper device below it. this is not only less expensive for the same capacity as using a single device, it is theoretically almost as fast. so a slight loss in performance can drastically reduce the cost. the more layers you add to the memory hierarchy, the smaller the performance gap becomes. that's why people have 100gb SSD's and 2TB hard drives, by putting the right data in the right places, you computer will be 80% as fast as it would be with a 2.1TB SSD, for a fraction of the cost. now if you put some of the savings into more/faster RAM, suddenly we're looking at 90% as fast. put the rest of the savings into a better CPU and you computer is now, in many ways, FASTER than if you put all your money into 2.1TB worth of SSD's. It's all about balance.
__________________
Corsair Carbide 500R ::::: NINE (!) case fans
Intel I5 2500K @ 4.5Ghz ::::: AIR cooled (modified tuniq tower 120)
Asus P8Z68-V LX ::::: 16GB 1866 9-9-9-27-1T (1.4v) Samsung low profile 30nm's
Gigabyte Radeon HD 6870 1GB ::::: 1TB Samsung Spinpoint F3
128GB Intel SSD :::: 2x WD 1TB drives in RAID 1
yougotkicked is offline   Reply With Quote
Old 21st Mar 2012, 07:58   #14
Gareth Halfacree
WIIGII!
bit-tech Staff
 
Gareth Halfacree's Avatar
 
Join Date: Dec 2007
Location: Bradford, UK
Posts: 3,987
Gareth Halfacree is a Super Spamming SaiyanGareth Halfacree is a Super Spamming SaiyanGareth Halfacree is a Super Spamming SaiyanGareth Halfacree is a Super Spamming SaiyanGareth Halfacree is a Super Spamming SaiyanGareth Halfacree is a Super Spamming SaiyanGareth Halfacree is a Super Spamming SaiyanGareth Halfacree is a Super Spamming SaiyanGareth Halfacree is a Super Spamming SaiyanGareth Halfacree is a Super Spamming SaiyanGareth Halfacree is a Super Spamming Saiyan
Quote:
Originally Posted by schmidtbag View Post
i never said size doesn't affect cost - i'm aware cache is more expensive, but larger caches these days don't bring up the price THAT much, so by now you'd probably end up seeing server CPUs with over 20MB if it were practical and proven worth it.
We are: take the Intel Xeon X7560 for example, which has a whopping 24MB of cache. As does the Xeon E7-4830. Granted, AMD's equivalent chips only have 16MB - but they're typically used in dual- or quad-socket motherboards, meaning a grand total of 32MB or 64MB of cache.
__________________
Author, Raspberry Pi User Guide Third Edition, 21 Brilliant Projects for the Raspberry Pi and more | gareth.halfacree.co.uk | twitter
bit-tech news correspondent, Custom PC columnist, other things to other people
I'm a filthy freelancer! Hire me!
Gareth Halfacree is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 15:28.
Powered by: vBulletin Version 3
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.