RSS



Go Back   bit-tech.net Forums > bit-tech.net > Article Discussion

Reply
 
Thread Tools
Old 10th Mar 2006, 00:21   #1
WilHarris
Just another nobody
 
WilHarris's Avatar
 
Join Date: Jun 2001
Location: Oxford
Posts: 2,671
WilHarris is on a distinguished road
The Core of Intel's new chips

http://www.bit-tech.net/hardware/200...tecture/1.html

WilHarris is offline   Reply With Quote
Old 10th Mar 2006, 00:34   #2
Shadowed_fury
Natural Born Chaos
 
Shadowed_fury's Avatar
 
Join Date: Nov 2003
Location: Telford
Posts: 7,135
Shadowed_fury is on a distinguished road
Thats quite clever tbh..
__________________
Steam: kev_fury New Frag Video
- Gigabyte EP43-DS3L - E8200 - XFX 4870 1GB - 4GB Kingston -
- Steel Series 7.1s - Logitech G11 - Razer Diamondback 3G - 2209WA -
Shadowed_fury is offline   Reply With Quote
Old 10th Mar 2006, 00:34   #3
Shepps
Slacking off since 1986..
 
Shepps's Avatar
 
Join Date: May 2002
Location: Stourbridge
Posts: 1,408
Shepps will become famous soon enough
Good read, some of it went over my head but interesting all the same.
Shepps is offline   Reply With Quote
Old 10th Mar 2006, 00:38   #4
LAGMonkey
Group 7 error
 
LAGMonkey's Avatar
 
Join Date: Aug 2004
Location: Canada/Saudi
Posts: 1,450
LAGMonkey will become famous soon enough
NOTE: read in the afternoon and NOT the evening.

But apart from that, very informative.
__________________
---------------LAG out!---------------


http://www.ctrlaltdel-online.com
100% Brainless]
LAGMonkey is offline   Reply With Quote
Old 10th Mar 2006, 00:39   #5
Knoxxy
Minimodder
 
Join Date: Jun 2005
Posts: 29
Knoxxy is on a distinguished road
Excellent job explaining things.
Knoxxy is offline   Reply With Quote
Old 10th Mar 2006, 01:11   #6
thecrownles
What's a Relix?
 
thecrownles's Avatar
 
Join Date: Feb 2004
Location: Chicago, Illinois-US
Posts: 734
thecrownles is on a distinguished road
sounds like they got some killer performance enhancements and some really great power management stuff coming up in the next batch. I can't wait to buy a new computer with one of these beasts. So do these processors classify as 64-bit like the Athlon 64's or are they now 128 bit, meaning even more performance than the athlons?
__________________
Sam0r: Mr. Winky got a little aroused, needless to say I was tenting.

Women... know your place.
thecrownles is offline   Reply With Quote
Old 10th Mar 2006, 08:37   #7
hitman012
What is a Dremel!?
Moderator
 
Join Date: May 2005
Location: Bristol / London
Posts: 4,811
hitman012 has a spectacular aura abouthitman012 has a spectacular aura abouthitman012 has a spectacular aura about
Nice article

Just one bit that isn't quite right as far as I know:
Quote:
For example, if you have a 31-stage pipeline, as you do in the Pentium 4 Prescot, you can do 31 operations on an instruction in one clock cycle. That's a lot of operations. The advantage of a long pipeline is that it reduces the amount of times when an instruction comes out the end of the pipeline without being completed. When this happens, it has to loop around and go through again to be finished. Long instructions matched with long pipelines makes for efficiency.
A longer pipeline actually does less work on an instruction per clock cycle than a shorter one. In a long pipeline, such as the ones advocated in NetBurst's Hyper Pipelining Technology, the total switching time for the gates on each stage is much shorter, and so the CPU does less work per cycle but can hit much higher clock speeds. In the FPU for example, you can send a FMUL instruction and run the whole thing in one unit at once, or alternatively, you can have several per-clock parts to the unit (FADD, FLD) and do it a piece at a time.

The disadvantage of a long pipeline - specifically, one as large as Prescott's - is that it hugely increases penalties for various inherent problems with the processor. Branch mispredicts, although rare, mean that the CPU has to actually flush the entire pipeline of instructions and load the correct branch again - the minimum P4 mispredict penalty is over 20 clock cycles for data in the L1 cache. Pipeline bubbles, where the processor can't schedule two instructions to be executing simultaneously (or close to one another), result in a further loss because the bubble (i.e. an empty pipeline stage) propagates all the way down the chain. I imagine that the huge amount of work that the Willamette team did with the scheduling unit to stop this happening has moved right into Conroe.

As far as I know, instructions can't go down the pipeline without being completed - however, the above flush means that some must be removed before they can finish and others put in their place.

I hope this doesn't sound like I'm criticising the article, because it does a very good job of explaining the architecture
__________________
"Nothing is more practical than a good theory"
- Kurt Lewin
hitman012 is offline   Reply With Quote
Old 10th Mar 2006, 12:39   #8
Bindibadgi
Richard Swinburne
bit-tech Staff
 
Bindibadgi's Avatar
 
Join Date: Mar 2001
Location: Omnipwntent
Posts: 28,284
Bindibadgi is a splendid one to beholdBindibadgi is a splendid one to beholdBindibadgi is a splendid one to beholdBindibadgi is a splendid one to beholdBindibadgi is a splendid one to beholdBindibadgi is a splendid one to beholdBindibadgi is a splendid one to behold
Quote:
Originally Posted by thecrownles
sounds like they got some killer performance enhancements and some really great power management stuff coming up in the next batch. I can't wait to buy a new computer with one of these beasts. So do these processors classify as 64-bit like the Athlon 64's or are they now 128 bit, meaning even more performance than the athlons?
No they are still 32/64bit, but can work with SSE(1/2/3) specific commands which are 128bits wide in total.
Bindibadgi is offline   Reply With Quote
Old 10th Mar 2006, 15:23   #9
rupbert
Hypermodder
 
rupbert's Avatar
 
Join Date: Jul 2002
Location: Northern Ireland
Posts: 799
rupbert is on a distinguished road
Great article!
rupbert is offline   Reply With Quote
Old 10th Mar 2006, 15:34   #10
WilHarris
Just another nobody
 
WilHarris's Avatar
 
Join Date: Jun 2001
Location: Oxford
Posts: 2,671
WilHarris is on a distinguished road
Quote:
Originally Posted by hitman012
Nice article

Just one bit that isn't quite right as far as I know:

A longer pipeline actually does less work on an instruction per clock cycle than a shorter one. In a long pipeline, such as the ones advocated in NetBurst's Hyper Pipelining Technology, the total switching time for the gates on each stage is much shorter, and so the CPU does less work per cycle but can hit much higher clock speeds. In the FPU for example, you can send a FMUL instruction and run the whole thing in one unit at once, or alternatively, you can have several per-clock parts to the unit (FADD, FLD) and do it a piece at a time.

The disadvantage of a long pipeline - specifically, one as large as Prescott's - is that it hugely increases penalties for various inherent problems with the processor. Branch mispredicts, although rare, mean that the CPU has to actually flush the entire pipeline of instructions and load the correct branch again - the minimum P4 mispredict penalty is over 20 clock cycles for data in the L1 cache. Pipeline bubbles, where the processor can't schedule two instructions to be executing simultaneously (or close to one another), result in a further loss because the bubble (i.e. an empty pipeline stage) propagates all the way down the chain. I imagine that the huge amount of work that the Willamette team did with the scheduling unit to stop this happening has moved right into Conroe.

As far as I know, instructions can't go down the pipeline without being completed - however, the above flush means that some must be removed before they can finish and others put in their place.

I hope this doesn't sound like I'm criticising the article, because it does a very good job of explaining the architecture
My head hurts
WilHarris is offline   Reply With Quote
Old 10th Mar 2006, 16:03   #11
Bindibadgi
Richard Swinburne
bit-tech Staff
 
Bindibadgi's Avatar
 
Join Date: Mar 2001
Location: Omnipwntent
Posts: 28,284
Bindibadgi is a splendid one to beholdBindibadgi is a splendid one to beholdBindibadgi is a splendid one to beholdBindibadgi is a splendid one to beholdBindibadgi is a splendid one to beholdBindibadgi is a splendid one to beholdBindibadgi is a splendid one to behold
Basically if you get a pipeline stall, for whatever reason, it has to flush part/all of it and start again. Long pipeline = massive performance hit. Longer pipeline = lower IPC but higher mhz because netburst was all about sheduling and keeping a large flow of data ready from high speed, long latency devises like RDRAM.
Bindibadgi is offline   Reply With Quote
Old 10th Mar 2006, 21:35   #12
hitman012
What is a Dremel!?
Moderator
 
Join Date: May 2005
Location: Bristol / London
Posts: 4,811
hitman012 has a spectacular aura abouthitman012 has a spectacular aura abouthitman012 has a spectacular aura about
Quote:
Originally Posted by WilHarris
My head hurts
It is a nasty subject, isn't it .

Quote:
For example, if you have a 31-stage pipeline, as you do in the Pentium 4 Prescot, you can do 31 operations on an instruction in one clock cycle.
In essence, 31 operations are not performed on a single instruction in one cycle - 31 operations are performed on 31 different instructions in one cycle, each of which is in a different stage of execution. That means that they'll still roll off that particular pipeline at a rate of one per clock, but higher clock speeds can be hit more easily since less work - which takes less time - is performed on that instruction (and the 30 others) in one cycle.
__________________
"Nothing is more practical than a good theory"
- Kurt Lewin
hitman012 is offline   Reply With Quote
Old 10th Mar 2006, 23:44   #13
RotoSequence
Lazy Lurker
 
RotoSequence's Avatar
 
Join Date: Jan 2004
Location: 'States
Posts: 4,533
RotoSequence is on a distinguished road
It stops the information from cycling through the processor again and again and again ultimately; netburst is all about rapidly chewing through the data it gets all at once, storing the important stuff, figuring out what needs to be done, and putting it through quickly, quickly, quickly; we just dont live in that type of a processing world.

/me wibbles himtan012 and his gift for extensive, in depth architecture talks
__________________
Technology Schmecnology
RotoSequence is offline   Reply With Quote
Old 12th Mar 2006, 09:25   #14
K.I.T.T.
Hasselhoff™ Inside
 
K.I.T.T.'s Avatar
 
Join Date: Jan 2005
Location: West Midlands, England
Posts: 581
K.I.T.T. is on a distinguished road
this should be interesting when it comes out i can't wait to see some perfromance figures....the way i like to see this is so;say the P4 precotts for example were like american cars they have BIG V8 engines (long pipelines) that chew through huge amounts of fuel (data) but still manage to produce not very much horsepower or torque (IPC) but you can strap on superchargers and such like (highier clock speeds etc..) to get a bit more power out of them...whereas on the other hand the Core architecture will be more like a European car it will have a sensibly sized engine (14 stage pipline the ballence for both long and short instructions) will use a sensible amount of fuel (in this case power) but still manage to jump up and down all over the V8 and produce more power and torque (IPC) then you can dd little bits here and there like an ECU re-map (the prefecther tweaks etc..) to give you more power.

i know its a bit of a strange way to look at it but it does make it nice and understandable i think
__________________
Founder of the RSPCM (see here)
My Blog

_____________________________________________
Q6600 (G0) @ 3.40Ghz | 4GB G.Skill (5-5-5-15 2T) @ 533Mhz | Radeon HD 4870 512MB GDDR5 | DFI Lanparty UT X48-T2R | Galaxy DXX 1000W | 640GB WD Caviar Blue| 1TB Samsung F1 | 250GB Deskstar
K.I.T.T. is offline   Reply With Quote
Old 13th Mar 2006, 13:05   #15
Meanmotion
Supermodder
 
Meanmotion's Avatar
 
Join Date: Nov 2003
Location: Bracknell nr. Ascot
Posts: 370
Meanmotion is on a distinguished road
Well, the big question now is do AMD have an answer to Core? Or are they relying on there current lineup to compete for a while yet?
__________________
Athlon64 X2 4400+ || A8N SLI Premium || 2Gb Corsair XMS Pro || 2*40Gb Raptor (striped) || 250GB External || Radeon X1950Pro || WinXP

www.OutForBlood.co.uk || www.4Qradio.com || www.EdwardChester.co.uk
Meanmotion is offline   Reply With Quote
Old 13th Mar 2006, 21:31   #16
nemesis80
Modder
 
Join Date: Aug 2004
Location: canada
Posts: 52
nemesis80 is on a distinguished road
Quote:
Originally Posted by Meanmotion
Well, the big question now is do AMD have an answer to Core? Or are they relying on there current lineup to compete for a while yet?
Well the AM2'S should be an answer
__________________
Asus P4P800 SE|P4 3.0E @ 3.6ghz |Utra 1536 mb PC3200 dual channel|Maxtor 80gb| Maxtor 120gb|Saphire X800 pro vivo-> xt pe @550/560|Artic Cooling V4|Thermalright xp-120|Lian Li pc-6077
nemesis80 is offline   Reply With Quote
Old 19th Mar 2006, 12:33   #17
nookie
Minimodder
 
Join Date: Mar 2005
Posts: 34
nookie is on a distinguished road
can you show us these PowePoint documents or whatever else you read for this article? I bet it would be interesting to see more pro-orientated form of this! / i can't belive i'm going to study this kind of **it after a year ...no really...286's way of working(which makes you understand all processors way of working) is in the program of my technical desciplines in 12th grade/
Is it true that Pentium M's microarchitecture is "closer" in design to Pentium 3's than it is to Pentium 4's?/one of the reasons its better than P4/
nookie is offline   Reply With Quote
Old 19th Mar 2006, 13:26   #18
hitman012
What is a Dremel!?
Moderator
 
Join Date: May 2005
Location: Bristol / London
Posts: 4,811
hitman012 has a spectacular aura abouthitman012 has a spectacular aura abouthitman012 has a spectacular aura about
Quote:
Originally Posted by nookie
Is it true that Pentium M's microarchitecture is "closer" in design to Pentium 3's than it is to Pentium 4's?/one of the reasons its better than P4/
Yes, that is correct. The NetBurst architecture - that of the Pentium 4 - was all but scrapped and, while they're keeping some features of it (such as its FSB), the majority of the design is based on the Pentium 3 with large improvements made. NetBurst's pipeline was simply too long; although they could hit huge clock speeds with it, its effect on performance due to continual flushing and bubbles (see my posts above) was too great.
__________________
"Nothing is more practical than a good theory"
- Kurt Lewin
hitman012 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT. The time now is 03:38.
Powered by: vBulletin Version 3
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.