View Single Post
Staro 02.11.2008., 20:17   #561
McG
-------
 
Datum registracije: Aug 2005
Lokacija: -
Postovi: 7,568
Detaljan pregled Nehalem arhitekture

It seems that even though Core microarchitecture has been extremely successful, there is something about it that doesn’t quite satisfy the microprocessor giant any more. And these reasons are not superficial. Core processors have a lot of advantages, sell very well and are way ahead of the competitor’s solutions. It turns out that a serious drawback of Core microarchitecture that makes Intel very unhappy is their non-modular design. Being a continuation of the mobile Pentium M CPUs, Core 2 microprocessors were initially designed as dual-core semiconductor dies. When they started making multi-core Core 2 and Xeon solutions later on, they discovered several drawbacks of this approach. Increasing the systems performance by adding more processor cores to CPUs and more processors to the systems would have sooner or later brought Intel to a dead end, despite the fact that contemporary Core microarchitecture seemed very successful overall. That is why Intel is working real hard to switch to new Nehalem microarchitecture.

The actual microarchitecture consists only of a few building blocks that will be used to form a processor at the final design and production stage. This set of building blocks includes a processor core with an L2 cache, L3 cache, QPI bus controller, memory controller, graphics core, etc. The Bloomfield CPU we are going to discuss fairly soon consists of four cores, an L3 cache, a memory controller and one QPI bus controller. Server processors with the same microarchitecture that should be announced in early 2009 will have up to eight cores, up to four QPI bus controllers for multi-processor systems, L3 cache and a memory controller. The upcoming budget Nehalem processors scheduled to come out in H2 2009 will have two cores, a memory controller, a graphics core and DMI bus controller connecting the processor directly to the South Bridge.



Citiraj:
Advanced Processor Core
Nehalem has the same number of the same decoders. However, Macrofusion technology did change significantly. First of all, there are more pairs of x86 instructions decoded “at one fling” within this technology. Secondly, Macrofusion technology in Nehalem processors works in 64-bit mode, while in Core 2 processors it could only be activated when the CPU worked with 32-bit code. So, CPUs with new microarchitecture will be able to decode five instructions per clock instead of four in a larger number of cases than their predecessors.

New processor cores support SMT technology and can simultaneously process up to two computational threads that require resource sharing. New processors with Nehalem microarchitecture should benefit even more from SMT. Firstly, they have memory subsystem with much higher bandwidth that can much better supply two computational processes with data. Secondly, Nehalem boasts “wider” microarchitecture that allows processing more instructions simultaneously. When SMT is enabled, all other resources are shared dynamically between processor threads or shared 50-50.





TLB and Cache-Memory
Intel engineers have significantly increased the size of the TLB (Translation-Lookaside Buffer). As you know, TLB is a high-speed buffer used to map over the physical and virtual page addresses. By making the TLB bigger they increase the number of memory pages that can be used without additional costly modifications employing address translation tables stored in regular memory. Moreover, TLB of Nehalem processors became dual-level. In fact, Intel simply added another L2 buffer to the TLB inherited from Core 2 processors. The new L2 TLB is not only large and can save up to 512 entries, but also boasts relatively low latency. Also, the new L2 TLB is unified and can translate page addresses of any size.

Nehalem microarchitecture allowing processors with up to 8 cores, doesn’t have a shared L2 cache any more. Each core gets its own L2 cache of relatively small size: 256KB. Nehalem also acquired L3 cache, which connects all cores and is shared. As a result, L2 cache turns into a buffer when processor cores send their requests to pretty big shared cache-memory.





SSE4.2 Instructions
Intel continued increasing the number of supported SIMD instructions in their new Nehalem microarchitecture. They added a set of seven new instructions called SSE4.2. Intel specifically stressed that the new SSE4.2 instructions are designed not that much for the processing of streaming media content, but for slightly different things.



Integrated Memory Controller
The main feature of Nehalem processors memory controller is its flexibility. Keeping in mind the modular design of the entire upcoming processor family, that may include solutions differing dramatically in features and market positioning, Intel foresaw the opportunity not just to enable or disable buffered modules, but also to vary the memory speed and the number of channels. The first processors with Nehalem microarchitecture will be quad-core models and they will have a triple-channel memory controller supporting DDR3 SDRAM. This way, desktop systems built on new CPUs will boast unprecedented memory subsystem bandwidth. With three DDR3-1067 SDRAM modules it will reach 25.6GB/s.





QPI Bus
Nehalem microarchitecture is universal, it should be used for desktop and mobile as well as server solutions. That is why Intel designed a new processor bus that could suit for multi-processor systems and provide sufficient bandwidth and scalability. On the technical side, QPI consists of two 20-bit links transferring data forward and back. 16 bit are assigned for data and the remaining 4 bits serve some auxiliary purpose: they are used by the protocol and error correction. This bus performs maximum 6.4 mln transfers per second (GT/s) and has 12.8GB/s bandwidth in each direction, or 25.6GB/s total bandwidth.

Depending on their market positioning, processors on Nehalem microarchitecture may come equipped with one or multiple QPI interfaces. As a result, each CPU in the multi-processor system may be directly connected to all other processors to reduce the latency when working with the memory connected to another controller. CPUs for single-processor desktop systems will have one QPI connecting it to the chipset.



Power Management and Turbo-Mode
Power Control Unit is actually just another programmable micro-controller built into the CPU that should manage power consumption intelligently. No wonder that PCU is of pretty complex design: it consists of about 1 million transistors. PCU’s main task is to adjust the frequency and voltage of individual cores and it has everything it takes for that. It receives the sensor readings of temperatures, voltage and current for all cores. PCU analyzes these data and switches qualifying cores to power-saving mode by adjusting their frequency and voltage. Namely, PCU may disable inactive cores and put them in deep sleep state where their power consumption will be close to 0.

According to Turbo Boost Technology main principle, the overall processor power consumption and heat dissipation lowers when some cores go into power-saving mode, which allows increasing the frequencies of other cores without risking to get past the TDP limits. Turbo-mode doesn’t necessarily get enabled when one or more cores go into power-saving mode.


First mass production processors based on new Nehalem microarchitecture will be desktop processors codenamed Bloomfield. They will have four cores. Besides these four cores the Bloomfield processor die will also contain 8MB L3 cache, triple-channel memory controller supporting DDR3 SDRAM and one QPI interface. The CPU like that will consist of 731 mln transistors and will be manufactured with 45nm process using high-k dielectric metal gates. Bloomfield processors will be marketed as Core i7. The first models due to come out in mid November will work at 3.2, 2.93 and 2.66GHz frequencies and the typical heat dissipation of all three will be set at 130W.



Izvor: X-Bit Labs
McG je offline   Reply With Quote