![](/Content/images/logo2.png)
Original Link: https://www.anandtech.com/show/994
Intel Developer Forum Fall 2002 - Hyper-Threading & Memory Roadmap
by Anand Lal Shimpi on September 16, 2002 6:05 PM EST- Posted in
- Trade Shows
This year's fall Intel Developer Forum came to a close last week and we're here today to tie up some loose ends and conclude our coverage of the conference. There are many ways to measure the success of the show; attendance was relatively unchanged from the Spring show which is a good thing considering the current state of the industry.
In terms of how exciting the show was, compared to last Spring there were definitely fewer announcements of interest; for example, here's some of what we saw just 6 months ago at IDF:
- Details of the Itanium roadmap through 2004
- Early introduction of the Prescott core
- Introduction of two new PC form factors (Tidewater and Bigwater)
- The world's first public Banias demo on the Odem chipset
This fall however, we were only given a little bit to digest:
- More Banias information
- Pentium 4 with Hyper-Threading 3.06GHz demonstration
As usual, the Technology Showcase held some surprises but we definitely had hoped for more information on forthcoming products from Intel (a few Prescott tidbits would've helped seeing as how the chip will be sampling by the end of this year but it may be silence that's Intel's weapon against Hammer).
Considering that the two major products talked about at this IDF were Banias and the HT-enabled Pentium 4 you'd expect us to devote the most of our time to covering those two. We've already talked Banias to death in our earlier coverage, so today we'll talk a lot about Hyper-Threading and what it will mean for your desktop experience.
Dinner with Pat: Hyper-Threading over Cocktails
We were fortunate enough to have dinner with Intel's CTO, Pat Gelsinger to pick his brain about technology. Being an engineer at heart, Pat will give you more than enough detail on any aspect of Intel's technologies. We asked him what he felt was the most exciting thing in Intel's current arsenal and he responded with a simple "threading."
In reference to Hyper-Threading, Pat explained exactly why this is very important to the future of Intel microprocessors. For the past decade or so the focus of Intel's microprocessor architects have been on extracting instruction level parallelism (ILP) or simply put, making sure that the CPU could execute as many instructions as possible in parallel. Today, the limits of ILP engineering are kicking in and both AMD and Intel are noticing diminishing returns from all architectural enhancements aimed at improving the ILP of CPUs. You can optimize your CPU architecture for executing many instructions in parallel but as long as the CPU is not being fed those instructions fast enough your optimizations will be useless; this is why there's such a focus on latency to main memory and memory bandwidth. The faster the CPU-memory link is, the more instructions the CPU can start crunching on which benefit from the ILP optimizations put in these CPUs; unfortunately today's memory subsystems aren't anywhere near as fast as they need to be and thus we are beginning to hit a wall.
Pat admitted that much of Intel's success over the past decade has been due to their fabs and manufacturing capabilities that allowed them to increase cache sizes and implement a good deal of parallel execution units; moving forward however, the challenge lies on Intel to go beyond just manufacturing leadership.
The changes to the Northwood core were minimal to implement Hyper-Threading.
Using the Northwood Pentium 4 core as an example, doubling the cache of a processor ends up increasing performance 5 - 10% across the board in today's applications. This increase in performance comes at an extremely high price however, by increasing the die size tremendously and upping transistor count. Pat mentioned that although Intel may implement an on-die memory controller in future CPUs (ala Hammer) this isn't the only solution to the problem of starving execution units; even with an on-die memory controller there is still a noticeable latency between the CPU and main memory and cache misses result in a number of idle CPU clocks, there are fewer misses but they still exist.
Pat's take on the situation is that if you can't keep the CPU busy with instructions, throw another thread at it and voila, CPU efficiency jumps from ~35% to much closer to 50%. We've explained the reasons for this before but to recap, most current desktop processors can only execute one thread at a time. A thread can be thought of as a collection of instructions that are sent to the CPU to be executed; every application that runs dispatches threads to the CPU to be executed, with each application sending its own threads. If a single CPU could receive two threads simultaneously, the CPU could now receive more instructions to work on and thus keeping its pipeline filled and busy. A single application could send two threads (called a multithreaded application), such as a 3D rendering application; one thread could handle the rendering of even lines while the other thread would render odd numbered lines. You could also have two single threaded applications running simultaneously, like running Microsoft Outlook while encoding a video to MPEG-4. This is exactly what Intel's Hyper-Threading technology enables, and in Q4 it will be coming to the desktop with the 3.06GHz Pentium 4.
But according to Pat, this is just the first step in a very long road for Intel. Including Hyper-Threading only costs about 5% of die space and can improve performance from 0 - 35% in every day applications and scenarios. Intel showed off some very impressive demos of the current version of Hyper-Threading in the 3.06GHz Pentium 4 and most all of our doubts about the technology's feasibility vanished. Performance in today's multitasking scenarios will improve in a very noticeable way, and gains in multithreaded applications are very respectable (DiVX encoding fans will definitely appreciate Hyper-Threading).
The future of Hyper-Threading will assume many different faces, many of which Pat is overseeing research on right now. One of the most interesting areas is in the compiler realm; future versions of Intel compilers may be able to generate support for pseudo-multithreading on their own. As we've described before, pseudo-multithreading involves the creation of helper threads that go further into program execution and speculate on data that will be used in the near future and pulling it into the processor's cache. Assuming that a helper thread correctly pulls something into cache that wouldn't have normally been in cache, performance can be improved considerably as it would've just avoided a cache miss.
Another area for the future of Hyper-Threading is allowing more than two threads to be executed on the CPU simultaneously. Remember that the number of instructions the CPU executes per clock (IPC) can be improved by around 40% by moving to two threads, the theory is that with more threads performance could improve even more. There is one thing to watch out for, performance degradation could result if these threads begin combating for the same resources so more than two threads is still a little while away.
Pat sees Hyper-Threading and a general focus on thread level parallelism (TLP) as the future trend of microprocessor development, at least at Intel. We asked Pat what he thought about AMD's approach and he responded that every day AMD doesn't have Hyper-Threading makes him happier. AMD could take a different approach by moving to affordable multiprocessor designs with future versions of Hammer; remember that with a multiprocessor system you still have the ability to run more than one thread at a time (one thread per CPU x the number of CPUs) thus moving towards a TLP optimized design. With the very low size of AMD cores this may end up being their answer to Intel's Hyper-Threading, although a more expensive route.
Hyper-Threading and Memory Bandwidth
The concept of Hyper-Threading brings up an interesting point and that is - does Hyper-Threading increase a CPU's dependency on memory bandwidth? The reasoning behind that statement is simple; if two threads are executing on the CPU concurrently then you've effectively halved the amount of cache that each individual thread has access to. Halving the cache means that each thread will end up requesting more data from memory, which increases the CPU's dependency on a high-bandwidth and highly efficient memory subsystem. With RDRAM offering very high bandwidth and much higher theoretical utilization than DDR SDRAM, could Hyper-Threading be RDRAM's killer app?
We posed this question to Pat Gelsinger and his response took an angle that we hadn't originally thought of. The premise of the increased memory bandwidth requirements argument is that the two threads that are executing don't exhibit much (if any) locality. Remember that data is pulled into the CPU's cache based on two principles - locality and temporality. The principle of locality states that if one block of data is requested, it's very likely that data located around it will be requested in the future; this results in the CPU not only pulling into cache the block of data that's requested but also what's around it in memory.
If the threads being executed do exhibit certain levels of locality then most of what they need will already be in cache, but what if they don't? Well it turns out that in most of the multitasking/multithreaded scenarios there is a good deal of locality of reference present, but let's say for the sake of argument that there's not. Another very popular situation occurs when you're running one small background application (e.g. virus scanning software) alongside a much larger application. The memory footprint for the background application is usually relatively tiny and the working set size small enough that it can fit entirely within the cache of the CPU; although this does take away some of the room in the cache for the larger program its usually a small enough amount that it doesn't matter or the larger program is an application that is going to main memory a lot anyways and thus performance doesn't change.
The final scenario is one where you have two very large applications running with next to no locality in their memory accesses, in which case dependency on a high bandwidth, highly efficient memory subsystem does go up with Hyper-Threading enabled. These situations are much more specialized and harder to find but it is something to keep in mind when investigating HT performance.
Intel's Memory Roadmap - RDRAM or DDR II?
In the late 90's Intel made quite a few mistakes when it came to their chipsets. Intel tried to put all their weight behind one memory technology (Rambus) and designed all of their chipsets around the technology; Intel even designed CPUs around RDRAM (Timna) and it backfired in a major way. Motherboard manufacturers rejected Intel's first RDRAM chipset as the industry was not ready to switch to Rambus at that time which paved the way for VIA to rise to the top with a number of SDRAM equipped solutions.
Today Intel has learned from their mistakes and although many (us included) expected Intel to completely drop RDRAM from their roadmap, it turns out that they haven't. If Intel has truly learned from their mistakes the solution to their memory problems isn't to completely ditch one memory type, but to let fate play its role and let the industry decide what memory technology to embrace; luckily (for Intel's sake) that's what they seem to be doing.
By the end of this year Intel will adopt DDR333 support in their chipsets, but as of now there are no plans for DDR400 support citing the lack of an industry standard and difficulties in mass producing DDR400. Intel will also embrace PC1066 which ended up being no more than a simple mod to the motherboard design. What about after DDR333 and RDRAM?
Intel's current roadmap shows both DDR-II and RDRAM support moving forward, with DDR-II not arriving until 2004. This is refreshing to see as it does illustrate Intel learning from past mistakes - they're not tying themselves down to a single memory technology, a huge improvement over what we saw a couple of years ago.
As you can see from the roadmap above, RDRAM is aimed at only the highest market segments with DDR/DDR-II taking the mainstream. The one thing that Intel did mention was that the technology eventually embraced by the industry would have to accommodate all segments, this division according to price cannot continue for much longer. As far as which memory technology will prevail, thankfully Intel is leaving that up to the rest of the industry this time around.
Going forward there needs to be a single memory technology for all segments
(much like DDR today)
Until then, it looks like we'll see both RDRAM and DDR (and eventually DDR-II) support from Intel.
Final Words
On this note we conclude yet another week at the Intel Developer Forum, as always we'll be back in 6 months covering the Spring 2003 IDF in February.