![](/Content/images/logo2.png)
Original Link: https://www.anandtech.com/show/1248
Intel Developer Forum Spring 2004 - Wrapup
by Derek Wilson on February 23, 2004 8:44 PM EST- Posted in
- Trade Shows
The final day's keynote is always a thought provoking experience. This is the time during the forum where Intel looks deep into its R&D labs and gives us a little glimpse of what the future holds. We heard from Sean Maloney, VP and GM of Intel's Communications Group, and Pat Gelsinger, the CTO of Intel, on all the latest and greatest ideas Intel is focusing on.
In addition to the final day's keynote, this wrap up will take a look at the floor of the Technology Showcase. We will also be looking a little more in depth at what exactly is going on with PCI Express, ATI, and NVIDIA.
We are still reading through documents and doing research on Intel's x86-64 extensions, though there isn't any more news we can bring you at this moment. As with other processors technologies, when x86-64 is finally enabled (when Nocona launches), we will have an in depth analysis of the architecture enhancements.
Broadband Wireless Technology
Back in the days of the original Quake, average users first realized that their computer just wasn't fast enough. In response, processors, graphics cards and systems were pushed to run games very well. Even still, games are the applications that tend to push users sytems to their limits. Sean Maloney pinpoints broadband as the next area that will push computers to their limit. As broadband wireless becomes a reality, portable wide pipes will push PDAs and other devices to actually use the data to which they have access.
In looking at future technology to push portable devices, Intel is targeting key areas that are current bottlenecks with portable systems. Their first announcement of the keynote was of a 90nm NOR Flash Memory device intended to help speed up the normally slow memory used. Sean then ran a demo of a portable visualization technology (codenamed Carbonado) that can play full motion video and push quite a few polygons/second to run 3D games at smooth frame rates. At this rate, we may have to expand our graphics coverage to include cell phone GPUs.
Unfortunately, Sean didn't want to talk much about their radio enhancements (indicating that the next IDF might lend a little more information in this area). He did indicate that Intel is exploring MEMS systems for use in radios.
The success or failure of products using these technologies depends heavily on the availability of wireless broadband and pervasive networking. Intel isn't going to leave those technologies alone either. We saw a demo of Xilinx's implementation of the recently finalized AS interconnect standard. In addition, Intel is working on 10Gbps and 1Gpbs network switch silicon (90nm of course), 4Gbps optical transceivers (due out 2h '04), and even a 10 gigabit PCI-X ethernet card. Sean was also very happy with the current push toward 802.16 and WiMAX. One of the most interesting numbers Intel threw out is that they expect 802.16e (portable WiMAX) to pop up in 2006.
The Information Age
Pat Gelsinger's presentation began with a bit of a history lesson taking us from kilobyte to gigabytes. What followed was a discussion of how Intel will attack the "Era of Tera" as Pat dubbed the next step up.
Much of the rest of the keynote was geared toward answering the questions of why we need "tera" anything, and how Intel plans to approach the problem of achieving such high performance computing. The answer to the first question came back to reflect what Sean Maloney had said about broadband pushing computer systems to their limit: recognition, mining, and synthesis of data (for which Gelsinger used the poorly chosen acronym RMS). Essentially, Gelsinger is saying that the "Era of Tera" will allow us to operate quickly on massive data sets to identify very complex patterns and situations, as well as help us generate data that blurs the lines of reality.
As examples of usefulness, Pat mentioned that computers were able to detect the possibility of what happened on September 11, but they were a week or two late in doing so. Data mining would allow us to do such things as search the web for image data based on what the image actually looked like. As for an example of synthesis, we were shown a demo of realtime raytracing. Visualization being the infinitely parallelizable problem that it is, this demo was a software renderer running on a cluster of 23 dual 2.2GHz Xeon processors. The world will be a beautiful place when we can pack this kind of power into a GPU and call it a day.
Of course, we still need to answer the question of how we are going to get from here to there. As surprising as it may seem, Intel's answer isn't to push for ever increasing frequencies. With some nifty charts and graphs, Pat showed us that we wouldn't be able to rely on increases in clock frequency giving us the same increases in performance as we have had in the past. The graphs showed the power density of Intel processors approaching that of the sun if it remains on its current trend, as well as a graph showing that the faster a processor, the more cycles it wastes waiting for data from memory (since memory latency hasn't decreased at the same rate as clock speed has increased). Also, as chips are fabbed with smaller and smaller processes, increasing clock speeds will lead to problems with moving data across around a chip in less than one clock cycle (because of interconnect RC delays).
In addition to clock speed not being able to pull us out of the mud, architectural advances in processors are limited by the maximum instruction level parallelism (ILP) available in any given program (the max amount of work a processor can do is limited because not all instructions can be completed in parallel: some instructions are dependant on the result of other instructions). Since the average maximum ILP isn't increasing in programs, we will need to find another way to increase the performance of a processor.
If clock frequency isn't going to get us anywhere, and we are hitting a wall with increasing how many instructions per cycle we can complete in a single program, the only other option is to increase parallelism on the thread level. Rather than trying to get more done in a single program or thread, we will have to have multiple processors running independent code at the same time. Intel's first step in this direction was the baby step of Hyper Threading, but dual core, multicore and massively multicore processors are on the horizon for Intel.
In addition to massively multicore architectures, Intel needs to eliminate bottlenecks from other parts of the system as well. One of the ways they plan on doing this is via a feature called Helper Threads. Apparently, half of the execution time of any given processes is spent waiting for data. If that data could be available in the cache for the processes when they needed it, everything would run much faster. Helper Threads are apparently able to warm up the cache for a specific process when they would normally have a cache miss. In the demo of Helper Threads Intel ran a benchmark on an Itanium processor and a "research Itanium," and we saw 8.9% speedup and 23% fewer cache misses from the Helper Thread enabled side.
One of the other paths Intel is looking down is adaptability. Adaptive body biasing (forward biasing a transistor when it is on, and reverse biasing when it is off) to increase performance and decrease power lost to leakage is being explored on the silicon level. On the large scale, adaptive architectures and platforms are being explored. Reconfigurable architectures such as adaptive wireless radio arrays that can be easily reconfigured to work with multiple types of wireless networks are another example of the kind of adaptability Intel wants to see evolve in the future.
By utilizing massive multiprocessing and adaptive/programmable architectures, the hope is that systems will be able to form themselves to the needs of the programs they are running while doing as many things as possible in any given nanosecond (or part thereof as the case may be).
Of course, that's the future. Dual core processors aren't even going to be showing up this year (though next year might be a different story if we are lucky), and reconfigurable and adaptive computing has been discussed for a very long time. It is very exciting to see what some of the visions Intel's farthest looking people have to say about where we are headed, but it also serves to make us a little bit like the next few years will be an eternal day-before-Christmas.
The Technology Showcase
Aside from the PCI Express and RAM technologies we talked about in our IDF Day 1 article, The most exiting thing on the floor was the new motherboard chipset spread all around for us to drool over.
First off, we were able to check out Grantsdale and Alderwood in their cute little BTX boxes. Here's what they looked like:
After checking out those bad boys, we decided we needed a look at the new Xeon line of motherboards as well. We sneaked a peak at the workstation class board using the Tumwater chipset from Tyan here:
This Tyan board is based on a reference design from Intel. The reference Tumwater board is pictured in action here:
In a very convenient twist of fate, one of the designers of the Tumwater and Lindenhurst reference boards happens to be a reader and forum member here (and over at ArsTechnica) who goes by KalTorak. We talked about such cool things as the weight of those nifty cooling solutions in the picture (they are about a pound each and need to be anchored to the chassis since the motherboard can't support them) and just why we couldn't crack open the other systems to take a look at the boards within.
Apparently there were quite a few boards on display at IDF that shouldn't have been as Intel didn't want to disclose the maximum amount of RAM or PCI Express slots available on Tumwater and Lindenhurst. That's why we couldn't take a look inside the systems running any of the cool PCI Express graphics demos either.
We did manage to get some pictures of a server board that we've edited to keep everyone out of trouble. Take a look:
We saw another Lindenhurst board over at the Corsair booth showing off their DDR2 modules. It also had a little contraband material on it as well, but that's been taken care of here also.
Most of the rest of the show was Itanium and server hardware, or how everyone had an example of (insert hardware here) with a PCI Express interface. The only thing left to tackle on the Technology Showcase is the differences between ATI and NVIDIA's PCI Express solutions.
More on PCI Express and Graphics
There is going to be a lot of fighting over the next few months about the performance of PCI Express based graphics solutions. Unfortunately, we won't really know what's what until we have hardware in our hands that we can play with to test real world performance. In order to try to straighten out some of the madness that will surely be going on, we will try to present a better picture of what things are looking like with PCI Express graphics cards at the moment. First we will attempt to explain why the contenders chose their paths.
ATI went with a native PCI Express interface because it gives them a full 4GB/s upstream and downstream bandwidth at the same time. This will allow for some massive amounts of data to move between the GPU and the CPU/main memory in both directions. They also have the advantage of not needing an extra component on the graphics card itself.
NVIDIA chose to use a bridged solution (which they like to call their High Speed Interconnect or HSI solution) which gives them the ability to only produce one GPU for both AGP and PCI Express based solutions while the transition is being made to the new platform. This gives them the advantage of being more flexible to demand for AGP and PCI Express based products, and they won't have to forecast just how many of which type of GPU they will sell in any given silicon run.
The main point of contention between the two camps is bandwidth. We know what ATI is capable of; now let's take a look at NVIDIA. Because the bridge is in such close proximity to the AGP interface of the GPU, NVIDIA is able to run the AGP side of their bridge at 4GB/s (which is twice the bandwidth of AGP 8x). As NVIDIA is bridging PCI Express's 4GB/s up and 4GB/s down to an AGP 16x bus, they will not be able to sustain the full bandwidth of the x16 PCI Express interface in both directions at the same time. If data is moving in only one direction, there is no bandwidth loss. If data needs to move up and down at the same time, one data stream will have to sacrifice some bandwidth to the other.
So, what kind of impact will this have? We can't really say until we have hardware. Of course, DDR400 will only give us 3.2GB/s of bandwidth (or 6.4GB/s in dual channel mode), so transfers from memory to the GPU will actually be memory speed limited on systems that don't have 8GB/s of memory bandwidth. As for games, we can take a look at the history of the impact of increasing AGP bandwidths. It is possible that future games (and possibly games ported by lazy console developers) may want to use the CPU and main memory a great deal and therefore benefit from PCI Express, but again, we won't know until the hardware and the games are out there.
NVIDIA's NV5x line is slated to be native PCI Express, but they also plan on continuing AGP support through that line by flipping their bridge chip upside down. In fact, we were told that if demand persists into the future, it is very possible that NVIDIA will bridge its NV6x GPUs back to AGP as well. This bus transition is a really tough one to call. ISA took a long time to disappear, and PCI graphics cards are still selling (generally to those who want two graphics cards, but the point is that they are still out there). The difference here is that Intel is pushing really hard for this transition, and AGP is a much more targeted solution than either the ISA or PCI buses.
From a business standpoint, NVIDIA has made a safer choice than ATI. Of course, if performance ends up suffering, or if ATI shows effective, affordable applications for the PCI Express bus that NVIDIA can't take advantage of, NVIDIA will suffer. In this business performance is everything. Obviously having the full bandwidth of PCI Express is a desirable thing. Eventually everyone will be making native PCI Express GPUs. But when is eventually? And when will that available bandwidth be tapped, let alone necessary? Only time will tell, but hopefully we've filled in some of the gaps until that time comes.
Final Words
With the exception of the announcement that Intel would be picking up 64 bit extensions for their x86 line of processors, this springs forum has been fairly subdued. Much of the interest at the forum focused PCI Express as an emerging technology, and there were plenty of booths with PCI Express based technology being shown off. While our 128 core processors are quite a ways down the road, the next generation of graphics technology is rapidly approaching, and it will be very interesting to see what impact the new bus technology has on performance.
While there wasn't as much excitement as the P4EE announcement of the fall, everyone who was at IDF this spring witnessed the first of an incredible number of changes that are coming to a platform near you. With new sockets, busses, platforms, memories, and everything else coming our way, it is sure to be an exciting year.
We hope you've enjoyed our coverage of this year's Intel Developer Forum. We will be back in San Francisco in the Fall to bring you more coverage from a time when all the questions we've asked here will have been answered, yet our dreams of multicore CPUs will continue.