Original Link: https://www.anandtech.com/show/13239/intel-at-hot-chips-2018-showing-the-ankle-of-cascade-lake



The final presentation of the Hot Chips event this week is from Intel, with a talk on its next generation Xeon Scalable platform, Cascade Lake. We recently learned about Intel’s Xeon Roadmap at the recent Datacenter Insider Summit, consisting of Cascade Lake in 2018, Cooper Lake in 2019, and Ice Lake in 2020, and now Hot Chips is the first chance for Intel to add some more information to the mix. Previously this would have been done at events like IDF, over several hours, but Intel only has 30 minutes on stage here. We picked up the slides before the presentation.

The Key Takeaways

Intel is using the opportunity to expand on Cascade Lake’s previously announced features: new instructions for machine learning by taking advantage of the AVX-512 unit, and how the platform is set to be protected / hardened against attacks such as Spectre and Meltdown. We also have confirmation about how the new Optane DIMMs, Apache Pass, will be enabled through the platform.

Unfortunately, for those expecting an IDF-like substantial talk about the design of the chips, how the SKUs will be separated, or what the product stack will look like, this was not it. We have a feeling that Intel will be drip feeding information about Cascade Lake in this manner.

However, Intel’s main play here is that a significant amount of the server and enterprise industry desperately want Spectre and Meltdown hardened processors, and will pay for them. When Intel stated that they expect Cascade to be their fastest ramping processor, make no mistake that this is likely to be true, for the reasons of security. The question marks will obviously come on price, which has yet to be announced, but Intel could easily argue ‘how much is security worth?’.

Protecting for Spectre, Meltdown, and Similar Attacks

With the range of new attacks, Intel and others moved quickly to enable firmware and operating system remedies. The downside of those remedies was a potential loss in performance, mainly due to kernel switching, that on the latest platforms could account for 3-10% performance (or more old older systems). When we spoke with Lisa Spelman, VP of Intel’s DCG, we were told that the hardware-based mitigations in Cascade Lake would have an impact on the performance loss – exactly how much was not stated, and we were told that ‘the comparison is sort of apples to oranges – either way performance is set to be increased [because of platform updates]’. Lisa did categorically state that ‘the hardware fixes put the performance back on track’, which is the key takeaway.

For the variants of side-channel attacks, Intel is applying the following updates to Cascade Lake:

Variant 1 is still to be tackled at the OS level, with variants 3a and 4 through firmware and OS updates. Variants 2, 3, and 5, will be solved in hardware, requiring no extra additions.

So while the new processors have fixes in place, not all of them will be hardware fixes. The firmware fixes might as well be hardware, given that the system will launch with these by default, but the OS fixes will have to be pushed before platforms are released. The non-hardware fixes have the potential for performance regression, however as stated above, the platform as a whole should be at a higher performance level than Skylake.

Intel did not state what the lead time was for different variant attacks to be added in hardware beyond Cascade Lake, however the earlier they know, the better. Lisa Spelman did state that every new processor features security updates, and the teams they have will be working hard to provide the best solution.

Page 2: Process Tuning and VNNI
Page 3: Optane DIMMs and Slides

Top image originally from ServeTheHome



Purley Mark Two: Cascade Lake-SP

On the processor front, the on-paper hardware specifications of Cascade Lake Xeons offer no surprises, mainly because the stock design is identical to Skylake Xeons. Users will be offered up to 28 cores with hyperthreading, the same levels of cache, the same UPI connectivity, the same number of PCIe lanes, the same number of channels of memory, and the same maximum supported frequency of memory.

Questions still to be answered will be if the XCC/HCC/LCC silicon dies, from which the processor stack will come, will be the same. There is also no information about memory capacity limitations.

What Intel is saying on this slide however is in the second bullet point:

  • Process tuning, frequency push, targeted performance improvements

We believe this is a tie-in to Intel improving its 14nm process further, tuning it for voltage and frequency, or a better binning. At this point Intel has not stated if Cascade Lake is using the ‘14++’ process node, to use Intel’s own naming scheme, although we expect it to be the case. We suspect that Intel might drop the +++ naming scheme altogether, if this isn’t disclosed closer to the time. However a drive to 10% better frequency at the same voltage would be warmly welcomed.

Where some of the performance will come from is in the new deep learning instructions, as well as the support for Optane DIMMs.

AVX-512 VNNI Instructions for Deep Learning

The world of AVX-512 instruction support is completely confusing. Different processors and different families support various sets of instructions, and it is hard to keep track of them all, let alone code for them. Luckily for Intel (and others), companies that invest into deep learning tend to focus on one particular set of microarchitectures for their work. As a result, Intel has been working with software developers to optimize code paths for Xeon Scalable systems. In fact, Intel is claiming to have already secured a 5.4x increase in inference throughput on Caffe / ResNet50 since the launch of Skylake – partially though code and parallelism optimizations, but partially though reduced precision and multiple concurrent instances also.

With VNNI, or Vector Neural Network Instructions, Intel expects to double its neural network performance with Cascade Lake. Behind VNNI are two key instructions that can optimized and decoded, reducing work:

Both instructions aim to reduce the number of required manipulations within inner convulsion loops for neural networks.

VPDPWSSD, the INT16 version of the two instructions, fuses two INT16 instructions and uses a third INT32 constant to replace PMADDWD and VPADD math that current AVX-512 would use:

VPDPBUSD does a similar thing, but takes it one stage back, using INT8 inputs to reduce a three-instruction path into a one-instruction implementation:

The key part from Intel here is that with the right data-set, these two instructions will improve the number of elements processed per cycle by 2x and 3x respectively.

Framework and Library for these new instructions will be part of Caffe, mxnet, TensorFlow, and Intel’s MKL-DNN.



Making the Most of Memory: Optane DC Persistent Memory

The week before Computex, Intel announced its new Optane DIMMs, and stated that they will be coming to market in three capacities: 128GB, 256GB, and 512GB. The new persistent memory was explained as being a high capacity SSD with that acts as DRAM with similar latencies, available to hold large databases or enable systems to quickly recover from power loss to improve uptimes.

The Hot Chips presentation confirms that the new Optane DIMMs will be enabled at one per memory channel, allowing a single socket to contain six memory modules and six Optane DIMMs at once. For those counting along at home, that is a 128 GB LRDIMM + 512 GB of Optane per channel, or 3840GB per socket.

What this doesn’t state is if Optane will be supported on all processors, or select high-memory SKUs at extra cost. We have seen a few prices flying around for the 512 GB DIMMs, although we cannot verify this. Intel’s own @IntelBusiness Twitter account recently posted this picture, attempting to show that the DIMMs were shipping.

I’m pretty sure the people on the left are making the money hand gesture

If Intel is shipping Optane DIMMs in this quantity already, that means that high-profile customers that are part of Intel’s Early Sampling program are already buying them in bulk quantities. It will be interesting to see if they ever post any data about the product.

Unanswered Questions

Sure, it is frustrating that Intel has not opened the lid fully on Cascade Lake yet. The pure takeaway I can give you is that I suspect the processors will be optimized for efficiency and frequencies will improve, but the core designs will likely look very much the same as we have now. Intel will be using the opportunity, alongside the DIMMs and VNNI, to offer a product that has a number of the Spectre and Meltdown variants fixed in hardware. A lot of people are waiting for these parts, and are prepared to pay for them. It will be interesting to see what the pricing will be later in the year.

Slide Deck

Related Reading

Log in

Don't have an account? Sign up now