I may be way off here, but to me the transition to heterogeneous cores is more of a long term than short them play, anticipating the era where we'll have different kinds of cores in our systems.
Even the performance cores of today have circuitry that's there to make certain workloads faster (video decoding, crypto operations, etc.). Other stuff has mostly been offset to the GPUs (actual grahphics-related stuff, machine learning).
In an architecture where you can mix/match different kinds of cores, it doesn't sound crazy to have cores that are specially designed for those tasks. Processors could ship with one "crypto-core", a "video-decoding-core", etc.. These should provide similar performance to full performance cores at _much_ lower cost, energy consumption and die area.
If (and that's a big if) they can get to a point where:
a) Designing/implementing new specific-purpose small cores is not outrageously expensive because the whole coordination/integration part is already solved
b) They have a combination of cores that provides similar performance to current p-cores-only cpu's at similar processor cost and power budget.
That would be a huge win for processor companies. They would be able to open lines of specialized processors commanding huge premiums for certain industries (think about processors with 128 ml-cores, 128 video-encoding-cores, etc.).
Or maybe I'm just delusional :)Reply
Intel Cell B.E but just around 12 years late.Reply
Calling Gracemont (the E cores) “Atom technology” is a bit weak. They support out of order execution and AVX2.
Gracemont is faster, clock-for-clock than Broadwell and nearly as fast as Skylake running ChaChaPoly-1305, as used in WireGuard, largely due to avx2.
Intel’s benchmarking shows 80% performance peak vs peak of 4C of Gracemont vs 2C4T of Skylake.Reply
Even more reason to not bother writing code for AVX512 support. Intel seems committed to having no advantage over AMD.Reply
It's strange Agner seems so against this hybrid design despite the fact it's been proven to work so well for ARM. I'd agree this architecture is very difficult to optimize a single application for, but on the other hand he doesn't seem to consider whether it can lead to better performance and battery life in everyday use cases with mixed workloads using less than 100% of available performance.Reply
Just trying to figure this out.
So the low power high power cores do seem to be inspired by the M1. If they can implement that easily in x86 is another thing but if anyone can Intel can I guess.
The half prescion fps are obviously aimed at ML but how on earth are the hoping to compete with gpus on this front? Where is the use case on this (genuine question).
The complex numbers bit makes me think of quantum circuit simulation (but that could be bias on my part as that's what I'm looking at). But this is a relatively small field at the moment, and just through normal use nvideas backend simulator blows cpu performance out of the water (both for speed and number of qubits simulated).
I assume I'm missing the point here and it is heavily used in compression or something but again if someone had an idea of the usecase for complex number instructions that would be useful! Thank youReply
I don't have an idea on CPU design. Can this lead to M1 like performance?Reply
This reminds me of the 'transition' to multi-core code started in the "Core" days (or actually AMD's Athlon X2). The switch to multi-core/thread/process (as this all were steps towards such approach) needed to be implemented by all layers (h/w, os, s/w).
We're now getting from symmetrical to heterogeneous computing becoming the mains-stream on all platforms.
Just a few weeks ago there was here the issue with measuring efficiency which is flawed on Apple's Activity Monitor.
As the multi-thread code paradigms thrived, asymmetrical-cores will also become part of the toolchain eventually.Reply
Funny he defines Chimera as the mythical beast right at the start. I was sure he was thinking of the alternative definition:
> a thing that is hoped or wished for but in fact is illusory or impossible to achieve. "the economic sovereignty you claim to defend is a chimera"
That said Alder Lake does open up a bunch of new possibilities and isn't an illusionReply
I feel that P and E split requires putting an asterisk next to the core count. it is dishonest to sell a CPU with N fully featured cores and M gimped cores as a CPU with N+M coresReply
My understanding is that Intel boosts power so a single core can chew through a program quicker than being inherently energy efficient (performance-per-watt). Most of the time this benefits Intel, because a core will boost-up-then-down quicker over completion of the program than a more efficient AMD processor core.
(This is my layman understanding.)
I solidly believe AMD is the king for efficiency, but I wish I could find better benchmarks showing idle power use for AMD vs Intel (not peak power use). My understanding is that Intel has deeper power states its processors settle into.
I'm surprised someone doesn't artificially limit or undervolt their Intel proc to approach or surpass what we're seeing from AMD. Would it significantly lengthen the total execution time of the program? Would there still be a significant difference in "performance"?
Performance-per-watt is important, and the total amount of power to execute the program. Would this be in Kilowatt/hours?
I want to see how many Kilowatt/hours something like Cinebench consumes on similar AMD & Intel processors, so we can derive the "real computing efficiency".Reply
I wish there was at least a switch to allow the CPU to report the proper CPUID on each core and ideally, also keep AVX512 enabled. You could have a bios setting writing to an MSR. But I guess some idiot gamer somewhere might still enable this setting without understanding a word of the warning it's accompanied by and Intel wants to avoid that.
But I'm hopeful this is a stop-gap solution and we'll just make the p-cores more power efficient under medium load, and merely have the OS clock them lower when something is determined a background task. Then we "only" have to get that part to become smart enough.Reply
I very much don't understand why Intel didn't disable AVX-512 on the P cores, unless and until the OS writes to a new MSR that means "I understand that the P cores can do AVX-512, while the E cores cannot", and then enable AVX-512 on the P cores.
Old OS versions continue to work fine, and newer OS versions can opt into the new world and benefit.Reply
Perhaps the easier option would be DRM software and anything that needs instructions that are only on a specific core type, stick to those cores.
It might reinvolve changes to DRM software but looking at the current compromises, looks like a better deal.
Of course Intel should have thought of this and didn't pick this route. I'd love to see the reasoning though.Reply
Other than obvious stuff like differences in features (AVX512 - although I thought that wasn't even officially supported at one point even though they haven't disabled it?) between the P and E cores, how does ARM manage it with big/LITTLE cores? Surely that's similar to a degree (Out of order P cores, in order E cores) to what Intel are attempting?
Are Intel just taking it too far in terms of disparity between the architectures?Reply
Agner is very late to this one.Reply
Personally I don't really see a point in having a hybrid architecture like this unless it'll lead to massively increased core counts, and so far it doesn't look like it does. AMD still beats Intel in both core counts and power efficiency, and they're only using P cores so... what's the point of having E cores?
But maybe that's just because it's a first generation technology for Intel? I hope things improve in the future because I sure would love to have more cores available (currently rocking a 32 core Threadripper) and AMD is sure as hell not interested in actually pushing HEDT forward since they became the top dog. (I have a few grand burning a hole in my pocket waiting for a product they don't want to release.)Reply
Hasn't it been at least six months since release? What an odd self promoting entryReply
For a desktop or mobile processor this might make sense, but many hosting providers sell dedicated hardware with desktop processors to be used in a server environment. (E.g. Hetzner https://www.hetzner.com/dedicated-rootserver/ex100)
So the provider can save energy and make sense of this, but customers won't notice the energy savings in their pocket and they only notice that they can only use 8 cores.
When I tested this a few months ago under Linux every request of our server application would land on a different CPU leading to funny performance characteristics. Even Ubuntu 22.4 does not seem to have the kernel included 5.16+ that fully supports this CPU (https://www.phoronix.com/scan.php?page=article&item=adl-linu...)
As we use the JVM the only interesting use case I can imagine would be to assign the slower efficiency cores for the garbage collector.Reply
Anyone else slightly annoyed that the CPUID had to be made the same for both P & E cores, just to accommodate DRM? Making it harder for all software to properly utilize the technology effectively.Reply
You get a bunch of smart hardware guys into a room, they design this funky exotic architecture. Then the software goes "Allocate these threads to whatever is idle" and suddenly you've completely lost any possible advantage and are thrashing around with no idea what you're doing. The big-little architecture from Apple was accompanied by software that basically handles that for you. From what I heard there were similar problems with Xeon Phi - great theoretical performance but a very difficult programming model and as a result very challenging sales for the Intel sales guys.Reply