• 22 days ago
If you're already executing native code on the machine, you probably have the ability to read and write all the other memory of every other user mode process, so you don't need this to attack cryptographic keys stored there. This attack is more against secure enclaves.
ReplyOk, I See how this works in theory. But until I see an exploit that uses this method in real life to extract keys (or maybe any memory content) from a server running real life workloads, I am extremely skeptical. How much samples are needed to get anything useful? And wouldn't the time required to acquire these samples be longer than the time required to detect the attack (or even all keys to be shifted)?
ReplyFrom the AMD advisory, it seems like desktop Ryzen 5000 series aren't affected, nor 3rd gen and later EPYC. Pretty much everything else is:
https://www.amd.com/en/corporate/product-security/bulletin/a...
ReplyWhat's the impact on AES-NI specifically? If hardware AES is impacted and no microcode updates are coming, this would be bad news.
Assuming ChaPoly needs expensive masking mitigations and AES-NI is safe, ChaPoly just became a lot less attractive, too.
Reply> What can you do about it? Nerf your CPU performance by disabling "turbo boost" or equivalent. Should you do it? Probably not unless you're particularly vulnerable (journalist, human rights activist, etc.)
Would not another option be to do something that temporarily maxes out the CPU and forces it into boost mode, immediately prior to executing the crypto operation? But not for such a long duration that the CPU reaches any thermal limits and decreases its speed again.
Obviously energy inefficient and not good for laptops or portable devices.
ReplyI'm mind-blown at this vuln being exploitable remotely. How is that possible?
ReplyI don't care!
There, I said it.
I didn't care about Spectre, Meltdown, or any of the other obscure timing side-channels that came after them either, because they relied on so much detailed information about the environment being attacked that you'd almost certainly be able to get the information you wanted by some much easier way.
Attacking something that doesn't seem to be in much use either doesn't make me any more worried either. Go after e.g. TLS, SSH, AES, RSA, etc. if you want to get our attention, but I suspect that trying this in practice, you're going to be overwhelmed by all the other sources of noise --- especially over a network connection -- that you won't be very successful at all. They mention 36h and 89h to get the key (few dozen bytes), and I assume that was in a basically ideal environment with nothing else to measure.
Those of us familiar with hardware would know that things like this are pretty natural; but unlike these people, we don't go feeding the paranoia machine and driving us even more towards the growing dystopia.
ReplyThis is why we can't have nice things, dear security researchers.
Replyjokes on them I disable turbo boost on all my machines because modern portables can't handle turbo boost anyway.
ReplySome cryptographic implementations are blinded such that as the number of attempts increase the amount of 'secret' data recovered (e.g. via power/emi sidechannels-- which this acts like) also increases. If the rate of uncertainty increases faster than the rate of leaked data, then the attack should fail.
ReplyAt first I thought it had something to do with the company Hertz...
ReplyIf the frequency scale is known to user applications, I presume jittering response times proportional of the scale factor just before write() would be effective.
ReplyWhy do we never get proactive defense against this sort of thing? As with speculative execution, caching, out-of-order execution, dispatching instructions to multiple ALUs depending on availability, etc, it was clear from the get-go that in principle the timing can depend on the payload so in principle it can be a problem for crypto.
The need for constant time should have first class support on the language/compiler level, the OS level, the ISA level, and the hardware level. E.g. the processor could guarantee that the instructions of a certain section of code are executed at a constant rate, the OS could guarantee that the thread remains pinned to one core and the frequency fixed, and the compiler could guarantee that only branchless assembly gets emitted.
ReplySeems like the simplest way to mitigate is to randomly throw some junk at the problem. Some random cypto code, some random no-purpose cryptographic calculations, should prevent any listener from gaining any useful information. It shouldn't take much, a single-digit percentage increase during cypto functions would be enough imho.
ReplySo I take it when they say "constant time" for things like SIKE, they aren't sleeping for X milliseconds, but are just using some operation that is thought to be effectively constant time, hence this vulnerability? What is the countermeasure for this? Are crypto systems that always wait a full second using system timers, for example, immune to this sort of thing, or is it still detectable even in those circumstances?
ReplyThis reminds me of: In the 90s I rememebr hearing a story about someone hacking a supposedly 'impossible' remote machine for a competition - they did it by analysing the response times and using that info to derive the key - at the time, a novel approach. Can anyone remember the story I must be dimly remembering?
ReplyA lot of people here commenting about shared hosting in clouds, but I don't see any actual text that shared environments are more vulnerable.
It sounds like a black box timing attack that could target my laptop, my phone, my server, anything that does cpu frequency scaling and is performing a computation that is susceptible to this attack.
Is that accurate?
ReplyCan't intel and amd just change how long a core stays at a turbo frequency to mitigate this? I.e.: if it scales up by 1hz, it can't scale down by that much until N number of cycles.
ReplyIt's so cool that x86 is completely fucked security-wise because of all the perf hacks that have been introduced - and yet, computers never seem to get any faster.
ReplyI don't get it. Is this only a problem for platforms that can read the current CPU frequency?
Does this mean platforms such as JavaScript in the browser are unable to exploit this?
Ditto for WebAssembly? If you don't give it the CPU frequency and don't give it the ability to make a syscall for it, then its unaffected?
Is the longer term fix then to make reading of any compute metrics a privileged operation?
ReplyI did my MEng, in part, on analyzing data dependent power usage in adders and I'm sort of embarrassed I didn't think of this.
ReplyIsn't it possible to monitor for this type of attack, and then apply mitigations?
ReplyI'm not too much of a cryptography expert. How do I know if I'm using a
> constant-time cryptographic library
?
Edit: thanks everyone, I just wasn't familiar with the terminology.
ReplyCan anyone ELI-CS graduate this vulnerability?
My understanding from reading the page is that modern processors process certain data with higher frequency, and somehow that allows an attacker to know guess private keys.
However, I don't understand the connection between those two things. How would an attacker trigger a lot of almost-identical CPU runs without hitting some rate limit somewhere? And how is this different than just guessing the password?
ReplySo if the encryption function would look at an actual timer, and insert bogus calculations at random places during encryption to pad the execution time, would that remove the information this attack needs?
ReplyCan this attack be used to extract Widevine decryption key?
ReplyThis paper relies on Turbo P-states, where they measure the oscillation when that is active; it is not measuring general SpeedStep (OS software controlled) as some seem to have taken away from it. Turbo state is the HWP (hardware P-state controlled) layer above SpeedStep; turning off Turbo in the BIOS still fully allows OS controlled SpeedStep P-states to function, it just disables the hardware level bursting P-states above that max listed CPU level for short periods of time. As others have noted, Turbo state can really kill a laptop battery and/or drive up the thermals pretty quick, a lot of folks disable it anyways if they've tinkered around before.
The abstract writes it as "When frequency boost is disabled, the frequency stays fixed at the base frequency during workload execution, preventing leakage via Hertzbleed. This is not a recommended mitigation strategy as it will very significantly impact performance." This is a confusing grammatical way to state it, as SpeedStep will still work at the OS layer, you'll scale min to max "as usual" and just lose temporary hardware boost max+ capability when under stress (full load at P0 state) - not really "fixed" as it were in layperson's terms. That would be more akin to saying SpeedStep had to be disabled, IMHO.
https://www.kernel.org/doc/html/v4.19/admin-guide/pm/intel_p...
ReplyHere’s a simple mitigation — don’t have your encryption depend on 2022 + 23823 being compared to 2022 + 24436.
The idea that a cpu frequency change (based on cpu load) could be detected, and if detected — that it could lead to any useful information by an attacker is laughably preposterous.
The only theoretical vulnerability is if someone in a shared data center was able to gain control over a system on dedicated hardware that had nothing else running on it — exploit some code on it that triggers and expected frequency — open the cage and case, detects the frequency (by turning off all other hardware in the vicinity — meaning you already know which machine it is) and then by exploiting the machine you already control (and have already isolated) you can then physically identify the machine you have exploited.
ReplyDjb says:
„This particular attack demo succeeded with toy models and toy signal processing, so I'd expect state-of-the-art models and state-of-the-art signal processing to extract secrets from many more programs, _except_ when users protect themselves by setting constant CPU frequencies.“
https://twitter.com/hashbreaker/status/1537188851440943105
ReplyCan someone explain this to a non-crypto expert? I understand the concept that information can leak via timing measurements. However I don’t understand how this can extract the exact bits of a signing key from this?
ReplyInteresting that the mitigation is to turn off Turbo/Precision Boost.
Four or five years ago there was an an article submitted here (I wish I could find it) about a developer who keeps a machine with Turbo Boost disabled specifically because it seemed to interfere with their performance testing. By keeping it disabled they were able to eliminate a number of factors that prevented them from getting consistent results. It sounded like they preferred this approach for working on optimizing their code.
I am not pointing this out to disparage this performance boosting feature, only calling it out as a point of interest
ReplyIs the libsecp256k1 library affected? How hard is to fix it?
ReplyI suspect what we are seeing in the last few years is the slow death of purely symmetric multiprocessing. At the end of this I wonder if we'll see processors with one or two cores dedicated to cryptographic primitives, where the ALU has a fixed IPC, the core has a very limited number of clock rates, and the caches are sized to prevent eviction when running common cryptographic algorithms.
ReplyWhich ARM processors could be affected? Can't find an overview of ARM processors that implement frequency scaling.
ReplyI do wonder, if only the Turbo P-States are what cause the vulnerability. Is relying on Deep C-states for instance an alternative to get power savings? On my server during idle, when cores enter C6, the power savings are at their maximum and no frequency scaling can match that. Why not just rely on that? (Ignoring the loss of turbo boost ofc)
ReplyHow was cloudflare chosen over say linux foundation or red hat to disclose
ReplyInteresting, and seems like a natural followup to this side channel: http://www.cs.tau.ac.il/~tromer/papers/acoustic-20131218.pdf (RSA Key Extraction via Low-Bandwidth Acoustic Cryptanalysis), in which researchers deduced that the high-pitched sounds made by CPUs could leak the operations that GPG was performing to decrypt some encrypted content, and thus leak the private key. All you need is a microphone and the ability to trick the victim into decrypting some known content.
ReplySomething about this doesn't bother me as much as other side channels.
To me, this reads like trying to predict the presence, make, model & operational schedule of someone's washing machine just by observing how fast their power meter spins over time. Unless you have an intimate awareness of all of the other power consuming appliances, as well as habits of the homeowner, you would have a hell of a time reaching any meaningful conclusions.
ReplyAs I've said before, these announcements could benefit from better "action items" or "TLDR" for the average person with other problems to think about. What libraries are affected, what do I need to upgrade exactly, on Ubuntu, etc etc. And I'm guessing this is intended to reach those people (among others) given the effort they put into the graphics, etc.
In this case. "Am I affected?" "Likely, yes. It's on all the CPUs". Okay, but how does this work exactly? Is the Hertzbleed going to take over my computer and steal my private keys from my hard drive? Do I need to be running a server? Do I need to be in the middle of a cryptographic operation with the key in question? Etc.
"What is the impact" Ah, this sounds like the useful part. "...modern x86 cpus... side channel attacks...power consumption...constant-time execution". Nope, this isn't it either.
I think this is simply a matter of being so deeply embedded in something, one forgets how much is assumed. If they showed it to an outsider first they'd get the necessary feedback.
ReplyCan't wait until 2050 when all of our computers are bogged down with energy hungry security chips and processors that barely get any real work done because the security arms race demands ever increasing resources...
ReplyI haven't looked at the article but this sounds like a local exploit, right? Those were important in the timesharing era, but with personal computers we temporarily had an era when we didn't have to let hostile code run on our computers. When will we learn that we shouldn't have given that up? Local exploits will never go away, at least on high performance machines.
ReplyWould it help to slighty reduce the granularity of the frequency adjustment? Just enough to make the analysis infeasible? It doesn't have to be all or nothing. We had a similar issue with browsers restricting access to high-precision timers in JavaScript.
ReplyAside: giving a new exploit a catchy name, a top level domain and a logo doesn't make it more dangerous that it really is.
After Heartbleed, this trend is becoming common and annoying and feels more and more like crying wolf.
ReplyThis is probably a naive question, but could this be mitigated by fencing a part of code by some “frequency fence” of some sorts? This is of course a long-term mitigation as it may require compiler support, may affect performance and other threads and whatnot, but I wonder what a proper solution would look like.
ReplyWe need an industry-wide effort for coordination between cryptography library owners & device/chip vendors to ensure the use of constant CPU frequencies during cryptographic operations.
It's odd that the authors haven't chosen to initiate this themselves, as it seems like the proper solution to this vulnerability.
ReplyBrilliant approach, really. Never occurred to me to try something like this!
Are you affected? Very likely. What can you do about it? Nerf your CPU performance by disabling "turbo boost" or equivalent. Should you do it? Probably not unless you're particularly vulnerable (journalist, human rights activist, etc.)
One thing I found interesting that may get changed later, so I'm documenting it here, is in their FAQ they say:
> Why did Intel ask for a long embargo, considering they are not deploying patches? > > Ask Intel.
So Intel did ask for a long embargo, then apparently did nothing about it. My guess is they investigated "can we actually mitigate this thing with a microcode update?" and arrived at the conclusion after actually trying - or possibly after external influences were exerted (you be the judge) - that no, there's not much you can really do about this one.
Later in the document another FAQ says:
> [...] Both Cloudflare and Microsoft deployed the mitigation suggested by De Feo et al. (who, while our paper was under the long Intel embargo, independently re-discovered how to exploit anomalous 0s in SIKE for power side channels). [...]
Which is again telling us that there indeed WAS a long embargo placed on this research by Intel.
Only mentioning this here just in case the PR spin doctors threaten the researchers into removing mention of Intel on this one. Which honestly I hope doesn't happen because my interpretation is that Intel asked for that long embargo so they could investigate really fixing the problem (state agencies have more methods at their disposal and wouldn't need much time to exert influence over Intel if they decided to). Which speaks well of them IMO. But then again, not everybody's going to come to that same conclusion which is why I'm slightly concerned those facts may get memory-holed.
Reply> This means that, on modern processors, the same program can run at a different CPU frequency (and therefore take a different wall time) when computing, for example, 2022 + 23823 compared to 2022 + 24436.
I'm a layman when it comes to things this low level however, I always assumed that different addition inputs would take different amounts of wall time, but looking it up it turns out that in theory I was wrong, but I guess I'm actually correct. ¯\_(ツ)_/¯
ReplyI think it's worth noting that the main attack described in the paper, against SIKE, depends on exploiting some behavior peculiar to that particular algorithm (what the paper calls "anomalous 0s"):
> The attacker simultaneously sends n requests with a challenge ciphertext meant to trigger an anomalous 0 and measures the time t it takes to receive responses for all no requests. When an anomalous 0 is triggered, power decreases, frequency increases, SIKE decapsulation executes faster, and t should be smaller. Based on the observed t and the previously recovered secret key bits, the attacker can infer the value of the target bit, then repeat the attack for the next bit.
While any leakage of information can in be exploited in principle, it might be that this technique is impractical against a target which doesn't exhibit some sort of behavior that facilitates it.
Replyhonestly, i dont think this is some universal remote exploit, despite it being remotely exploitable. under certain circumstances seems a keyword here..
this is incredibly clever and devious, but mostly i think practical locally. since different cpus have different power usage, and systems roll different configurations, id expect the most reliable use case would be for instance to roll a custom os on a confiscated device to learn what power throttling patterns it has related to this kind of attack and then perform that on the systems original installation to decrypt it. (something along those lines). I think, maybe better in a counter example, that it is unlikely that someones online service or personal system would be ever exploited by this. why? because the system runs a lot of threads in general when its being used, making it much harder to predict what measurements mean what. if a system is not idle during the attack, its hard to deduce if timing diffetences are related to the attack or for example just other tasks/threads being executed during the attack.
am i wrong?
ReplyThat first paragraph is perfect. It's an exact description of the concept and it's impossible to know whether this is a shower thought or whether 1,000 Intel engineers are going to spend the next 3 years added RNGs to their clock generation circuitry.
ReplyWow, I didn't know that frequency scaling on CPUs was a function of the workload being processed, I thought it was a function of CPU temperature, which would be much less easy to glean meaningful data from (presumably it has a large deal of hysteresis, and you'll have to somehow run a computation millions of times and then run another computation millions of times and compare them). I'm not convinced that I'm wrong.
Reply"Hertz" means "heart" in German, so the name is a pun on "heartbleed" and the unit for frequency
ReplySo would this rely on a known compiled binary they could reliably project/simulate/anticipate?
This seems vaguely like when they would use page fault boundaries to extract passwords. An OS/hardware event that occurs within some if-then to leak parts of the "key".
Do we need some sort of "random delays" or binary execution randomization?
Reply`cpupower frequency-set --governor performance`
Reply> This means that, on modern processors, the same program can run at a different CPU frequency (and therefore take a different wall time) when computing, for example, 2022 + 23823 compared to 2022 + 24436.
I'm not a hardware expert, and I was a bit surprised at this.
Is that because the transistors heat up more with certain input values, which then results in a lower frequency when the CPU gets hot enough? Something like AND(1,1) using more energy than AND(1,0) on the transistor level?
As far as I can tell [1], addition typically takes a constant number of cycles on x86 CPUs at least, so any difference should happen at a very low level.
[1] https://www.agner.org/optimize/instruction_tables.pdf
ReplyAssume you’re a tenant on a cloud service provider and you don’t care about power consumption… can you mitigate this by running a process with a busy loop that forces the CPU into max frequency at all times, with `nice` set to run it at lower priority than your actual workload?
ReplyAn awesome attack vector and kudos to the authors. I do wish it had never been discovered for environmental reasons though :(
ReplyYet another embargo assented to, which appears to be pointless. Maybe the spy agencies wanted time to develop an exploit? Who knows.
"Responsible" disclosure is cancerous.
ReplyNaive question:
Would another possible defense be for the kernel to introduce a small random delay in the task scheduler?
ReplyI suppose future systems will just run crypto operations at a fixed speed, and turn dynamic stuff back on when they're done?
ReplySo this can be used on so called ‘airgapped’ devices, but what if you house the machine in a giant Faraday cage to prevent this? Maybe a little paranoid, but if your threat model requires it, then surely Faraday cages would make sense no?
ReplyIsn't this sort of a stand-in for power draw side-channel analysis? I guess it is cool that you can do it purely from software rather than needing physical access.
ReplyIs it not possible to add noise by running other processes in parallel that will also cause frequency boosts to occur and colour the results? Bauscilaly the mitigation is to disable boost, but instead boosting more often or boosting in a controlled way (with another process triggering it) should so help mitigate.... That said if it was thst trivial, surely intel or someone would suggest it.
ReplyIs it not possible to add noise by running other processes in parallel that will also cause frequency boosts to occur and colour the results? Bauscilaly the mitigation is to disable boost, but instead boosting more often or boosting in a controlled way (with another process triggering it) should so help mitigate.... That said if it was that trivial, surely intel or someone would suggest it.
ReplyAnd yet folks will keep using cloud services and multi tenant offerings until we have regulations forbidding multi tenant computing for sensitive data.
ReplyThe amount of time and energy that's wasted because people insist on having secrets is absolutely insane
ReplySometimes I wonder if the main use of quantum computers will just be to verifiably have no side channels, because any such side channel would act as a measurement (which produces phase errors that you can check for). It wouldn't be efficient but the computation would be provably isolated from the rest of the universe. Well... other than the input at the start and the output at the end.
Reply> We disclosed our findings, together with proof-of-concept code, to Intel, Cloudflare and Microsoft in Q3 2021 and to AMD in Q1 2022.
Why did they choose to disclose their findings to just two software companies (Cloudflare and Microsoft)? Why not other software companies like Amazon or Google? Or developers behind open source cryptography libraries?
ReplyWhy did they wait 2 extra quarters to tell AMD about this?
ReplyHmm, it feels like Intel/AMD are ducking and just hoping that the implications of this are not large.
Here's a video from Intel chatting with the researchers: https://community.intel.com/t5/Blogs/Products-and-Solutions/...
The questions are incredibly weak from the interviewers. They first state that it's not practical because the attack could take many hours, even days. But they don't describe why a day-long attack is not practical.
They then bring the researchers and ask them the same question. The researchers say that the attack is very practical because it only takes.. a few hours or days to execute the attack. Here's the specific part: https://youtu.be/BiRPr839dSU?t=1476
Instead of chatting more about this discrepancy they just ignore it and ask the researches how they feel about their new popularity.
From what I can tell from the advisory from Intel, it's simply that people should understand the attack and mitigate it in software. It's very vague. The specifics (i.e. a list of example popular programs that are vulnerable) seem entirely missing.
Replywould quantizing the boost to a few specific levels or even randomizing the levels mitigate something like this?
ReplyIt’s clear the secure enclave was a genius move a decade ago.
ReplyMy first highly amateur idea was to modify the frequency scaling algorithm with some randomness. How stupid is my idea?
ReplyOverclocking FTW
ReplyWould this apply to M1 Macs as well?
ReplyIs the long-term solution some kind of balanced binary encoding where two wires represent a bit, with exactly one wire high and one wire low?
ReplyIsn’t the real long term mitigation here to do all crypto operations on a separate chip? Rewire platform libraries to use the TPM/SecureEnclave backends exclusively. Then if you need “soft crypto” you are kinda on your own in “you better know what you’re doing” territory?
ReplyWebsite, Brand, everything, nice!
But, if they had a merchandise shop they would look more professional.
ReplyDoes this mean that an evil process may gain information about a foreign process by measuring its own execution speed variations?
ReplyGiven that cloud providers oversubscribe their rack power supplies for $ reasons, I'm waiting for the cloud-level equivalent of this DVFS attack, where you throttle a competitor cloud instances by bursting on collocated instances of yours :)
Replysite design / logo © 2022 Box Piper