This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Pass for Cortex-A57 and Cortex-A72 Fused AES Erratum
ClosedPublic

Authored by lenary on Feb 14 2022, 7:30 AM.

Details

Summary

This adds a late Machine Pass to work around a Cortex CPU Erratum
affecting Cortex-A57 and Cortex-A72:

  • Cortex-A57 Erratum 1742098
  • Cortex-A72 Erratum 1655431

The pass inserts instructions to make the inputs to the fused AES
instruction pairs no longer trigger the erratum. Here the pass errs on
the side of caution, inserting the instructions wherever we cannot prove
that the inputs came from a safe instruction.

As the pass is executed very late in the ARM backend pipeline, it has to
reconstruct the Register Dataflow Graph, for which it uses the RDFGraph
utilities used by other backends.

This initial version will stop at the start of basic block containing
the first AES instruction, but having the full RDF Graph available means
we should be able to be more efficient with AES encryption and
decryption loops in future, if we wish.

The pass is used:

  • for Cortex-A57 and Cortex-A72,
  • for "generic" cores (which are used when using -march=),
  • when the user specifies -mfix-cortex-a57-aes-1742098 or mfix-cortex-a72-aes-1655431 in the command-line arguments to clang.

Diff Detail

Event Timeline

lenary created this revision.Feb 14 2022, 7:30 AM
lenary requested review of this revision.Feb 14 2022, 7:30 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptFeb 14 2022, 7:30 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

@kparzysz I've tagged you due to the changes in RDFGraph, which I believe you are the owner of. The asserts are hit in llvm/test/CodeGen/ARM/inlineasm-error-t-toofewregs.ll - the Register Allocator Chokes on the test due to not having enough registers for the inline assembly. llc then reports the error, but continues running passes (including this one), which then core dumps on the asserts I've removed (The first one hit is in DataFlowGraph::buildStmt). When I run the same testcase with -verify-machineinstrs, there is no verification error, but the verifier seems to have decided that analysing physical reg liveness is too hard (see e.g. MachineVerifier.cpp line 2990). I haven't worked out how else to prevent this pass running if the register allocation is known to be wrong, without analysing every instruction before building the RDFGraph. I wasn't sure the right approach, so I wanted to post the patch and then discuss it, rather than the other way around (so you can see the testcase).

I have a high level question regarding RDF, as I've not seen it used in many other places, so it may be under-tested on Arm systems at the moment. This currently, for all code, builds an rdf graph, analyze the rdf graph for a fairly rare instructions, then fixes up the code based on that. It might be best to avoid the (possibly expensive?) rdf graph generation for the common case where the instructions are not present. Check that the instruction exists first.

It might then be simpler to just search back for the def of a register, considering in most code the instruction we are looking for should be fairly rare and we won't expect to need to find def's in bulk. That might be simpler overall, and avoid some of the difficulties of RDF.

llvm/test/CodeGen/ARM/aes-erratum-fix.ll
49

Adding arm_aapcs_vfpcc will make the function "hardfp", which might be useful for testing inputs from argument that don't need to be passed via gpr regs.

lenary planned changes to this revision.Mar 28 2022, 3:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2022, 3:54 AM
Herald added a subscriber: MaskRay. · View Herald Transcript

I have a high level question regarding RDF, as I've not seen it used in many other places, so it may be under-tested on Arm systems at the moment. This currently, for all code, builds an rdf graph, analyze the rdf graph for a fairly rare instructions, then fixes up the code based on that. It might be best to avoid the (possibly expensive?) rdf graph generation for the common case where the instructions are not present. Check that the instruction exists first.

I haven't had time to see what effect this has on compile times, but I don't see the point in iterating over every instruction in the function checking the opcode, to then iterate over them all constructing the rdfgraph, that seems to make the approach a lot more expensive, if I'm then going to iterate over the rdfgraph looking for specific instructions again.

It might then be simpler to just search back for the def of a register, considering in most code the instruction we are looking for should be fairly rare and we won't expect to need to find def's in bulk. That might be simpler overall, and avoid some of the difficulties of RDF.

The reason for using the rdfgraph was to allow us to improve this at a later date if we found the performance issues a problem, i.e. by hoisting the vorr past phis so we can save on executing them in loops. Given the issues with rdfgraph, I'm going to reimplement this as a Block-local analysis and fixup pass without the rdfgraph, which is effectively what the pass does in the current patch.

lenary updated this revision to Diff 419202.Mar 30 2022, 10:14 AM
  • Rewrite pass in terms of ReachingDefAnalysis
  • Split tests into separate commit, for ease of review.

@kparzysz I have rewritten this to avoid using RDFGraph, so I don't think this needs you to review it any more.

llvm/test/CodeGen/ARM/aes-erratum-fix.ll
49

Yeah, seems I was too zealous with removing some of these attributes.

lenary planned changes to this revision.Mar 30 2022, 10:16 AM
lenary added inline comments.
llvm/lib/CodeGen/RDFGraph.cpp
1096

I forgot to remove these comments, will update shortly.

llvm/lib/Target/ARM/ARMSubtarget.h
543

I missed this, will update again shortly.

lenary updated this revision to Diff 419208.Mar 30 2022, 10:20 AM
  • Remove whitespace change in ARMSubtarget
  • Remove commented-out debug lines in RDFGraph
lenary marked an inline comment as done.Mar 30 2022, 11:48 AM
dmgreen added inline comments.Mar 31 2022, 1:10 AM
llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp
146

Can/should all these use findFirstPredOperandIdx?

And is it worth checking for more instruction? Anything that sets a Q register, or is that too broad?

251

This needn't scan through checking for the instruction that the loop below is going to check for too.

307

Can you explain more about the IsLiveIn && UnsafeCount==0 case. Am I understanding that correctly that it would be:

function(q0, ...)
  lotsofcode...
  q0 = load
  aes q0

Is there a better way to detect that the live-in doesn't matter in cases like that?

llvm/lib/Target/ARM/ARMTargetMachine.cpp
585

Passes can't insert new instructions (or move things further apart) after ConstantIslandPass. The branches/constant pools it has created may go out of range of the instructions that use them. Would it be OK to move it before that?

Thanks for the review. Lots of comments inline, hopefully Phab doesn't mangle the large one.

llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp
146

findFirstPredOperandIdx doesn't work as lots of these instructions are not marked isPredicable in the tablegen. I'm not sure if we want to solve that in this work, or as a follow-up (I'd lean towards follow-up).

I believe "anything that sets a Q register" is too broad, as we model subregister insertion as setting the the whole register in LLVM, but I'm not sure that micro-architecturally they are actually doing that. This is why I've tried to only add 64- and 128-bit setting instructions rather than ones that are less wide. Originally I also included the VMOVv2f32 instructions that are now at the bottom of this switch, but I felt that might have been too risky.

251

Ack. Will remove. I think this is vestigal from a previous (unshared) version of the patch which was doing something more complex in the loop.

307

I don't believe there is, and this comes down to issues with the RDA.getGlobalReachingDefs implementation, which I want to fix/enhance, but in a follow-up patch.

To start with, this is actually not a problem, as the pass is intended to be conservative, and we know the clobbers are a no-op at the architectural level (we insert them for their micro-architectural effects). So code will still do the right thing, but maybe with a little too much overhead in the case you showed.

However, this is necessary in some other cases, such as:

function(q0)
   code
   conditional branch to L2
L1:
   q0 = safe_op(…)
   branch to L3
L2:
   code without update to q0
L3:
   aes q0

In this case, AllDefs is a set containing one single defining instruction for Q0, because there is only one within the function (which is all that RDA.getGlobalReachingDefs can report instructions for).

But in my example, we *need* to protect q0 on the other paths, because the safe definitions of q0, when considered as a set, do not entirely dominate the AES use of q0 (this is slightly stretching the conventional definition of dominance, but think of this as "there exists a path from entry to the aes, which does not contain any of the safe instructions". Sadly, MachineDominance doesn't allow us to make this kind of query either!).

In this case though, it is safe to insert the protection at function entry, because that will (by definition) dominate all the AES uses, and the protection doesn't need to be dominated by the safe definitions, as we know they're safe.

I intend to follow-up this initial patch with an enhancement to ReachingDefAnalysis which will also provide the information that you have a set of defs inside the function, and also you're live-in, as this is required info for any conservative pass using the ReachingDefAnalysis. I felt, however, that given the pass is safe as-is, it was good to proceed without this enhancement.

llvm/lib/Target/ARM/ARMTargetMachine.cpp
585

TIL. I'll add a comment about the constant island pass as well.

Should I also look at the restrictions on the Branch Targets pass? I can imagine we also don't want to separate instructions once we've calculated their targets locally?

dmgreen added inline comments.Mar 31 2022, 2:41 AM
llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp
201–208

Are these vmov of an immediate? Are they not safe?

I was expecting it to be the lanes sets (VSETLNi8) and other scalar instructions that were unsafe.

307

OK sounds good.

llvm/lib/Target/ARM/ARMTargetMachine.cpp
585

Yeah - It sounds like the BTI would need to remain as the first instruction in the block.

lenary marked 6 inline comments as done.Apr 5 2022, 8:59 AM

A few comments before I post the next version of the patch.

llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp
146

I'm wrong about this. The tablegen isPredicable is an override, other code might also set isPredicable to true, so I think findFirstPredOperandIdx should work too.

201–208

I have received the information on what is safe and what is not, and the next version of the patch will have this correct.

307

One note is that the exact problem you describe does show up in the tests (in aese_set64_via_ptr, where the vldr is "safe"), so when the pass is enhanced, we will notice the improvements.

llvm/lib/Target/ARM/ARMTargetMachine.cpp
585

Turns out the AES pass doesn't *have* to come before the BTI pass, because AES instructions are only available on A-profile, and BTI is M-profile. I still will move it to before all these passes anyway, just so it's clear what is going on.

lenary updated this revision to Diff 421193.Apr 7 2022, 7:00 AM
lenary marked 2 inline comments as done.
  • Updated set of safe instructions
  • Address reviewer feedback, including reordering passes.
lenary marked an inline comment as done.Apr 7 2022, 7:01 AM

Looks OK to me, as far as I can see. If it worth adding a few extra instructions that may come up?

llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp
147

Perhaps add these, if they are safe:
VBICd/q
VBICi's, VORRi's
VBIF/VBIT/VBSL/VBSP
VCEQ/VCNE/etc?
VDUP? VEXT?
VMVN imm equivalents of VMOV's
VREV's?
VSHL's, VSHR's?
I'm not sure if they will be very useful, but they are the kind of instructions that may come up in aes algorithms.

llvm/lib/Target/ARM/ARMTargetMachine.cpp
582–583

"No new instructions may be inserted" -> "Block sizes cannot be increased"
And maybe "will affect the offsets used for accessing these constants." -> "may push the branch ranges and load offsets of accessing constant pools out of range."

583–585

It's not about "not inserting instructions" exactly - it will replace psuedos with all kinds of new instructions :)
The pseudos needed to have a conservative size through ConstantIslandPass though to allow that though. It does make sure that it will not move instructions further apart from their targets.

simon_tatham added inline comments.Apr 11 2022, 6:16 AM
llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp
13–20

This description would leave me still confused if I didn't happen to already know roughly what the plan was. It jumps in half way through the explanation that someone would need if they were coming to this pass cold. (E.g. it talks about "the VORRq" before having even mentioned that there is one, let alone why there is one.)

How about the suggested text as a rewording?

311

nit: repeated word

Ack to all the comment clarifications, will update patch with those soon (probably tomorrow).

llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp
147

I'm very keen to avoid scope-creep on this patch, so I'm going to push back on this comment.

We know this list as given is safe (and have had internal confirmation). I've sent a new email internally with your list of instructions, to find out of they're safe too, but I'd like any answer to that to be part of a follow-up patch rather than blocking this patch for yet longer.

I believe what I have today is correct, even if the list is not optimal for all expected AES code sequences.

dmgreen added inline comments.May 10 2022, 5:54 AM
llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp
147

Yeah that sounds OK. So long as you address Simons comments and follow up with the instructions at a later date, this LGTM.

lenary updated this revision to Diff 428978.May 12 2022, 9:41 AM
lenary marked 3 inline comments as done.
  • Address comment nits
  • Rebase
lenary marked 3 inline comments as done.May 12 2022, 9:42 AM
lenary added inline comments.
llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp
13–20

I have taken this feedback on board, and reworded the explanation based on your suggestion, with some slight ordering changes and a little more detail about the complex cases and some separation between the erratum itself, and the workaround we have chosen.

dmgreen accepted this revision.May 13 2022, 1:21 AM

Thanks. LGTM

This revision is now accepted and ready to land.May 13 2022, 1:21 AM
simon_tatham accepted this revision.May 13 2022, 1:28 AM

Thanks, that explanation looks fine. (And I agree that re-paragraphing it made more sense than my version)

This revision was landed with ongoing or failed builds.May 13 2022, 2:48 AM
This revision was automatically updated to reflect the committed changes.