Page MenuHomePhabricator

AArch64: support i128 cmpxchg in GlobalISel.
ClosedPublic

Authored by t.p.northover on Apr 27 2021, 3:00 AM.

Details

Reviewers
paquette
aemerson
Summary

Most operations get expanded to cmpxchg at i128 because it's the only primitive we have. So this is the bulk of the support needed for all i128 atomics.

There are three essentially different cases to handle:

  • -O0, no LSE. The IR is expanded to ldxp/stxp and we need patterns & code to select them.
  • -O1, no LSE. We get G_ATOMIC_CMPXCHG, and need to produce CMP_SWAP_N pseudos. The registers are all 64-bit so this is easy in the existing framework.
  • LSE. We get G_ATOMIC_CMPXCHG and need to produce a CASP instruction with XSeqPair registers.

The last case is by far the hardest, and and adds 128-bit GPR support as a byproduct. The only way I could get it through all phases (and I tried many variants) was:

G_UNMERGE
REG_SEQUENCE
CAS
G_EXTRACT
G_MERGE

The merge/unmerge pair are needed to cancel out similar in surrounding instructions, but then the REG_SEQUENCE and G_EXTRACT are needed to put the values back into 128-bit vregs again.

Unfortunately the G_EXTRACT had to be handled with quite a nasty hack in RegisterBankInfo. Please tell me if anyone has any better ideas, but I couldn't make a RegBank stick from Legalizer to there, or think of any other way to check for an int op.

Diff Detail

Event Timeline

t.p.northover created this revision.Apr 27 2021, 3:00 AM
t.p.northover requested review of this revision.Apr 27 2021, 3:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2021, 3:00 AM

I think it would be good to add dedicated legalizer + regbankselect MIR tests?

llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
2531–2532

It seems like this assert can probably be replaced with something like

assert(SrcRB.getID() == DstRB.getID() && "Src and dst register banks must match!");

And then moved above the if (SrcRB.getID() == AArch64::GPRRegBankID...

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
1062

Some comments on the various cases would be nice here? (e.g. LSE, no LSE, etc.)

llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp
879–880

I guess this is just for the s128 case, so maybe this can be refactored a little

...
if (SrcTy.getSizeInBits() != 128)
   break;
auto Idx = (MRI.getRegClassOrNull(Src) == &AArch64::XSeqPairsClassRegClass) ? PMI_FirstGPR : PMI_FirstFPR;
...

Can't think of a better way to decide this without checking the register class directly, unfortunately.

t.p.northover marked 3 inline comments as done.

Thanks for the improvements, I think I've implemented them.

This revision is now accepted and ready to land.May 10 2021, 9:24 AM
paquette added inline comments.May 10 2021, 9:25 AM
llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
2535

With the above assert, you don't have to check both SrcRB and DstRB here.

This might be a bit crazy, but instead of emitting CASPX directly from globalisel, did you consider just emitting CMP_SWAP_128, then converting it to CASPX after isel? Not sure it's worth shifting around the complexity like this, but maybe worth considering.

  • -O0, no LSE. The IR is expanded to ldxp/stxp and we need patterns & code to select them.
  • -O1, no LSE. We get G_ATOMIC_CMPXCHG, and need to produce CMP_SWAP_N pseudos. The registers are all 64-bit so this is easy in the existing framework.
  • LSE. We get G_ATOMIC_CMPXCHG and need to produce a CASP instruction with XSeqPair registers.

I think you accidentally switched -O0 and -O1.

t.p.northover marked an inline comment as done.May 14 2021, 2:07 AM

This might be a bit crazy, but instead of emitting CASPX directly from globalisel, did you consider just emitting CMP_SWAP_128, then converting it to CASPX after isel?

Unforuntately that won't work. The reason CMP_SWAP_128 works so easily is that its 64-bit register operands can be chosen independently (because LDXP/STXP allow that), CASP needs them to be consecutive (and the first one to be even).

I think you accidentally switched -O0 and -O1.

Yes. Thanks, I'll fix it.

t.p.northover closed this revision.May 14 2021, 2:42 AM

Thanks both. I've committed it as 4789fc75d350.

This might be a bit crazy, but instead of emitting CASPX directly from globalisel, did you consider just emitting CMP_SWAP_128, then converting it to CASPX after isel?

Unforuntately that won't work. The reason CMP_SWAP_128 works so easily is that its 64-bit register operands can be chosen independently (because LDXP/STXP allow that), CASP needs them to be consecutive (and the first one to be even).

I don't see any reason it can't work, as long as you make sure to do the conversion before regalloc. (After regalloc would be a problem for the reason you note.)