This is an archive of the discontinued LLVM Phabricator instance.

Fix PR25526 by adding back limited cmpxchg pseudo-Insts
ClosedPublic

Authored by t.p.northover on Jan 15 2016, 12:36 PM.

Details

Reviewers
ab
Summary

Hi,

This is an attempt to fix the failure of cmpxchg at -O0 on AArch64. The issue is that fast-regalloc puts spills into ldxr/stxr loops, which can continually clear the exclusive monitor and block progress.

I really don't like having to re-add the pseudo-instructions, since they're error-prone and ugly, but I think I've tried everything else (in approximate order of preference):

  • Emit libcalls: they don't exist in libgcc, so even if we could herd all other cats we'd break compatibility with existing versions of GCC.
  • Handle the expanded @llvm.aarch64.ldxr/stxr intrinsics to do away with any vregs (so fast-regalloc can't botch them), either in DAG or FastISel. I tried both, but the IR comparison always creates more. No change.
  • Change to a different register allocator at -O0. Too costly for compile-time performance and debug.
  • A usesCustomInserter Pseudo-inst. Not really any better, but doesn't work anyway because we still have the vregs immediately after DAG.

About the one silver lining is that we only need to handle a very small subset of possible atomic operations, and not even many cmpxchg operations (I think we can treat them all as seq_cst, strong exchanges without violating correctness).

If anyone has any other ideas, please speak up. I'm quite happy to give implementing them a try if there's any hope of avoiding this hack.

Otherwise, OK to commit?

Cheers.

Tim.

Diff Detail

Event Timeline

t.p.northover retitled this revision from to Fix PR25526 by adding back limited cmpxchg pseudo-Insts.
t.p.northover updated this object.
t.p.northover added a subscriber: llvm-commits.
mcrosier added inline comments.
lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
2360

return nullptr;

Comment at: lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:2360
@@ -2325,1 +2359,3 @@
+ SelectCMP_SWAP(Node);
+ return 0;

case ISD::READ_REGISTER:

return nullptr;

Thanks Chad. Will do, sorry about that. I also realised I'd not
updated the comments properly yet. The one in AArch64InstrAtomics.td
is misleading and more elsewhere would probably help.

Tim.

Comment at: lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:2360
@@ -2325,1 +2359,3 @@
+ SelectCMP_SWAP(Node);
+ return 0;

case ISD::READ_REGISTER:

return nullptr;

Thanks Chad. Will do, sorry about that. I also realised I'd not
updated the comments properly yet. The one in AArch64InstrAtomics.td
is misleading and more elsewhere would probably help.

Np. Wish I could offer more assistance..

Chad

test/CodeGen/AArch64/cmpxchg-O0.ll
76

Please add a new line.

davide added a subscriber: davide.Jan 15 2016, 3:54 PM
ab added a subscriber: ab.Jan 15 2016, 5:58 PM

This makes sense to me, but:

Handle the expanded @llvm.aarch64.ldxr/stxr intrinsics to do away with any vregs (so fast-regalloc can't botch them), either in DAG or FastISel. I tried both, but the IR comparison always creates more. No change.

I don't understand: what "IR comparison"? If it worked, this seems to me like the ideal solution.

lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
596โ€“597

MBB.addSuccessor(LoadCmpBB) ?

608

Use a variable for MI.getDebugLoc(), perhaps?

632โ€“633

Is this sufficient? What happens to a def inside MBB used in DoneBB?

My MI-fu is weak, and I don't recall ever seeing code doing this when splicing MBBs (a quick grep confirms that), so I might be missing something obvious.

634

Should this also transferSuccessorsAndUpdatePHIs()?

lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
2328

MF is already available in SelectionDAGISel.

lib/Target/AArch64/AArch64ISelLowering.cpp
10049

missing word: "to the,"

lib/Target/AArch64/AArch64InstrAtomics.td
380

mayLoad/mayStore here as well?

ab added inline comments.Jan 15 2016, 6:02 PM
lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
634

On second thought, I guess this is late enough for transferSuccessors() to be sufficient.

Thanks for taking a look Ahmed.

emaste added a subscriber: emaste.Jan 15 2016, 8:05 PM

Think I've fixed most of the issues people mentioned.

ab added a comment.Mar 28 2016, 11:22 AM

There's still the live-in issue, but I'm not sure anyone cares after pseudo expansion. Quentin tells me you can use LivePhysRegs (::stepBackwards) to compute that for you.

Also, it occurs to me that nothing prevents a spill from being inserted with optimized regalloc. It's probably extremely unlikely, but maybe we should consider using the pseudo regardless of the opt level..

ab added a comment.Mar 28 2016, 11:22 AM

(and sorry about the delay)

New diff preserving MBB LiveIns during the transformation.

ab accepted this revision.Apr 12 2016, 10:03 AM
ab added a reviewer: ab.

Liveness in expandCMP_SWAP_128 as well?

With that, LGTM.

This revision is now accepted and ready to land.Apr 12 2016, 10:03 AM

Oops, well spotted. I've committed a fixed version as r266339. Thanks for reviewing.

Tim.