This is an archive of the discontinued LLVM Phabricator instance.

Fix PR25526 by adding back limited cmpxchg pseudo-Insts
ClosedPublic

Authored by t.p.northover on Jan 15 2016, 12:36 PM.

Details

Reviewers
ab
Summary

Hi,

This is an attempt to fix the failure of cmpxchg at -O0 on AArch64. The issue is that fast-regalloc puts spills into ldxr/stxr loops, which can continually clear the exclusive monitor and block progress.

I really don't like having to re-add the pseudo-instructions, since they're error-prone and ugly, but I think I've tried everything else (in approximate order of preference):

  • Emit libcalls: they don't exist in libgcc, so even if we could herd all other cats we'd break compatibility with existing versions of GCC.
  • Handle the expanded @llvm.aarch64.ldxr/stxr intrinsics to do away with any vregs (so fast-regalloc can't botch them), either in DAG or FastISel. I tried both, but the IR comparison always creates more. No change.
  • Change to a different register allocator at -O0. Too costly for compile-time performance and debug.
  • A usesCustomInserter Pseudo-inst. Not really any better, but doesn't work anyway because we still have the vregs immediately after DAG.

About the one silver lining is that we only need to handle a very small subset of possible atomic operations, and not even many cmpxchg operations (I think we can treat them all as seq_cst, strong exchanges without violating correctness).

If anyone has any other ideas, please speak up. I'm quite happy to give implementing them a try if there's any hope of avoiding this hack.

Otherwise, OK to commit?

Cheers.

Tim.

Diff Detail

Event Timeline

t.p.northover retitled this revision from to Fix PR25526 by adding back limited cmpxchg pseudo-Insts.
t.p.northover updated this object.
t.p.northover added a subscriber: llvm-commits.
mcrosier added inline comments.
lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
2355

return nullptr;

Comment at: lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:2360
@@ -2325,1 +2359,3 @@
+ SelectCMP_SWAP(Node);
+ return 0;

case ISD::READ_REGISTER:

return nullptr;

Thanks Chad. Will do, sorry about that. I also realised I'd not
updated the comments properly yet. The one in AArch64InstrAtomics.td
is misleading and more elsewhere would probably help.

Tim.

Comment at: lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:2360
@@ -2325,1 +2359,3 @@
+ SelectCMP_SWAP(Node);
+ return 0;

case ISD::READ_REGISTER:

return nullptr;

Thanks Chad. Will do, sorry about that. I also realised I'd not
updated the comments properly yet. The one in AArch64InstrAtomics.td
is misleading and more elsewhere would probably help.

Np. Wish I could offer more assistance..

Chad

test/CodeGen/AArch64/cmpxchg-O0.ll
76

Please add a new line.

davide added a subscriber: davide.Jan 15 2016, 3:54 PM
ab added a subscriber: ab.Jan 15 2016, 5:58 PM

This makes sense to me, but:

Handle the expanded @llvm.aarch64.ldxr/stxr intrinsics to do away with any vregs (so fast-regalloc can't botch them), either in DAG or FastISel. I tried both, but the IR comparison always creates more. No change.

I don't understand: what "IR comparison"? If it worked, this seems to me like the ideal solution.

lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
604โ€“605

MBB.addSuccessor(LoadCmpBB) ?

616

Use a variable for MI.getDebugLoc(), perhaps?

640โ€“641

Is this sufficient? What happens to a def inside MBB used in DoneBB?

My MI-fu is weak, and I don't recall ever seeing code doing this when splicing MBBs (a quick grep confirms that), so I might be missing something obvious.

642

Should this also transferSuccessorsAndUpdatePHIs()?

lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
2323

MF is already available in SelectionDAGISel.

lib/Target/AArch64/AArch64ISelLowering.cpp
10160

missing word: "to the,"

lib/Target/AArch64/AArch64InstrAtomics.td
380

mayLoad/mayStore here as well?

ab added inline comments.Jan 15 2016, 6:02 PM
lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
642

On second thought, I guess this is late enough for transferSuccessors() to be sufficient.

Thanks for taking a look Ahmed.

emaste added a subscriber: emaste.Jan 15 2016, 8:05 PM

Think I've fixed most of the issues people mentioned.

ab added a comment.Mar 28 2016, 11:22 AM

There's still the live-in issue, but I'm not sure anyone cares after pseudo expansion. Quentin tells me you can use LivePhysRegs (::stepBackwards) to compute that for you.

Also, it occurs to me that nothing prevents a spill from being inserted with optimized regalloc. It's probably extremely unlikely, but maybe we should consider using the pseudo regardless of the opt level..

ab added a comment.Mar 28 2016, 11:22 AM

(and sorry about the delay)

New diff preserving MBB LiveIns during the transformation.

ab accepted this revision.Apr 12 2016, 10:03 AM
ab added a reviewer: ab.

Liveness in expandCMP_SWAP_128 as well?

With that, LGTM.

This revision is now accepted and ready to land.Apr 12 2016, 10:03 AM

Oops, well spotted. I've committed a fixed version as r266339. Thanks for reviewing.

Tim.