This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] Reimplement the 1-byte compare-and-swap logic
ClosedPublic

Authored by jonpa on Feb 26 2021, 6:19 PM.

Download Raw Diff

Details

Reviewers

uweigand
Andreas-Krebbel

Commits

rG7334b3dc3ea4: [SystemZ] Reimplement the i8/i16 compare-and-swap logic.

Summary

Even though the implementation in emitAtomicCmpSwapW() was correct, it made Valgrind report an error.

Instead of using a RISBG on CmpVal, an LL[CH]R can be made on the OldVal, and the problem is avoided.

CmpVal: Should not need a LL[HC]R, as it should already be zero-extended also in the case of a non-constant, or?

Test updating only begun...

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jonpa created this revision.Feb 26 2021, 6:19 PM

Herald added subscribers: jfb, hiraditya. · View Herald TranscriptFeb 26 2021, 6:19 PM

jonpa requested review of this revision.Feb 26 2021, 6:19 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 26 2021, 6:19 PM

Harbormaster completed remote builds in B91161: Diff 326861.Feb 26 2021, 11:00 PM

CmpVal: Should not need a LL[HC]R, as it should already be zero-extended also in the case of a non-constant, or?

Not necessarily. Our ABI does require that "char" and "short" parameters and return values are extended, but that can be either a zero- or a sign-extension depending on the type. Also, this is implemented via the zeroext/signext type attributes on the parameters in code generated by clang; with LLVM IR generated elsewhere (like in those test cases!), we may get a plain i8 or i16 that is not extended. And of course if the i8 or i16 in question is not a function parameter but the result of some intermediate computation, it is not guaranteed to be extended anyway.

So in short, yes, the CmpVal may have to be extended. However, it is probably worthwhile to detect those (common) cases where it already *is* extended to avoid redundant effort. This is hard(er) to do at the MI level, so I think the extension is best done in SystemZTargetLowering::lowerATOMIC_CMP_SWAP at the SelectionDAG level before emitting the ATOMIC_CMP_SWAPW MI instruction.

As an aside, it seems the code does now require one extra register. It might be worthwhile to avoid this by rearranging the statements a bit:

//  LoopMBB:
//   %OldVal        = phi [ %OrigOldVal, EntryBB ], [ %RetryOldVal, SetMBB ]
//   %SwapVal       = phi [ %OrigSwapVal, EntryBB ], [ %RetrySwapVal, SetMBB ]
//   %Dest          = RLL %OldVal, BitSize(%BitShift)
//                      ^^ The low BitSize bits contain the field
//                         of interest.
//   %RetrySwapVal = RISBG32 %SwapVal, %Dest, 32, 63-BitSize, 0
//                      ^^ Replace the upper 32-BitSize bits of the
//                         swap value with those that we loaded and rotated.
//   %Dest    = LL[CH] %Dest
//   CR %Dest, %CmpVal

//  SetMBB:
//   %StoreVal     = RLL %RetrySwapVal, -BitSize(%NegBitShift)
//                      ^^ Rotate the new field to its proper position.
//   %RetryOldVal  = CS %OldVal, %StoreVal, Disp(%Base)
//   JNE LoopMBB

As an added bonus, this would make Dest already zero-extended, so it might be possible to avoid any extra zero-extension on the result (by adding an AssertZExt ISD node after the ATOMIC_SWAP_CMPW).

Not necessarily. Our ABI does require that "char" and "short" parameters and return values are extended, but that can be either a zero- or a sign-extension depending on the type. Also, this is implemented via the zeroext/signext type attributes on the parameters in code generated by clang; with LLVM IR generated elsewhere (like in those test cases!), we may get a plain i8 or i16 that is not extended. And of course if the i8 or i16 in question is not a function parameter but the result of some intermediate computation, it is not guaranteed to be extended anyway.

So in short, yes, the CmpVal may have to be extended. However, it is probably worthwhile to detect those (common) cases where it already *is* extended to avoid redundant effort. This is hard(er) to do at the MI level, so I think the extension is best done in SystemZTargetLowering::lowerATOMIC_CMP_SWAP at the Select\ionDAG level before emitting the ATOMIC_CMP_SWAPW MI instruction.

I added an AND to zero-out the high bits to perform the zero-extension from the narrow type. It seemed that if the source was a constant (e.g. '1'), the DAG.getNode(ISD::AND, ...) call folded the AND on the fly. And if the source was a parameter with the 'zeroext' attribute (or rather any result with an AssertZext node) ,the AND goes away during DAGCombine2. So for what I could see, there is not much extra work to do here.

As an aside, it seems the code does now require one extra register. It might be worthwhile to avoid this by rearranging the statements a bit:

As an added bonus, this would make Dest already zero-extended, so it might be possible to avoid any extra zero-extension on the result (by adding an AssertZExt ISD node after the ATOMIC_SWAP_CMPW).

Ah, yes, that makes sense now.

At this point I am wondering about how to treat the signedness of the input/output: if the template type of std::atomic is signed, then the result of e.g. a signed char should be sign-extended to i32, right? So both CmpVal and Dest should either be sign- or zero-extended. The patch currently always zero-extends...

I see that with signed char:

Optimized lowered selection DAG: %bb.0 '_Z3funa:entry'
SelectionDAG has 15 nodes:
          t0: ch = EntryToken
        t8: ch = lifetime.start<0 to 1> t0, TargetFrameIndex:i64<0>
      t12: ch = store<(store 1 into %ir.0, align 2)> t8, Constant:i8<1>, FrameIndex:i64<0>, undef:i64
    t14: i8,i1,ch = AtomicCmpSwapWithSuccess<(volatile load store acquire monotonic 1 on %ir.0)> t12, FrameIndex:i64<0>, Constant:i8<1>, Constant:i8<0>
  t15: i8,ch = AtomicLoad<(volatile dereferenceable load seq_cst 1 from %ir.0, align 2)> t14:2, FrameIndex:i64<0>
    t16: ch = lifetime.end<0 to 1> t15:1, TargetFrameIndex:i64<0>
    t18: i64 = sign_extend t15
  t20: ch,glue = CopyToReg t16, Register:i64 $r2d, t18
  t21: ch = SystemZISD::RET_FLAG t20, Register:i64 $r2d, t20:1


Type-legalized selection DAG: %bb.0 '_Z3funa:entry'
SelectionDAG has 17 nodes:
    t16: ch = lifetime.end<0 to 1> t27:1, TargetFrameIndex:i64<0>
      t28: i64 = any_extend t27
    t30: i64 = sign_extend_inreg t28, ValueType:ch:i8
  t20: ch,glue = CopyToReg t16, Register:i64 $r2d, t30
          t0: ch = EntryToken
        t8: ch = lifetime.start<0 to 1> t0, TargetFrameIndex:i64<0>
      t24: ch = store<(store 1 into %ir.0, align 2), trunc to i8> t8, Constant:i32<1>, FrameIndex:i64<0>, undef:i64
    t26: i32,i32,ch = AtomicCmpSwapWithSuccess<(volatile load store acquire monotonic 1 on %ir.0)> t24, FrameIndex:i64<0>, Constant:i32<1>, Constant:i32<0>
  t27: i32,ch = AtomicLoad<(volatile dereferenceable load seq_cst 1 from %ir.0, align 2)> t26:2, FrameIndex:i64<0>
  t21: ch = SystemZISD::RET_FLAG t20, Register:i64 $r2d, t20:1

For a signed type (above), there is a sign_extend of the result (and for the unsigned case a zero_extend). But after the type-legalizer, the AtomicCmpSwapWithSuccess has been replaced with a new one of i32 type, but the extension node is gone. The memory operand has no information of what extension is supposed to happen, so I wonder how the result (original value) is supposed to be properly extended..?

This happens in DAGTypeLegalizer::PromoteIntRes_AtomicCmpSwap() which uses TLI.getExtendForAtomicCmpSwapArg(). This also seem to ignore the original type (takes no parameters), so I am not sure if I am missing something?

It seems nice if we could extend CmpVal and Dest with the same signedness and just change the opcodes for the extensions with this new approach...

In D97604#2596148, @jonpa wrote:

I added an AND to zero-out the high bits to perform the zero-extension from the narrow type. It seemed that if the source was a constant (e.g. '1'), the DAG.getNode(ISD::AND, ...) call folded the AND on the fly. And if the source was a parameter with the 'zeroext' attribute (or rather any result with an AssertZext node) ,the AND goes away during DAGCombine2. So for what I could see, there is not much extra work to do here.

Ah, thanks for pointing out the TLI.getExtendForAtomicCmpSwapArg() and TLI.getExtendForAtomicOps() routines, I had overlooked those. This means you don't have to do any of the extends "by hand", you just use the proper opcode in those routines. The TLI.getExtendForAtomicCmpSwapArg() routine specifies how common code is supposed to extend the compare value before passing it to compare-and-swap, while the TLI.getExtendForAtomicOps() routine specifies what extension platform-specific code will have already performed on the result so common code may rely on it.

These are currently both set to ANY_EXTEND on SystemZ, I think for the new algorithm you can just set them both to ZERO_EXTEND and everything those just work.

At this point I am wondering about how to treat the signedness of the input/output: if the template type of std::atomic is signed, then the result of e.g. a signed char should be sign-extended to i32, right? So both CmpVal and Dest should either be sign- or zero-extended. The patch currently always zero-extends...

Well, I guess there could be multiple variants (sign/zero-extend to i32/i64) that we might support. This may need improvements to the common-code TIL.getExtendForAtomic... logic (or possibly be done via DAGCombine)? In any case, I think for now we should just do the ZERO_EXTEND, any improvement can be done as a follow-up.

Updated per review.

Tests updated and passing.

Harbormaster completed remote builds in B91656: Diff 327560.Mar 2 2021, 5:10 PM

The change in lowerATOMIC_CMP_SWAP should be removed now. Otherwise this LGTM.

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
4060	This is not needed any more -- it is already done by common code now that you set getAtomicExtendOps to ZERO_EXTEND.

This is not needed any more -- it is already done by common code now that you set getAtomicExtendOps to ZERO_EXTEND.

I also thought so, but I found that it did make a difference on this test case:

boolean_cmpxchg.uint8.ll1 KBDownload

Not sure exactly why, but thought we might as well have it there... Or is this a bug in common code we should fix?

Or is it perhaps even good without the AssertZext - the differnece in this case is an LLGFR instead of LLGCR. I thought maybe that could make a difference in other programs...

Before isel:

Optimized legalized selection DAG: %bb.0 '_Z3funh:entry'                        Optimized legalized selection DAG: %bb.0 '_Z3funh:entry'
SelectionDAG has 28 nodes:                                                |     SelectionDAG has 27 nodes:
  t0: ch = EntryToken                                                             t0: ch = EntryToken
    t19: ch = lifetime.end<0 to 1> t45:2, TargetFrameIndex:i64<0>                   t19: ch = lifetime.end<0 to 1> t45:2, TargetFrameIndex:i64<0>
      t34: i64 = any_extend t45                                           |           t49: i32 = AssertZext t45, ValueType:ch:i8
    t36: i64 = and t34, Constant:i64<255>                                 |         t59: i64 = zero_extend t49
  t22: ch,glue = CopyToReg t19, Register:i64 $r2d, t36                    |       t22: ch,glue = CopyToReg t19, Register:i64 $r2d, t59
      t11: ch = lifetime.start<0 to 1> t0, TargetFrameIndex:i64<0>                    t11: ch = lifetime.start<0 to 1> t0, TargetFrameIndex:i64<0>
    t30: ch = store<(store 1 into %ir.0, align 2), trunc to i8> t11, C              t30: ch = store<(store 1 into %ir.0, align 2), trunc to i8> t11, C
    t39: i64 = and FrameIndex:i64<0>, Constant:i64<-4>                              t39: i64 = and FrameIndex:i64<0>, Constant:i64<-4>
        t2: i64,ch = CopyFromReg t0, Register:i64 %0                                    t2: i64,ch = CopyFromReg t0, Register:i64 %0
      t24: i64 = AssertZext t2, ValueType:ch:i8                                       t24: i64 = AssertZext t2, ValueType:ch:i8
    t29: i32 = truncate t24                                                         t29: i32 = truncate t24
    t43: i32 = sub Constant:i32<0>, t57                                   |         t43: i32 = sub Constant:i32<0>, t58
  t45: i32,i32,ch = SystemZISD::ATOMIC_CMP_SWAPW<(volatile load store     |       t45: i32,i32,ch = SystemZISD::ATOMIC_CMP_SWAPW<(volatile load store 
    t56: i32 = truncate FrameIndex:i64<0>                                 |         t57: i32 = truncate FrameIndex:i64<0>
  t57: i32 = shl t56, Constant:i32<3>                                     |       t58: i32 = shl t57, Constant:i32<3>
  t23: ch = SystemZISD::RET_FLAG t22, Register:i64 $r2d, t22:1                    t23: ch = SystemZISD::RET_FLAG t22, Register:i64 $r2d, t22:1

output

        .text                                                                           .text
        .file   "boolean_cmpxchg.cpp"                                                   .file   "boolean_cmpxchg.cpp"
        .globl  _Z3funh                         # -- Begin function _Z                  .globl  _Z3funh                         # -- Begin function _Z
        .p2align        4                                                               .p2align        4
        .type   _Z3funh,@function                                                       .type   _Z3funh,@function
_Z3funh:                                # @_Z3funh                              _Z3funh:                                # @_Z3funh
        .cfi_startproc                                                                  .cfi_startproc
# %bb.0:                                # %entry                                # %bb.0:                                # %entry
        stmg    %r13, %r15, 104(%r15)                                                   stmg    %r13, %r15, 104(%r15)
        .cfi_offset %r13, -56                                                           .cfi_offset %r13, -56
        .cfi_offset %r14, -48                                                           .cfi_offset %r14, -48
        .cfi_offset %r15, -40                                                           .cfi_offset %r15, -40
        aghi    %r15, -168                                                              aghi    %r15, -168
        .cfi_def_cfa_offset 328                                                         .cfi_def_cfa_offset 328
        la      %r3, 166(%r15)                                                          la      %r3, 166(%r15)
        mvi     166(%r15), 1                                                            mvi     166(%r15), 1
        risbgn  %r1, %r3, 0, 189, 0                                                     risbgn  %r1, %r3, 0, 189, 0
        l       %r5, 0(%r1)                                                             l       %r5, 0(%r1)
        sll     %r3, 3                                                                  sll     %r3, 3
        lcr     %r4, %r3                                                                lcr     %r4, %r3
        lhi     %r0, 0                                                                  lhi     %r0, 0
.LBB0_1:                                # %entry                                .LBB0_1:                                # %entry
                                        # =>This Inner Loop Header: De                                                  # =>This Inner Loop Header: De
        rll     %r14, %r5, 8(%r3)                                                       rll     %r14, %r5, 8(%r3)
        risbg   %r0, %r14, 32, 55, 0                                                    risbg   %r0, %r14, 32, 55, 0
        llcr    %r14, %r14                                                              llcr    %r14, %r14
        crjlh   %r14, %r2, .LBB0_3                                                      crjlh   %r14, %r2, .LBB0_3
# %bb.2:                                # %entry                                # %bb.2:                                # %entry
                                        #   in Loop: Header=BB0_1 Dept                                                  #   in Loop: Header=BB0_1 Dept
        rll     %r13, %r0, -8(%r4)                                                      rll     %r13, %r0, -8(%r4)
        cs      %r5, %r13, 0(%r1)                                                       cs      %r5, %r13, 0(%r1)
        jl      .LBB0_1                                                                 jl      .LBB0_1
.LBB0_3:                                # %entry                                .LBB0_3:                                # %entry
        llgcr   %r2, %r14                                                 |             llgfr   %r2, %r14
        lmg     %r13, %r15, 272(%r15)                                                   lmg     %r13, %r15, 272(%r15)
        br      %r14                                                                    br      %r14
.Lfunc_end0:                                                                    .Lfunc_end0:

Ahh, you're right. It's done in common code by the default ATOMIC_CMP_SWAP_SUCCESS expander -- but we're not using that since we use our own custom expander! So it indeed has to be done there in the back end.

Patch LGTM. Thanks!

This revision is now accepted and ready to land.Mar 3 2021, 11:05 AM

This revision was landed with ongoing or failed builds.Mar 3 2021, 12:06 PM

Closed by commit rG7334b3dc3ea4: [SystemZ] Reimplement the i8/i16 compare-and-swap logic. (authored by jonpa). · Explain Why

This revision was automatically updated to reflect the committed changes.

jonpa added a commit: rG7334b3dc3ea4: [SystemZ] Reimplement the i8/i16 compare-and-swap logic..

Revision Contents

Path

Size

llvm/

lib/

Target/

SystemZ/

SystemZISelLowering.h

5 lines

SystemZISelLowering.cpp

47 lines

test/

CodeGen/

SystemZ/

cmpxchg-01.ll

23 lines

cmpxchg-02.ll

23 lines

cmpxchg-05.ll

5 lines

Diff 327877

llvm/lib/Target/SystemZ/SystemZISelLowering.h

Show First 20 Lines • Show All 544 Lines • ▼ Show 20 Lines	public:

/// Determine the number of bits in the operation that are sign bits.		/// Determine the number of bits in the operation that are sign bits.
unsigned ComputeNumSignBitsForTargetNode(SDValue Op,		unsigned ComputeNumSignBitsForTargetNode(SDValue Op,
const APInt &DemandedElts,		const APInt &DemandedElts,
const SelectionDAG &DAG,		const SelectionDAG &DAG,
unsigned Depth) const override;		unsigned Depth) const override;

ISD::NodeType getExtendForAtomicOps() const override {		ISD::NodeType getExtendForAtomicOps() const override {
return ISD::ANY_EXTEND;		return ISD::ZERO_EXTEND;
		}
		ISD::NodeType getExtendForAtomicCmpSwapArg() const override {
		return ISD::ZERO_EXTEND;
}		}

bool supportSwiftError() const override {		bool supportSwiftError() const override {
return true;		return true;
}		}

unsigned getStackProbeSize(MachineFunction &MF) const;		unsigned getStackProbeSize(MachineFunction &MF) const;

▲ Show 20 Lines • Show All 165 Lines • Show Last 20 Lines

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,050 Lines • ▼ Show 20 Lines	SDValue SystemZTargetLowering::lowerATOMIC_CMP_SWAP(SDValue Op,
SDVTList VTList = DAG.getVTList(WideVT, MVT::i32, MVT::Other);		SDVTList VTList = DAG.getVTList(WideVT, MVT::i32, MVT::Other);
SDValue Ops[] = { ChainIn, AlignedAddr, CmpVal, SwapVal, BitShift,		SDValue Ops[] = { ChainIn, AlignedAddr, CmpVal, SwapVal, BitShift,
NegBitShift, DAG.getConstant(BitSize, DL, WideVT) };		NegBitShift, DAG.getConstant(BitSize, DL, WideVT) };
SDValue AtomicOp = DAG.getMemIntrinsicNode(SystemZISD::ATOMIC_CMP_SWAPW, DL,		SDValue AtomicOp = DAG.getMemIntrinsicNode(SystemZISD::ATOMIC_CMP_SWAPW, DL,
VTList, Ops, NarrowVT, MMO);		VTList, Ops, NarrowVT, MMO);
SDValue Success = emitSETCC(DAG, DL, AtomicOp.getValue(1),		SDValue Success = emitSETCC(DAG, DL, AtomicOp.getValue(1),
SystemZ::CCMASK_ICMP, SystemZ::CCMASK_CMP_EQ);		SystemZ::CCMASK_ICMP, SystemZ::CCMASK_CMP_EQ);

DAG.ReplaceAllUsesOfValueWith(Op.getValue(0), AtomicOp.getValue(0));		// emitAtomicCmpSwapW() will zero extend the result (original value).
		SDValue OrigVal = DAG.getNode(ISD::AssertZext, DL, WideVT, AtomicOp.getValue(0),
		uweigandUnsubmitted Not Done Reply Inline Actions This is not needed any more -- it is already done by common code now that you set getAtomicExtendOps to ZERO_EXTEND. uweigand: This is not needed any more -- it is already done by common code now that you set…
		DAG.getValueType(NarrowVT));
		DAG.ReplaceAllUsesOfValueWith(Op.getValue(0), OrigVal);
DAG.ReplaceAllUsesOfValueWith(Op.getValue(1), Success);		DAG.ReplaceAllUsesOfValueWith(Op.getValue(1), Success);
DAG.ReplaceAllUsesOfValueWith(Op.getValue(2), AtomicOp.getValue(2));		DAG.ReplaceAllUsesOfValueWith(Op.getValue(2), AtomicOp.getValue(2));
return SDValue();		return SDValue();
}		}

MachineMemOperand::Flags		MachineMemOperand::Flags
SystemZTargetLowering::getTargetMMOFlags(const Instruction &I) const {		SystemZTargetLowering::getTargetMMOFlags(const Instruction &I) const {
// Because of how we convert atomic_load and atomic_store to normal loads and		// Because of how we convert atomic_load and atomic_store to normal loads and
▲ Show 20 Lines • Show All 3,505 Lines • ▼ Show 20 Lines	MachineBasicBlock *SystemZTargetLowering::emitAtomicLoadMinMax(
return DoneMBB;		return DoneMBB;
}		}

// Implement EmitInstrWithCustomInserter for pseudo ATOMIC_CMP_SWAPW		// Implement EmitInstrWithCustomInserter for pseudo ATOMIC_CMP_SWAPW
// instruction MI.		// instruction MI.
MachineBasicBlock *		MachineBasicBlock *
SystemZTargetLowering::emitAtomicCmpSwapW(MachineInstr &MI,		SystemZTargetLowering::emitAtomicCmpSwapW(MachineInstr &MI,
MachineBasicBlock *MBB) const {		MachineBasicBlock *MBB) const {

MachineFunction &MF = *MBB->getParent();		MachineFunction &MF = *MBB->getParent();
const SystemZInstrInfo *TII =		const SystemZInstrInfo *TII =
static_cast<const SystemZInstrInfo *>(Subtarget.getInstrInfo());		static_cast<const SystemZInstrInfo *>(Subtarget.getInstrInfo());
MachineRegisterInfo &MRI = MF.getRegInfo();		MachineRegisterInfo &MRI = MF.getRegInfo();

// Extract the operands. Base can be a register or a frame index.		// Extract the operands. Base can be a register or a frame index.
Register Dest = MI.getOperand(0).getReg();		Register Dest = MI.getOperand(0).getReg();
MachineOperand Base = earlyUseOperand(MI.getOperand(1));		MachineOperand Base = earlyUseOperand(MI.getOperand(1));
int64_t Disp = MI.getOperand(2).getImm();		int64_t Disp = MI.getOperand(2).getImm();
Register OrigCmpVal = MI.getOperand(3).getReg();		Register CmpVal = MI.getOperand(3).getReg();
Register OrigSwapVal = MI.getOperand(4).getReg();		Register OrigSwapVal = MI.getOperand(4).getReg();
Register BitShift = MI.getOperand(5).getReg();		Register BitShift = MI.getOperand(5).getReg();
Register NegBitShift = MI.getOperand(6).getReg();		Register NegBitShift = MI.getOperand(6).getReg();
int64_t BitSize = MI.getOperand(7).getImm();		int64_t BitSize = MI.getOperand(7).getImm();
DebugLoc DL = MI.getDebugLoc();		DebugLoc DL = MI.getDebugLoc();

const TargetRegisterClass *RC = &SystemZ::GR32BitRegClass;		const TargetRegisterClass *RC = &SystemZ::GR32BitRegClass;

// Get the right opcodes for the displacement.		// Get the right opcodes for the displacement and zero-extension.
unsigned LOpcode = TII->getOpcodeForOffset(SystemZ::L, Disp);		unsigned LOpcode = TII->getOpcodeForOffset(SystemZ::L, Disp);
unsigned CSOpcode = TII->getOpcodeForOffset(SystemZ::CS, Disp);		unsigned CSOpcode = TII->getOpcodeForOffset(SystemZ::CS, Disp);
		unsigned ZExtOpcode = BitSize == 8 ? SystemZ::LLCR : SystemZ::LLHR;
assert(LOpcode && CSOpcode && "Displacement out of range");		assert(LOpcode && CSOpcode && "Displacement out of range");

// Create virtual registers for temporary results.		// Create virtual registers for temporary results.
Register OrigOldVal = MRI.createVirtualRegister(RC);		Register OrigOldVal = MRI.createVirtualRegister(RC);
Register OldVal = MRI.createVirtualRegister(RC);		Register OldVal = MRI.createVirtualRegister(RC);
Register CmpVal = MRI.createVirtualRegister(RC);
Register SwapVal = MRI.createVirtualRegister(RC);		Register SwapVal = MRI.createVirtualRegister(RC);
Register StoreVal = MRI.createVirtualRegister(RC);		Register StoreVal = MRI.createVirtualRegister(RC);
		Register OldValRot = MRI.createVirtualRegister(RC);
Register RetryOldVal = MRI.createVirtualRegister(RC);		Register RetryOldVal = MRI.createVirtualRegister(RC);
Register RetryCmpVal = MRI.createVirtualRegister(RC);
Register RetrySwapVal = MRI.createVirtualRegister(RC);		Register RetrySwapVal = MRI.createVirtualRegister(RC);

// Insert 2 basic blocks for the loop.		// Insert 2 basic blocks for the loop.
MachineBasicBlock *StartMBB = MBB;		MachineBasicBlock *StartMBB = MBB;
MachineBasicBlock *DoneMBB = SystemZ::splitBlockBefore(MI, MBB);		MachineBasicBlock *DoneMBB = SystemZ::splitBlockBefore(MI, MBB);
MachineBasicBlock *LoopMBB = SystemZ::emitBlockAfter(StartMBB);		MachineBasicBlock *LoopMBB = SystemZ::emitBlockAfter(StartMBB);
MachineBasicBlock *SetMBB = SystemZ::emitBlockAfter(LoopMBB);		MachineBasicBlock *SetMBB = SystemZ::emitBlockAfter(LoopMBB);

// StartMBB:		// StartMBB:
// ...		// ...
// %OrigOldVal = L Disp(%Base)		// %OrigOldVal = L Disp(%Base)
// # fall through to LoopMMB		// # fall through to LoopMMB
MBB = StartMBB;		MBB = StartMBB;
BuildMI(MBB, DL, TII->get(LOpcode), OrigOldVal)		BuildMI(MBB, DL, TII->get(LOpcode), OrigOldVal)
.add(Base)		.add(Base)
.addImm(Disp)		.addImm(Disp)
.addReg(0);		.addReg(0);
MBB->addSuccessor(LoopMBB);		MBB->addSuccessor(LoopMBB);

// LoopMBB:		// LoopMBB:
// %OldVal = phi [ %OrigOldVal, EntryBB ], [ %RetryOldVal, SetMBB ]		// %OldVal = phi [ %OrigOldVal, EntryBB ], [ %RetryOldVal, SetMBB ]
// %CmpVal = phi [ %OrigCmpVal, EntryBB ], [ %RetryCmpVal, SetMBB ]
// %SwapVal = phi [ %OrigSwapVal, EntryBB ], [ %RetrySwapVal, SetMBB ]		// %SwapVal = phi [ %OrigSwapVal, EntryBB ], [ %RetrySwapVal, SetMBB ]
// %Dest = RLL %OldVal, BitSize(%BitShift)		// %OldValRot = RLL %OldVal, BitSize(%BitShift)
// ^^ The low BitSize bits contain the field		// ^^ The low BitSize bits contain the field
// of interest.		// of interest.
// %RetryCmpVal = RISBG32 %CmpVal, %Dest, 32, 63-BitSize, 0		// %RetrySwapVal = RISBG32 %SwapVal, %OldValRot, 32, 63-BitSize, 0
// ^^ Replace the upper 32-BitSize bits of the		// ^^ Replace the upper 32-BitSize bits of the
// comparison value with those that we loaded,		// swap value with those that we loaded and rotated.
// so that we can use a full word comparison.		// %Dest = LL[CH] %OldValRot
// CR %Dest, %RetryCmpVal		// CR %Dest, %CmpVal
// JNE DoneMBB		// JNE DoneMBB
// # Fall through to SetMBB		// # Fall through to SetMBB
MBB = LoopMBB;		MBB = LoopMBB;
BuildMI(MBB, DL, TII->get(SystemZ::PHI), OldVal)		BuildMI(MBB, DL, TII->get(SystemZ::PHI), OldVal)
.addReg(OrigOldVal).addMBB(StartMBB)		.addReg(OrigOldVal).addMBB(StartMBB)
.addReg(RetryOldVal).addMBB(SetMBB);		.addReg(RetryOldVal).addMBB(SetMBB);
BuildMI(MBB, DL, TII->get(SystemZ::PHI), CmpVal)
.addReg(OrigCmpVal).addMBB(StartMBB)
.addReg(RetryCmpVal).addMBB(SetMBB);
BuildMI(MBB, DL, TII->get(SystemZ::PHI), SwapVal)		BuildMI(MBB, DL, TII->get(SystemZ::PHI), SwapVal)
.addReg(OrigSwapVal).addMBB(StartMBB)		.addReg(OrigSwapVal).addMBB(StartMBB)
.addReg(RetrySwapVal).addMBB(SetMBB);		.addReg(RetrySwapVal).addMBB(SetMBB);
BuildMI(MBB, DL, TII->get(SystemZ::RLL), Dest)		BuildMI(MBB, DL, TII->get(SystemZ::RLL), OldValRot)
.addReg(OldVal).addReg(BitShift).addImm(BitSize);		.addReg(OldVal).addReg(BitShift).addImm(BitSize);
BuildMI(MBB, DL, TII->get(SystemZ::RISBG32), RetryCmpVal)		BuildMI(MBB, DL, TII->get(SystemZ::RISBG32), RetrySwapVal)
.addReg(CmpVal).addReg(Dest).addImm(32).addImm(63 - BitSize).addImm(0);		.addReg(SwapVal).addReg(OldValRot).addImm(32).addImm(63 - BitSize).addImm(0);
		BuildMI(MBB, DL, TII->get(ZExtOpcode), Dest)
		.addReg(OldValRot);
BuildMI(MBB, DL, TII->get(SystemZ::CR))		BuildMI(MBB, DL, TII->get(SystemZ::CR))
.addReg(Dest).addReg(RetryCmpVal);		.addReg(Dest).addReg(CmpVal);
BuildMI(MBB, DL, TII->get(SystemZ::BRC))		BuildMI(MBB, DL, TII->get(SystemZ::BRC))
.addImm(SystemZ::CCMASK_ICMP)		.addImm(SystemZ::CCMASK_ICMP)
.addImm(SystemZ::CCMASK_CMP_NE).addMBB(DoneMBB);		.addImm(SystemZ::CCMASK_CMP_NE).addMBB(DoneMBB);
MBB->addSuccessor(DoneMBB);		MBB->addSuccessor(DoneMBB);
MBB->addSuccessor(SetMBB);		MBB->addSuccessor(SetMBB);

// SetMBB:		// SetMBB:
// %RetrySwapVal = RISBG32 %SwapVal, %Dest, 32, 63-BitSize, 0
// ^^ Replace the upper 32-BitSize bits of the new
// value with those that we loaded.
// %StoreVal = RLL %RetrySwapVal, -BitSize(%NegBitShift)		// %StoreVal = RLL %RetrySwapVal, -BitSize(%NegBitShift)
// ^^ Rotate the new field to its proper position.		// ^^ Rotate the new field to its proper position.
// %RetryOldVal = CS %Dest, %StoreVal, Disp(%Base)		// %RetryOldVal = CS %OldVal, %StoreVal, Disp(%Base)
// JNE LoopMBB		// JNE LoopMBB
// # fall through to ExitMMB		// # fall through to ExitMMB
MBB = SetMBB;		MBB = SetMBB;
BuildMI(MBB, DL, TII->get(SystemZ::RISBG32), RetrySwapVal)
.addReg(SwapVal).addReg(Dest).addImm(32).addImm(63 - BitSize).addImm(0);
BuildMI(MBB, DL, TII->get(SystemZ::RLL), StoreVal)		BuildMI(MBB, DL, TII->get(SystemZ::RLL), StoreVal)
.addReg(RetrySwapVal).addReg(NegBitShift).addImm(-BitSize);		.addReg(RetrySwapVal).addReg(NegBitShift).addImm(-BitSize);
BuildMI(MBB, DL, TII->get(CSOpcode), RetryOldVal)		BuildMI(MBB, DL, TII->get(CSOpcode), RetryOldVal)
.addReg(OldVal)		.addReg(OldVal)
.addReg(StoreVal)		.addReg(StoreVal)
.add(Base)		.add(Base)
.addImm(Disp);		.addImm(Disp);
BuildMI(MBB, DL, TII->get(SystemZ::BRC))		BuildMI(MBB, DL, TII->get(SystemZ::BRC))
▲ Show 20 Lines • Show All 767 Lines • Show Last 20 Lines

llvm/test/CodeGen/SystemZ/cmpxchg-01.ll

Show All 9 Lines
; tested in CHECK. CHECK-SHIFT also checks that %r3 is not modified before		; tested in CHECK. CHECK-SHIFT also checks that %r3 is not modified before
; being used in the RISBG (in contrast to things like atomic addition,		; being used in the RISBG (in contrast to things like atomic addition,
; which shift %r3 left so that %b is at the high end of the word).		; which shift %r3 left so that %b is at the high end of the word).
define i8 @f1(i8 %dummy, i8 *%src, i8 %cmp, i8 %swap) {		define i8 @f1(i8 %dummy, i8 *%src, i8 %cmp, i8 %swap) {
; CHECK-MAIN-LABEL: f1:		; CHECK-MAIN-LABEL: f1:
; CHECK-MAIN: risbg [[RISBG:%r[1-9]+]], %r3, 0, 189, 0{{$}}		; CHECK-MAIN: risbg [[RISBG:%r[1-9]+]], %r3, 0, 189, 0{{$}}
; CHECK-MAIN-DAG: sll %r3, 3		; CHECK-MAIN-DAG: sll %r3, 3
; CHECK-MAIN-DAG: l [[OLD:%r[0-9]+]], 0([[RISBG]])		; CHECK-MAIN-DAG: l [[OLD:%r[0-9]+]], 0([[RISBG]])
		; CHECK-MAIN-DAG: llcr %r4, %r4
; CHECK-MAIN: [[LOOP:\.[^ ]*]]:		; CHECK-MAIN: [[LOOP:\.[^ ]*]]:
; CHECK-MAIN: rll %r2, [[OLD]], 8(%r3)		; CHECK-MAIN: rll %r2, [[OLD]], 8(%r3)
; CHECK-MAIN: risbg %r4, %r2, 32, 55, 0
; CHECK-MAIN: crjlh %r2, %r4, [[EXIT:\.[^ ]*]]
; CHECK-MAIN: risbg %r5, %r2, 32, 55, 0		; CHECK-MAIN: risbg %r5, %r2, 32, 55, 0
		; CHECK-MAIN: llcr %r2, %r2
		; CHECK-MAIN: crjlh %r2, %r4, [[EXIT:\.[^ ]*]]
; CHECK-MAIN: rll [[NEW:%r[0-9]+]], %r5, -8({{%r[1-9]+}})		; CHECK-MAIN: rll [[NEW:%r[0-9]+]], %r5, -8({{%r[1-9]+}})
; CHECK-MAIN: cs [[OLD]], [[NEW]], 0([[RISBG]])		; CHECK-MAIN: cs [[OLD]], [[NEW]], 0([[RISBG]])
; CHECK-MAIN: jl [[LOOP]]		; CHECK-MAIN: jl [[LOOP]]
; CHECK-MAIN: [[EXIT]]:		; CHECK-MAIN: [[EXIT]]:
; CHECK-MAIN-NOT: %r2		; CHECK-MAIN-NOT: %r2
; CHECK-MAIN: br %r14		; CHECK-MAIN: br %r14
;		;
; CHECK-SHIFT-LABEL: f1:		; CHECK-SHIFT-LABEL: f1:
Show All 12 Lines
; CHECK-LABEL: f2:		; CHECK-LABEL: f2:
; CHECK: lhi [[CMP:%r[0-9]+]], 42		; CHECK: lhi [[CMP:%r[0-9]+]], 42
; CHECK: risbg [[CMP]], {{%r[0-9]+}}, 32, 55, 0		; CHECK: risbg [[CMP]], {{%r[0-9]+}}, 32, 55, 0
; CHECK: risbg		; CHECK: risbg
; CHECK: br %r14		; CHECK: br %r14
;		;
; CHECK-SHIFT-LABEL: f2:		; CHECK-SHIFT-LABEL: f2:
; CHECK-SHIFT: lhi [[SWAP:%r[0-9]+]], 88		; CHECK-SHIFT: lhi [[SWAP:%r[0-9]+]], 88
; CHECK-SHIFT: risbg
; CHECK-SHIFT: risbg [[SWAP]], {{%r[0-9]+}}, 32, 55, 0		; CHECK-SHIFT: risbg [[SWAP]], {{%r[0-9]+}}, 32, 55, 0
; CHECK-SHIFT: br %r14		; CHECK-SHIFT: br %r14
%pair = cmpxchg i8 *%src, i8 42, i8 88 seq_cst seq_cst		%pair = cmpxchg i8 *%src, i8 42, i8 88 seq_cst seq_cst
%res = extractvalue { i8, i1 } %pair, 0		%res = extractvalue { i8, i1 } %pair, 0
ret i8 %res		ret i8 %res
}		}

; Check generating the comparison result.		; Check generating the comparison result.
define i32 @f3(i8 %dummy, i8 *%src, i8 %cmp, i8 %swap) {		define i32 @f3(i8 %dummy, i8 *%src, i8 %cmp, i8 %swap) {
; CHECK-MAIN-LABEL: f3:		; CHECK-MAIN-LABEL: f3:
; CHECK-MAIN: risbg [[RISBG:%r[1-9]+]], %r3, 0, 189, 0{{$}}		; CHECK-MAIN: risbg [[RISBG:%r[1-9]+]], %r3, 0, 189, 0{{$}}
; CHECK-MAIN-DAG: sll %r3, 3		; CHECK-MAIN-DAG: sll %r3, 3
; CHECK-MAIN-DAG: l [[OLD:%r[0-9]+]], 0([[RISBG]])		; CHECK-MAIN-DAG: l [[OLD:%r[0-9]+]], 0([[RISBG]])
		; CHECK-MAIN-DAG: llcr %r2, %r4
; CHECK-MAIN: [[LOOP:\.[^ ]*]]:		; CHECK-MAIN: [[LOOP:\.[^ ]*]]:
; CHECK-MAIN: rll [[TMP:%r[0-9]+]], [[OLD]], 8(%r3)		; CHECK-MAIN: rll [[TMP:%r[0-9]+]], [[OLD]], 8(%r3)
; CHECK-MAIN: risbg %r4, [[TMP]], 32, 55, 0
; CHECK-MAIN: cr [[TMP]], %r4
; CHECK-MAIN: jlh [[EXIT:\.[^ ]*]]
; CHECK-MAIN: risbg %r5, [[TMP]], 32, 55, 0		; CHECK-MAIN: risbg %r5, [[TMP]], 32, 55, 0
		; CHECK-MAIN: llcr [[TMP2:%r[0-9]+]], [[TMP]]
		; CHECK-MAIN: cr [[TMP2]], %r2
		; CHECK-MAIN: jlh [[EXIT:\.[^ ]*]]
; CHECK-MAIN: rll [[NEW:%r[0-9]+]], %r5, -8({{%r[1-9]+}})		; CHECK-MAIN: rll [[NEW:%r[0-9]+]], %r5, -8({{%r[1-9]+}})
; CHECK-MAIN: cs [[OLD]], [[NEW]], 0([[RISBG]])		; CHECK-MAIN: cs [[OLD]], [[NEW]], 0([[RISBG]])
; CHECK-MAIN: jl [[LOOP]]		; CHECK-MAIN: jl [[LOOP]]
; CHECK-MAIN: [[EXIT]]:		; CHECK-MAIN: [[EXIT]]:
; CHECK-MAIN-NEXT: ipm %r2		; CHECK-MAIN-NEXT: ipm %r2
; CHECK-MAIN-NEXT: afi %r2, -268435456		; CHECK-MAIN-NEXT: afi %r2, -268435456
; CHECK-MAIN-NEXT: srl %r2, 31		; CHECK-MAIN-NEXT: srl %r2, 31
; CHECK-MAIN-NOT: %r2		; CHECK-MAIN-NOT: %r2
Show All 14 Lines
declare void @g()		declare void @g()

; Check using the comparison result for a branch.		; Check using the comparison result for a branch.
; CHECK-LABEL: f4		; CHECK-LABEL: f4
; CHECK-MAIN-LABEL: f4:		; CHECK-MAIN-LABEL: f4:
; CHECK-MAIN: risbg [[RISBG:%r[1-9]+]], %r2, 0, 189, 0{{$}}		; CHECK-MAIN: risbg [[RISBG:%r[1-9]+]], %r2, 0, 189, 0{{$}}
; CHECK-MAIN-DAG: sll %r2, 3		; CHECK-MAIN-DAG: sll %r2, 3
; CHECK-MAIN-DAG: l [[OLD:%r[0-9]+]], 0([[RISBG]])		; CHECK-MAIN-DAG: l [[OLD:%r[0-9]+]], 0([[RISBG]])
		; CHECK-MAIN-DAG: llcr %r3, %r3
; CHECK-MAIN: [[LOOP:\.[^ ]*]]:		; CHECK-MAIN: [[LOOP:\.[^ ]*]]:
; CHECK-MAIN: rll [[TMP:%r[0-9]+]], [[OLD]], 8(%r2)		; CHECK-MAIN: rll [[TMP:%r[0-9]+]], [[OLD]], 8(%r2)
; CHECK-MAIN: risbg %r3, [[TMP]], 32, 55, 0		; CHECK-MAIN: risbg %r4, [[TMP]], 32, 55, 0
		; CHECK-MAIN: llcr [[TMP]], [[TMP]]
; CHECK-MAIN: cr [[TMP]], %r3		; CHECK-MAIN: cr [[TMP]], %r3
; CHECK-MAIN: jlh [[EXIT:\.[^ ]*]]		; CHECK-MAIN: jlh [[EXIT:\.[^ ]*]]
; CHECK-MAIN: risbg %r4, [[TMP]], 32, 55, 0
; CHECK-MAIN: rll [[NEW:%r[0-9]+]], %r4, -8({{%r[1-9]+}})		; CHECK-MAIN: rll [[NEW:%r[0-9]+]], %r4, -8({{%r[1-9]+}})
; CHECK-MAIN: cs [[OLD]], [[NEW]], 0([[RISBG]])		; CHECK-MAIN: cs [[OLD]], [[NEW]], 0([[RISBG]])
; CHECK-MAIN: jl [[LOOP]]		; CHECK-MAIN: jl [[LOOP]]
; CHECK-MAIN: [[EXIT]]:		; CHECK-MAIN: [[EXIT]]:
; CHECK-MAIN-NEXT: jlh [[LABEL:\.[^ ]*]]		; CHECK-MAIN-NEXT: jlh [[LABEL:\.[^ ]*]]
; CHECK-MAIN: jg g		; CHECK-MAIN: jg g
; CHECK-MAIN: [[LABEL]]:		; CHECK-MAIN: [[LABEL]]:
; CHECK-MAIN: br %r14		; CHECK-MAIN: br %r14
Show All 16 Lines	exit:
ret void		ret void
}		}

; ... and the same with the inverted direction.		; ... and the same with the inverted direction.
; CHECK-MAIN-LABEL: f5:		; CHECK-MAIN-LABEL: f5:
; CHECK-MAIN: risbg [[RISBG:%r[1-9]+]], %r2, 0, 189, 0{{$}}		; CHECK-MAIN: risbg [[RISBG:%r[1-9]+]], %r2, 0, 189, 0{{$}}
; CHECK-MAIN-DAG: sll %r2, 3		; CHECK-MAIN-DAG: sll %r2, 3
; CHECK-MAIN-DAG: l [[OLD:%r[0-9]+]], 0([[RISBG]])		; CHECK-MAIN-DAG: l [[OLD:%r[0-9]+]], 0([[RISBG]])
		; CHECK-MAIN-DAG: llcr %r3, %r3
; CHECK-MAIN: [[LOOP:\.[^ ]*]]:		; CHECK-MAIN: [[LOOP:\.[^ ]*]]:
; CHECK-MAIN: rll [[TMP:%r[0-9]+]], [[OLD]], 8(%r2)		; CHECK-MAIN: rll [[TMP:%r[0-9]+]], [[OLD]], 8(%r2)
; CHECK-MAIN: risbg %r3, [[TMP]], 32, 55, 0		; CHECK-MAIN: risbg %r4, [[TMP]], 32, 55, 0
		; CHECK-MAIN: llcr [[TMP]], [[TMP]]
; CHECK-MAIN: cr [[TMP]], %r3		; CHECK-MAIN: cr [[TMP]], %r3
; CHECK-MAIN: jlh [[EXIT:\.[^ ]*]]		; CHECK-MAIN: jlh [[EXIT:\.[^ ]*]]
; CHECK-MAIN: risbg %r4, [[TMP]], 32, 55, 0
; CHECK-MAIN: rll [[NEW:%r[0-9]+]], %r4, -8({{%r[1-9]+}})		; CHECK-MAIN: rll [[NEW:%r[0-9]+]], %r4, -8({{%r[1-9]+}})
; CHECK-MAIN: cs [[OLD]], [[NEW]], 0([[RISBG]])		; CHECK-MAIN: cs [[OLD]], [[NEW]], 0([[RISBG]])
; CHECK-MAIN: jl [[LOOP]]		; CHECK-MAIN: jl [[LOOP]]
; CHECK-MAIN: [[EXIT]]:		; CHECK-MAIN: [[EXIT]]:
; CHECK-MAIN-NEXT: jlh [[LABEL:\.[^ ]*]]		; CHECK-MAIN-NEXT: jlh [[LABEL:\.[^ ]*]]
; CHECK-MAIN: br %r14		; CHECK-MAIN: br %r14
; CHECK-MAIN: [[LABEL]]:		; CHECK-MAIN: [[LABEL]]:
; CHECK-MAIN: jg g		; CHECK-MAIN: jg g
Show All 19 Lines

llvm/test/CodeGen/SystemZ/cmpxchg-02.ll

Show All 9 Lines
; tested in CHECK. CHECK-SHIFT also checks that %r3 is not modified before		; tested in CHECK. CHECK-SHIFT also checks that %r3 is not modified before
; being used in the RISBG (in contrast to things like atomic addition,		; being used in the RISBG (in contrast to things like atomic addition,
; which shift %r3 left so that %b is at the high end of the word).		; which shift %r3 left so that %b is at the high end of the word).
define i16 @f1(i16 %dummy, i16 *%src, i16 %cmp, i16 %swap) {		define i16 @f1(i16 %dummy, i16 *%src, i16 %cmp, i16 %swap) {
; CHECK-MAIN-LABEL: f1:		; CHECK-MAIN-LABEL: f1:
; CHECK-MAIN: risbg [[RISBG:%r[1-9]+]], %r3, 0, 189, 0{{$}}		; CHECK-MAIN: risbg [[RISBG:%r[1-9]+]], %r3, 0, 189, 0{{$}}
; CHECK-MAIN-DAG: sll %r3, 3		; CHECK-MAIN-DAG: sll %r3, 3
; CHECK-MAIN-DAG: l [[OLD:%r[0-9]+]], 0([[RISBG]])		; CHECK-MAIN-DAG: l [[OLD:%r[0-9]+]], 0([[RISBG]])
		; CHECK-MAIN-DAG: llhr %r4, %r4
; CHECK-MAIN: [[LOOP:\.[^ ]*]]:		; CHECK-MAIN: [[LOOP:\.[^ ]*]]:
; CHECK-MAIN: rll %r2, [[OLD]], 16(%r3)		; CHECK-MAIN: rll %r2, [[OLD]], 16(%r3)
; CHECK-MAIN: risbg %r4, %r2, 32, 47, 0
; CHECK-MAIN: crjlh %r2, %r4, [[EXIT:\.[^ ]*]]
; CHECK-MAIN: risbg %r5, %r2, 32, 47, 0		; CHECK-MAIN: risbg %r5, %r2, 32, 47, 0
		; CHECK-MAIN: llhr %r2, %r2
		; CHECK-MAIN: crjlh %r2, %r4, [[EXIT:\.[^ ]*]]
; CHECK-MAIN: rll [[NEW:%r[0-9]+]], %r5, -16({{%r[1-9]+}})		; CHECK-MAIN: rll [[NEW:%r[0-9]+]], %r5, -16({{%r[1-9]+}})
; CHECK-MAIN: cs [[OLD]], [[NEW]], 0([[RISBG]])		; CHECK-MAIN: cs [[OLD]], [[NEW]], 0([[RISBG]])
; CHECK-MAIN: jl [[LOOP]]		; CHECK-MAIN: jl [[LOOP]]
; CHECK-MAIN: [[EXIT]]:		; CHECK-MAIN: [[EXIT]]:
; CHECK-MAIN-NOT: %r2		; CHECK-MAIN-NOT: %r2
; CHECK-MAIN: br %r14		; CHECK-MAIN: br %r14
;		;
; CHECK-SHIFT-LABEL: f1:		; CHECK-SHIFT-LABEL: f1:
Show All 12 Lines
; CHECK-LABEL: f2:		; CHECK-LABEL: f2:
; CHECK: lhi [[CMP:%r[0-9]+]], 42		; CHECK: lhi [[CMP:%r[0-9]+]], 42
; CHECK: risbg [[CMP]], {{%r[0-9]+}}, 32, 47, 0		; CHECK: risbg [[CMP]], {{%r[0-9]+}}, 32, 47, 0
; CHECK: risbg		; CHECK: risbg
; CHECK: br %r14		; CHECK: br %r14
;		;
; CHECK-SHIFT-LABEL: f2:		; CHECK-SHIFT-LABEL: f2:
; CHECK-SHIFT: lhi [[SWAP:%r[0-9]+]], 88		; CHECK-SHIFT: lhi [[SWAP:%r[0-9]+]], 88
; CHECK-SHIFT: risbg
; CHECK-SHIFT: risbg [[SWAP]], {{%r[0-9]+}}, 32, 47, 0		; CHECK-SHIFT: risbg [[SWAP]], {{%r[0-9]+}}, 32, 47, 0
; CHECK-SHIFT: br %r14		; CHECK-SHIFT: br %r14
%pair = cmpxchg i16 *%src, i16 42, i16 88 seq_cst seq_cst		%pair = cmpxchg i16 *%src, i16 42, i16 88 seq_cst seq_cst
%res = extractvalue { i16, i1 } %pair, 0		%res = extractvalue { i16, i1 } %pair, 0
ret i16 %res		ret i16 %res
}		}

; Check generating the comparison result.		; Check generating the comparison result.
define i32 @f3(i16 %dummy, i16 *%src, i16 %cmp, i16 %swap) {		define i32 @f3(i16 %dummy, i16 *%src, i16 %cmp, i16 %swap) {
; CHECK-MAIN-LABEL: f3:		; CHECK-MAIN-LABEL: f3:
; CHECK-MAIN: risbg [[RISBG:%r[1-9]+]], %r3, 0, 189, 0{{$}}		; CHECK-MAIN: risbg [[RISBG:%r[1-9]+]], %r3, 0, 189, 0{{$}}
; CHECK-MAIN-DAG: sll %r3, 3		; CHECK-MAIN-DAG: sll %r3, 3
; CHECK-MAIN-DAG: l [[OLD:%r[0-9]+]], 0([[RISBG]])		; CHECK-MAIN-DAG: l [[OLD:%r[0-9]+]], 0([[RISBG]])
		; CHECK-MAIN-DAG: llhr %r2, %r4
; CHECK-MAIN: [[LOOP:\.[^ ]*]]:		; CHECK-MAIN: [[LOOP:\.[^ ]*]]:
; CHECK-MAIN: rll [[TMP:%r[0-9]+]], [[OLD]], 16(%r3)		; CHECK-MAIN: rll [[TMP:%r[0-9]+]], [[OLD]], 16(%r3)
; CHECK-MAIN: risbg %r4, [[TMP]], 32, 47, 0
; CHECK-MAIN: cr [[TMP]], %r4
; CHECK-MAIN: jlh [[EXIT:\.[^ ]*]]
; CHECK-MAIN: risbg %r5, [[TMP]], 32, 47, 0		; CHECK-MAIN: risbg %r5, [[TMP]], 32, 47, 0
		; CHECK-MAIN: llhr %r14, %r14
		; CHECK-MAIN: cr [[TMP]], %r2
		; CHECK-MAIN: jlh [[EXIT:\.[^ ]*]]
; CHECK-MAIN: rll [[NEW:%r[0-9]+]], %r5, -16({{%r[1-9]+}})		; CHECK-MAIN: rll [[NEW:%r[0-9]+]], %r5, -16({{%r[1-9]+}})
; CHECK-MAIN: cs [[OLD]], [[NEW]], 0([[RISBG]])		; CHECK-MAIN: cs [[OLD]], [[NEW]], 0([[RISBG]])
; CHECK-MAIN: jl [[LOOP]]		; CHECK-MAIN: jl [[LOOP]]
; CHECK-MAIN: [[EXIT]]:		; CHECK-MAIN: [[EXIT]]:
; CHECK-MAIN-NEXT: ipm %r2		; CHECK-MAIN-NEXT: ipm %r2
; CHECK-MAIN-NEXT: afi %r2, -268435456		; CHECK-MAIN-NEXT: afi %r2, -268435456
; CHECK-MAIN-NEXT: srl %r2, 31		; CHECK-MAIN-NEXT: srl %r2, 31
; CHECK-MAIN-NOT: %r2		; CHECK-MAIN-NOT: %r2
Show All 13 Lines
declare void @g()		declare void @g()

; Check using the comparison result for a branch.		; Check using the comparison result for a branch.
; CHECK-LABEL: f4		; CHECK-LABEL: f4
; CHECK-MAIN-LABEL: f4:		; CHECK-MAIN-LABEL: f4:
; CHECK-MAIN: risbg [[RISBG:%r[1-9]+]], %r2, 0, 189, 0{{$}}		; CHECK-MAIN: risbg [[RISBG:%r[1-9]+]], %r2, 0, 189, 0{{$}}
; CHECK-MAIN-DAG: sll %r2, 3		; CHECK-MAIN-DAG: sll %r2, 3
; CHECK-MAIN-DAG: l [[OLD:%r[0-9]+]], 0([[RISBG]])		; CHECK-MAIN-DAG: l [[OLD:%r[0-9]+]], 0([[RISBG]])
		; CHECK-MAIN-DAG: llhr %r3, %r3
; CHECK-MAIN: [[LOOP:\.[^ ]*]]:		; CHECK-MAIN: [[LOOP:\.[^ ]*]]:
; CHECK-MAIN: rll [[TMP:%r[0-9]+]], [[OLD]], 16(%r2)		; CHECK-MAIN: rll [[TMP:%r[0-9]+]], [[OLD]], 16(%r2)
; CHECK-MAIN: risbg %r3, [[TMP]], 32, 47, 0		; CHECK-MAIN: risbg %r4, [[TMP]], 32, 47, 0
		; CHECK-MAIN: llhr %r14, %r14
; CHECK-MAIN: cr [[TMP]], %r3		; CHECK-MAIN: cr [[TMP]], %r3
; CHECK-MAIN: jlh [[EXIT:\.[^ ]*]]		; CHECK-MAIN: jlh [[EXIT:\.[^ ]*]]
; CHECK-MAIN: risbg %r4, [[TMP]], 32, 47, 0
; CHECK-MAIN: rll [[NEW:%r[0-9]+]], %r4, -16({{%r[1-9]+}})		; CHECK-MAIN: rll [[NEW:%r[0-9]+]], %r4, -16({{%r[1-9]+}})
; CHECK-MAIN: cs [[OLD]], [[NEW]], 0([[RISBG]])		; CHECK-MAIN: cs [[OLD]], [[NEW]], 0([[RISBG]])
; CHECK-MAIN: jl [[LOOP]]		; CHECK-MAIN: jl [[LOOP]]
; CHECK-MAIN: [[EXIT]]:		; CHECK-MAIN: [[EXIT]]:
; CHECK-MAIN-NEXT: jlh [[LABEL:\.[^ ]*]]		; CHECK-MAIN-NEXT: jlh [[LABEL:\.[^ ]*]]
; CHECK-MAIN: jg g		; CHECK-MAIN: jg g
; CHECK-MAIN: [[LABEL]]:		; CHECK-MAIN: [[LABEL]]:
; CHECK-MAIN: br %r14		; CHECK-MAIN: br %r14
Show All 16 Lines	exit:
ret void		ret void
}		}

; ... and the same with the inverted direction.		; ... and the same with the inverted direction.
; CHECK-MAIN-LABEL: f5:		; CHECK-MAIN-LABEL: f5:
; CHECK-MAIN: risbg [[RISBG:%r[1-9]+]], %r2, 0, 189, 0{{$}}		; CHECK-MAIN: risbg [[RISBG:%r[1-9]+]], %r2, 0, 189, 0{{$}}
; CHECK-MAIN-DAG: sll %r2, 3		; CHECK-MAIN-DAG: sll %r2, 3
; CHECK-MAIN-DAG: l [[OLD:%r[0-9]+]], 0([[RISBG]])		; CHECK-MAIN-DAG: l [[OLD:%r[0-9]+]], 0([[RISBG]])
		; CHECK-MAIN-DAG: llhr %r3, %r3
; CHECK-MAIN: [[LOOP:\.[^ ]*]]:		; CHECK-MAIN: [[LOOP:\.[^ ]*]]:
; CHECK-MAIN: rll [[TMP:%r[0-9]+]], [[OLD]], 16(%r2)		; CHECK-MAIN: rll [[TMP:%r[0-9]+]], [[OLD]], 16(%r2)
; CHECK-MAIN: risbg %r3, [[TMP]], 32, 47, 0		; CHECK-MAIN: risbg %r4, [[TMP]], 32, 47, 0
		; CHECK-MAIN: llhr %r14, %r14
; CHECK-MAIN: cr [[TMP]], %r3		; CHECK-MAIN: cr [[TMP]], %r3
; CHECK-MAIN: jlh [[EXIT:\.[^ ]*]]		; CHECK-MAIN: jlh [[EXIT:\.[^ ]*]]
; CHECK-MAIN: risbg %r4, [[TMP]], 32, 47, 0
; CHECK-MAIN: rll [[NEW:%r[0-9]+]], %r4, -16({{%r[1-9]+}})		; CHECK-MAIN: rll [[NEW:%r[0-9]+]], %r4, -16({{%r[1-9]+}})
; CHECK-MAIN: cs [[OLD]], [[NEW]], 0([[RISBG]])		; CHECK-MAIN: cs [[OLD]], [[NEW]], 0([[RISBG]])
; CHECK-MAIN: jl [[LOOP]]		; CHECK-MAIN: jl [[LOOP]]
; CHECK-MAIN: [[EXIT]]:		; CHECK-MAIN: [[EXIT]]:
; CHECK-MAIN-NEXT: jlh [[LABEL:\.[^ ]*]]		; CHECK-MAIN-NEXT: jlh [[LABEL:\.[^ ]*]]
; CHECK-MAIN: br %r14		; CHECK-MAIN: br %r14
; CHECK-MAIN: [[LABEL]]:		; CHECK-MAIN: [[LABEL]]:
; CHECK-MAIN: jg g		; CHECK-MAIN: jg g
Show All 19 Lines

llvm/test/CodeGen/SystemZ/cmpxchg-05.ll

	; Test proper extension of 8-bit/16-bit cmpxchg.			; Test proper extension of 8-bit/16-bit cmpxchg.
	;			;
	; RUN: llc < %s -mtriple=s390x-linux-gnu \| FileCheck %s			; RUN: llc < %s -mtriple=s390x-linux-gnu \| FileCheck %s

	; CHECK-LABEL: f1			; CHECK-LABEL: f1
	; CHECK: crjlh			; CHECK: crjlh
	; CHECK-NOT: llcr			; CHECK-NOT: llcr
	; CHECK-NOT: cr			; CHECK-NOT: cr
	; CHECK: llgcr %r2, [[RES:%r[0-9]+]]			; CHECK: llgfr %r2, [[RES:%r[0-9]+]]
	; CHECK-NOT: llcr			; CHECK-NOT: llcr
	; CHECK-NOT: cr			; CHECK-NOT: cr
	define zeroext i8 @f1(i8* nocapture, i8 zeroext, i8 zeroext) {			define zeroext i8 @f1(i8* nocapture, i8 zeroext, i8 zeroext) {
	%cx = cmpxchg i8* %0, i8 %1, i8 %2 seq_cst seq_cst			%cx = cmpxchg i8* %0, i8 %1, i8 %2 seq_cst seq_cst
	%res = extractvalue { i8, i1 } %cx, 0			%res = extractvalue { i8, i1 } %cx, 0
	ret i8 %res			ret i8 %res
	}			}

	; CHECK-LABEL: f2			; CHECK-LABEL: f2
	; CHECK: crjlh			; CHECK: crjlh
	; CHECK-NOT: llhr			; CHECK-NOT: llhr
	; CHECK-NOT: cr			; CHECK-NOT: cr
	; CHECK: llghr %r2, [[RES:%r[0-9]+]]			; CHECK: llgfr %r2, [[RES:%r[0-9]+]]
	; CHECK-NOT: llhr			; CHECK-NOT: llhr
	; CHECK-NOT: cr			; CHECK-NOT: cr
	define zeroext i16 @f2(i16* nocapture, i16 zeroext, i16 zeroext) {			define zeroext i16 @f2(i16* nocapture, i16 zeroext, i16 zeroext) {
	%cx = cmpxchg i16* %0, i16 %1, i16 %2 seq_cst seq_cst			%cx = cmpxchg i16* %0, i16 %1, i16 %2 seq_cst seq_cst
	%res = extractvalue { i16, i1 } %cx, 0			%res = extractvalue { i16, i1 } %cx, 0
	ret i16 %res			ret i16 %res
	}			}

	Show All 17 Lines
	; CHECK: lghr %r2, [[RES:%r[0-9]+]]			; CHECK: lghr %r2, [[RES:%r[0-9]+]]
	; CHECK-NOT: llhr			; CHECK-NOT: llhr
	; CHECK-NOT: cr			; CHECK-NOT: cr
	define signext i16 @f4(i16* nocapture, i16 signext, i16 signext) {			define signext i16 @f4(i16* nocapture, i16 signext, i16 signext) {
	%cx = cmpxchg i16* %0, i16 %1, i16 %2 seq_cst seq_cst			%cx = cmpxchg i16* %0, i16 %1, i16 %2 seq_cst seq_cst
	%res = extractvalue { i16, i1 } %cx, 0			%res = extractvalue { i16, i1 } %cx, 0
	ret i16 %res			ret i16 %res
	}			}