Download Raw Diff

Details

Reviewers

nemanjai
stefanp
lkail

Group Reviewers

Restricted Project

Commits

rG3d259a82da3e: [PowerPC] Fix LQ-STQ instructions to use correct offset and base

Summary

This patch fixes the load and store quadword instructions on
PowerPC to use correct offset and base address.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

saghir created this revision.Jun 1 2022, 12:16 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2022, 12:16 PM

Herald added subscribers: steven.zhang, shchenz, kbarton and 2 others. · View Herald Transcript

saghir requested review of this revision.Jun 1 2022, 12:16 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2022, 12:16 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

saghir added reviewers: Restricted Project, nemanjai, stefanp.Jun 1 2022, 12:17 PM

Harbormaster completed remote builds in B167322: Diff 433490.Jun 1 2022, 1:00 PM

Use string substitution blocks for matching register and offset.

Harbormaster completed remote builds in B167335: Diff 433505.Jun 1 2022, 1:45 PM

I think the root cause here is lq/stq is lacking x-form. There are LQX_PSEUDO and STQX_PSEUDO defined to assist accessing big offset inside stack/heap, but they are expanded pre-RA. I think we should expand them in PPCAtomicExpandPass(which is after prologepilog) and make lq/stq's x-form mapping to LQX_PSEUDO/STQX_PSEUDO.

In D126807#3551902, @lkail wrote:

I think the root cause here is lq/stq is lacking x-form. There are LQX_PSEUDO and STQX_PSEUDO defined to assist accessing big offset inside stack/heap, but they are expanded pre-RA. I think we should expand them in PPCAtomicExpandPass(which is after prologepilog) and make lq/stq's x-form mapping to LQX_PSEUDO/STQX_PSEUDO.

This really doesn't seem like an instruction we should be expanding prior to RA (i.e. there is no reason for it to be a PPCCustomInserterPseudo). It should probably just be a PPCPostRAExpPseudo. That would be expanded in PPCInstrInfo::expandPostRAPseudo(). Wouldn't that be late enough?

In any case, this works if the offset fits but not if it doesn't. So we're just kicking the can down the road a little bit. We need to add the X-Form pseudo to the ImmToIdxMap and expand it at the appropriate time.

This revision now requires changes to proceed.Jun 1 2022, 5:56 PM

That would be expanded in PPCInstrInfo::expandPostRAPseudo()

Looks also viable as long as we do it after prologepilog.

Note that we might need an additional register in outs to keep the result of the sum of two registers in memrr.

// handle x-form during isel.
def LQX_PSEUDO : PPCPostRAExpPseudo<(outs g8prc:$RTp, g8rc:$scratch),
                                    (ins memrr:$src), "#LQX_PSEUDO", []>;
def STQX_PSEUDO : PPCPostRAExpPseudo<(outs g8rc:$scratch),
                                     (ins g8prc:$RSp, memrr:$dst),
                                     "#STQX_PSEUDO", []>;

Handle unaligned offsets and offsets that do not fit in the instruction.

This fix looks making offset handling more complex. We can make it easier by add LQX_PSEUDO and STQX_PSEUDO to ImmToIdxMap.

ImmToIdxMap[PPC::LQ]  = PPC::LQX_PSEUDO;
ImmToIdxMap[PPC::STQ] = PPC::STQX_PSEUDO;

And expand PPC::LQX_PSEUDO and PPC::STQX_PSEUDO post RA(They are expanded in PPCTargetLowering::EmitInstrWithCustomInserter right now). Something like

$x6 = LQX_PSEUDO $x0, $x1
=>
$x3 = ADD $x0, $x1
$x6 = LQ 0($x3)

Thus we make LQ/STQ fits in how we are handling frame index now, no more code is needed to handling the offset for LQ/STQ(When LQ/STQ are selected by ISEL, alignment is guaranteed to be 16 bytes since they are for atomic operations).

Harbormaster completed remote builds in B169703: Diff 436760.Jun 14 2022, 7:18 AM

Added LQX_PSEUDO and STQX_PSEUDO to ImmToIdxMap, and rebased.

In D126807#3581621, @lkail wrote:
This fix looks making offset handling more complex. We can make it easier by add LQX_PSEUDO and STQX_PSEUDO to ImmToIdxMap.
ImmToIdxMap[PPC::LQ]  = PPC::LQX_PSEUDO;
ImmToIdxMap[PPC::STQ] = PPC::STQX_PSEUDO;
And expand PPC::LQX_PSEUDO and PPC::STQX_PSEUDO post RA(They are expanded in PPCTargetLowering::EmitInstrWithCustomInserter right now). Something like
$x6 = LQX_PSEUDO $x0, $x1
=>
$x3 = ADD $x0, $x1
$x6 = LQ 0($x3)
Thus we make LQ/STQ fits in how we are handling frame index now, no more code is needed to handling the offset for LQ/STQ(When LQ/STQ are selected by ISEL, alignment is guaranteed to be 16 bytes since they are for atomic operations).

Thanks for the suggestion. Adding LQX_PSEUDO and STQX_PSEUDO to ImmToIdxMap fixes both cases:

Offset fits in the instruction
Offset does not fit in the instruction

I have taken out the test case for the unaligned case since alignment is guaranteed to be 16 bytes for atomic operations.

nemanjai added inline comments.Jun 14 2022, 4:40 PM

llvm/test/CodeGen/PowerPC/LQ-STQ-32bit-offset.ll
13–19	No regex please. Lets produce actual instructions that show the registers as well as offsets.
llvm/test/CodeGen/PowerPC/LQ-STQ.ll
13	Same as above.

Harbormaster completed remote builds in B169853: Diff 436955.Jun 14 2022, 4:41 PM

nemanjai added inline comments.Jun 14 2022, 4:55 PM

llvm/test/CodeGen/PowerPC/LQ-STQ-32bit-offset.ll
2	What is this test case actually testing? It produces the same code with and without this patch.

nemanjai added inline comments.Jun 14 2022, 5:05 PM

llvm/test/CodeGen/PowerPC/LQ-STQ-32bit-offset.ll
2	Sorry, this isn't actually true. However with this patch, the frame index isn't actually removed during frame index elimination. So that doesn't test the intent of this test case.

If this IR is compiled with -mattr=+quadword-atomics, we emit #STQX_PSEUDO which is definitely not what we want:

%struct.StructA = type { [16 x i8] }

@s1 = dso_local global i128 324929342, align 16

; Function Attrs: mustprogress noinline nounwind optnone uwtable
define dso_local void @_Z4testv() #0 {
entry:
  %s2 = alloca %struct.StructA, align 16
  %s3 = alloca %struct.StructA, align 16
  %arr = alloca [997003 x i8], align 1
  %tmp = alloca %struct.StructA, align 16
  call void @llvm.memcpy.p0.p0.i64(ptr align 16 %tmp, ptr align 16 @s1, i64 16, i1 false)
  %0 = load i128, ptr %tmp, align 16
  store atomic i128 %0, ptr %s2 seq_cst, align 16
  ret void
}

; Function Attrs: argmemonly nofree nounwind willreturn
declare void @llvm.memcpy.p0.p0.i64(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i64, i1 immarg) #1

attributes #0 = { noinline optnone }

llvm/test/CodeGen/PowerPC/LQ-STQ-32bit-offset.ll
35	Please get rid of all of this and just compile with `-mcpu=pwr10`.

This revision now requires changes to proceed.Jun 14 2022, 5:42 PM

Address review comments

saghir added inline comments.Jun 15 2022, 11:17 AM

llvm/test/CodeGen/PowerPC/LQ-STQ-32bit-offset.ll
28	This does not look right, I am looking into it.

Harbormaster completed remote builds in B170057: Diff 437257.Jun 15 2022, 12:37 PM

updated test case.

update test case output

saghir added inline comments.Jun 15 2022, 2:43 PM

llvm/test/CodeGen/PowerPC/LQ-STQ-32bit-offset.ll
28	Updated test case and now it looks good.

lkail requested changes to this revision.Jun 15 2022, 3:45 PM

lkail added inline comments.

llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp
496	This looks redundant. Test cases can pass without this change.
1589	Expanding two `PPCCustomInserterPseudo` instructions here looks odd. I would expect these two instructions expanded after `prologepilog`, in `postrapseudos`. This is making us expand these two instructions twice in backend code. It's more adequate to make them `PPCPostRAExpPseudo` rather than `PPCCustomInserterPseudo`.

This revision now requires changes to proceed.Jun 15 2022, 3:45 PM

Harbormaster completed remote builds in B170114: Diff 437347.Jun 15 2022, 4:40 PM

nemanjai added inline comments.Jun 15 2022, 4:56 PM

llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp
496	It would be difficult to write a test case that ensures all registers are allocated and we need to rely on this in order to guarantee that the register scavenger is able to spill a register to be able to scavenge one.
1589	This code is the only place we convert D-Form instructions to X-Form instructions post-RA. Expanding them here makes sense because: We create them here to begin with Producing X-Forms is what this portion of the code already does We have the register scavenger here and a slot saved for the scavenger to ensure it can always find a free GPR. If we do this expansion somewhere later, there is no guarantee that the scavenger we acquire will be able to scavenge a register.

LGTM as nemanja has detailed explanation for my concern.

LGTM. Thanks for the fix and all the updates.

This revision is now accepted and ready to land.Jun 16 2022, 7:56 AM

This revision was landed with ongoing or failed builds.Jun 16 2022, 8:47 AM

Closed by commit rG3d259a82da3e: [PowerPC] Fix LQ-STQ instructions to use correct offset and base (authored by saghir). · Explain Why

This revision was automatically updated to reflect the committed changes.

saghir added a commit: rG3d259a82da3e: [PowerPC] Fix LQ-STQ instructions to use correct offset and base.

Diff 437347

llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp

Show First 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	PPCRegisterInfo::PPCRegisterInfo(const PPCTargetMachine &TM)
ImmToIdxMap[PPC::LWA_32] = PPC::LWAX_32;		ImmToIdxMap[PPC::LWA_32] = PPC::LWAX_32;

// 64-bit		// 64-bit
ImmToIdxMap[PPC::LHA8] = PPC::LHAX8; ImmToIdxMap[PPC::LBZ8] = PPC::LBZX8;		ImmToIdxMap[PPC::LHA8] = PPC::LHAX8; ImmToIdxMap[PPC::LBZ8] = PPC::LBZX8;
ImmToIdxMap[PPC::LHZ8] = PPC::LHZX8; ImmToIdxMap[PPC::LWZ8] = PPC::LWZX8;		ImmToIdxMap[PPC::LHZ8] = PPC::LHZX8; ImmToIdxMap[PPC::LWZ8] = PPC::LWZX8;
ImmToIdxMap[PPC::STB8] = PPC::STBX8; ImmToIdxMap[PPC::STH8] = PPC::STHX8;		ImmToIdxMap[PPC::STB8] = PPC::STBX8; ImmToIdxMap[PPC::STH8] = PPC::STHX8;
ImmToIdxMap[PPC::STW8] = PPC::STWX8; ImmToIdxMap[PPC::STDU] = PPC::STDUX;		ImmToIdxMap[PPC::STW8] = PPC::STWX8; ImmToIdxMap[PPC::STDU] = PPC::STDUX;
ImmToIdxMap[PPC::ADDI8] = PPC::ADD8;		ImmToIdxMap[PPC::ADDI8] = PPC::ADD8;
		ImmToIdxMap[PPC::LQ] = PPC::LQX_PSEUDO;
		ImmToIdxMap[PPC::STQ] = PPC::STQX_PSEUDO;

// VSX		// VSX
ImmToIdxMap[PPC::DFLOADf32] = PPC::LXSSPX;		ImmToIdxMap[PPC::DFLOADf32] = PPC::LXSSPX;
ImmToIdxMap[PPC::DFLOADf64] = PPC::LXSDX;		ImmToIdxMap[PPC::DFLOADf64] = PPC::LXSDX;
ImmToIdxMap[PPC::SPILLTOVSR_LD] = PPC::SPILLTOVSR_LDX;		ImmToIdxMap[PPC::SPILLTOVSR_LD] = PPC::SPILLTOVSR_LDX;
ImmToIdxMap[PPC::SPILLTOVSR_ST] = PPC::SPILLTOVSR_STX;		ImmToIdxMap[PPC::SPILLTOVSR_ST] = PPC::SPILLTOVSR_STX;
ImmToIdxMap[PPC::DFSTOREf32] = PPC::STXSSPX;		ImmToIdxMap[PPC::DFSTOREf32] = PPC::STXSSPX;
ImmToIdxMap[PPC::DFSTOREf64] = PPC::STXSDX;		ImmToIdxMap[PPC::DFSTOREf64] = PPC::STXSDX;
▲ Show 20 Lines • Show All 359 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i < Info.size(); i++) {
// 2) A not fixed object but in that case we now know that the min required		// 2) A not fixed object but in that case we now know that the min required
// alignment is no more than 1 based on the previous check.		// alignment is no more than 1 based on the previous check.
if (InstrInfo->isXFormMemOp(Opcode)) {		if (InstrInfo->isXFormMemOp(Opcode)) {
LLVM_DEBUG(dbgs() << "Memory Operand: " << InstrInfo->getName(Opcode)		LLVM_DEBUG(dbgs() << "Memory Operand: " << InstrInfo->getName(Opcode)
<< " for register " << printReg(Reg, this) << ".\n");		<< " for register " << printReg(Reg, this) << ".\n");
LLVM_DEBUG(dbgs() << "TRUE - Memory operand is X-Form.\n");		LLVM_DEBUG(dbgs() << "TRUE - Memory operand is X-Form.\n");
return true;		return true;
}		}

		// This is a spill/restore of a quadword.
		if ((Opcode == PPC::RESTORE_QUADWORD) \|\| (Opcode == PPC::SPILL_QUADWORD)) {
		lkailUnsubmitted Not Done Reply Inline Actions This looks redundant. Test cases can pass without this change. lkail: This looks redundant. Test cases can pass without this change.
		nemanjaiUnsubmitted Not Done Reply Inline Actions It would be difficult to write a test case that ensures all registers are allocated and we need to rely on this in order to guarantee that the register scavenger is able to spill a register to be able to scavenge one. nemanjai: It would be difficult to write a test case that ensures all registers are allocated and we need…
		LLVM_DEBUG(dbgs() << "Memory Operand: " << InstrInfo->getName(Opcode)
		<< " for register " << printReg(Reg, this) << ".\n");
		LLVM_DEBUG(dbgs() << "TRUE - Memory operand is a quadword.\n");
		return true;
		}
}		}
LLVM_DEBUG(dbgs() << "FALSE - Scavenging is not required.\n");		LLVM_DEBUG(dbgs() << "FALSE - Scavenging is not required.\n");
return false;		return false;
}		}

bool PPCRegisterInfo::requiresVirtualBaseRegisters(		bool PPCRegisterInfo::requiresVirtualBaseRegisters(
const MachineFunction &MF) const {		const MachineFunction &MF) const {
const PPCSubtarget &Subtarget = MF.getSubtarget<PPCSubtarget>();		const PPCSubtarget &Subtarget = MF.getSubtarget<PPCSubtarget>();
▲ Show 20 Lines • Show All 1,028 Lines • ▼ Show 20 Lines	PPCRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,
// offset in.		// offset in.

bool is64Bit = TM.isPPC64();		bool is64Bit = TM.isPPC64();
const TargetRegisterClass *G8RC = &PPC::G8RCRegClass;		const TargetRegisterClass *G8RC = &PPC::G8RCRegClass;
const TargetRegisterClass *GPRC = &PPC::GPRCRegClass;		const TargetRegisterClass *GPRC = &PPC::GPRCRegClass;
const TargetRegisterClass *RC = is64Bit ? G8RC : GPRC;		const TargetRegisterClass *RC = is64Bit ? G8RC : GPRC;
Register SRegHi = MF.getRegInfo().createVirtualRegister(RC),		Register SRegHi = MF.getRegInfo().createVirtualRegister(RC),
SReg = MF.getRegInfo().createVirtualRegister(RC);		SReg = MF.getRegInfo().createVirtualRegister(RC);
		unsigned NewOpcode = 0u;

// Insert a set of rA with the full offset value before the ld, st, or add		// Insert a set of rA with the full offset value before the ld, st, or add
if (isInt<16>(Offset))		if (isInt<16>(Offset))
BuildMI(MBB, II, dl, TII.get(is64Bit ? PPC::LI8 : PPC::LI), SReg)		BuildMI(MBB, II, dl, TII.get(is64Bit ? PPC::LI8 : PPC::LI), SReg)
.addImm(Offset);		.addImm(Offset);
else if (isInt<32>(Offset)) {		else if (isInt<32>(Offset)) {
BuildMI(MBB, II, dl, TII.get(is64Bit ? PPC::LIS8 : PPC::LIS), SRegHi)		BuildMI(MBB, II, dl, TII.get(is64Bit ? PPC::LIS8 : PPC::LIS), SRegHi)
.addImm(Offset >> 16);		.addImm(Offset >> 16);
Show All 12 Lines	PPCRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,
unsigned OperandBase;		unsigned OperandBase;

if (noImmForm)		if (noImmForm)
OperandBase = 1;		OperandBase = 1;
else if (OpC != TargetOpcode::INLINEASM &&		else if (OpC != TargetOpcode::INLINEASM &&
OpC != TargetOpcode::INLINEASM_BR) {		OpC != TargetOpcode::INLINEASM_BR) {
assert(ImmToIdxMap.count(OpC) &&		assert(ImmToIdxMap.count(OpC) &&
"No indexed form of load or store available!");		"No indexed form of load or store available!");
unsigned NewOpcode = ImmToIdxMap.find(OpC)->second;		NewOpcode = ImmToIdxMap.find(OpC)->second;
MI.setDesc(TII.get(NewOpcode));		MI.setDesc(TII.get(NewOpcode));
OperandBase = 1;		OperandBase = 1;
} else {		} else {
OperandBase = OffsetOperandNo;		OperandBase = OffsetOperandNo;
}		}

Register StackReg = MI.getOperand(FIOperandNum).getReg();		Register StackReg = MI.getOperand(FIOperandNum).getReg();
MI.getOperand(OperandBase).ChangeToRegister(StackReg, false);		MI.getOperand(OperandBase).ChangeToRegister(StackReg, false);
MI.getOperand(OperandBase + 1).ChangeToRegister(SReg, false, false, true);		MI.getOperand(OperandBase + 1).ChangeToRegister(SReg, false, false, true);

		// Since these are not real X-Form instructions, we must
		// add the registers and access 0(NewReg) rather than
		// emitting the X-Form pseudo.
		if (NewOpcode == PPC::LQX_PSEUDO \|\| NewOpcode == PPC::STQX_PSEUDO) {
		lkailUnsubmitted Not Done Reply Inline Actions Expanding two `PPCCustomInserterPseudo` instructions here looks odd. I would expect these two instructions expanded after `prologepilog`, in `postrapseudos`. This is making us expand these two instructions twice in backend code. It's more adequate to make them `PPCPostRAExpPseudo` rather than `PPCCustomInserterPseudo`. lkail: Expanding two `PPCCustomInserterPseudo` instructions here looks odd. I would expect these two…
		nemanjaiUnsubmitted Not Done Reply Inline Actions This code is the only place we convert D-Form instructions to X-Form instructions post-RA. Expanding them here makes sense because: We create them here to begin with Producing X-Forms is what this portion of the code already does We have the register scavenger here and a slot saved for the scavenger to ensure it can always find a free GPR. If we do this expansion somewhere later, there is no guarantee that the scavenger we acquire will be able to scavenge a register. nemanjai: This code is the only place we convert D-Form instructions to X-Form instructions post-RA.
		assert(is64Bit && "Quadword loads/stores only supported in 64-bit mode");
		Register NewReg = MF.getRegInfo().createVirtualRegister(&PPC::G8RCRegClass);
		BuildMI(MBB, II, dl, TII.get(PPC::ADD8), NewReg)
		.addReg(SReg, RegState::Kill)
		.addReg(StackReg);
		MI.setDesc(TII.get(NewOpcode == PPC::LQX_PSEUDO ? PPC::LQ : PPC::STQ));
		MI.getOperand(OperandBase + 1).ChangeToRegister(NewReg, false);
		MI.getOperand(OperandBase).ChangeToImmediate(0);
		}
}		}

Register PPCRegisterInfo::getFrameRegister(const MachineFunction &MF) const {		Register PPCRegisterInfo::getFrameRegister(const MachineFunction &MF) const {
const PPCFrameLowering *TFI = getFrameLowering(MF);		const PPCFrameLowering *TFI = getFrameLowering(MF);

if (!TM.isPPC64())		if (!TM.isPPC64())
return TFI->hasFP(MF) ? PPC::R31 : PPC::R1;		return TFI->hasFP(MF) ? PPC::R31 : PPC::R1;
else		else
▲ Show 20 Lines • Show All 145 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/LQ-STQ-32bit-offset.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mcpu=pwr10 -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu \
				nemanjaiUnsubmitted Not Done Reply Inline Actions What is this test case actually testing? It produces the same code with and without this patch. nemanjai: What is this test case actually testing? It produces the same code with and without this patch.
				nemanjaiUnsubmitted Not Done Reply Inline Actions Sorry, this isn't actually true. However with this patch, the frame index isn't actually removed during frame index elimination. So that doesn't test the intent of this test case. nemanjai: Sorry, this isn't actually true. However with this patch, the frame index isn't actually…
				; RUN: -mattr=+quadword-atomics -ppc-asm-full-reg-names -o - %s \| FileCheck %s

				%struct.StructA = type { [16 x i8] }

				@s1 = dso_local global i128 324929342, align 16

				; Function Attrs: mustprogress noinline nounwind optnone uwtable
				define dso_local void @STQ() #0 {
				; CHECK-LABEL: STQ:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: lis r0, -16
				; CHECK-NEXT: ori r0, r0, 51488
				; CHECK-NEXT: stdux r1, r1, r0
				; CHECK-NEXT: .cfi_def_cfa_offset 997088
				; CHECK-NEXT: pld r3, s1@PCREL+8(0), 1
				; CHECK-NEXT: std r3, 40(r1)
				; CHECK-NEXT: pld r3, s1@PCREL(0), 1
				nemanjaiUnsubmitted Not Done Reply Inline Actions No regex please. Lets produce actual instructions that show the registers as well as offsets. nemanjai: No regex please. Lets produce actual instructions that show the registers as well as offsets.
				; CHECK-NEXT: std r3, 32(r1)
				; CHECK-NEXT: ld r3, 40(r1)
				; CHECK-NEXT: ld r4, 32(r1)
				; CHECK-NEXT: sync
				; CHECK-NEXT: mr r5, r4
				; CHECK-NEXT: mr r4, r3
				; CHECK-NEXT: lis r3, 15
				; CHECK-NEXT: ori r3, r3, 14032
				; CHECK-NEXT: add r3, r3, r1
				saghirAuthorUnsubmitted Done Reply Inline Actions This does not look right, I am looking into it. saghir: This does not look right, I am looking into it.
				saghirAuthorUnsubmitted Done Reply Inline Actions Updated test case and now it looks good. saghir: Updated test case and now it looks good.
				; CHECK-NEXT: stq r4, 0(r3)
				; CHECK-NEXT: ld r1, 0(r1)
				; CHECK-NEXT: blr
				entry:
				%s2 = alloca %struct.StructA, align 16
				%s3 = alloca %struct.StructA, align 16
				%arr = alloca [997003 x i8], align 1
				nemanjaiUnsubmitted Not Done Reply Inline Actions Please get rid of all of this and just compile with `-mcpu=pwr10`. nemanjai: Please get rid of all of this and just compile with `-mcpu=pwr10`.
				%tmp = alloca %struct.StructA, align 16
				call void @llvm.memcpy.p0.p0.i64(ptr align 16 %tmp, ptr align 16 @s1, i64 16, i1 false)
				%0 = load i128, ptr %tmp, align 16
				store atomic i128 %0, ptr %s2 seq_cst, align 16
				ret void
				}

				define dso_local void @LQ() #0 {
				; CHECK-LABEL: LQ:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: lis r0, -16
				; CHECK-NEXT: ori r0, r0, 51488
				; CHECK-NEXT: stdux r1, r1, r0
				; CHECK-NEXT: .cfi_def_cfa_offset 997088
				; CHECK-NEXT: pld r3, s1@PCREL+8(0), 1
				; CHECK-NEXT: std r3, 40(r1)
				; CHECK-NEXT: pld r3, s1@PCREL(0), 1
				; CHECK-NEXT: std r3, 32(r1)
				; CHECK-NEXT: sync
				; CHECK-NEXT: lis r3, 15
				; CHECK-NEXT: ori r3, r3, 14016
				; CHECK-NEXT: add r3, r3, r1
				; CHECK-NEXT: lq r4, 0(r3)
				; CHECK-NEXT: cmpd cr7, r5, r5
				; CHECK-NEXT: bne- cr7, .+4
				; CHECK-NEXT: isync
				; CHECK-NEXT: ld r1, 0(r1)
				; CHECK-NEXT: blr
				entry:
				%s2 = alloca %struct.StructA, align 16
				%s3 = alloca %struct.StructA, align 16
				%arr = alloca [997003 x i8], align 1
				%tmp = alloca %struct.StructA, align 16
				call void @llvm.memcpy.p0.p0.i64(ptr align 16 %tmp, ptr align 16 @s1, i64 16, i1 false)
				%0 = load i128, ptr %tmp, align 16
				%1 = load atomic i128, ptr %s3 seq_cst, align 16
				ret void
				}

				; Function Attrs: argmemonly nofree nounwind willreturn
				declare void @llvm.memcpy.p0.p0.i64(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i64, i1 immarg) #1

				attributes #0 = { noinline optnone }

llvm/test/CodeGen/PowerPC/LQ-STQ.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mcpu=pwr10 -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu \
				; RUN: -mattr=+quadword-atomics -ppc-asm-full-reg-names -o - %s \| FileCheck %s

				%struct.StructA = type { [16 x i8] }

				@s1 = dso_local global %struct.StructA { [16 x i8] c"\0B\0C\0D\0E\0F\10\11\12\13\14\15\16\17\18\19\1A" }, align 16

				define dso_local void @test() {
				; CHECK-LABEL: test:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: plxv vs0, s1@PCREL(0), 1
				; CHECK-NEXT: stxv vs0, -48(r1)
				nemanjaiUnsubmitted Not Done Reply Inline Actions Same as above. nemanjai: Same as above.
				; CHECK-NEXT: ld r3, -40(r1)
				; CHECK-NEXT: ld r4, -48(r1)
				; CHECK-NEXT: sync
				; CHECK-NEXT: mr r5, r4
				; CHECK-NEXT: mr r4, r3
				; CHECK-NEXT: stq r4, -16(r1)
				; CHECK-NEXT: sync
				; CHECK-NEXT: lq r4, -16(r1)
				; CHECK-NEXT: cmpd cr7, r5, r5
				; CHECK-NEXT: bne- cr7, .+4
				; CHECK-NEXT: isync
				; CHECK-NEXT: std r4, -24(r1)
				; CHECK-NEXT: std r5, -32(r1)
				; CHECK-NEXT: blr
				entry:
				%s2 = alloca %struct.StructA, align 16
				%s3 = alloca %struct.StructA, align 16
				%agg.tmp.ensured = alloca %struct.StructA, align 16
				call void @llvm.memcpy.p0.p0.i64(ptr align 16 %agg.tmp.ensured, ptr align 16 @s1, i64 16, i1 false)
				%0 = load i128, ptr %agg.tmp.ensured, align 16
				store atomic i128 %0, ptr %s2 seq_cst, align 16
				%atomic-load = load atomic i128, ptr %s2 seq_cst, align 16
				store i128 %atomic-load, ptr %s3, align 16
				ret void
				}

				declare void @llvm.memcpy.p0.p0.i64(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i64, i1 immarg)

This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Fix LQ-STQ instructions to use correct offset and base
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 437347

llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp

llvm/test/CodeGen/PowerPC/LQ-STQ-32bit-offset.ll

llvm/test/CodeGen/PowerPC/LQ-STQ.ll

This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Fix LQ-STQ instructions to use correct offset and baseClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 437347

llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp

llvm/test/CodeGen/PowerPC/LQ-STQ-32bit-offset.ll

llvm/test/CodeGen/PowerPC/LQ-STQ.ll

[PowerPC] Fix LQ-STQ instructions to use correct offset and base
ClosedPublic