Download Raw Diff

Details

Reviewers

arsenm
paquette
qcolombet
aditya_nandakumar
dsanders

Commits

rGdaf6e66ac5d2: [GlobalISel] Add legalization support for non-power-2 loads and stores
rL358613: [GlobalISel] Add legalization support for non-power-2 loads and stores

Summary

Legalize things like i24 load/store by splitting them into smaller power of 2 operations.

This change also adds an artifact combiner for G_INSERT -> G_EXTRACT where we can just forward the inserted value straight to the user of the extract. To do this I had to add G_INSERT to the list of artifact opcodes, and then to fix test failures where the legalization order meant that we need to try to combine them away before they're legalized, so that we don't have to look through G_TRUNC ops in between.

Diff Detail

Repository: rL LLVM

Event Timeline

aemerson created this revision.Mar 28 2019, 4:58 PM

Herald added subscribers: Petar.Avramovic, volkan, hiraditya and 6 others. · View Herald TranscriptMar 28 2019, 4:58 PM

@arsenm Matt there are changes to the AMDGPU tests, but I'm not sure if they're ok or not. The tests run with -global-isel-abort=0 and the change of adding G_INSERT to the artifacts list means that it doesn't get legalized the same way (as an artifact, it doesn't get pushed onto the legalization list so it's deferred until later). Does it look benign?

In D59971#1447031, @aemerson wrote:

@arsenm Matt there are changes to the AMDGPU tests, but I'm not sure if they're ok or not. The tests run with -global-isel-abort=0 and the change of adding G_INSERT to the artifacts list means that it doesn't get legalized the same way (as an artifact, it doesn't get pushed onto the legalization list so it's deferred until later). Does it look benign?

The -global-isel-abort=0 is a workaround for the broken artifact handling. I didn't have any particular expectations for these other than not crashing

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1379–1389 ↗	(On Diff #192742)	I think this breaks the alignment. You should use MF.getMachineMemOperand(MMO, Size, Offset)

arsenm added inline comments.Mar 28 2019, 8:03 PM

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1379–1389 ↗	(On Diff #192742)	I got the order wrong, it's: MachineMemOperand getMachineMemOperand(const MachineMemOperand MMO, int64_t Offset, uint64_t Size);

Fix alignment for smaller MMO. I couldn't use your suggested getMachineMemOperand() method because it doesn't update the alignment it seems.

Actually I've realized now there's two problems with this approach:

This doesn't work when we try to legalize other non power of 2 operations. The artifacts for arithmetic ops for example will try to use G_ANYEXT to legalize their source, but if loads are producing the type with G_INSERTS, it's difficult to combine them away.
It's not correct when alignment is smaller than the largest sub-memop. E.g. if the i24 load had an alignment of 1. In this case, we have no choice but to split the op into the alignment sized components and use something like G_MERGE on the resulting values.

I'm instead going to work on first adding the legalization support for the pessimistic case of smaller alignment memops, and then deal with the common case.

In D59971#1447273, @aemerson wrote:

Fix alignment for smaller MMO. I couldn't use your suggested getMachineMemOperand() method because it doesn't update the alignment it seems.

Are you confusing the alignment and base alignment? I had this problem last time I touched this. getMachineMemOperand should do the right thing

In D59971#1448481, @arsenm wrote:

In D59971#1447273, @aemerson wrote:

Fix alignment for smaller MMO. I couldn't use your suggested getMachineMemOperand() method because it doesn't update the alignment it seems.

Are you confusing the alignment and base alignment? I had this problem last time I touched this. getMachineMemOperand should do the right thing

Ah probably. When I changed to using the getMachineMemOperand() function the test still passed, I was expecting the MMO printer to instead print "align 2". It seems it prints the base pointer alignment, not the alignment of the access.

In D59971#1448481, @arsenm wrote:

In D59971#1447273, @aemerson wrote:

Fix alignment for smaller MMO. I couldn't use your suggested getMachineMemOperand() method because it doesn't update the alignment it seems.

Are you confusing the alignment and base alignment? I had this problem last time I touched this. getMachineMemOperand should do the right thing

Reviving this as the overall approach was fine, it seems the alignment of non pow2 types is assumed to be the alignment of the next largest pow-2 type, so we don't need to worry about alignment during the breakdown.

I did however change the legalization method to not use extracts/inserts, but instead use extending loads and truncating stores, so that the artifacts get combined away and it Just Works.

New and improved patch.

In D59971#1467546, @aemerson wrote:

Reviving this as the overall approach was fine, it seems the alignment of non pow2 types is assumed to be the alignment of the next largest pow-2 type, so we don't need to worry about alignment during the breakdown.

I did however change the legalization method to not use extracts/inserts, but instead use extending loads and truncating stores, so that the artifacts get combined away and it Just Works.

I don't like how we do everything in bits, and then the mem operand forces bytes. Would it cost anything to switch MemOperands to also be in bits?

arsenm added inline comments.Apr 16 2019, 12:36 AM

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1379–1389 ↗	(On Diff #192742)	This is still using the new function?

In D59971#1467947, @arsenm wrote:

In D59971#1467546, @aemerson wrote:

Reviving this as the overall approach was fine, it seems the alignment of non pow2 types is assumed to be the alignment of the next largest pow-2 type, so we don't need to worry about alignment during the breakdown.

I did however change the legalization method to not use extracts/inserts, but instead use extending loads and truncating stores, so that the artifacts get combined away and it Just Works.

I don't like how we do everything in bits, and then the mem operand forces bytes. Would it cost anything to switch MemOperands to also be in bits?

Yes I think we could introduce a getSizeInBits() accessor to make it clearer, but I think it should be a separate patch from this.

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1379–1389 ↗	(On Diff #192742)	It should do, I forgot to make that change in the latest patch.

Use MF.getMachineMemOperand()

In D59971#1467947, @arsenm wrote:

In D59971#1467546, @aemerson wrote:

Reviving this as the overall approach was fine, it seems the alignment of non pow2 types is assumed to be the alignment of the next largest pow-2 type, so we don't need to worry about alignment during the breakdown.

I did however change the legalization method to not use extracts/inserts, but instead use extending loads and truncating stores, so that the artifacts get combined away and it Just Works.

I don't like how we do everything in bits, and then the mem operand forces bytes. Would it cost anything to switch MemOperands to also be in bits?

I just had at doing this and some places still work in terms of bytes, not bits. I don't think it's worth it to change the internal representation, the getSizeInBits() should help anyway.

arsenm added inline comments.Apr 17 2019, 2:41 AM

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1352–1357 ↗	(On Diff #195412)	Why can't uses use getMachineMemOperand for both halves from the original?

Yep, in that case might as well remove the entire helper.

LGTM

This revision is now accepted and ready to land.Apr 17 2019, 12:04 PM

Closed by commit rL358613: [GlobalISel] Add legalization support for non-power-2 loads and stores (authored by aemerson). · Explain WhyApr 17 2019, 2:28 PM

This revision was automatically updated to reflect the committed changes.

Diff 195628

llvm/trunk/include/llvm/CodeGen/GlobalISel/LegalizerInfo.h

Show First 20 Lines • Show All 633 Lines • ▼ Show 20 Lines	public:
}		}
LegalizeRuleSet &unsupportedIf(LegalityPredicate Predicate) {		LegalizeRuleSet &unsupportedIf(LegalityPredicate Predicate) {
return actionIf(LegalizeAction::Unsupported, Predicate);		return actionIf(LegalizeAction::Unsupported, Predicate);
}		}
LegalizeRuleSet &unsupportedIfMemSizeNotPow2() {		LegalizeRuleSet &unsupportedIfMemSizeNotPow2() {
return actionIf(LegalizeAction::Unsupported,		return actionIf(LegalizeAction::Unsupported,
LegalityPredicates::memSizeInBytesNotPow2(0));		LegalityPredicates::memSizeInBytesNotPow2(0));
}		}
		LegalizeRuleSet &lowerIfMemSizeNotPow2() {
		return actionIf(LegalizeAction::Lower,
		LegalityPredicates::memSizeInBytesNotPow2(0));
		}

LegalizeRuleSet &customIf(LegalityPredicate Predicate) {		LegalizeRuleSet &customIf(LegalityPredicate Predicate) {
// We have no choice but conservatively assume that a custom action with a		// We have no choice but conservatively assume that a custom action with a
// free-form user provided Predicate properly handles all type indices:		// free-form user provided Predicate properly handles all type indices:
markAllTypeIdxsAsCovered();		markAllTypeIdxsAsCovered();
return actionIf(LegalizeAction::Custom, Predicate);		return actionIf(LegalizeAction::Custom, Predicate);
}		}
LegalizeRuleSet &customFor(std::initializer_list<LLT> Types) {		LegalizeRuleSet &customFor(std::initializer_list<LLT> Types) {
▲ Show 20 Lines • Show All 656 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

Show First 20 Lines • Show All 1,478 Lines • ▼ Show 20 Lines	LegalizerHelper::lower(MachineInstr &MI, unsigned TypeIdx, LLT Ty) {
case TargetOpcode::G_ZEXTLOAD: {		case TargetOpcode::G_ZEXTLOAD: {
// Lower to a memory-width G_LOAD and a G_SEXT/G_ZEXT/G_ANYEXT		// Lower to a memory-width G_LOAD and a G_SEXT/G_ZEXT/G_ANYEXT
unsigned DstReg = MI.getOperand(0).getReg();		unsigned DstReg = MI.getOperand(0).getReg();
unsigned PtrReg = MI.getOperand(1).getReg();		unsigned PtrReg = MI.getOperand(1).getReg();
LLT DstTy = MRI.getType(DstReg);		LLT DstTy = MRI.getType(DstReg);
auto &MMO = **MI.memoperands_begin();		auto &MMO = **MI.memoperands_begin();

if (DstTy.getSizeInBits() == MMO.getSize() /* in bytes / 8) {		if (DstTy.getSizeInBits() == MMO.getSize() /* in bytes / 8) {
// In the case of G_LOAD, this was a non-extending load already and we're		if (MI.getOpcode() == TargetOpcode::G_LOAD) {
// about to lower to the same instruction.		// This load needs splitting into power of 2 sized loads.
if (MI.getOpcode() == TargetOpcode::G_LOAD)		if (DstTy.isVector())
return UnableToLegalize;		return UnableToLegalize;
		if (isPowerOf2_32(DstTy.getSizeInBits()))
		return UnableToLegalize; // Don't know what we're being asked to do.

		// Our strategy here is to generate anyextending loads for the smaller
		// types up to next power-2 result type, and then combine the two larger
		// result values together, before truncating back down to the non-pow-2
		// type.
		// E.g. v1 = i24 load =>
		// v2 = i32 load (2 byte)
		// v3 = i32 load (1 byte)
		// v4 = i32 shl v2, 16
		// v5 = i32 or v4, v3
		// v1 = i24 trunc v5
		// By doing this we generate the correct truncate which should get
		// combined away as an artifact with a matching extend.
		uint64_t LargeSplitSize = PowerOf2Floor(DstTy.getSizeInBits());
		uint64_t SmallSplitSize = DstTy.getSizeInBits() - LargeSplitSize;

		MachineFunction &MF = MIRBuilder.getMF();
		MachineMemOperand *LargeMMO =
		MF.getMachineMemOperand(&MMO, 0, LargeSplitSize / 8);
		MachineMemOperand *SmallMMO = MF.getMachineMemOperand(
		&MMO, LargeSplitSize / 8, SmallSplitSize / 8);

		LLT PtrTy = MRI.getType(PtrReg);
		unsigned AnyExtSize = NextPowerOf2(DstTy.getSizeInBits());
		LLT AnyExtTy = LLT::scalar(AnyExtSize);
		unsigned LargeLdReg = MRI.createGenericVirtualRegister(AnyExtTy);
		unsigned SmallLdReg = MRI.createGenericVirtualRegister(AnyExtTy);
		auto LargeLoad =
		MIRBuilder.buildLoad(LargeLdReg, PtrReg, *LargeMMO);

		auto OffsetCst =
		MIRBuilder.buildConstant(LLT::scalar(64), LargeSplitSize / 8);
		unsigned GEPReg = MRI.createGenericVirtualRegister(PtrTy);
		auto SmallPtr = MIRBuilder.buildGEP(GEPReg, PtrReg, OffsetCst.getReg(0));
		auto SmallLoad = MIRBuilder.buildLoad(SmallLdReg, SmallPtr.getReg(0),
		*SmallMMO);

		auto ShiftAmt = MIRBuilder.buildConstant(AnyExtTy, LargeSplitSize);
		auto Shift = MIRBuilder.buildShl(AnyExtTy, LargeLoad, ShiftAmt);
		auto Or = MIRBuilder.buildOr(AnyExtTy, Shift, SmallLoad);
		MIRBuilder.buildTrunc(DstReg, {Or.getReg(0)});
		MI.eraseFromParent();
		return Legalized;
		}
MIRBuilder.buildLoad(DstReg, PtrReg, MMO);		MIRBuilder.buildLoad(DstReg, PtrReg, MMO);
MI.eraseFromParent();		MI.eraseFromParent();
return Legalized;		return Legalized;
}		}

if (DstTy.isScalar()) {		if (DstTy.isScalar()) {
unsigned TmpReg = MRI.createGenericVirtualRegister(		unsigned TmpReg = MRI.createGenericVirtualRegister(
LLT::scalar(MMO.getSize() /* in bytes / 8));		LLT::scalar(MMO.getSize() /* in bytes / 8));
Show All 12 Lines	if (DstTy.isScalar()) {
break;		break;
}		}
MI.eraseFromParent();		MI.eraseFromParent();
return Legalized;		return Legalized;
}		}

return UnableToLegalize;		return UnableToLegalize;
}		}
		case TargetOpcode::G_STORE: {
		// Lower a non-power of 2 store into multiple pow-2 stores.
		// E.g. split an i24 store into an i16 store + i8 store.
		// We do this by first extending the stored value to the next largest power
		// of 2 type, and then using truncating stores to store the components.
		// By doing this, likewise with G_LOAD, generate an extend that can be
		// artifact-combined away instead of leaving behind extracts.
		unsigned SrcReg = MI.getOperand(0).getReg();
		unsigned PtrReg = MI.getOperand(1).getReg();
		LLT SrcTy = MRI.getType(SrcReg);
		MachineMemOperand &MMO = **MI.memoperands_begin();
		if (SrcTy.getSizeInBits() != MMO.getSize() /* in bytes / 8)
		return UnableToLegalize;
		if (SrcTy.isVector())
		return UnableToLegalize;
		if (isPowerOf2_32(SrcTy.getSizeInBits()))
		return UnableToLegalize; // Don't know what we're being asked to do.

		// Extend to the next pow-2.
		const LLT ExtendTy = LLT::scalar(NextPowerOf2(SrcTy.getSizeInBits()));
		auto ExtVal = MIRBuilder.buildAnyExt(ExtendTy, SrcReg);

		// Obtain the smaller value by shifting away the larger value.
		uint64_t LargeSplitSize = PowerOf2Floor(SrcTy.getSizeInBits());
		uint64_t SmallSplitSize = SrcTy.getSizeInBits() - LargeSplitSize;
		auto ShiftAmt = MIRBuilder.buildConstant(ExtendTy, LargeSplitSize);
		auto SmallVal = MIRBuilder.buildLShr(ExtendTy, ExtVal, ShiftAmt);

		// Generate the GEP and truncating stores.
		LLT PtrTy = MRI.getType(PtrReg);
		auto OffsetCst =
		MIRBuilder.buildConstant(LLT::scalar(64), LargeSplitSize / 8);
		unsigned GEPReg = MRI.createGenericVirtualRegister(PtrTy);
		auto SmallPtr = MIRBuilder.buildGEP(GEPReg, PtrReg, OffsetCst.getReg(0));

		MachineFunction &MF = MIRBuilder.getMF();
		MachineMemOperand *LargeMMO =
		MF.getMachineMemOperand(&MMO, 0, LargeSplitSize / 8);
		MachineMemOperand *SmallMMO =
		MF.getMachineMemOperand(&MMO, LargeSplitSize / 8, SmallSplitSize / 8);
		MIRBuilder.buildStore(ExtVal.getReg(0), PtrReg, *LargeMMO);
		MIRBuilder.buildStore(SmallVal.getReg(0), SmallPtr.getReg(0), *SmallMMO);
		MI.eraseFromParent();
		return Legalized;
		}
case TargetOpcode::G_CTLZ_ZERO_UNDEF:		case TargetOpcode::G_CTLZ_ZERO_UNDEF:
case TargetOpcode::G_CTTZ_ZERO_UNDEF:		case TargetOpcode::G_CTTZ_ZERO_UNDEF:
case TargetOpcode::G_CTLZ:		case TargetOpcode::G_CTLZ:
case TargetOpcode::G_CTTZ:		case TargetOpcode::G_CTTZ:
case TargetOpcode::G_CTPOP:		case TargetOpcode::G_CTPOP:
return lowerBitCount(MI, TypeIdx, Ty);		return lowerBitCount(MI, TypeIdx, Ty);
case G_UADDO: {		case G_UADDO: {
unsigned Res = MI.getOperand(0).getReg();		unsigned Res = MI.getOperand(0).getReg();
▲ Show 20 Lines • Show All 1,432 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/AArch64LegalizerInfo.cpp

Show First 20 Lines • Show All 228 Lines • ▼ Show 20 Lines	getActionDefinitionsBuilder(G_LOAD)
{v8s16, p0, 128, 8},		{v8s16, p0, 128, 8},
{v2s32, p0, 64, 8},		{v2s32, p0, 64, 8},
{v4s32, p0, 128, 8},		{v4s32, p0, 128, 8},
{v2s64, p0, 128, 8}})		{v2s64, p0, 128, 8}})
// These extends are also legal		// These extends are also legal
.legalForTypesWithMemDesc({{s32, p0, 8, 8},		.legalForTypesWithMemDesc({{s32, p0, 8, 8},
{s32, p0, 16, 8}})		{s32, p0, 16, 8}})
.clampScalar(0, s8, s64)		.clampScalar(0, s8, s64)
.widenScalarToNextPow2(0)		.lowerIfMemSizeNotPow2()
// TODO: We could support sum-of-pow2's but the lowering code doesn't know
// how to do that yet.
.unsupportedIfMemSizeNotPow2()
// Lower any any-extending loads left into G_ANYEXT and G_LOAD		// Lower any any-extending loads left into G_ANYEXT and G_LOAD
.lowerIf([=](const LegalityQuery &Query) {		.lowerIf([=](const LegalityQuery &Query) {
return Query.Types[0].getSizeInBits() != Query.MMODescrs[0].SizeInBits;		return Query.Types[0].getSizeInBits() != Query.MMODescrs[0].SizeInBits;
})		})
		.widenScalarToNextPow2(0)
.clampMaxNumElements(0, s32, 2)		.clampMaxNumElements(0, s32, 2)
.clampMaxNumElements(0, s64, 1)		.clampMaxNumElements(0, s64, 1)
.customIf(IsPtrVecPred);		.customIf(IsPtrVecPred);

getActionDefinitionsBuilder(G_STORE)		getActionDefinitionsBuilder(G_STORE)
.legalForTypesWithMemDesc({{s8, p0, 8, 8},		.legalForTypesWithMemDesc({{s8, p0, 8, 8},
{s16, p0, 16, 8},		{s16, p0, 16, 8},
		{s32, p0, 8, 8},
		{s32, p0, 16, 8},
{s32, p0, 32, 8},		{s32, p0, 32, 8},
{s64, p0, 64, 8},		{s64, p0, 64, 8},
{p0, p0, 64, 8},		{p0, p0, 64, 8},
{v16s8, p0, 128, 8},		{v16s8, p0, 128, 8},
{v4s16, p0, 64, 8},		{v4s16, p0, 64, 8},
{v8s16, p0, 128, 8},		{v8s16, p0, 128, 8},
{v2s32, p0, 64, 8},		{v2s32, p0, 64, 8},
{v4s32, p0, 128, 8},		{v4s32, p0, 128, 8},
{v2s64, p0, 128, 8}})		{v2s64, p0, 128, 8}})
.clampScalar(0, s8, s64)		.clampScalar(0, s8, s64)
.widenScalarToNextPow2(0)		.lowerIfMemSizeNotPow2()
// TODO: We could support sum-of-pow2's but the lowering code doesn't know
// how to do that yet.
.unsupportedIfMemSizeNotPow2()
.lowerIf([=](const LegalityQuery &Query) {		.lowerIf([=](const LegalityQuery &Query) {
return Query.Types[0].isScalar() &&		return Query.Types[0].isScalar() &&
Query.Types[0].getSizeInBits() != Query.MMODescrs[0].SizeInBits;		Query.Types[0].getSizeInBits() != Query.MMODescrs[0].SizeInBits;
})		})
.clampMaxNumElements(0, s32, 2)		.clampMaxNumElements(0, s32, 2)
.clampMaxNumElements(0, s64, 1)		.clampMaxNumElements(0, s64, 1)
.customIf(IsPtrVecPred);		.customIf(IsPtrVecPred);

▲ Show 20 Lines • Show All 390 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	true:
store atomic i32 42, i32* %addr seq_cst, align 4		store atomic i32 42, i32* %addr seq_cst, align 4
br label %end		br label %end

false:		false:
br label %end		br label %end

}		}

; FALLBACK-WITH-REPORT-ERR: remark: <unknown>:0:0: unable to legalize instruction: %3:_(s32) = G_LOAD %1:_(p0) :: (load 3 from `i24* undef`, align 1) (in function: odd_type_load)
; FALLBACK-WITH-REPORT-ERR: warning: Instruction selection used fallback path for odd_type_load
; FALLBACK-WITH-REPORT-OUT-LABEL: odd_type_load
define i32 @odd_type_load() {
entry:
%ld = load i24, i24* undef, align 1
%cst = zext i24 %ld to i32
ret i32 %cst
}

; General legalizer inability to handle types whose size wasn't a power of 2.
; FALLBACK-WITH-REPORT-ERR: remark: <unknown>:0:0: unable to legalize instruction: G_STORE %1:_(s42), %0:_(p0) :: (store 6 into %ir.addr, align 8) (in function: odd_type)
; FALLBACK-WITH-REPORT-ERR: warning: Instruction selection used fallback path for odd_type
; FALLBACK-WITH-REPORT-OUT-LABEL: odd_type:
define void @odd_type(i42* %addr) {
%val42 = load i42, i42* %addr
store i42 %val42, i42* %addr
ret void
}

; FALLBACK-WITH-REPORT-ERR: remark: <unknown>:0:0: unable to legalize instruction: G_STORE %1:_(<7 x s32>), %0:_(p0) :: (store 28 into %ir.addr, align 32) (in function: odd_vector)		; FALLBACK-WITH-REPORT-ERR: remark: <unknown>:0:0: unable to legalize instruction: G_STORE %1:_(<7 x s32>), %0:_(p0) :: (store 28 into %ir.addr, align 32) (in function: odd_vector)
; FALLBACK-WITH-REPORT-ERR: warning: Instruction selection used fallback path for odd_vector		; FALLBACK-WITH-REPORT-ERR: warning: Instruction selection used fallback path for odd_vector
; FALLBACK-WITH-REPORT-OUT-LABEL: odd_vector:		; FALLBACK-WITH-REPORT-OUT-LABEL: odd_vector:
define void @odd_vector(<7 x i32>* %addr) {		define void @odd_vector(<7 x i32>* %addr) {
%vec = load <7 x i32>, <7 x i32>* %addr		%vec = load <7 x i32>, <7 x i32>* %addr
store <7 x i32> %vec, <7 x i32>* %addr		store <7 x i32> %vec, <7 x i32>* %addr
ret void		ret void
}		}
▲ Show 20 Lines • Show All 168 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/GlobalISel/legalize-non-pow2-load-store.mir

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -march=aarch64 -run-pass=legalizer %s -o - -verify-machineinstrs \| FileCheck %s
				--- \|
				target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64"

				define i32 @load_store_test(i24* %ptr, i24* %ptr2) {
				%val = load i24, i24* %ptr
				store i24 %val, i24* %ptr2
				ret i32 0
				}

				...
				---
				name: load_store_test
				alignment: 2
				tracksRegLiveness: true
				body: \|
				bb.1 (%ir-block.0):
				liveins: $x0, $x1

				; CHECK-LABEL: name: load_store_test
				; CHECK: liveins: $x0, $x1
				; CHECK: [[COPY:%[0-9]+]]:_(p0) = COPY $x0
				; CHECK: [[COPY1:%[0-9]+]]:_(p0) = COPY $x1
				; CHECK: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
				; CHECK: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[COPY]](p0) :: (load 2 from %ir.ptr, align 4)
				; CHECK: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 2
				; CHECK: [[GEP:%[0-9]+]]:_(p0) = G_GEP [[COPY]], [[C1]](s64)
				; CHECK: [[LOAD1:%[0-9]+]]:_(s32) = G_LOAD [[GEP]](p0) :: (load 1 from %ir.ptr + 2, align 4)
				; CHECK: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; CHECK: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[LOAD]], [[C2]](s32)
				; CHECK: [[OR:%[0-9]+]]:_(s32) = G_OR [[SHL]], [[LOAD1]]
				; CHECK: [[COPY2:%[0-9]+]]:_(s32) = COPY [[OR]](s32)
				; CHECK: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[COPY2]], [[C2]](s32)
				; CHECK: [[GEP1:%[0-9]+]]:_(p0) = G_GEP [[COPY1]], [[C1]](s64)
				; CHECK: G_STORE [[COPY2]](s32), [[COPY1]](p0) :: (store 2 into %ir.ptr2, align 4)
				; CHECK: G_STORE [[LSHR]](s32), [[GEP1]](p0) :: (store 1 into %ir.ptr2 + 2, align 4)
				; CHECK: $w0 = COPY [[C]](s32)
				; CHECK: RET_ReallyLR implicit $w0
				%0:_(p0) = COPY $x0
				%1:_(p0) = COPY $x1
				%3:_(s32) = G_CONSTANT i32 0
				%2:_(s24) = G_LOAD %0(p0) :: (load 3 from %ir.ptr, align 4)
				G_STORE %2(s24), %1(p0) :: (store 3 into %ir.ptr2, align 4)
				$w0 = COPY %3(s32)
				RET_ReallyLR implicit $w0

				...

This is an archive of the discontinued LLVM Phabricator instance.

[GlobalISel] Add legalization support for non-power-2 loads and stores
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 195628

llvm/trunk/include/llvm/CodeGen/GlobalISel/LegalizerInfo.h

llvm/trunk/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

llvm/trunk/lib/Target/AArch64/AArch64LegalizerInfo.cpp

llvm/trunk/test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll

llvm/trunk/test/CodeGen/AArch64/GlobalISel/legalize-non-pow2-load-store.mir

This is an archive of the discontinued LLVM Phabricator instance.

[GlobalISel] Add legalization support for non-power-2 loads and storesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 195628

llvm/trunk/include/llvm/CodeGen/GlobalISel/LegalizerInfo.h

llvm/trunk/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

llvm/trunk/lib/Target/AArch64/AArch64LegalizerInfo.cpp

llvm/trunk/test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll

llvm/trunk/test/CodeGen/AArch64/GlobalISel/legalize-non-pow2-load-store.mir

[GlobalISel] Add legalization support for non-power-2 loads and stores
ClosedPublic