This is an archive of the discontinued LLVM Phabricator instance.

[WIP][LegalizeTypes][LegalizeDAG] Use misaligned load/store to optimize memory access with non-power2 integer types.
AbandonedPublic

Authored by bcl5980 on Sep 7 2022, 2:33 AM.

Download Raw Diff

Details

Reviewers

efriedma
dmgreen
paulwalker-arm
RKSimon
craig.topper
spatel

Commits

rG5fa13212d1c9: [AArch64] add tests for non-power2 int types; NFC

Summary

When load integer types with non-power2 aligned size, we will split the load to some no overlap legal type loads.
For example, if we trying to legalize i56, it will be split to i32 + i16 + i8 three loads.
This change trying to use two i32 load with 8bits overlap to reduce the load number.

The motivation comes from ARM64EC (https://reviews.llvm.org/D125418#inline-1267564).
For now, we don't apply it to store because it will involve extra dependency in CPU load store queue.

Diff Detail

Event Timeline

bcl5980 created this revision.Sep 7 2022, 2:33 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 7 2022, 2:33 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

bcl5980 requested review of this revision.Sep 7 2022, 2:33 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 7 2022, 2:33 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

bcl5980 added a commit: rG5fa13212d1c9: [AArch64] add tests for non-power2 int types; NFC.Sep 7 2022, 2:37 AM

Harbormaster completed remote builds in B185375: Diff 458396.Sep 7 2022, 3:04 AM

fix test fail.

Harbormaster completed remote builds in B185383: Diff 458409.Sep 7 2022, 4:16 AM

efriedma added inline comments.Sep 7 2022, 10:42 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
801	The alignment you're passing in here doesn't seem right; you want to pass in the alignment of the load you're planning to generate, right?
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
2045 ↗	(On Diff #458409)	None of this logic looks like it's target-specific; can we just do this in target-independent code?

bcl5980 added inline comments.Sep 7 2022, 9:43 PM

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
801	Ah, yeah. I missed it. Thanks for the catch.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
2045 ↗	(On Diff #458409)	Yes. Of course, we can move this code into target-independent layer. But there are some regressions on other platform like x86, comes from load/store pair. For now, I haven't enabled the optimization for store. So, load pattern is different from store if we have a load/store pair. For example, define void @i56_or(ptr %a) { %aa = load i56, ptr %a, align 1 %b = or i56 %aa, 384 store i56 %b, ptr %a, align 1 ret void } Actually, this is a general issue but the gain from load can cover the extra instructions on ARM but x86 can't. So, I limited this to ARM. Maybe we can enable the optimization for store also to fix the issue. But I still have a little concern on it.

Move the code to independent layer.
Send the patch to review but for now x86 still has some regressions.

Herald added subscribers: StephenFan, pengfei. · View Herald TranscriptSep 8 2022, 2:01 AM

bcl5980 added a reviewer: spatel.Sep 8 2022, 2:02 AM

Harbormaster completed remote builds in B185583: Diff 458686.Sep 8 2022, 3:25 AM

Do we have test coverage for load+store patterns involving integers larger than 56 bits?

If the code quality issues are specific to 56 bits, it might make sense to make the initial patch skip handling 56-bit integers, and try to address them in a followup.

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
815	The alignment on this load isn't right. The "alignment" of a load SDNode isn't the alignment of the load itself; it's the alignment of the base of the MachinePointerInfo. So if you're using getWithOffset() to adjust the MachinePointerInfo, the alignment doesn't change based on the offset. (This distinction helps make alias analysis more accurate.)

Update load+store patterns with integers larger than 56 bits. Looks even worse than 56bits.

bcl5980 marked 2 inline comments as done.Sep 8 2022, 7:27 PM

Harbormaster completed remote builds in B185764: Diff 458946.Sep 8 2022, 7:28 PM

bcl5980 planned changes to this revision.Sep 8 2022, 7:28 PM

bcl5980 retitled this revision from [LegalizeTypes][LegalizeDAG] Use misaligned load/store to optimize memory access with non-power2 integer types. to [WIP][LegalizeTypes][LegalizeDAG] Use misaligned load/store to optimize memory access with non-power2 integer types..

bcl5980 abandoned this revision.Oct 24 2022, 10:45 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

TargetLowering.h

6 lines

lib/

CodeGen/

SelectionDAG/

LegalizeDAG.cpp

21 lines

LegalizeIntegerTypes.cpp

53 lines

TargetLoweringBase.cpp

20 lines

test/

CodeGen/

AArch64/

arm64-non-pow2-ldst.ll

104 lines

X86/

funnel-shift.ll

34 lines

illegal-bitfield-loadstore.ll

191 lines

shrink-compare-pgso.ll

8 lines

shrink-compare.ll

8 lines

Diff 458946

llvm/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 1,715 Lines • ▼ Show 20 Lines	#include "llvm/IR/ConstrainedOps.def"
/// LLT handling variant.		/// LLT handling variant.
virtual bool allowsMisalignedMemoryAccesses(		virtual bool allowsMisalignedMemoryAccesses(
LLT, unsigned AddrSpace = 0, Align Alignment = Align(1),		LLT, unsigned AddrSpace = 0, Align Alignment = Align(1),
MachineMemOperand::Flags Flags = MachineMemOperand::MONone,		MachineMemOperand::Flags Flags = MachineMemOperand::MONone,
bool * /Fast/ = nullptr) const {		bool * /Fast/ = nullptr) const {
return false;		return false;
}		}

		/// Use unaligned memory access for non-power2 types
		bool allowMisalignedMemForNonPow2Type(
		LLVMContext &Context, unsigned SrcBits, EVT ExtraVT,
		unsigned AddrSpace = 0, Align Alignment = Align(1),
		MachineMemOperand::Flags Flags = MachineMemOperand::MONone) const;

/// This function returns true if the memory access is aligned or if the		/// This function returns true if the memory access is aligned or if the
/// target allows this specific unaligned memory access. If the access is		/// target allows this specific unaligned memory access. If the access is
/// allowed, the optional final parameter returns if the access is also fast		/// allowed, the optional final parameter returns if the access is also fast
/// (as defined by the target).		/// (as defined by the target).
bool allowsMemoryAccessForAlignment(		bool allowsMemoryAccessForAlignment(
LLVMContext &Context, const DataLayout &DL, EVT VT,		LLVMContext &Context, const DataLayout &DL, EVT VT,
unsigned AddrSpace = 0, Align Alignment = Align(1),		unsigned AddrSpace = 0, Align Alignment = Align(1),
MachineMemOperand::Flags Flags = MachineMemOperand::MONone,		MachineMemOperand::Flags Flags = MachineMemOperand::MONone,
▲ Show 20 Lines • Show All 3,291 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

Show First 20 Lines • Show All 775 Lines • ▼ Show 20 Lines	if (SrcWidth != SrcVT.getStoreSizeInBits() &&
unsigned LogSrcWidth = Log2_32(SrcWidthBits);		unsigned LogSrcWidth = Log2_32(SrcWidthBits);
assert(LogSrcWidth < 32);		assert(LogSrcWidth < 32);
unsigned RoundWidth = 1 << LogSrcWidth;		unsigned RoundWidth = 1 << LogSrcWidth;
assert(RoundWidth < SrcWidthBits);		assert(RoundWidth < SrcWidthBits);
unsigned ExtraWidth = SrcWidthBits - RoundWidth;		unsigned ExtraWidth = SrcWidthBits - RoundWidth;
assert(ExtraWidth < RoundWidth);		assert(ExtraWidth < RoundWidth);
assert(!(RoundWidth % 8) && !(ExtraWidth % 8) &&		assert(!(RoundWidth % 8) && !(ExtraWidth % 8) &&
"Load size not an integral number of bytes!");		"Load size not an integral number of bytes!");
EVT RoundVT = EVT::getIntegerVT(*DAG.getContext(), RoundWidth);		LLVMContext &Context = *DAG.getContext();
EVT ExtraVT = EVT::getIntegerVT(*DAG.getContext(), ExtraWidth);		EVT RoundVT = EVT::getIntegerVT(Context, RoundWidth);
		EVT ExtraVT = EVT::getIntegerVT(Context, ExtraWidth);
SDValue Lo, Hi, Ch;		SDValue Lo, Hi, Ch;
unsigned IncrementSize;		unsigned IncrementSize;
auto &DL = DAG.getDataLayout();		auto &DL = DAG.getDataLayout();

if (DL.isLittleEndian()) {		if (DL.isLittleEndian()) {
// EXTLOAD:i24 -> ZEXTLOAD:i16 \| (shl EXTLOAD@+2:i8, 16)		// EXTLOAD:i24 -> ZEXTLOAD:i16 \| (shl EXTLOAD@+2:i8, 16)
// Load the bottom RoundWidth bits.		// Load the bottom RoundWidth bits.
Lo = DAG.getExtLoad(ISD::ZEXTLOAD, dl, Node->getValueType(0), Chain, Ptr,		Lo = DAG.getExtLoad(ISD::ZEXTLOAD, dl, Node->getValueType(0), Chain, Ptr,
LD->getPointerInfo(), RoundVT, LD->getOriginalAlign(),		LD->getPointerInfo(), RoundVT, LD->getOriginalAlign(),
MMOFlags, AAInfo);		MMOFlags, AAInfo);

		Align ExtraAlign = Align(1ull << countTrailingZeros(ExtraWidth / 8));
		unsigned IncSizeBits = RoundWidth;
		if (TLI.allowMisalignedMemForNonPow2Type(Context, SrcWidthBits, RoundVT,
		LD->getAddressSpace(),
		efriedmaUnsubmitted Done Reply Inline Actions The alignment you're passing in here doesn't seem right; you want to pass in the alignment of the load you're planning to generate, right? efriedma: The alignment you're passing in here doesn't seem right; you want to pass in the alignment of…
		bcl5980AuthorUnsubmitted Done Reply Inline Actions Ah, yeah. I missed it. Thanks for the catch. bcl5980: Ah, yeah. I missed it. Thanks for the catch.
		ExtraAlign, MMOFlags)) {
		IncSizeBits = ExtraWidth;
		ExtraVT = RoundVT;
		ExtraWidth = RoundWidth;
		} else {
		ExtraAlign = LD->getOriginalAlign();
		}

// Load the remaining ExtraWidth bits.		// Load the remaining ExtraWidth bits.
IncrementSize = RoundWidth / 8;		IncrementSize = IncSizeBits / 8;
Ptr = DAG.getMemBasePlusOffset(Ptr, TypeSize::Fixed(IncrementSize), dl);		Ptr = DAG.getMemBasePlusOffset(Ptr, TypeSize::Fixed(IncrementSize), dl);
Hi = DAG.getExtLoad(ExtType, dl, Node->getValueType(0), Chain, Ptr,		Hi = DAG.getExtLoad(ExtType, dl, Node->getValueType(0), Chain, Ptr,
LD->getPointerInfo().getWithOffset(IncrementSize),		LD->getPointerInfo().getWithOffset(IncrementSize),
ExtraVT, LD->getOriginalAlign(), MMOFlags, AAInfo);		ExtraVT, LD->getOriginalAlign(), MMOFlags, AAInfo);
		efriedmaUnsubmitted Done Reply Inline Actions The alignment on this load isn't right. The "alignment" of a load SDNode isn't the alignment of the load itself; it's the alignment of the base of the MachinePointerInfo. So if you're using getWithOffset() to adjust the MachinePointerInfo, the alignment doesn't change based on the offset. (This distinction helps make alias analysis more accurate.) efriedma: The alignment on this load isn't right. The "alignment" of a load SDNode isn't the alignment…

// Build a factor node to remember that this load is independent of		// Build a factor node to remember that this load is independent of
// the other one.		// the other one.
Ch = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Lo.getValue(1),		Ch = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Lo.getValue(1),
Hi.getValue(1));		Hi.getValue(1));

// Move the top bits to the right place.		// Move the top bits to the right place.
Hi = DAG.getNode(		Hi = DAG.getNode(
ISD::SHL, dl, Hi.getValueType(), Hi,		ISD::SHL, dl, Hi.getValueType(), Hi,
DAG.getConstant(RoundWidth, dl,		DAG.getConstant(IncSizeBits, dl,
TLI.getShiftAmountTy(Hi.getValueType(), DL)));		TLI.getShiftAmountTy(Hi.getValueType(), DL)));

// Join the hi and lo parts.		// Join the hi and lo parts.
Value = DAG.getNode(ISD::OR, dl, Node->getValueType(0), Lo, Hi);		Value = DAG.getNode(ISD::OR, dl, Node->getValueType(0), Lo, Hi);
} else {		} else {
// Big endian - avoid unaligned loads.		// Big endian - avoid unaligned loads.
// EXTLOAD:i24 -> (shl EXTLOAD:i16, 8) \| ZEXTLOAD@+2:i8		// EXTLOAD:i24 -> (shl EXTLOAD:i16, 8) \| ZEXTLOAD@+2:i8
// Load the top RoundWidth bits.		// Load the top RoundWidth bits.
▲ Show 20 Lines • Show All 4,307 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

Show First 20 Lines • Show All 3,558 Lines • ▼ Show 20 Lines	if (ExtType == ISD::SEXTLOAD) {
// The high part is just a zero.		// The high part is just a zero.
Hi = DAG.getConstant(0, dl, NVT);		Hi = DAG.getConstant(0, dl, NVT);
} else {		} else {
assert(ExtType == ISD::EXTLOAD && "Unknown extload!");		assert(ExtType == ISD::EXTLOAD && "Unknown extload!");
// The high part is undefined.		// The high part is undefined.
Hi = DAG.getUNDEF(NVT);		Hi = DAG.getUNDEF(NVT);
}		}
} else if (DAG.getDataLayout().isLittleEndian()) {		} else if (DAG.getDataLayout().isLittleEndian()) {
		LLVMContext &Context = *DAG.getContext();
		unsigned SrcBits = N->getMemoryVT().getSizeInBits();
		unsigned RoundBits = NVT.getSizeInBits();
// Little-endian - low bits are at low addresses.		// Little-endian - low bits are at low addresses.
Lo = DAG.getLoad(NVT, dl, Ch, Ptr, N->getPointerInfo(),		Lo = DAG.getLoad(NVT, dl, Ch, Ptr, N->getPointerInfo(),
N->getOriginalAlign(), MMOFlags, AAInfo);		N->getOriginalAlign(), MMOFlags, AAInfo);

unsigned ExcessBits =		unsigned ExcessBits = SrcBits - RoundBits;
N->getMemoryVT().getSizeInBits() - NVT.getSizeInBits();		EVT NEVT = EVT::getIntegerVT(Context, ExcessBits);
EVT NEVT = EVT::getIntegerVT(*DAG.getContext(), ExcessBits);

		EVT HiVT = NEVT.getRoundIntegerType(Context);
		unsigned OverlapedBits = HiVT.getSizeInBits() - ExcessBits;
		unsigned IncBits = RoundBits - OverlapedBits;
		Align ExtraAlign = Align(1ull << countTrailingZeros((IncBits + 7) / 8));
		// We only enable the unaligned load when the extra part is
		// less than the largest integer type the target support,
		// which means "VT == getTypeToTransformTo(VT)"
		if (HiVT == TLI.getTypeToTransformTo(Context, HiVT) &&
		TLI.allowMisalignedMemForNonPow2Type(Context, SrcBits, HiVT,
		N->getAddressSpace(), ExtraAlign,
		MMOFlags)) {
		// use unaligned load to simplify the non-power2 types
		unsigned IncrementSize = IncBits / 8;
		Ptr = DAG.getMemBasePlusOffset(Ptr, TypeSize::Fixed(IncrementSize), dl);
		Hi = DAG.getExtLoad(ExtType, dl, NVT, Ch, Ptr,
		N->getPointerInfo().getWithOffset(IncrementSize),
		HiVT, N->getOriginalAlign(), MMOFlags, AAInfo);
		unsigned Opcode = ExtType == ISD::SEXTLOAD ? ISD::SRA : ISD::SRL;
		Hi = DAG.getNode(Opcode, dl, NVT, Hi,
		DAG.getConstant(OverlapedBits, dl, NVT));
		} else {
// Increment the pointer to the other half.		// Increment the pointer to the other half.
unsigned IncrementSize = NVT.getSizeInBits()/8;		unsigned IncrementSize = RoundBits / 8;
Ptr = DAG.getMemBasePlusOffset(Ptr, TypeSize::Fixed(IncrementSize), dl);		Ptr = DAG.getMemBasePlusOffset(Ptr, TypeSize::Fixed(IncrementSize), dl);
Hi = DAG.getExtLoad(ExtType, dl, NVT, Ch, Ptr,		Hi = DAG.getExtLoad(ExtType, dl, NVT, Ch, Ptr,
N->getPointerInfo().getWithOffset(IncrementSize), NEVT,		N->getPointerInfo().getWithOffset(IncrementSize),
N->getOriginalAlign(), MMOFlags, AAInfo);		NEVT, N->getOriginalAlign(), MMOFlags, AAInfo);

// Build a factor node to remember that this load is independent of the		// Build a factor node to remember that this load is independent of the
// other one.		// other one.
Ch = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Lo.getValue(1),		Ch = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Lo.getValue(1),
Hi.getValue(1));		Hi.getValue(1));
		}
} else {		} else {
// Big-endian - high bits are at low addresses. Favor aligned loads at		// Big-endian - high bits are at low addresses. Favor aligned loads at
// the cost of some bit-fiddling.		// the cost of some bit-fiddling.
EVT MemVT = N->getMemoryVT();		EVT MemVT = N->getMemoryVT();
unsigned EBytes = MemVT.getStoreSize();		unsigned EBytes = MemVT.getStoreSize();
unsigned IncrementSize = NVT.getSizeInBits()/8;		unsigned IncrementSize = NVT.getSizeInBits()/8;
unsigned ExcessBits = (EBytes - IncrementSize)*8;		unsigned ExcessBits = (EBytes - IncrementSize)*8;

▲ Show 20 Lines • Show All 1,970 Lines • Show Last 20 Lines

llvm/lib/CodeGen/TargetLoweringBase.cpp

Show First 20 Lines • Show All 1,754 Lines • ▼ Show 20 Lines	bool TargetLoweringBase::allowsMemoryAccess(LLVMContext &Context,
const DataLayout &DL, LLT Ty,		const DataLayout &DL, LLT Ty,
const MachineMemOperand &MMO,		const MachineMemOperand &MMO,
bool *Fast) const {		bool *Fast) const {
EVT VT = getApproximateEVTForLLT(Ty, DL, Context);		EVT VT = getApproximateEVTForLLT(Ty, DL, Context);
return allowsMemoryAccess(Context, DL, VT, MMO.getAddrSpace(), MMO.getAlign(),		return allowsMemoryAccess(Context, DL, VT, MMO.getAddrSpace(), MMO.getAlign(),
MMO.getFlags(), Fast);		MMO.getFlags(), Fast);
}		}

		bool TargetLoweringBase::allowMisalignedMemForNonPow2Type(
		LLVMContext &Context, unsigned SrcBits, EVT ExtraVT, unsigned AddrSpace,
		Align Alignment, MachineMemOperand::Flags Flags) const {
		// If pop count is less or equal than 2 we can't get
		// any benifit from unaligned load/store
		if (countPopulation(SrcBits) <= 2)
		return false;

		// If load bits is not byte align we don't use unaligned load/store
		if ((SrcBits & 7) != 0)
		return false;

		bool fast;
		if (!allowsMisalignedMemoryAccesses(ExtraVT, AddrSpace, Alignment, Flags,
		&fast))
		return false;

		return fast;
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// TargetTransformInfo Helpers		// TargetTransformInfo Helpers
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

int TargetLoweringBase::InstructionOpcodeToISD(unsigned Opcode) const {		int TargetLoweringBase::InstructionOpcodeToISD(unsigned Opcode) const {
enum InstructionOpcodes {		enum InstructionOpcodes {
#define HANDLE_INST(NUM, OPCODE, CLASS) OPCODE = NUM,		#define HANDLE_INST(NUM, OPCODE, CLASS) OPCODE = NUM,
#define LAST_OTHER_INST(NUM) InstructionOpcodesCount = NUM		#define LAST_OTHER_INST(NUM) InstructionOpcodesCount = NUM
▲ Show 20 Lines • Show All 552 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-non-pow2-ldst.ll

	Show All 9 Lines
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%r = load i24, i24* %p			%r = load i24, i24* %p
	ret i24 %r			ret i24 %r
	}			}

	define i56 @ldi56(ptr %p) nounwind {			define i56 @ldi56(ptr %p) nounwind {
	; CHECK-LABEL: ldi56:			; CHECK-LABEL: ldi56:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldrb w8, [x0, #6]			; CHECK-NEXT: ldur w8, [x0, #3]
	; CHECK-NEXT: ldrh w9, [x0, #4]			; CHECK-NEXT: ldr w9, [x0]
	; CHECK-NEXT: ldr w0, [x0]			; CHECK-NEXT: orr x0, x9, x8, lsl #24
	; CHECK-NEXT: bfi w9, w8, #16, #16
	; CHECK-NEXT: bfi x0, x9, #32, #32
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%r = load i56, i56* %p			%r = load i56, i56* %p
	ret i56 %r			ret i56 %r
	}			}

	define i80 @ldi80(ptr %p) nounwind {			define i80 @ldi80(ptr %p) nounwind {
	; CHECK-LABEL: ldi80:			; CHECK-LABEL: ldi80:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldr x8, [x0]			; CHECK-NEXT: ldr x8, [x0]
	; CHECK-NEXT: ldrh w1, [x0, #8]			; CHECK-NEXT: ldrh w1, [x0, #8]
	; CHECK-NEXT: mov x0, x8			; CHECK-NEXT: mov x0, x8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%r = load i80, i80* %p			%r = load i80, i80* %p
	ret i80 %r			ret i80 %r
	}			}

	define i120 @ldi120(ptr %p) nounwind {			define i120 @ldi120(ptr %p) nounwind {
	; CHECK-LABEL: ldi120:			; CHECK-LABEL: ldi120:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldrb w8, [x0, #14]			; CHECK-NEXT: ldur x8, [x0, #7]
	; CHECK-NEXT: ldrh w9, [x0, #12]
	; CHECK-NEXT: ldr w1, [x0, #8]
	; CHECK-NEXT: ldr x0, [x0]			; CHECK-NEXT: ldr x0, [x0]
	; CHECK-NEXT: bfi w9, w8, #16, #16			; CHECK-NEXT: lsr x1, x8, #8
	; CHECK-NEXT: bfi x1, x9, #32, #32
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%r = load i120, i120* %p			%r = load i120, i120* %p
	ret i120 %r			ret i120 %r
	}			}

	define i280 @ldi280(ptr %p) nounwind {			define i280 @ldi280(ptr %p) nounwind {
	; CHECK-LABEL: ldi280:			; CHECK-LABEL: ldi280:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldp x8, x1, [x0]			; CHECK-NEXT: ldp x8, x1, [x0]
	; CHECK-NEXT: ldrb w9, [x0, #34]			; CHECK-NEXT: ldur w9, [x0, #31]
	; CHECK-NEXT: ldrh w4, [x0, #32]
	; CHECK-NEXT: ldp x2, x3, [x0, #16]			; CHECK-NEXT: ldp x2, x3, [x0, #16]
	; CHECK-NEXT: mov x0, x8			; CHECK-NEXT: mov x0, x8
	; CHECK-NEXT: bfi x4, x9, #16, #8			; CHECK-NEXT: ubfx x4, x9, #8, #24
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%r = load i280, i280* %p			%r = load i280, i280* %p
	ret i280 %r			ret i280 %r
	}			}

	define void @sti24(ptr %p, i24 %a) nounwind {			define void @sti24(ptr %p, i24 %a) nounwind {
	; CHECK-LABEL: sti24:			; CHECK-LABEL: sti24:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	store i280 %a, i280* %p			store i280 %a, i280* %p
	ret void			ret void
	}			}

	define void @i56_or(ptr %a) {			define void @i56_or(ptr %a) {
	; CHECK-LABEL: i56_or:			; CHECK-LABEL: i56_or:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: mov x8, x0			; CHECK-NEXT: ldur w8, [x0, #3]
	; CHECK-NEXT: ldr w9, [x0]			; CHECK-NEXT: ldr w9, [x0]
	; CHECK-NEXT: ldrh w10, [x8, #4]!			; CHECK-NEXT: lsr x10, x8, #8
	; CHECK-NEXT: ldrb w11, [x8, #2]			; CHECK-NEXT: orr w9, w9, w8, lsl #24
				; CHECK-NEXT: lsr x8, x8, #24
	; CHECK-NEXT: orr w9, w9, #0x180			; CHECK-NEXT: orr w9, w9, #0x180
	; CHECK-NEXT: bfi w10, w11, #16, #16			; CHECK-NEXT: strh w10, [x0, #4]
				; CHECK-NEXT: strb w8, [x0, #6]
	; CHECK-NEXT: str w9, [x0]			; CHECK-NEXT: str w9, [x0]
	; CHECK-NEXT: strb w11, [x8, #2]
	; CHECK-NEXT: strh w10, [x8]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%aa = load i56, ptr %a, align 1			%aa = load i56, ptr %a, align 1
	%b = or i56 %aa, 384			%b = or i56 %aa, 384
	store i56 %b, ptr %a, align 1			store i56 %b, ptr %a, align 1
	ret void			ret void
	}			}

	define void @i56_and_or(ptr %a) {			define void @i56_and_or(ptr %a) {
	; CHECK-LABEL: i56_and_or:			; CHECK-LABEL: i56_and_or:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: mov x8, x0			; CHECK-NEXT: ldr w8, [x0]
	; CHECK-NEXT: ldr w9, [x0]			; CHECK-NEXT: ldur w9, [x0, #3]
	; CHECK-NEXT: ldrh w10, [x8, #4]!			; CHECK-NEXT: orr w8, w8, w9, lsl #24
	; CHECK-NEXT: ldrb w11, [x8, #2]			; CHECK-NEXT: lsr x10, x9, #8
	; CHECK-NEXT: orr w9, w9, #0x180			; CHECK-NEXT: orr w8, w8, #0x180
	; CHECK-NEXT: and w9, w9, #0xffffff80			; CHECK-NEXT: lsr x9, x9, #24
	; CHECK-NEXT: bfi w10, w11, #16, #16			; CHECK-NEXT: and w8, w8, #0xffffff80
	; CHECK-NEXT: strb w11, [x8, #2]			; CHECK-NEXT: strh w10, [x0, #4]
	; CHECK-NEXT: str w9, [x0]			; CHECK-NEXT: strb w9, [x0, #6]
	; CHECK-NEXT: strh w10, [x8]			; CHECK-NEXT: str w8, [x0]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%b = load i56, ptr %a, align 1			%b = load i56, ptr %a, align 1
	%c = and i56 %b, -128			%c = and i56 %b, -128
	%d = or i56 %c, 384			%d = or i56 %c, 384
	store i56 %d, ptr %a, align 1			store i56 %d, ptr %a, align 1
	ret void			ret void
	}			}

	define void @i56_insert_bit(ptr %a, i1 zeroext %bit) {			define void @i56_insert_bit(ptr %a, i1 zeroext %bit) {
	; CHECK-LABEL: i56_insert_bit:			; CHECK-LABEL: i56_insert_bit:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: mov x8, x0			; CHECK-NEXT: ldur w8, [x0, #3]
	; CHECK-NEXT: ldr w11, [x0]			; CHECK-NEXT: ldr w9, [x0]
	; CHECK-NEXT: ldrh w9, [x8, #4]!			; CHECK-NEXT: orr x8, x9, x8, lsl #24
	; CHECK-NEXT: ldrb w10, [x8, #2]			; CHECK-NEXT: and x8, x8, #0xffffffffffffdfff
	; CHECK-NEXT: bfi w9, w10, #16, #8			; CHECK-NEXT: lsr x9, x8, #32
	; CHECK-NEXT: strb w10, [x8, #2]			; CHECK-NEXT: lsr x10, x8, #48
	; CHECK-NEXT: bfi x11, x9, #32, #24			; CHECK-NEXT: orr w8, w8, w1, lsl #13
	; CHECK-NEXT: strh w9, [x8]			; CHECK-NEXT: strh w9, [x0, #4]
	; CHECK-NEXT: and x11, x11, #0xffffffffffffdfff			; CHECK-NEXT: strb w10, [x0, #6]
	; CHECK-NEXT: orr w11, w11, w1, lsl #13			; CHECK-NEXT: str w8, [x0]
	; CHECK-NEXT: str w11, [x0]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%extbit = zext i1 %bit to i56			%extbit = zext i1 %bit to i56
	%b = load i56, ptr %a, align 1			%b = load i56, ptr %a, align 1
	%extbit.shl = shl nuw nsw i56 %extbit, 13			%extbit.shl = shl nuw nsw i56 %extbit, 13
	%c = and i56 %b, -8193			%c = and i56 %b, -8193
	%d = or i56 %c, %extbit.shl			%d = or i56 %c, %extbit.shl
	store i56 %d, ptr %a, align 1			store i56 %d, ptr %a, align 1
	ret void			ret void
	}			}

	define void @i120_or(ptr %a) {			define void @i120_or(ptr %a) {
	; CHECK-LABEL: i120_or:			; CHECK-LABEL: i120_or:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldr x8, [x0]			; CHECK-NEXT: ldr x8, [x0]
				; CHECK-NEXT: ldur x9, [x0, #7]
	; CHECK-NEXT: orr x8, x8, #0x180			; CHECK-NEXT: orr x8, x8, #0x180
				; CHECK-NEXT: lsr x10, x9, #56
				; CHECK-NEXT: lsr x11, x9, #40
				; CHECK-NEXT: lsr x9, x9, #8
	; CHECK-NEXT: str x8, [x0]			; CHECK-NEXT: str x8, [x0]
				; CHECK-NEXT: strb w10, [x0, #14]
				; CHECK-NEXT: strh w11, [x0, #12]
				; CHECK-NEXT: str w9, [x0, #8]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%aa = load i120, ptr %a, align 1			%aa = load i120, ptr %a, align 1
	%b = or i120 %aa, 384			%b = or i120 %aa, 384
	store i120 %b, ptr %a, align 1			store i120 %b, ptr %a, align 1
	ret void			ret void
	}			}

	define void @i200_or(ptr %a) {			define void @i200_or(ptr %a) {
	; CHECK-LABEL: i200_or:			; CHECK-LABEL: i200_or:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldr x8, [x0]			; CHECK-NEXT: ldr x8, [x0]
	; CHECK-NEXT: orr x8, x8, #0x180			; CHECK-NEXT: orr x8, x8, #0x180
	; CHECK-NEXT: str x8, [x0]			; CHECK-NEXT: str x8, [x0]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%aa = load i200, ptr %a, align 1			%aa = load i200, ptr %a, align 1
	%b = or i200 %aa, 384			%b = or i200 %aa, 384
	store i200 %b, ptr %a, align 1			store i200 %b, ptr %a, align 1
	ret void			ret void
	}			}

	define void @i248_or(ptr %a) {			define void @i248_or(ptr %a) {
	; CHECK-LABEL: i248_or:			; CHECK-LABEL: i248_or:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldr x8, [x0]			; CHECK-NEXT: ldr x8, [x0]
				; CHECK-NEXT: ldur x9, [x0, #23]
				; CHECK-NEXT: ldr x10, [x0, #16]
	; CHECK-NEXT: orr x8, x8, #0x180			; CHECK-NEXT: orr x8, x8, #0x180
				; CHECK-NEXT: lsr x11, x9, #56
				; CHECK-NEXT: str x10, [x0, #16]
				; CHECK-NEXT: lsr x10, x9, #40
				; CHECK-NEXT: lsr x9, x9, #8
	; CHECK-NEXT: str x8, [x0]			; CHECK-NEXT: str x8, [x0]
				; CHECK-NEXT: strb w11, [x0, #30]
				; CHECK-NEXT: strh w10, [x0, #28]
				; CHECK-NEXT: str w9, [x0, #24]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%aa = load i248, ptr %a, align 1			%aa = load i248, ptr %a, align 1
	%b = or i248 %aa, 384			%b = or i248 %aa, 384
	store i248 %b, ptr %a, align 1			store i248 %b, ptr %a, align 1
	ret void			ret void
	}			}

	define void @i304_or(ptr %a) {			define void @i304_or(ptr %a) {
	; CHECK-LABEL: i304_or:			; CHECK-LABEL: i304_or:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldr x8, [x0]			; CHECK-NEXT: ldur x8, [x0, #30]
	; CHECK-NEXT: orr x8, x8, #0x180			; CHECK-NEXT: ldr x9, [x0]
	; CHECK-NEXT: str x8, [x0]			; CHECK-NEXT: ldr x10, [x0, #24]
				; CHECK-NEXT: ldur q0, [x0, #8]
				; CHECK-NEXT: orr x9, x9, #0x180
				; CHECK-NEXT: str x10, [x0, #24]
				; CHECK-NEXT: lsr x10, x8, #48
				; CHECK-NEXT: lsr x8, x8, #16
				; CHECK-NEXT: stur q0, [x0, #8]
				; CHECK-NEXT: str x9, [x0]
				; CHECK-NEXT: strh w10, [x0, #36]
				; CHECK-NEXT: str w8, [x0, #32]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%aa = load i304, ptr %a, align 1			%aa = load i304, ptr %a, align 1
	%b = or i304 %aa, 384			%b = or i304 %aa, 384
	store i304 %b, ptr %a, align 1			store i304 %b, ptr %a, align 1
	ret void			ret void
	}			}

llvm/test/CodeGen/X86/funnel-shift.ll

	Show First 20 Lines • Show All 977 Lines • ▼ Show 20 Lines
	define void @PR45265(i32 %0, %struct.S* nocapture readonly %1) nounwind {			define void @PR45265(i32 %0, %struct.S* nocapture readonly %1) nounwind {
	; X86-SSE2-LABEL: PR45265:			; X86-SSE2-LABEL: PR45265:
	; X86-SSE2: # %bb.0:			; X86-SSE2: # %bb.0:
	; X86-SSE2-NEXT: pushl %edi			; X86-SSE2-NEXT: pushl %edi
	; X86-SSE2-NEXT: pushl %esi			; X86-SSE2-NEXT: pushl %esi
	; X86-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-SSE2-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X86-SSE2-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X86-SSE2-NEXT: leal (%eax,%eax,2), %edx			; X86-SSE2-NEXT: leal (%eax,%eax,2), %edx
	; X86-SSE2-NEXT: movzwl 8(%ecx,%edx,4), %esi			; X86-SSE2-NEXT: movl 4(%ecx,%edx,4), %esi
	; X86-SSE2-NEXT: movl 4(%ecx,%edx,4), %edi			; X86-SSE2-NEXT: movl 7(%ecx,%edx,4), %ecx
	; X86-SSE2-NEXT: shrdl $8, %esi, %edi
	; X86-SSE2-NEXT: xorl %eax, %edi
	; X86-SSE2-NEXT: sarl $31, %eax
	; X86-SSE2-NEXT: movzbl 10(%ecx,%edx,4), %ecx
	; X86-SSE2-NEXT: shll $16, %ecx
	; X86-SSE2-NEXT: orl %esi, %ecx
	; X86-SSE2-NEXT: shll $8, %ecx
	; X86-SSE2-NEXT: movl %ecx, %edx			; X86-SSE2-NEXT: movl %ecx, %edx
	; X86-SSE2-NEXT: sarl $8, %edx			; X86-SSE2-NEXT: movl %ecx, %edi
	; X86-SSE2-NEXT: sarl $31, %ecx			; X86-SSE2-NEXT: shrl $8, %ecx
	; X86-SSE2-NEXT: shldl $24, %edx, %ecx			; X86-SSE2-NEXT: shldl $24, %esi, %ecx
	; X86-SSE2-NEXT: xorl %eax, %ecx			; X86-SSE2-NEXT: xorl %eax, %ecx
	; X86-SSE2-NEXT: orl %ecx, %edi			; X86-SSE2-NEXT: sarl $31, %eax
				; X86-SSE2-NEXT: sarl $8, %edx
				; X86-SSE2-NEXT: sarl $31, %edi
				; X86-SSE2-NEXT: shldl $24, %edx, %edi
				; X86-SSE2-NEXT: xorl %eax, %edi
				; X86-SSE2-NEXT: orl %edi, %ecx
	; X86-SSE2-NEXT: jne .LBB46_1			; X86-SSE2-NEXT: jne .LBB46_1
	; X86-SSE2-NEXT: # %bb.2:			; X86-SSE2-NEXT: # %bb.2:
	; X86-SSE2-NEXT: popl %esi			; X86-SSE2-NEXT: popl %esi
	; X86-SSE2-NEXT: popl %edi			; X86-SSE2-NEXT: popl %edi
	; X86-SSE2-NEXT: jmp _Z3foov # TAILCALL			; X86-SSE2-NEXT: jmp _Z3foov # TAILCALL
	; X86-SSE2-NEXT: .LBB46_1:			; X86-SSE2-NEXT: .LBB46_1:
	; X86-SSE2-NEXT: popl %esi			; X86-SSE2-NEXT: popl %esi
	; X86-SSE2-NEXT: popl %edi			; X86-SSE2-NEXT: popl %edi
	; X86-SSE2-NEXT: retl			; X86-SSE2-NEXT: retl
	;			;
	; X64-AVX2-LABEL: PR45265:			; X64-AVX2-LABEL: PR45265:
	; X64-AVX2: # %bb.0:			; X64-AVX2: # %bb.0:
	; X64-AVX2-NEXT: movslq %edi, %rax			; X64-AVX2-NEXT: movslq %edi, %rax
	; X64-AVX2-NEXT: leaq (%rax,%rax,2), %rcx			; X64-AVX2-NEXT: leaq (%rax,%rax,2), %rcx
	; X64-AVX2-NEXT: movsbq 10(%rsi,%rcx,4), %rdx			; X64-AVX2-NEXT: movq (%rsi,%rcx,4), %rdx
	; X64-AVX2-NEXT: shlq $16, %rdx			; X64-AVX2-NEXT: movslq 7(%rsi,%rcx,4), %rcx
	; X64-AVX2-NEXT: movzwl 8(%rsi,%rcx,4), %edi			; X64-AVX2-NEXT: shrq $8, %rcx
	; X64-AVX2-NEXT: orq %rdx, %rdi			; X64-AVX2-NEXT: shldq $24, %rdx, %rcx
	; X64-AVX2-NEXT: movq (%rsi,%rcx,4), %rcx
	; X64-AVX2-NEXT: shrdq $40, %rdi, %rcx
	; X64-AVX2-NEXT: cmpq %rax, %rcx			; X64-AVX2-NEXT: cmpq %rax, %rcx
	; X64-AVX2-NEXT: jne .LBB46_1			; X64-AVX2-NEXT: jne .LBB46_1
	; X64-AVX2-NEXT: # %bb.2:			; X64-AVX2-NEXT: # %bb.2:
	; X64-AVX2-NEXT: jmp _Z3foov # TAILCALL			; X64-AVX2-NEXT: jmp _Z3foov # TAILCALL
	; X64-AVX2-NEXT: .LBB46_1:			; X64-AVX2-NEXT: .LBB46_1:
	; X64-AVX2-NEXT: retq			; X64-AVX2-NEXT: retq
	%3 = sext i32 %0 to i64			%3 = sext i32 %0 to i64
	%4 = getelementptr inbounds %struct.S, %struct.S* %1, i64 %3			%4 = getelementptr inbounds %struct.S, %struct.S* %1, i64 %3
	▲ Show 20 Lines • Show All 279 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/illegal-bitfield-loadstore.ll

Show First 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
%d = or i24 %c, %extbit.shl		%d = or i24 %c, %extbit.shl
store i24 %d, ptr %a, align 1		store i24 %d, ptr %a, align 1
ret void		ret void
}		}

define void @i56_or(ptr %a) {		define void @i56_or(ptr %a) {
; X86-LABEL: i56_or:		; X86-LABEL: i56_or:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax		; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: orl $384, (%eax) # imm = 0x180		; X86-NEXT: movl 3(%ecx), %eax
		; X86-NEXT: movl %eax, %edx
		; X86-NEXT: shrl $8, %edx
		; X86-NEXT: movw %dx, 4(%ecx)
		; X86-NEXT: shrl $24, %eax
		; X86-NEXT: movb %al, 6(%ecx)
		; X86-NEXT: orl $384, (%ecx) # imm = 0x180
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: i56_or:		; X64-LABEL: i56_or:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: movzbl 6(%rdi), %eax		; X64-NEXT: movl 3(%rdi), %eax
; X64-NEXT: shll $16, %eax		; X64-NEXT: movq %rax, %rcx
; X64-NEXT: movzwl 4(%rdi), %ecx		; X64-NEXT: shlq $24, %rcx
; X64-NEXT: movw %cx, 4(%rdi)		; X64-NEXT: movl (%rdi), %edx
; X64-NEXT: shrq $16, %rax		; X64-NEXT: orl %ecx, %edx
		; X64-NEXT: orl $384, %edx # imm = 0x180
		; X64-NEXT: shrq $24, %rax
; X64-NEXT: movb %al, 6(%rdi)		; X64-NEXT: movb %al, 6(%rdi)
; X64-NEXT: orl $384, (%rdi) # imm = 0x180		; X64-NEXT: shrq $32, %rcx
		; X64-NEXT: movw %cx, 4(%rdi)
		; X64-NEXT: movl %edx, (%rdi)
; X64-NEXT: retq		; X64-NEXT: retq
%aa = load i56, ptr %a, align 1		%aa = load i56, ptr %a, align 1
%b = or i56 %aa, 384		%b = or i56 %aa, 384
store i56 %b, ptr %a, align 1		store i56 %b, ptr %a, align 1
ret void		ret void
}		}

define void @i56_and_or(ptr %a) {		define void @i56_and_or(ptr %a) {
; X86-LABEL: i56_and_or:		; X86-LABEL: i56_and_or:
; X86: # %bb.0:		; X86: # %bb.0:
		; X86-NEXT: pushl %esi
		; X86-NEXT: .cfi_def_cfa_offset 8
		; X86-NEXT: .cfi_offset %esi, -8
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax		; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: movl $384, %ecx # imm = 0x180		; X86-NEXT: movl 3(%eax), %ecx
; X86-NEXT: orl (%eax), %ecx		; X86-NEXT: movl %ecx, %edx
; X86-NEXT: andl $-128, %ecx		; X86-NEXT: shrl $8, %edx
; X86-NEXT: movl %ecx, (%eax)		; X86-NEXT: movl $384, %esi # imm = 0x180
		; X86-NEXT: orl (%eax), %esi
		; X86-NEXT: andl $-128, %esi
		; X86-NEXT: movl %esi, (%eax)
		; X86-NEXT: movw %dx, 4(%eax)
		; X86-NEXT: shrl $24, %ecx
		; X86-NEXT: movb %cl, 6(%eax)
		; X86-NEXT: popl %esi
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: i56_and_or:		; X64-LABEL: i56_and_or:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: movzwl 4(%rdi), %eax
; X64-NEXT: movzbl 6(%rdi), %ecx
; X64-NEXT: movb %cl, 6(%rdi)
; X64-NEXT: # kill: def $ecx killed $ecx def $rcx
; X64-NEXT: shll $16, %ecx
; X64-NEXT: orl %eax, %ecx
; X64-NEXT: shlq $32, %rcx
; X64-NEXT: movl (%rdi), %eax		; X64-NEXT: movl (%rdi), %eax
; X64-NEXT: orq %rcx, %rax		; X64-NEXT: movl 3(%rdi), %ecx
; X64-NEXT: orq $384, %rax # imm = 0x180		; X64-NEXT: movq %rcx, %rdx
; X64-NEXT: movabsq $72057594037927808, %rcx # imm = 0xFFFFFFFFFFFF80		; X64-NEXT: shlq $24, %rdx
; X64-NEXT: andq %rax, %rcx		; X64-NEXT: orq %rax, %rdx
; X64-NEXT: movl %ecx, (%rdi)		; X64-NEXT: orq $384, %rdx # imm = 0x180
; X64-NEXT: shrq $32, %rcx		; X64-NEXT: movabsq $72057594037927808, %rax # imm = 0xFFFFFFFFFFFF80
; X64-NEXT: movw %cx, 4(%rdi)		; X64-NEXT: andq %rdx, %rax
		; X64-NEXT: shrq $24, %rcx
		; X64-NEXT: movb %cl, 6(%rdi)
		; X64-NEXT: movl %eax, (%rdi)
		; X64-NEXT: shrq $32, %rax
		; X64-NEXT: movw %ax, 4(%rdi)
; X64-NEXT: retq		; X64-NEXT: retq
%b = load i56, ptr %a, align 1		%b = load i56, ptr %a, align 1
%c = and i56 %b, -128		%c = and i56 %b, -128
%d = or i56 %c, 384		%d = or i56 %c, 384
store i56 %d, ptr %a, align 1		store i56 %d, ptr %a, align 1
ret void		ret void
}		}

define void @i56_insert_bit(ptr %a, i1 zeroext %bit) {		define void @i56_insert_bit(ptr %a, i1 zeroext %bit) {
; X86-LABEL: i56_insert_bit:		; X86-LABEL: i56_insert_bit:
; X86: # %bb.0:		; X86: # %bb.0:
		; X86-NEXT: pushl %edi
		; X86-NEXT: .cfi_def_cfa_offset 8
		; X86-NEXT: pushl %esi
		; X86-NEXT: .cfi_def_cfa_offset 12
		; X86-NEXT: .cfi_offset %esi, -12
		; X86-NEXT: .cfi_offset %edi, -8
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax		; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: movzbl {{[0-9]+}}(%esp), %ecx		; X86-NEXT: movzbl {{[0-9]+}}(%esp), %edx
; X86-NEXT: shll $13, %ecx		; X86-NEXT: movl 3(%eax), %ecx
; X86-NEXT: movl $-8193, %edx # imm = 0xDFFF		; X86-NEXT: movl %ecx, %esi
; X86-NEXT: andl (%eax), %edx		; X86-NEXT: shrl $8, %esi
; X86-NEXT: orl %ecx, %edx		; X86-NEXT: shll $13, %edx
; X86-NEXT: movl %edx, (%eax)		; X86-NEXT: movl $-8193, %edi # imm = 0xDFFF
		; X86-NEXT: andl (%eax), %edi
		; X86-NEXT: orl %edx, %edi
		; X86-NEXT: movw %si, 4(%eax)
		; X86-NEXT: movl %edi, (%eax)
		; X86-NEXT: shrl $24, %ecx
		; X86-NEXT: movb %cl, 6(%eax)
		; X86-NEXT: popl %esi
		; X86-NEXT: .cfi_def_cfa_offset 8
		; X86-NEXT: popl %edi
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: i56_insert_bit:		; X64-LABEL: i56_insert_bit:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: movzwl 4(%rdi), %eax		; X64-NEXT: movl 3(%rdi), %eax
; X64-NEXT: movzbl 6(%rdi), %ecx		; X64-NEXT: movq %rax, %rcx
; X64-NEXT: movb %cl, 6(%rdi)		; X64-NEXT: shlq $24, %rcx
; X64-NEXT: # kill: def $ecx killed $ecx def $rcx		; X64-NEXT: movl (%rdi), %edx
; X64-NEXT: shll $16, %ecx		; X64-NEXT: orl %ecx, %edx
; X64-NEXT: orl %eax, %ecx
; X64-NEXT: shlq $32, %rcx
; X64-NEXT: movl (%rdi), %eax
; X64-NEXT: orq %rcx, %rax
; X64-NEXT: shll $13, %esi		; X64-NEXT: shll $13, %esi
; X64-NEXT: andq $-8193, %rax # imm = 0xDFFF		; X64-NEXT: andl $-8193, %edx # imm = 0xDFFF
; X64-NEXT: orl %eax, %esi		; X64-NEXT: orl %esi, %edx
; X64-NEXT: shrq $32, %rax		; X64-NEXT: shrq $24, %rax
; X64-NEXT: movw %ax, 4(%rdi)		; X64-NEXT: movb %al, 6(%rdi)
; X64-NEXT: movl %esi, (%rdi)		; X64-NEXT: shrq $32, %rcx
		; X64-NEXT: movw %cx, 4(%rdi)
		; X64-NEXT: movl %edx, (%rdi)
; X64-NEXT: retq		; X64-NEXT: retq
%extbit = zext i1 %bit to i56		%extbit = zext i1 %bit to i56
%b = load i56, ptr %a, align 1		%b = load i56, ptr %a, align 1
%extbit.shl = shl nuw nsw i56 %extbit, 13		%extbit.shl = shl nuw nsw i56 %extbit, 13
%c = and i56 %b, -8193		%c = and i56 %b, -8193
%d = or i56 %c, %extbit.shl		%d = or i56 %c, %extbit.shl
store i56 %d, ptr %a, align 1		store i56 %d, ptr %a, align 1
ret void		ret void
}		}

define void @i120_or(ptr %a) {		define void @i120_or(ptr %a) {
; X86-LABEL: i120_or:		; X86-LABEL: i120_or:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax		; X86-NEXT: pushl %esi
; X86-NEXT: orl $384, (%eax) # imm = 0x180		; X86-NEXT: .cfi_def_cfa_offset 8
		; X86-NEXT: .cfi_offset %esi, -8
		; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
		; X86-NEXT: movl 8(%ecx), %edx
		; X86-NEXT: movl 11(%ecx), %eax
		; X86-NEXT: movl %eax, %esi
		; X86-NEXT: shrl $8, %esi
		; X86-NEXT: movl %edx, 8(%ecx)
		; X86-NEXT: movw %si, 12(%ecx)
		; X86-NEXT: shrl $24, %eax
		; X86-NEXT: movb %al, 14(%ecx)
		; X86-NEXT: orl $384, (%ecx) # imm = 0x180
		; X86-NEXT: popl %esi
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: i120_or:		; X64-LABEL: i120_or:
; X64: # %bb.0:		; X64: # %bb.0:
		; X64-NEXT: movq 7(%rdi), %rax
		; X64-NEXT: movq %rax, %rcx
		; X64-NEXT: shrq $8, %rcx
		; X64-NEXT: movq %rax, %rdx
		; X64-NEXT: shrq $56, %rdx
		; X64-NEXT: movb %dl, 14(%rdi)
		; X64-NEXT: shrq $40, %rax
		; X64-NEXT: movw %ax, 12(%rdi)
		; X64-NEXT: movl %ecx, 8(%rdi)
; X64-NEXT: orq $384, (%rdi) # imm = 0x180		; X64-NEXT: orq $384, (%rdi) # imm = 0x180
; X64-NEXT: retq		; X64-NEXT: retq
%aa = load i120, ptr %a, align 1		%aa = load i120, ptr %a, align 1
%b = or i120 %aa, 384		%b = or i120 %aa, 384
store i120 %b, ptr %a, align 1		store i120 %b, ptr %a, align 1
ret void		ret void
}		}

Show All 12 Lines	; X64-NEXT: retq
%b = or i200 %aa, 384		%b = or i200 %aa, 384
store i200 %b, ptr %a, align 1		store i200 %b, ptr %a, align 1
ret void		ret void
}		}

define void @i248_or(ptr %a) {		define void @i248_or(ptr %a) {
; X86-LABEL: i248_or:		; X86-LABEL: i248_or:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax		; X86-NEXT: pushl %esi
; X86-NEXT: orl $384, (%eax) # imm = 0x180		; X86-NEXT: .cfi_def_cfa_offset 8
		; X86-NEXT: .cfi_offset %esi, -8
		; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
		; X86-NEXT: movl 24(%ecx), %edx
		; X86-NEXT: movl 27(%ecx), %eax
		; X86-NEXT: movl %eax, %esi
		; X86-NEXT: shrl $8, %esi
		; X86-NEXT: movl %edx, 24(%ecx)
		; X86-NEXT: movw %si, 28(%ecx)
		; X86-NEXT: shrl $24, %eax
		; X86-NEXT: movb %al, 30(%ecx)
		; X86-NEXT: orl $384, (%ecx) # imm = 0x180
		; X86-NEXT: popl %esi
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: i248_or:		; X64-LABEL: i248_or:
; X64: # %bb.0:		; X64: # %bb.0:
		; X64-NEXT: movq 16(%rdi), %rax
		; X64-NEXT: movq 23(%rdi), %rcx
		; X64-NEXT: movq %rcx, %rdx
		; X64-NEXT: shrq $8, %rdx
		; X64-NEXT: movq %rax, 16(%rdi)
		; X64-NEXT: movq %rcx, %rax
		; X64-NEXT: shrq $56, %rax
		; X64-NEXT: movb %al, 30(%rdi)
		; X64-NEXT: shrq $40, %rcx
		; X64-NEXT: movw %cx, 28(%rdi)
		; X64-NEXT: movl %edx, 24(%rdi)
; X64-NEXT: orq $384, (%rdi) # imm = 0x180		; X64-NEXT: orq $384, (%rdi) # imm = 0x180
; X64-NEXT: retq		; X64-NEXT: retq
%aa = load i248, ptr %a, align 1		%aa = load i248, ptr %a, align 1
%b = or i248 %aa, 384		%b = or i248 %aa, 384
store i248 %b, ptr %a, align 1		store i248 %b, ptr %a, align 1
ret void		ret void
}		}

define void @i304_or(ptr %a) {		define void @i304_or(ptr %a) {
; X86-LABEL: i304_or:		; X86-LABEL: i304_or:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax		; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: orl $384, (%eax) # imm = 0x180		; X86-NEXT: orl $384, (%eax) # imm = 0x180
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: i304_or:		; X64-LABEL: i304_or:
; X64: # %bb.0:		; X64: # %bb.0:
		; X64-NEXT: movq 30(%rdi), %rax
		; X64-NEXT: movq %rax, %rcx
		; X64-NEXT: shrq $16, %rcx
		; X64-NEXT: movq 8(%rdi), %r8
		; X64-NEXT: movq 16(%rdi), %rsi
		; X64-NEXT: movq 24(%rdi), %rdx
		; X64-NEXT: movq %rdx, 24(%rdi)
		; X64-NEXT: movq %rsi, 16(%rdi)
		; X64-NEXT: movq %r8, 8(%rdi)
		; X64-NEXT: shrq $48, %rax
		; X64-NEXT: movw %ax, 36(%rdi)
		; X64-NEXT: movl %ecx, 32(%rdi)
; X64-NEXT: orq $384, (%rdi) # imm = 0x180		; X64-NEXT: orq $384, (%rdi) # imm = 0x180
; X64-NEXT: retq		; X64-NEXT: retq
%aa = load i304, ptr %a, align 1		%aa = load i304, ptr %a, align 1
%b = or i304 %aa, 384		%b = or i304 %aa, 384
store i304 %b, ptr %a, align 1		store i304 %b, ptr %a, align 1
ret void		ret void
}		}

llvm/test/CodeGen/X86/shrink-compare-pgso.ll

	Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
	}			}

	@x = dso_local global { i8, i8, i8, i8, i8, i8, i8, i8 } { i8 1, i8 0, i8 0, i8 0, i8 1, i8 0, i8 0, i8 1 }, align 4			@x = dso_local global { i8, i8, i8, i8, i8, i8, i8, i8 } { i8 1, i8 0, i8 0, i8 0, i8 1, i8 0, i8 0, i8 1 }, align 4

	; PR16551			; PR16551
	define dso_local void @test5(i32 %X) nounwind !prof !14 {			define dso_local void @test5(i32 %X) nounwind !prof !14 {
	; CHECK-LABEL: test5:			; CHECK-LABEL: test5:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: movzbl x+6(%rip), %eax			; CHECK-NEXT: movl x+3(%rip), %eax
	; CHECK-NEXT: shll $16, %eax			; CHECK-NEXT: shrq $8, %rax
	; CHECK-NEXT: movzwl x+4(%rip), %ecx			; CHECK-NEXT: cmpl $1, %eax
	; CHECK-NEXT: orl %eax, %ecx
	; CHECK-NEXT: cmpl $1, %ecx
	; CHECK-NEXT: jne bar # TAILCALL			; CHECK-NEXT: jne bar # TAILCALL
	; CHECK-NEXT: # %bb.1: # %if.end			; CHECK-NEXT: # %bb.1: # %if.end
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%bf.load = load i56, ptr @x, align 4			%bf.load = load i56, ptr @x, align 4
	%bf.lshr = lshr i56 %bf.load, 32			%bf.lshr = lshr i56 %bf.load, 32
	%bf.cast = trunc i56 %bf.lshr to i32			%bf.cast = trunc i56 %bf.lshr to i32
	%cmp = icmp ne i32 %bf.cast, 1			%cmp = icmp ne i32 %bf.cast, 1
	▲ Show 20 Lines • Show All 208 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/shrink-compare.ll

	Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
	}			}

	@x = dso_local global { i8, i8, i8, i8, i8, i8, i8, i8 } { i8 1, i8 0, i8 0, i8 0, i8 1, i8 0, i8 0, i8 1 }, align 4			@x = dso_local global { i8, i8, i8, i8, i8, i8, i8, i8 } { i8 1, i8 0, i8 0, i8 0, i8 1, i8 0, i8 0, i8 1 }, align 4

	; PR16551			; PR16551
	define dso_local void @test5(i32 %X) nounwind minsize {			define dso_local void @test5(i32 %X) nounwind minsize {
	; CHECK-LABEL: test5:			; CHECK-LABEL: test5:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: movzbl x+6(%rip), %eax			; CHECK-NEXT: movl x+3(%rip), %eax
	; CHECK-NEXT: shll $16, %eax			; CHECK-NEXT: shrq $8, %rax
	; CHECK-NEXT: movzwl x+4(%rip), %ecx			; CHECK-NEXT: cmpl $1, %eax
	; CHECK-NEXT: orl %eax, %ecx
	; CHECK-NEXT: cmpl $1, %ecx
	; CHECK-NEXT: jne bar # TAILCALL			; CHECK-NEXT: jne bar # TAILCALL
	; CHECK-NEXT: # %bb.1: # %if.end			; CHECK-NEXT: # %bb.1: # %if.end
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%bf.load = load i56, ptr @x, align 4			%bf.load = load i56, ptr @x, align 4
	%bf.lshr = lshr i56 %bf.load, 32			%bf.lshr = lshr i56 %bf.load, 32
	%bf.cast = trunc i56 %bf.lshr to i32			%bf.cast = trunc i56 %bf.lshr to i32
	%cmp = icmp ne i32 %bf.cast, 1			%cmp = icmp ne i32 %bf.cast, 1
	▲ Show 20 Lines • Show All 191 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[WIP][LegalizeTypes][LegalizeDAG] Use misaligned load/store to optimize memory access with non-power2 integer types.AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 458946

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

llvm/lib/CodeGen/TargetLoweringBase.cpp

llvm/test/CodeGen/AArch64/arm64-non-pow2-ldst.ll

llvm/test/CodeGen/X86/funnel-shift.ll

llvm/test/CodeGen/X86/illegal-bitfield-loadstore.ll

llvm/test/CodeGen/X86/shrink-compare-pgso.ll

llvm/test/CodeGen/X86/shrink-compare.ll

[WIP][LegalizeTypes][LegalizeDAG] Use misaligned load/store to optimize memory access with non-power2 integer types.
AbandonedPublic