Download Raw Diff

Details

Reviewers

t.p.northover
ab
dmgreen
paquette

Commits

rG81a11da76257: [CGP,AArch64] Replace zexts with shuffle that can be lowered using tbl.

Summary

This patch extends CodeGenPrepare to lower zext v16i8 -> v16i32 in loops
using a wide shuffle creating a v64i8 vector, selecting groups of 3
zero elements and an element from the input.

This is profitable on AArch64 where such shuffles can be lowered to tbl
instructions, but only in loops, because it requires materializing 4
masks, which can be done in the loop preheader.

This is the only reason the transform is part of CGP. If there's a
better alternative I missed, please let me know. The same goes for the
shouldReplaceZExtWithShuffle hook which guards this. I am not sure if
this transform will be beneficial on other targets, but it seems like
there is no way other convenient way.

This improves the generated code for loops like the one below in
combination with D96522.

int foo(uint8_t *p, int N) {
  unsigned long long sum = 0;
  for (int i = 0; i < N ; i++, p++) {

unsigned int v = *p;
sum += (v < 127) ? v : 256 - v;

  }
  return sum;
}

https://clang.godbolt.org/z/Wco866MjY

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.Feb 25 2022, 9:30 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald TranscriptFeb 25 2022, 9:30 AM

fhahn requested review of this revision.Feb 25 2022, 9:30 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 25 2022, 9:30 AM

Harbormaster completed remote builds in B151490: Diff 411435.Feb 25 2022, 9:30 AM

fhahn mentioned this in rGc679fbee2a76: [AArch64] Add tests for tbl + cmp splitting..Feb 25 2022, 9:59 AM

rebase on top of additional tests

Harbormaster completed remote builds in B151522: Diff 411477.Feb 25 2022, 11:30 AM

fhahn added a child revision: D120582: [AArch64] Match shuffle/cast as zext..Feb 25 2022, 11:32 AM

Fixed extraction mask so the extracted elements from the vi16i8 are at the front of each 4 x i8 block, so the vector before bitcasting looks like orig[0], 0, 0, 0, orig[1], 0, 0, 0,....

Modeled in Alive2: https://alive2.llvm.org/ce/z/hnrmsP (although a reduced version)

Harbormaster completed remote builds in B151610: Diff 411604.Feb 26 2022, 6:10 AM

SjoerdMeijer added a subscriber: SjoerdMeijer.Mar 3 2022, 1:18 AM

SjoerdMeijer added inline comments.

llvm/test/CodeGen/AArch64/vselect-ext.ll
193	I haven't looked too carefully at this, so this is more of a drive by comment, but I think I see more code/instructions so was wondering if we shouldn't be doing this for code-size, and if this is always a win (e.g. smaller/in-order aarch64 implementations).

Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 1:18 AM

Rebase & ping.

This is a major update over the original version. The interface has been changed to a optimizeExtendOrTruncateConversion helper, which is able to apply the optimization directly. This was needed because I will also soon share follow-up patches that lower truncates using tbl4. For that we need to able to generate code using NEON intrinsics, which is why it is convenient to also implement the actual transform in the backend specific parts.

It also address @SjoerdMeijer's comment to skip this transform when optimizing for size.

fhahn added a child revision: D133494: [AArch64] Lower extending uitofp using tbl..Sep 8 2022, 7:39 AM

fhahn added a child revision: D133495: [AArch64] Lower vector trunc using tbl..Sep 8 2022, 7:52 AM

Harbormaster completed remote builds in B185620: Diff 458736.Sep 8 2022, 8:37 AM

fhahn added a parent revision: D133491: [AArch64] Try to fold shuffle (tbl2, tbl2) to tbl4..Sep 14 2022, 8:16 AM

Looks reasonable, apart from a glitch in a comment.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12387	Destination type seems wrong.

This revision is now accepted and ready to land.Sep 15 2022, 1:21 AM

fhahn mentioned this in rG8f19de848b96: [AArch64] Add big-endian tests for zext-to-tbl.ll.Sep 15 2022, 6:03 AM

Thanks Tim! Rebased and also adjusted the index as discussed offline for big-endian targets.

fhahn marked an inline comment as done.Sep 15 2022, 6:40 AM

fhahn added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12387	Thanks, should be fixed; updated to i32.

Harbormaster completed remote builds in B186845: Diff 460387.Sep 15 2022, 7:18 AM

This revision was landed with ongoing or failed builds.Sep 15 2022, 11:18 AM

Closed by commit rG81a11da76257: [CGP,AArch64] Replace zexts with shuffle that can be lowered using tbl. (authored by fhahn). · Explain Why

This revision was automatically updated to reflect the committed changes.

fhahn marked an inline comment as done.

fhahn added a commit: rG81a11da76257: [CGP,AArch64] Replace zexts with shuffle that can be lowered using tbl..

nilanjana_basu mentioned this in D137221: [MicroBenchmarks] Add benchmarks to check runtime of truncate or zero-extend vector operations in AArch64.Nov 1 2022, 6:31 PM

nilanjana_basu mentioned this in rT3b44b6bdd3e8: [MicroBenchmarks] Add benchmarks to check runtime of truncate or zero-extend….Nov 2 2022, 2:05 PM

aaronpuchert added a subscriber: aaronpuchert.Nov 4 2022, 3:49 PM

aaronpuchert added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12352–12354	The static analyzer finds this suspicious: `SrcTy` is the result of a `dyn_cast`, but is unconditionally dereferenced. Perhaps it should be a `cast`?

nilanjana_basu mentioned this in D138059: [MicroBenchmarks,AArch64] Added correctness test & other performance tests for truncate or zero-extend vector operations.Nov 15 2022, 1:13 PM

nilanjana_basu mentioned this in rT08de51078b0a: [MicroBenchmarks,AArch64] Added correctness test & other performance tests for….Dec 1 2022, 10:09 PM

nilanjana_basu mentioned this in rG955c0f13cd70: [AArch64] Extending lowering of 'zext <Y x i8> %x to <Y x i8X>' to use tbl….Dec 9 2022, 12:51 AM

Regression reported at https://github.com/llvm/llvm-project/issues/62620 . I think the issue is that this transform isn't aware of widening instructions; if the <16 x i32> output is used in a way that can be optimized to use 16-bit inputs, the six ushll/ushll2 instructions are actually lowered to just two ushll/ushll2 instructions, so transforming that into for tbl isn't profitable.

Herald added a subscriber: StephenFan. · View Herald TranscriptMay 10 2023, 9:56 AM

There are quite a few places where this transform is not profitable due to it blocking fold in selection-dag. They would be more obvious, but we don't have many tests with loops in the backend. I have been looking lately about whether it makes sense to replace it with something that happens either during or after ISel. I think after ISel should work as a (larger) peephole optimization, which should then work with both SDAG and GISel and prevent us needing to try and handle it in both places. We just need to recognize the patterns of USHLL's and that way we only optimize if it turns out to really be useful.

In D120571#4332504, @efriedma wrote:

Regression reported at https://github.com/llvm/llvm-project/issues/62620 . I think the issue is that this transform isn't aware of widening instructions; if the <16 x i32> output is used in a way that can be optimized to use 16-bit inputs, the six ushll/ushll2 instructions are actually lowered to just two ushll/ushll2 instructions, so transforming that into for tbl isn't profitable.

In D120571#4335168, @dmgreen wrote:

There are quite a few places where this transform is not profitable due to it blocking fold in selection-dag. They would be more obvious, but we don't have many tests with loops in the backend. I have been looking lately about whether it makes sense to replace it with something that happens either during or after ISel. I think after ISel should work as a (larger) peephole optimization, which should then work with both SDAG and GISel and prevent us needing to try and handle it in both places. We just need to recognize the patterns of USHLL's and that way we only optimize if it turns out to really be useful.

Yeah I saw the report, thanks! The current logic only tries to introduce tbl if there are at least 2 casting steps required, but doesn't know about widening instructions, so misses cases where only one step will be needed. I think we should be able to catch (hopefully) most cases using the existing logic in TTI: D150482

Diff 411604

llvm/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 2,666 Lines • ▼ Show 20 Lines	public:
/// instruction during instruction selection. After calling the function		/// instruction during instruction selection. After calling the function
/// \p Ops contains the Uses to sink ordered by dominance (dominating users		/// \p Ops contains the Uses to sink ordered by dominance (dominating users
/// come first).		/// come first).
virtual bool shouldSinkOperands(Instruction *I,		virtual bool shouldSinkOperands(Instruction *I,
SmallVectorImpl<Use *> &Ops) const {		SmallVectorImpl<Use *> &Ops) const {
return false;		return false;
}		}

		virtual bool shouldReplaceZExtWithShuffle(ZExtInst *I) const { return false; }

/// Return true if the target supplies and combines to a paired load		/// Return true if the target supplies and combines to a paired load
/// two loaded values of type LoadedType next to each other in memory.		/// two loaded values of type LoadedType next to each other in memory.
/// RequiredAlignment gives the minimal alignment constraints that must be met		/// RequiredAlignment gives the minimal alignment constraints that must be met
/// to be able to select this paired load.		/// to be able to select this paired load.
///		///
/// This information is not used to generate actual paired loads, but it is		/// This information is not used to generate actual paired loads, but it is
/// used to generate a sequence of loads that is easier to combine into a		/// used to generate a sequence of loads that is easier to combine into a
/// paired load.		/// paired load.
▲ Show 20 Lines • Show All 2,149 Lines • Show Last 20 Lines

llvm/lib/CodeGen/CodeGenPrepare.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,228 Lines • ▼ Show 20 Lines	bool CodeGenPrepare::optimizeExt(Instruction *&Inst) {

// Continue promoting SExts if known as considerable depending on targets.		// Continue promoting SExts if known as considerable depending on targets.
if (ATPConsiderable &&		if (ATPConsiderable &&
performAddressTypePromotion(Inst, AllowPromotionWithoutCommonHeader,		performAddressTypePromotion(Inst, AllowPromotionWithoutCommonHeader,
HasPromoted, TPT, SpeculativelyMovedExts))		HasPromoted, TPT, SpeculativelyMovedExts))
return true;		return true;

TPT.rollback(LastKnownGood);		TPT.rollback(LastKnownGood);

return false;		return false;
}		}

// Perform address type promotion if doing so is profitable.		// Perform address type promotion if doing so is profitable.
// If AllowPromotionWithoutCommonHeader == false, we should find other sext		// If AllowPromotionWithoutCommonHeader == false, we should find other sext
// instructions that sign extended the same initial value. However, if		// instructions that sign extended the same initial value. However, if
// AllowPromotionWithoutCommonHeader == true, we expect promoting the		// AllowPromotionWithoutCommonHeader == true, we expect promoting the
// extension is just profitable.		// extension is just profitable.
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	for (auto *VisitedSExt : UnhandledExts) {
}		}
}		}
return Promoted;		return Promoted;
}		}

bool CodeGenPrepare::optimizeExtUses(Instruction *I) {		bool CodeGenPrepare::optimizeExtUses(Instruction *I) {
BasicBlock *DefBB = I->getParent();		BasicBlock *DefBB = I->getParent();

		ZExtInst *ZExt = dyn_cast<ZExtInst>(I);
		// Try to lower zext v16i8 as a shuffle, if it is profitable for the target,
		// like on AArch64 where such shuffles can be lowered directly using tbl
		// instructions.
		if (ZExt && TLI->shouldReplaceZExtWithShuffle(ZExt) &&
		LI->getLoopFor(ZExt->getParent())) {
		Value *Op = ZExt->getOperand(0);
		auto *SrcTy = dyn_cast<FixedVectorType>(Op->getType());
		auto *DstTy = dyn_cast<FixedVectorType>(ZExt->getType());
		if (SrcTy && SrcTy->getNumElements() == 16 &&
		SrcTy->getElementType()->isIntegerTy(8) &&
		DstTy->getElementType()->isIntegerTy(32)) {

		IRBuilder<> Builder(ZExt);
		SmallVector<int> Mask(64, 16);
		for (unsigned i = 0; i < 16; i++)
		Mask[i * 4] = i;

		auto *FirstEltZero = Builder.CreateInsertElement(
		PoisonValue::get(SrcTy), Builder.getInt8(0), uint64_t(0));
		Value *Result = Builder.CreateShuffleVector(Op, FirstEltZero, Mask);
		Result = Builder.CreateBitCast(Result, DstTy);
		ZExt->replaceAllUsesWith(Result);
		ZExt->eraseFromParent();
		return true;
		}
		}

// If the result of a {s\|z}ext and its source are both live out, rewrite all		// If the result of a {s\|z}ext and its source are both live out, rewrite all
// other uses of the source with result of extension.		// other uses of the source with result of extension.
Value *Src = I->getOperand(0);		Value *Src = I->getOperand(0);
if (Src->hasOneUse())		if (Src->hasOneUse())
return false;		return false;

// Only do this xform if truncating is free.		// Only do this xform if truncating is free.
if (!TLI->isTruncateFree(I->getType(), Src->getType()))		if (!TLI->isTruncateFree(I->getType(), Src->getType()))
▲ Show 20 Lines • Show All 1,979 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 581 Lines • ▼ Show 20 Lines	public:

bool isZExtFree(Type Ty1, Type Ty2) const override;		bool isZExtFree(Type Ty1, Type Ty2) const override;
bool isZExtFree(EVT VT1, EVT VT2) const override;		bool isZExtFree(EVT VT1, EVT VT2) const override;
bool isZExtFree(SDValue Val, EVT VT2) const override;		bool isZExtFree(SDValue Val, EVT VT2) const override;

bool shouldSinkOperands(Instruction *I,		bool shouldSinkOperands(Instruction *I,
SmallVectorImpl<Use *> &Ops) const override;		SmallVectorImpl<Use *> &Ops) const override;

		bool shouldReplaceZExtWithShuffle(ZExtInst *I) const override;

bool hasPairedLoad(EVT LoadedType, Align &RequiredAligment) const override;		bool hasPairedLoad(EVT LoadedType, Align &RequiredAligment) const override;

unsigned getMaxSupportedInterleaveFactor() const override { return 4; }		unsigned getMaxSupportedInterleaveFactor() const override { return 4; }

bool lowerInterleavedLoad(LoadInst *LI,		bool lowerInterleavedLoad(LoadInst *LI,
ArrayRef<ShuffleVectorInst *> Shuffles,		ArrayRef<ShuffleVectorInst *> Shuffles,
ArrayRef<unsigned> Indices,		ArrayRef<unsigned> Indices,
unsigned Factor) const override;		unsigned Factor) const override;
▲ Show 20 Lines • Show All 553 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 12,341 Lines • ▼ Show 20 Lines	case Instruction::Mul: {
return IsProfitable;		return IsProfitable;
}		}
default:		default:
return false;		return false;
}		}
return false;		return false;
}		}

		bool AArch64TargetLowering::shouldReplaceZExtWithShuffle(ZExtInst *I) const {
		auto *SrcTy = dyn_cast<FixedVectorType>(I->getOperand(0)->getType());
		auto *DstTy = dyn_cast<FixedVectorType>(I->getType());
		return SrcTy && SrcTy->getNumElements() == 16 &&
		SrcTy->getElementType()->isIntegerTy(8) &&
		aaronpuchertUnsubmitted Not Done Reply Inline Actions The static analyzer finds this suspicious: `SrcTy` is the result of a `dyn_cast`, but is unconditionally dereferenced. Perhaps it should be a `cast`? aaronpuchert: The static analyzer finds this [suspicious](https://llvm.org/reports/scan-build/report…
		DstTy->getElementType()->isIntegerTy(32);
		}

bool AArch64TargetLowering::hasPairedLoad(EVT LoadedType,		bool AArch64TargetLowering::hasPairedLoad(EVT LoadedType,
Align &RequiredAligment) const {		Align &RequiredAligment) const {
if (!LoadedType.isSimple() \|\|		if (!LoadedType.isSimple() \|\|
(!LoadedType.isInteger() && !LoadedType.isFloatingPoint()))		(!LoadedType.isInteger() && !LoadedType.isFloatingPoint()))
return false;		return false;
// Cyclone supports unaligned accesses.		// Cyclone supports unaligned accesses.
RequiredAligment = Align(1);		RequiredAligment = Align(1);
unsigned NumBits = LoadedType.getSizeInBits();		unsigned NumBits = LoadedType.getSizeInBits();
Show All 13 Lines	AArch64TargetLowering::getTargetMMOFlags(const Instruction &I) const {
if (Subtarget->getProcFamily() == AArch64Subtarget::Falkor &&		if (Subtarget->getProcFamily() == AArch64Subtarget::Falkor &&
I.getMetadata(FALKOR_STRIDED_ACCESS_MD) != nullptr)		I.getMetadata(FALKOR_STRIDED_ACCESS_MD) != nullptr)
return MOStridedAccess;		return MOStridedAccess;
return MachineMemOperand::MONone;		return MachineMemOperand::MONone;
}		}

bool AArch64TargetLowering::isLegalInterleavedAccessType(		bool AArch64TargetLowering::isLegalInterleavedAccessType(
VectorType *VecTy, const DataLayout &DL, bool &UseScalable) const {		VectorType *VecTy, const DataLayout &DL, bool &UseScalable) const {

		t.p.northoverUnsubmitted Done Reply Inline Actions Destination type seems wrong. t.p.northover: Destination type seems wrong.
		fhahnAuthorUnsubmitted Done Reply Inline Actions Thanks, should be fixed; updated to i32. fhahn: Thanks, should be fixed; updated to i32.
unsigned VecSize = DL.getTypeSizeInBits(VecTy);		unsigned VecSize = DL.getTypeSizeInBits(VecTy);
unsigned ElSize = DL.getTypeSizeInBits(VecTy->getElementType());		unsigned ElSize = DL.getTypeSizeInBits(VecTy->getElementType());
unsigned NumElements = cast<FixedVectorType>(VecTy)->getNumElements();		unsigned NumElements = cast<FixedVectorType>(VecTy)->getNumElements();

UseScalable = false;		UseScalable = false;

// Ensure the number of vector elements is greater than 1.		// Ensure the number of vector elements is greater than 1.
if (NumElements < 2)		if (NumElements < 2)
▲ Show 20 Lines • Show All 8,085 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/vselect-ext.ll

	Show First 20 Lines • Show All 184 Lines • ▼ Show 20 Lines
	entry:			entry:
	%ext = zext <16 x i8> %a to <16 x i32>			%ext = zext <16 x i8> %a to <16 x i32>
	%cmp = icmp sgt <16 x i8> %a, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>			%cmp = icmp sgt <16 x i8> %a, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>
	%sel = select <16 x i1> %cmp, <16 x i32> %ext, <16 x i32> zeroinitializer			%sel = select <16 x i1> %cmp, <16 x i32> %ext, <16 x i32> zeroinitializer
	ret <16 x i32> %sel			ret <16 x i32> %sel
	}			}

	define void @extension_in_loop_v16i8_to_v16i32(i8* %src, i32* %dst) {			define void @extension_in_loop_v16i8_to_v16i32(i8* %src, i32* %dst) {
	; CHECK-LABEL: extension_in_loop_v16i8_to_v16i32:			; CHECK-LABEL: extension_in_loop_v16i8_to_v16i32:
				SjoerdMeijerUnsubmitted Not Done Reply Inline Actions I haven't looked too carefully at this, so this is more of a drive by comment, but I think I see more code/instructions so was wondering if we shouldn't be doing this for code-size, and if this is always a win (e.g. smaller/in-order aarch64 implementations). SjoerdMeijer: I haven't looked too carefully at this, so this is more of a drive by comment, but I think I…
	; CHECK: ; %bb.0: ; %entry			; CHECK: ; %bb.0: ; %entry
	; CHECK-NEXT: movi.2d v0, #0x0000ff000000ff			; CHECK-NEXT: Lloh0:
				; CHECK-NEXT: adrp x9, lCPI7_0@PAGE
				; CHECK-NEXT: Lloh1:
				; CHECK-NEXT: adrp x10, lCPI7_1@PAGE
				; CHECK-NEXT: Lloh2:
				; CHECK-NEXT: adrp x11, lCPI7_2@PAGE
				; CHECK-NEXT: Lloh3:
				; CHECK-NEXT: adrp x12, lCPI7_3@PAGE
				; CHECK-NEXT: movi.2d v2, #0000000000000000
	; CHECK-NEXT: mov x8, xzr			; CHECK-NEXT: mov x8, xzr
				; CHECK-NEXT: movi.2d v4, #0xffffffffffffffff
				; CHECK-NEXT: Lloh4:
				; CHECK-NEXT: ldr q0, [x9, lCPI7_0@PAGEOFF]
				; CHECK-NEXT: Lloh5:
				; CHECK-NEXT: ldr q3, [x10, lCPI7_1@PAGEOFF]
				; CHECK-NEXT: Lloh6:
				; CHECK-NEXT: ldr q5, [x11, lCPI7_2@PAGEOFF]
				; CHECK-NEXT: Lloh7:
				; CHECK-NEXT: ldr q6, [x12, lCPI7_3@PAGEOFF]
	; CHECK-NEXT: LBB7_1: ; %loop			; CHECK-NEXT: LBB7_1: ; %loop
	; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1			; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: ldr q1, [x0, x8]			; CHECK-NEXT: ldr q1, [x0, x8]
	; CHECK-NEXT: add x8, x8, #16			; CHECK-NEXT: add x8, x8, #16
	; CHECK-NEXT: cmp x8, #128			; CHECK-NEXT: cmp x8, #128
	; CHECK-NEXT: ushll2.8h v2, v1, #0			; CHECK-NEXT: cmgt.16b v7, v1, v4
	; CHECK-NEXT: ushll.8h v1, v1, #0			; CHECK-NEXT: tbl.16b v16, { v1, v2 }, v0
	; CHECK-NEXT: ushll2.4s v3, v2, #0			; CHECK-NEXT: tbl.16b v17, { v1, v2 }, v3
	; CHECK-NEXT: ushll.4s v2, v2, #0			; CHECK-NEXT: sshll2.8h v20, v7, #0
	; CHECK-NEXT: cmhi.4s v5, v0, v3			; CHECK-NEXT: tbl.16b v18, { v1, v2 }, v5
	; CHECK-NEXT: cmhi.4s v6, v0, v2			; CHECK-NEXT: sshll2.4s v21, v20, #0
	; CHECK-NEXT: ushll2.4s v4, v1, #0			; CHECK-NEXT: sshll.4s v20, v20, #0
	; CHECK-NEXT: ushll.4s v1, v1, #0			; CHECK-NEXT: tbl.16b v19, { v1, v2 }, v6
	; CHECK-NEXT: and.16b v3, v3, v5			; CHECK-NEXT: sshll.8h v7, v7, #0
	; CHECK-NEXT: and.16b v2, v2, v6			; CHECK-NEXT: and.16b v16, v16, v21
	; CHECK-NEXT: cmhi.4s v7, v0, v4			; CHECK-NEXT: and.16b v17, v17, v20
	; CHECK-NEXT: stp q2, q3, [x1, #32]			; CHECK-NEXT: stp q17, q16, [x1, #32]
	; CHECK-NEXT: cmhi.4s v3, v0, v1			; CHECK-NEXT: sshll2.4s v16, v7, #0
	; CHECK-NEXT: and.16b v2, v4, v7			; CHECK-NEXT: sshll.4s v7, v7, #0
	; CHECK-NEXT: and.16b v1, v1, v3			; CHECK-NEXT: and.16b v16, v18, v16
	; CHECK-NEXT: stp q1, q2, [x1], #64			; CHECK-NEXT: and.16b v7, v19, v7
				; CHECK-NEXT: stp q7, q16, [x1], #64
	; CHECK-NEXT: b.ne LBB7_1			; CHECK-NEXT: b.ne LBB7_1
	; CHECK-NEXT: ; %bb.2: ; %exit			; CHECK-NEXT: ; %bb.2: ; %exit
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
				; CHECK-NEXT: .loh AdrpLdr Lloh3, Lloh7
				; CHECK-NEXT: .loh AdrpLdr Lloh2, Lloh6
				; CHECK-NEXT: .loh AdrpLdr Lloh1, Lloh5
				; CHECK-NEXT: .loh AdrpLdr Lloh0, Lloh4
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
	%src.gep = getelementptr i8, i8* %src, i64 %iv			%src.gep = getelementptr i8, i8* %src, i64 %iv
	%src.gep.cast = bitcast i8* %src.gep to <16 x i8>*			%src.gep.cast = bitcast i8* %src.gep to <16 x i8>*
	%load = load <16 x i8>, <16 x i8>* %src.gep.cast			%load = load <16 x i8>, <16 x i8>* %src.gep.cast
	Show All 9 Lines

	exit:			exit:
	ret void			ret void
	}			}

	define void @extension_in_loop_as_shuffle_v16i8_to_v16i32(i8* %src, i32* %dst) {			define void @extension_in_loop_as_shuffle_v16i8_to_v16i32(i8* %src, i32* %dst) {
	; CHECK-LABEL: extension_in_loop_as_shuffle_v16i8_to_v16i32:			; CHECK-LABEL: extension_in_loop_as_shuffle_v16i8_to_v16i32:
	; CHECK: ; %bb.0: ; %entry			; CHECK: ; %bb.0: ; %entry
	; CHECK-NEXT: Lloh0:			; CHECK-NEXT: Lloh8:
	; CHECK-NEXT: adrp x9, lCPI8_0@PAGE			; CHECK-NEXT: adrp x9, lCPI8_0@PAGE
	; CHECK-NEXT: Lloh1:			; CHECK-NEXT: Lloh9:
	; CHECK-NEXT: adrp x10, lCPI8_1@PAGE			; CHECK-NEXT: adrp x10, lCPI8_1@PAGE
	; CHECK-NEXT: Lloh2:			; CHECK-NEXT: Lloh10:
	; CHECK-NEXT: adrp x11, lCPI8_2@PAGE			; CHECK-NEXT: adrp x11, lCPI8_2@PAGE
	; CHECK-NEXT: Lloh3:			; CHECK-NEXT: Lloh11:
	; CHECK-NEXT: adrp x12, lCPI8_3@PAGE			; CHECK-NEXT: adrp x12, lCPI8_3@PAGE
	; CHECK-NEXT: movi.2d v1, #0xffffffffffffffff			; CHECK-NEXT: movi.2d v1, #0xffffffffffffffff
	; CHECK-NEXT: mov x8, xzr			; CHECK-NEXT: mov x8, xzr
	; CHECK-NEXT: movi.2d v3, #0000000000000000			; CHECK-NEXT: movi.2d v3, #0000000000000000
	; CHECK-NEXT: Lloh4:			; CHECK-NEXT: Lloh12:
	; CHECK-NEXT: ldr q0, [x9, lCPI8_0@PAGEOFF]			; CHECK-NEXT: ldr q0, [x9, lCPI8_0@PAGEOFF]
	; CHECK-NEXT: Lloh5:			; CHECK-NEXT: Lloh13:
	; CHECK-NEXT: ldr q2, [x10, lCPI8_1@PAGEOFF]			; CHECK-NEXT: ldr q2, [x10, lCPI8_1@PAGEOFF]
	; CHECK-NEXT: Lloh6:			; CHECK-NEXT: Lloh14:
	; CHECK-NEXT: ldr q5, [x11, lCPI8_2@PAGEOFF]			; CHECK-NEXT: ldr q5, [x11, lCPI8_2@PAGEOFF]
	; CHECK-NEXT: Lloh7:			; CHECK-NEXT: Lloh15:
	; CHECK-NEXT: ldr q6, [x12, lCPI8_3@PAGEOFF]			; CHECK-NEXT: ldr q6, [x12, lCPI8_3@PAGEOFF]
	; CHECK-NEXT: LBB8_1: ; %loop			; CHECK-NEXT: LBB8_1: ; %loop
	; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1			; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: ldr q4, [x0, x8]			; CHECK-NEXT: ldr q4, [x0, x8]
	; CHECK-NEXT: add x8, x8, #16			; CHECK-NEXT: add x8, x8, #16
	; CHECK-NEXT: cmp x8, #128			; CHECK-NEXT: cmp x8, #128
	; CHECK-NEXT: cmgt.16b v7, v4, v1			; CHECK-NEXT: cmgt.16b v7, v4, v1
	; CHECK-NEXT: tbl.16b v16, { v3, v4 }, v0			; CHECK-NEXT: tbl.16b v16, { v3, v4 }, v0
	Show All 10 Lines
	; CHECK-NEXT: sshll2.4s v16, v7, #0			; CHECK-NEXT: sshll2.4s v16, v7, #0
	; CHECK-NEXT: sshll.4s v7, v7, #0			; CHECK-NEXT: sshll.4s v7, v7, #0
	; CHECK-NEXT: and.16b v16, v18, v16			; CHECK-NEXT: and.16b v16, v18, v16
	; CHECK-NEXT: and.16b v7, v19, v7			; CHECK-NEXT: and.16b v7, v19, v7
	; CHECK-NEXT: stp q7, q16, [x1], #64			; CHECK-NEXT: stp q7, q16, [x1], #64
	; CHECK-NEXT: b.ne LBB8_1			; CHECK-NEXT: b.ne LBB8_1
	; CHECK-NEXT: ; %bb.2: ; %exit			; CHECK-NEXT: ; %bb.2: ; %exit
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	; CHECK-NEXT: .loh AdrpLdr Lloh3, Lloh7			; CHECK-NEXT: .loh AdrpLdr Lloh11, Lloh15
	; CHECK-NEXT: .loh AdrpLdr Lloh2, Lloh6			; CHECK-NEXT: .loh AdrpLdr Lloh10, Lloh14
	; CHECK-NEXT: .loh AdrpLdr Lloh1, Lloh5			; CHECK-NEXT: .loh AdrpLdr Lloh9, Lloh13
	; CHECK-NEXT: .loh AdrpLdr Lloh0, Lloh4			; CHECK-NEXT: .loh AdrpLdr Lloh8, Lloh12
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
	%src.gep = getelementptr i8, i8* %src, i64 %iv			%src.gep = getelementptr i8, i8* %src, i64 %iv
	%src.gep.cast = bitcast i8* %src.gep to <16 x i8>*			%src.gep.cast = bitcast i8* %src.gep to <16 x i8>*
	%load = load <16 x i8>, <16 x i8>* %src.gep.cast			%load = load <16 x i8>, <16 x i8>* %src.gep.cast
	Show All 10 Lines

	exit:			exit:
	ret void			ret void
	}			}

	define void @shuffle_in_loop_is_no_extend_v16i8_to_v16i32(i8* %src, i32* %dst) {			define void @shuffle_in_loop_is_no_extend_v16i8_to_v16i32(i8* %src, i32* %dst) {
	; CHECK-LABEL: shuffle_in_loop_is_no_extend_v16i8_to_v16i32:			; CHECK-LABEL: shuffle_in_loop_is_no_extend_v16i8_to_v16i32:
	; CHECK: ; %bb.0: ; %entry			; CHECK: ; %bb.0: ; %entry
	; CHECK-NEXT: Lloh8:			; CHECK-NEXT: Lloh16:
	; CHECK-NEXT: adrp x9, lCPI9_0@PAGE			; CHECK-NEXT: adrp x9, lCPI9_0@PAGE
	; CHECK-NEXT: Lloh9:			; CHECK-NEXT: Lloh17:
	; CHECK-NEXT: adrp x10, lCPI9_1@PAGE			; CHECK-NEXT: adrp x10, lCPI9_1@PAGE
	; CHECK-NEXT: Lloh10:			; CHECK-NEXT: Lloh18:
	; CHECK-NEXT: adrp x11, lCPI9_2@PAGE			; CHECK-NEXT: adrp x11, lCPI9_2@PAGE
	; CHECK-NEXT: Lloh11:			; CHECK-NEXT: Lloh19:
	; CHECK-NEXT: adrp x12, lCPI9_3@PAGE			; CHECK-NEXT: adrp x12, lCPI9_3@PAGE
	; CHECK-NEXT: movi.2d v2, #0000000000000000			; CHECK-NEXT: movi.2d v2, #0000000000000000
	; CHECK-NEXT: mov x8, xzr			; CHECK-NEXT: mov x8, xzr
	; CHECK-NEXT: movi.2d v5, #0xffffffffffffffff			; CHECK-NEXT: movi.2d v5, #0xffffffffffffffff
	; CHECK-NEXT: Lloh12:			; CHECK-NEXT: Lloh20:
	; CHECK-NEXT: ldr q0, [x9, lCPI9_0@PAGEOFF]			; CHECK-NEXT: ldr q0, [x9, lCPI9_0@PAGEOFF]
	; CHECK-NEXT: Lloh13:			; CHECK-NEXT: Lloh21:
	; CHECK-NEXT: ldr q4, [x10, lCPI9_1@PAGEOFF]			; CHECK-NEXT: ldr q4, [x10, lCPI9_1@PAGEOFF]
	; CHECK-NEXT: Lloh14:			; CHECK-NEXT: Lloh22:
	; CHECK-NEXT: ldr q6, [x11, lCPI9_2@PAGEOFF]			; CHECK-NEXT: ldr q6, [x11, lCPI9_2@PAGEOFF]
	; CHECK-NEXT: Lloh15:			; CHECK-NEXT: Lloh23:
	; CHECK-NEXT: ldr q7, [x12, lCPI9_3@PAGEOFF]			; CHECK-NEXT: ldr q7, [x12, lCPI9_3@PAGEOFF]
	; CHECK-NEXT: LBB9_1: ; %loop			; CHECK-NEXT: LBB9_1: ; %loop
	; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1			; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: ldr q1, [x0, x8]			; CHECK-NEXT: ldr q1, [x0, x8]
	; CHECK-NEXT: add x8, x8, #16			; CHECK-NEXT: add x8, x8, #16
	; CHECK-NEXT: cmp x8, #128			; CHECK-NEXT: cmp x8, #128
	; CHECK-NEXT: cmgt.16b v16, v1, v5			; CHECK-NEXT: cmgt.16b v16, v1, v5
	; CHECK-NEXT: mov.16b v3, v1			; CHECK-NEXT: mov.16b v3, v1
	Show All 11 Lines
	; CHECK-NEXT: and.16b v17, v19, v17			; CHECK-NEXT: and.16b v17, v19, v17
	; CHECK-NEXT: stp q17, q18, [x1, #32]			; CHECK-NEXT: stp q17, q18, [x1, #32]
	; CHECK-NEXT: and.16b v17, v20, v23			; CHECK-NEXT: and.16b v17, v20, v23
	; CHECK-NEXT: and.16b v16, v21, v16			; CHECK-NEXT: and.16b v16, v21, v16
	; CHECK-NEXT: stp q16, q17, [x1], #64			; CHECK-NEXT: stp q16, q17, [x1], #64
	; CHECK-NEXT: b.ne LBB9_1			; CHECK-NEXT: b.ne LBB9_1
	; CHECK-NEXT: ; %bb.2: ; %exit			; CHECK-NEXT: ; %bb.2: ; %exit
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	; CHECK-NEXT: .loh AdrpLdr Lloh11, Lloh15			; CHECK-NEXT: .loh AdrpLdr Lloh19, Lloh23
	; CHECK-NEXT: .loh AdrpLdr Lloh10, Lloh14			; CHECK-NEXT: .loh AdrpLdr Lloh18, Lloh22
	; CHECK-NEXT: .loh AdrpLdr Lloh9, Lloh13			; CHECK-NEXT: .loh AdrpLdr Lloh17, Lloh21
	; CHECK-NEXT: .loh AdrpLdr Lloh8, Lloh12			; CHECK-NEXT: .loh AdrpLdr Lloh16, Lloh20
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
	%src.gep = getelementptr i8, i8* %src, i64 %iv			%src.gep = getelementptr i8, i8* %src, i64 %iv
	%src.gep.cast = bitcast i8* %src.gep to <16 x i8>*			%src.gep.cast = bitcast i8* %src.gep to <16 x i8>*
	%load = load <16 x i8>, <16 x i8>* %src.gep.cast			%load = load <16 x i8>, <16 x i8>* %src.gep.cast
	Show All 14 Lines

llvm/test/CodeGen/AArch64/zext-to-tbl.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -o - %s \| FileCheck %s			; RUN: llc -o - %s \| FileCheck %s

	target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
	target triple = "arm64-apple-ios"			target triple = "arm64-apple-ios"

	; It's profitable to convert the zext to a shuffle, which in turn will be			; It's profitable to convert the zext to a shuffle, which in turn will be
	; lowered to 4 tbl instructions. The masks are materialized outside the loop.			; lowered to 4 tbl instructions. The masks are materialized outside the loop.
	define void @zext_v16i8_to_v16i32_in_loop(i8* %src, i32* %dst) {			define void @zext_v16i8_to_v16i32_in_loop(i8* %src, i32* %dst) {
	; CHECK-LABEL: zext_v16i8_to_v16i32_in_loop:			; CHECK-LABEL: zext_v16i8_to_v16i32_in_loop:
	; CHECK: ; %bb.0: ; %entry			; CHECK: ; %bb.0: ; %entry
				; CHECK-NEXT: Lloh0:
				; CHECK-NEXT: adrp x9, lCPI0_0@PAGE
				; CHECK-NEXT: Lloh1:
				; CHECK-NEXT: adrp x10, lCPI0_1@PAGE
				; CHECK-NEXT: Lloh2:
				; CHECK-NEXT: adrp x11, lCPI0_2@PAGE
				; CHECK-NEXT: Lloh3:
				; CHECK-NEXT: adrp x12, lCPI0_3@PAGE
				; CHECK-NEXT: movi.2d v3, #0000000000000000
	; CHECK-NEXT: mov x8, xzr			; CHECK-NEXT: mov x8, xzr
				; CHECK-NEXT: Lloh4:
				; CHECK-NEXT: ldr q0, [x9, lCPI0_0@PAGEOFF]
				; CHECK-NEXT: Lloh5:
				; CHECK-NEXT: ldr q1, [x10, lCPI0_1@PAGEOFF]
				; CHECK-NEXT: Lloh6:
				; CHECK-NEXT: ldr q4, [x11, lCPI0_2@PAGEOFF]
				; CHECK-NEXT: Lloh7:
				; CHECK-NEXT: ldr q5, [x12, lCPI0_3@PAGEOFF]
	; CHECK-NEXT: LBB0_1: ; %loop			; CHECK-NEXT: LBB0_1: ; %loop
	; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1			; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: ldr q0, [x0, x8]			; CHECK-NEXT: ldr q2, [x0, x8]
	; CHECK-NEXT: add x8, x8, #16			; CHECK-NEXT: add x8, x8, #16
	; CHECK-NEXT: cmp x8, #128			; CHECK-NEXT: cmp x8, #128
	; CHECK-NEXT: ushll2.8h v1, v0, #0			; CHECK-NEXT: tbl.16b v6, { v2, v3 }, v5
	; CHECK-NEXT: ushll.8h v0, v0, #0			; CHECK-NEXT: tbl.16b v7, { v2, v3 }, v4
	; CHECK-NEXT: ushll2.4s v2, v1, #0			; CHECK-NEXT: tbl.16b v16, { v2, v3 }, v1
	; CHECK-NEXT: ushll.4s v1, v1, #0			; CHECK-NEXT: tbl.16b v17, { v2, v3 }, v0
	; CHECK-NEXT: ushll2.4s v3, v0, #0			; CHECK-NEXT: stp q7, q6, [x1, #32]
	; CHECK-NEXT: ushll.4s v0, v0, #0			; CHECK-NEXT: stp q17, q16, [x1], #64
	; CHECK-NEXT: stp q1, q2, [x1, #32]
	; CHECK-NEXT: stp q0, q3, [x1], #64
	; CHECK-NEXT: b.ne LBB0_1			; CHECK-NEXT: b.ne LBB0_1
	; CHECK-NEXT: ; %bb.2: ; %exit			; CHECK-NEXT: ; %bb.2: ; %exit
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
				; CHECK-NEXT: .loh AdrpLdr Lloh3, Lloh7
				; CHECK-NEXT: .loh AdrpLdr Lloh2, Lloh6
				; CHECK-NEXT: .loh AdrpLdr Lloh1, Lloh5
				; CHECK-NEXT: .loh AdrpLdr Lloh0, Lloh4
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
	%src.gep = getelementptr i8, i8* %src, i64 %iv			%src.gep = getelementptr i8, i8* %src, i64 %iv
	%src.gep.cast = bitcast i8* %src.gep to <16 x i8>*			%src.gep.cast = bitcast i8* %src.gep to <16 x i8>*
	%load = load <16 x i8>, <16 x i8>* %src.gep.cast			%load = load <16 x i8>, <16 x i8>* %src.gep.cast
	▲ Show 20 Lines • Show All 156 Lines • Show Last 20 Lines

llvm/test/Transforms/CodeGenPrepare/AArch64/zext-to-shuffle.ll

	Show All 9 Lines
	; CHECK-LABEL: @zext_v16i8_to_v16i32_in_loop(			; CHECK-LABEL: @zext_v16i8_to_v16i32_in_loop(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
	; CHECK-NEXT: [[SRC_GEP:%.]] = getelementptr i8, i8 [[SRC:%.*]], i64 [[IV]]			; CHECK-NEXT: [[SRC_GEP:%.]] = getelementptr i8, i8 [[SRC:%.*]], i64 [[IV]]
	; CHECK-NEXT: [[SRC_GEP_CAST:%.]] = bitcast i8 [[SRC_GEP]] to <16 x i8>*			; CHECK-NEXT: [[SRC_GEP_CAST:%.]] = bitcast i8 [[SRC_GEP]] to <16 x i8>*
	; CHECK-NEXT: [[LOAD:%.]] = load <16 x i8>, <16 x i8> [[SRC_GEP_CAST]], align 16			; CHECK-NEXT: [[LOAD:%.]] = load <16 x i8>, <16 x i8> [[SRC_GEP_CAST]], align 16
	; CHECK-NEXT: [[EXT:%.*]] = zext <16 x i8> [[LOAD]] to <16 x i32>			; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <16 x i8> [[LOAD]], <16 x i8> <i8 0, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison>, <64 x i32> <i32 0, i32 16, i32 16, i32 16, i32 1, i32 16, i32 16, i32 16, i32 2, i32 16, i32 16, i32 16, i32 3, i32 16, i32 16, i32 16, i32 4, i32 16, i32 16, i32 16, i32 5, i32 16, i32 16, i32 16, i32 6, i32 16, i32 16, i32 16, i32 7, i32 16, i32 16, i32 16, i32 8, i32 16, i32 16, i32 16, i32 9, i32 16, i32 16, i32 16, i32 10, i32 16, i32 16, i32 16, i32 11, i32 16, i32 16, i32 16, i32 12, i32 16, i32 16, i32 16, i32 13, i32 16, i32 16, i32 16, i32 14, i32 16, i32 16, i32 16, i32 15, i32 16, i32 16, i32 16>
				; CHECK-NEXT: [[TMP1:%.*]] = bitcast <64 x i8> [[TMP0]] to <16 x i32>
	; CHECK-NEXT: [[DST_GEP:%.]] = getelementptr i32, i32 [[DST:%.*]], i64 [[IV]]			; CHECK-NEXT: [[DST_GEP:%.]] = getelementptr i32, i32 [[DST:%.*]], i64 [[IV]]
	; CHECK-NEXT: [[DST_GEP_CAST:%.]] = bitcast i32 [[DST_GEP]] to <16 x i32>*			; CHECK-NEXT: [[DST_GEP_CAST:%.]] = bitcast i32 [[DST_GEP]] to <16 x i32>*
	; CHECK-NEXT: store <16 x i32> [[EXT]], <16 x i32>* [[DST_GEP_CAST]], align 64			; CHECK-NEXT: store <16 x i32> [[TMP1]], <16 x i32>* [[DST_GEP_CAST]], align 64
	; CHECK-NEXT: [[IV_NEXT]] = add nuw i64 [[IV]], 16			; CHECK-NEXT: [[IV_NEXT]] = add nuw i64 [[IV]], 16
	; CHECK-NEXT: [[EC:%.*]] = icmp eq i64 [[IV_NEXT]], 128			; CHECK-NEXT: [[EC:%.*]] = icmp eq i64 [[IV_NEXT]], 128
	; CHECK-NEXT: br i1 [[EC]], label [[EXIT:%.*]], label [[LOOP]]			; CHECK-NEXT: br i1 [[EC]], label [[EXIT:%.*]], label [[LOOP]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %loop			br label %loop
	▲ Show 20 Lines • Show All 155 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[CGP,AArch64] Replace zexts with shuffle that can be lowered using tbl.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 411604

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/CodeGen/CodeGenPrepare.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/vselect-ext.ll

llvm/test/CodeGen/AArch64/zext-to-tbl.ll

llvm/test/Transforms/CodeGenPrepare/AArch64/zext-to-shuffle.ll

This is an archive of the discontinued LLVM Phabricator instance.

[CGP,AArch64] Replace zexts with shuffle that can be lowered using tbl.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 411604

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/CodeGen/CodeGenPrepare.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/vselect-ext.ll

llvm/test/CodeGen/AArch64/zext-to-tbl.ll

llvm/test/Transforms/CodeGenPrepare/AArch64/zext-to-shuffle.ll

[CGP,AArch64] Replace zexts with shuffle that can be lowered using tbl.
ClosedPublic