This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
-
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
-
CodeGen/
1/1
BasicTTIImpl.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
Target/RISCV/
-
RISCV/
-
RISCVTargetTransformInfo.h
-
RISCVTargetTransformInfo.cpp
-
Transforms/Vectorize/
-
Vectorize/
-
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
AArch64/
1/2
i1-reg-usage.ll
-
X86/
-
i1-reg-usage.ll

Differential D125918

[LV] Improve register pressure estimate at high VFs
ClosedPublic

Authored by peterwaller-arm on May 18 2022, 12:04 PM.

Download Raw Diff

Details

Reviewers

fhahn
RKSimon
efriedma
nikic
dmgreen
sdesmalen
paulwalker-arm

Commits

rGade47bdc317b: [LV] Improve register pressure estimate at high VFs

Summary

Previously, getRegUsageForType was implemented using
getTypeLegalizationCost. getRegUsageForType is used by the loop
vectorizer to estimate the register pressure caused by using a vector
type. However, getTypeLegalizationCost currently only appears to
understand splitting and not scalarization, so significantly
underestimates the register requirements.

Instead, use getNumRegisters, which understands when scalarization
can occur (via computeRegisterProperties).

This was discovered while investigating D118979 (Set maximum VF with
shouldMaximizeVectorBandwidth), where under fixed-length 512-bit SVE the
loop vectorizer previously ends up costing an v128i1 as 2 v64i*
registers where it actually occupies 128 i32 registers.

I'm sending this patch early for comment, I'm still doing some sanity checking
with LNT. I note that getRegisterClassForType appears to return VectorRC even
though the type in question (large vNi1 types) end up occupying scalar
registers. That might be worth fixing too.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

peterwaller-arm created this revision.May 18 2022, 12:04 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2022, 12:04 PM

Herald added a subscriber: ctetreau. · View Herald Transcript

peterwaller-arm requested review of this revision.May 18 2022, 12:04 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2022, 12:04 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B165186: Diff 430471.May 18 2022, 12:54 PM

paulwalker-arm added inline comments.May 18 2022, 3:56 PM

llvm/include/llvm/CodeGen/BasicTTIImpl.h
384–385	I lack some historical knowledge here but I agree it does look like the current implementation is answering the wrong question here. Assuming others agree with the intent of the change I'm thinking the function definition should also be changed. Returning `InstructionCost` seems wrong and likely just the result of the original call to `getTypeLegalizationCost()`. I think `unsigned` is more representative of the function's intent.

Address Paul's suggestion: make getRegUsageForType return unsigned and simplify.

Herald added subscribers: • pcwang-thead, luke957, frasercrmck and 23 others. · View Herald TranscriptMay 19 2022, 2:24 AM

peterwaller-arm added a child revision: D125956: [NOT YET FOR REVIEW][AArch64][LV] Implement AArch64TTIImpl::getRegisterClassForType.May 19 2022, 2:37 AM

peterwaller-arm added inline comments.

llvm/test/Transforms/LoopVectorize/AArch64/i1-reg-usage.ll
36	I took a look at making this report the correct RC in D125956.

This change looks right to me.

I do think there is a related issue exposed by your test. The cost-model for the PHI node is not accurate, because it suggests the cost of the PHI itself is 0 but because the copy is being scalarized in CopyToReg/CopyFromReg the cost is actually really high. Perhaps we can update getCFInstrCost with some checks to see if the vector type requires Promotion + Splitting and if so, return a scalarization cost for the PHI node instead. We can also just fix the codegen itself of course, but I suspect that may be more work.

This revision is now accepted and ready to land.May 19 2022, 2:55 AM

Harbormaster completed remote builds in B165278: Diff 430610.May 19 2022, 3:07 AM

I think for the case of D118979 it makes sense to prevent maximizing the vector bandwidth for fixed-length sve. The larger vectors will already be wide enough and as far as I understand they don't benefit from the wider types in the same way that NEON does.

It sounds like this will be useful in either case though. The same thing could potentially happen with 128xi1 types. And it looks like it is used in the interleaving factor calculations.

llvm/test/Transforms/LoopVectorize/AArch64/i1-reg-usage.ll
15	I don't think this need the target-feature=+neon

paulwalker-arm accepted this revision.May 19 2022, 10:04 AM

jaykang10 added a subscriber: jaykang10.May 20 2022, 4:40 AM

Remove target-features test attribute per review and rebase.

I'm in the process of final-check-and-submit now, but may get interrupted.

This revision was landed with ongoing or failed builds.May 23 2022, 1:01 AM

Closed by commit rGade47bdc317b: [LV] Improve register pressure estimate at high VFs (authored by peterwaller-arm). · Explain Why

This revision was automatically updated to reflect the committed changes.

peterwaller-arm added a commit: rGade47bdc317b: [LV] Improve register pressure estimate at high VFs.

Harbormaster completed remote builds in B165786: Diff 431297.May 23 2022, 1:39 AM

jaykang10 mentioned this in D118979: [AArch64] Set maximum VF with shouldMaximizeVectorBandwidth.May 23 2022, 8:01 AM

jaykang10 mentioned this in rGbb82f746129f: Revert "Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth"".May 23 2022, 8:18 AM

it introduced a regression here:

https://github.com/llvm/llvm-project/issues/56374

In D125918#3628323, @sylvestre.ledru wrote:

it introduced a regression here:

https://github.com/llvm/llvm-project/issues/56374

Thanks for the report, should be fixed in rGc146af3f469adde04f7adb126e7d7b7b7047c88c, please ping again if not, I don't have gcc-6 to hand to test with.

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

6 lines

TargetTransformInfoImpl.h

2 lines

CodeGen/

BasicTTIImpl.h

7 lines

lib/

Analysis/

TargetTransformInfo.cpp

2 lines

Target/

RISCV/

RISCVTargetTransformInfo.h

2 lines

RISCVTargetTransformInfo.cpp

2 lines

Transforms/

Vectorize/

LoopVectorize.cpp

10 lines

test/

Transforms/

LoopVectorize/

AArch64/

i1-reg-usage.ll

57 lines

X86/

i1-reg-usage.ll

32 lines

Diff 431298

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 724 Lines • ▼ Show 20 Lines	public:
bool isProfitableToHoist(Instruction *I) const;		bool isProfitableToHoist(Instruction *I) const;

bool useAA() const;		bool useAA() const;

/// Return true if this type is legal.		/// Return true if this type is legal.
bool isTypeLegal(Type *Ty) const;		bool isTypeLegal(Type *Ty) const;

/// Returns the estimated number of registers required to represent \p Ty.		/// Returns the estimated number of registers required to represent \p Ty.
InstructionCost getRegUsageForType(Type *Ty) const;		unsigned getRegUsageForType(Type *Ty) const;

/// Return true if switches should be turned into lookup tables for the		/// Return true if switches should be turned into lookup tables for the
/// target.		/// target.
bool shouldBuildLookupTables() const;		bool shouldBuildLookupTables() const;

/// Return true if switches should be turned into lookup tables		/// Return true if switches should be turned into lookup tables
/// containing this constant value for the target.		/// containing this constant value for the target.
bool shouldBuildLookupTablesForConstant(Constant *C) const;		bool shouldBuildLookupTablesForConstant(Constant *C) const;
▲ Show 20 Lines • Show All 846 Lines • ▼ Show 20 Lines	virtual InstructionCost getScalingFactorCost(Type Ty, GlobalValue BaseGV,
int64_t BaseOffset,		int64_t BaseOffset,
bool HasBaseReg, int64_t Scale,		bool HasBaseReg, int64_t Scale,
unsigned AddrSpace) = 0;		unsigned AddrSpace) = 0;
virtual bool LSRWithInstrQueries() = 0;		virtual bool LSRWithInstrQueries() = 0;
virtual bool isTruncateFree(Type Ty1, Type Ty2) = 0;		virtual bool isTruncateFree(Type Ty1, Type Ty2) = 0;
virtual bool isProfitableToHoist(Instruction *I) = 0;		virtual bool isProfitableToHoist(Instruction *I) = 0;
virtual bool useAA() = 0;		virtual bool useAA() = 0;
virtual bool isTypeLegal(Type *Ty) = 0;		virtual bool isTypeLegal(Type *Ty) = 0;
virtual InstructionCost getRegUsageForType(Type *Ty) = 0;		virtual unsigned getRegUsageForType(Type *Ty) = 0;
virtual bool shouldBuildLookupTables() = 0;		virtual bool shouldBuildLookupTables() = 0;
virtual bool shouldBuildLookupTablesForConstant(Constant *C) = 0;		virtual bool shouldBuildLookupTablesForConstant(Constant *C) = 0;
virtual bool shouldBuildRelLookupTables() = 0;		virtual bool shouldBuildRelLookupTables() = 0;
virtual bool useColdCCForColdCall(Function &F) = 0;		virtual bool useColdCCForColdCall(Function &F) = 0;
virtual InstructionCost getScalarizationOverhead(VectorType *Ty,		virtual InstructionCost getScalarizationOverhead(VectorType *Ty,
const APInt &DemandedElts,		const APInt &DemandedElts,
bool Insert,		bool Insert,
bool Extract) = 0;		bool Extract) = 0;
▲ Show 20 Lines • Show All 422 Lines • ▼ Show 20 Lines	public:
bool isTruncateFree(Type Ty1, Type Ty2) override {		bool isTruncateFree(Type Ty1, Type Ty2) override {
return Impl.isTruncateFree(Ty1, Ty2);		return Impl.isTruncateFree(Ty1, Ty2);
}		}
bool isProfitableToHoist(Instruction *I) override {		bool isProfitableToHoist(Instruction *I) override {
return Impl.isProfitableToHoist(I);		return Impl.isProfitableToHoist(I);
}		}
bool useAA() override { return Impl.useAA(); }		bool useAA() override { return Impl.useAA(); }
bool isTypeLegal(Type *Ty) override { return Impl.isTypeLegal(Ty); }		bool isTypeLegal(Type *Ty) override { return Impl.isTypeLegal(Ty); }
InstructionCost getRegUsageForType(Type *Ty) override {		unsigned getRegUsageForType(Type *Ty) override {
return Impl.getRegUsageForType(Ty);		return Impl.getRegUsageForType(Ty);
}		}
bool shouldBuildLookupTables() override {		bool shouldBuildLookupTables() override {
return Impl.shouldBuildLookupTables();		return Impl.shouldBuildLookupTables();
}		}
bool shouldBuildLookupTablesForConstant(Constant *C) override {		bool shouldBuildLookupTablesForConstant(Constant *C) override {
return Impl.shouldBuildLookupTablesForConstant(C);		return Impl.shouldBuildLookupTablesForConstant(C);
}		}
▲ Show 20 Lines • Show All 499 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 306 Lines • ▼ Show 20 Lines	public:
bool isTruncateFree(Type Ty1, Type Ty2) const { return false; }		bool isTruncateFree(Type Ty1, Type Ty2) const { return false; }

bool isProfitableToHoist(Instruction *I) const { return true; }		bool isProfitableToHoist(Instruction *I) const { return true; }

bool useAA() const { return false; }		bool useAA() const { return false; }

bool isTypeLegal(Type *Ty) const { return false; }		bool isTypeLegal(Type *Ty) const { return false; }

InstructionCost getRegUsageForType(Type *Ty) const { return 1; }		unsigned getRegUsageForType(Type *Ty) const { return 1; }

bool shouldBuildLookupTables() const { return true; }		bool shouldBuildLookupTables() const { return true; }

bool shouldBuildLookupTablesForConstant(Constant *C) const { return true; }		bool shouldBuildLookupTablesForConstant(Constant *C) const { return true; }

bool shouldBuildRelLookupTables() const { return false; }		bool shouldBuildRelLookupTables() const { return false; }

bool useColdCCForColdCall(Function &F) const { return false; }		bool useColdCCForColdCall(Function &F) const { return false; }
▲ Show 20 Lines • Show All 939 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 375 Lines • ▼ Show 20 Lines	public:
}		}

bool useAA() const { return getST()->useAA(); }		bool useAA() const { return getST()->useAA(); }

bool isTypeLegal(Type *Ty) {		bool isTypeLegal(Type *Ty) {
EVT VT = getTLI()->getValueType(DL, Ty);		EVT VT = getTLI()->getValueType(DL, Ty);
return getTLI()->isTypeLegal(VT);		return getTLI()->isTypeLegal(VT);
}		}

InstructionCost getRegUsageForType(Type *Ty) {		unsigned getRegUsageForType(Type *Ty) {
		paulwalker-armUnsubmitted Done Reply Inline Actions I lack some historical knowledge here but I agree it does look like the current implementation is answering the wrong question here. Assuming others agree with the intent of the change I'm thinking the function definition should also be changed. Returning `InstructionCost` seems wrong and likely just the result of the original call to `getTypeLegalizationCost()`. I think `unsigned` is more representative of the function's intent. paulwalker-arm: I lack some historical knowledge here but I agree it does look like the current implementation…
InstructionCost Val = getTLI()->getTypeLegalizationCost(DL, Ty).first;		EVT ETy = getTLI()->getValueType(DL, Ty);
assert(Val >= 0 && "Negative cost!");		return getTLI()->getNumRegisters(Ty->getContext(), ETy);
return Val;
}		}

InstructionCost getGEPCost(Type PointeeType, const Value Ptr,		InstructionCost getGEPCost(Type PointeeType, const Value Ptr,
ArrayRef<const Value *> Operands,		ArrayRef<const Value *> Operands,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {
return BaseT::getGEPCost(PointeeType, Ptr, Operands, CostKind);		return BaseT::getGEPCost(PointeeType, Ptr, Operands, CostKind);
}		}

▲ Show 20 Lines • Show All 1,936 Lines • Show Last 20 Lines

llvm/lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 467 Lines • ▼ Show 20 Lines
	}			}

	bool TargetTransformInfo::useAA() const { return TTIImpl->useAA(); }			bool TargetTransformInfo::useAA() const { return TTIImpl->useAA(); }

	bool TargetTransformInfo::isTypeLegal(Type *Ty) const {			bool TargetTransformInfo::isTypeLegal(Type *Ty) const {
	return TTIImpl->isTypeLegal(Ty);			return TTIImpl->isTypeLegal(Ty);
	}			}

	InstructionCost TargetTransformInfo::getRegUsageForType(Type *Ty) const {			unsigned TargetTransformInfo::getRegUsageForType(Type *Ty) const {
	return TTIImpl->getRegUsageForType(Ty);			return TTIImpl->getRegUsageForType(Ty);
	}			}

	bool TargetTransformInfo::shouldBuildLookupTables() const {			bool TargetTransformInfo::shouldBuildLookupTables() const {
	return TTIImpl->shouldBuildLookupTables();			return TTIImpl->shouldBuildLookupTables();
	}			}

	bool TargetTransformInfo::shouldBuildLookupTablesForConstant(			bool TargetTransformInfo::shouldBuildLookupTablesForConstant(
	▲ Show 20 Lines • Show All 732 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h

Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	public:
TargetTransformInfo::PopcntSupportKind getPopcntSupport(unsigned TyWidth);		TargetTransformInfo::PopcntSupportKind getPopcntSupport(unsigned TyWidth);

bool shouldExpandReduction(const IntrinsicInst *II) const;		bool shouldExpandReduction(const IntrinsicInst *II) const;
bool supportsScalableVectors() const { return ST->hasVInstructions(); }		bool supportsScalableVectors() const { return ST->hasVInstructions(); }
Optional<unsigned> getMaxVScale() const;		Optional<unsigned> getMaxVScale() const;

TypeSize getRegisterBitWidth(TargetTransformInfo::RegisterKind K) const;		TypeSize getRegisterBitWidth(TargetTransformInfo::RegisterKind K) const;

InstructionCost getRegUsageForType(Type *Ty);		unsigned getRegUsageForType(Type *Ty);

InstructionCost getMaskedMemoryOpCost(unsigned Opcode, Type *Src,		InstructionCost getMaskedMemoryOpCost(unsigned Opcode, Type *Src,
Align Alignment, unsigned AddressSpace,		Align Alignment, unsigned AddressSpace,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

void getUnrollingPreferences(Loop *L, ScalarEvolution &SE,		void getUnrollingPreferences(Loop *L, ScalarEvolution &SE,
TTI::UnrollingPreferences &UP,		TTI::UnrollingPreferences &UP,
OptimizationRemarkEmitter *ORE);		OptimizationRemarkEmitter *ORE);
▲ Show 20 Lines • Show All 151 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

Show First 20 Lines • Show All 423 Lines • ▼ Show 20 Lines	if (Cost < 12)
UP.Force = true;		UP.Force = true;
}		}

void RISCVTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE,		void RISCVTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE,
TTI::PeelingPreferences &PP) {		TTI::PeelingPreferences &PP) {
BaseT::getPeelingPreferences(L, SE, PP);		BaseT::getPeelingPreferences(L, SE, PP);
}		}

InstructionCost RISCVTTIImpl::getRegUsageForType(Type *Ty) {		unsigned RISCVTTIImpl::getRegUsageForType(Type *Ty) {
TypeSize Size = Ty->getPrimitiveSizeInBits();		TypeSize Size = Ty->getPrimitiveSizeInBits();
if (Ty->isVectorTy()) {		if (Ty->isVectorTy()) {
if (Size.isScalable() && ST->hasVInstructions())		if (Size.isScalable() && ST->hasVInstructions())
return divideCeil(Size.getKnownMinValue(), RISCV::RVVBitsPerBlock);		return divideCeil(Size.getKnownMinValue(), RISCV::RVVBitsPerBlock);

if (ST->useRVVForFixedLengthVectors())		if (ST->useRVVForFixedLengthVectors())
return divideCeil(Size, ST->getMinRVVVectorSizeInBits());		return divideCeil(Size, ST->getMinRVVVectorSizeInBits());
}		}

return BaseT::getRegUsageForType(Ty);		return BaseT::getRegUsageForType(Ty);
}		}

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,981 Lines • ▼ Show 20 Lines	for (auto &Interval : EndPoint)
TransposeEnds[Interval.second].push_back(Interval.first);		TransposeEnds[Interval.second].push_back(Interval.first);

SmallPtrSet<Instruction *, 8> OpenIntervals;		SmallPtrSet<Instruction *, 8> OpenIntervals;
SmallVector<RegisterUsage, 8> RUs(VFs.size());		SmallVector<RegisterUsage, 8> RUs(VFs.size());
SmallVector<SmallMapVector<unsigned, unsigned, 4>, 8> MaxUsages(VFs.size());		SmallVector<SmallMapVector<unsigned, unsigned, 4>, 8> MaxUsages(VFs.size());

LLVM_DEBUG(dbgs() << "LV(REG): Calculating max register usage:\n");		LLVM_DEBUG(dbgs() << "LV(REG): Calculating max register usage:\n");

// A lambda that gets the register usage for the given type and VF.		auto GetRegUsage = [&TTI = TTI](Type *Ty, ElementCount VF) -> unsigned {
const auto &TTICapture = TTI;
auto GetRegUsage = [&TTICapture](Type *Ty, ElementCount VF) -> unsigned {
if (Ty->isTokenTy() \|\| !VectorType::isValidElementType(Ty))		if (Ty->isTokenTy() \|\| !VectorType::isValidElementType(Ty))
return 0;		return 0;
InstructionCost::CostType RegUsage =		return TTI.getRegUsageForType(VectorType::get(Ty, VF));
*TTICapture.getRegUsageForType(VectorType::get(Ty, VF)).getValue();
assert(RegUsage >= 0 && RegUsage <= std::numeric_limits<unsigned>::max() &&
"Nonsensical values for register usage.");
return RegUsage;
};		};

for (unsigned int i = 0, s = IdxToInstr.size(); i < s; ++i) {		for (unsigned int i = 0, s = IdxToInstr.size(); i < s; ++i) {
Instruction *I = IdxToInstr[i];		Instruction *I = IdxToInstr[i];

// Remove all of the instructions that end at this location.		// Remove all of the instructions that end at this location.
InstrList &List = TransposeEnds[i];		InstrList &List = TransposeEnds[i];
for (Instruction *ToRemove : List)		for (Instruction *ToRemove : List)
▲ Show 20 Lines • Show All 4,836 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/i1-reg-usage.ll

This file was added.

				; RUN: opt -loop-vectorize -debug-only=loop-vectorize -disable-output 2>&1 < %s \| FileCheck %s
				; REQUIRES: asserts

				target triple = "aarch64"

				; Test that shows how many registers the loop vectorizer thinks an illegal <VF x i1> will consume.

				; CHECK-LABEL: LV: Checking a loop in 'or_reduction_neon' from <stdin>
				; CHECK: LV(REG): VF = 32
				; CHECK-NEXT: LV(REG): Found max usage: 2 item
				; CHECK-NEXT: LV(REG): RegisterClass: Generic::VectorRC, 72 registers
				; CHECK-NEXT: LV(REG): RegisterClass: Generic::ScalarRC, 1 registers

				define i1 @or_reduction_neon(i32 %arg, ptr %ptr) {
				entry:
				dmgreenUnsubmitted Not Done Reply Inline Actions I don't think this need the target-feature=+neon dmgreen: I don't think this need the target-feature=+neon
				br label %loop
				exit:
				ret i1 %reduction_next
				loop:
				%induction = phi i32 [ 0, %entry ], [ %induction_next, %loop ]
				%reduction = phi i1 [ 0, %entry ], [ %reduction_next, %loop ]
				%gep = getelementptr inbounds i32, ptr %ptr, i32 %induction
				%loaded = load i32, ptr %gep
				%i1 = icmp eq i32 %loaded, %induction
				%reduction_next = or i1 %i1, %reduction
				%induction_next = add nuw i32 %induction, 1
				%cond = icmp eq i32 %induction_next, %arg
				br i1 %cond, label %exit, label %loop, !llvm.loop !32
				}

				; CHECK-LABEL: LV: Checking a loop in 'or_reduction_sve'
				; CHECK: LV(REG): VF = 64
				; CHECK-NEXT: LV(REG): Found max usage: 2 item
				; CHECK-NEXT: LV(REG): RegisterClass: Generic::VectorRC, 136 registers
				; CHECK-NEXT: LV(REG): RegisterClass: Generic::ScalarRC, 1 registers

				peterwaller-armAuthorUnsubmitted Done Reply Inline Actions I took a look at making this report the correct RC in D125956. peterwaller-arm: I took a look at making this report the correct RC in D125956.
				define i1 @or_reduction_sve(i32 %arg, ptr %ptr) vscale_range(2,2) "target-features"="+sve" {
				entry:
				br label %loop
				exit:
				ret i1 %reduction_next
				loop:
				%induction = phi i32 [ 0, %entry ], [ %induction_next, %loop ]
				%reduction = phi i1 [ true, %entry ], [ %reduction_next, %loop ]
				%gep = getelementptr inbounds i32, ptr %ptr, i32 %induction
				%loaded = load i32, ptr %gep
				%i1 = icmp eq i32 %loaded, %induction
				%reduction_next = or i1 %i1, %reduction
				%induction_next = add nuw i32 %induction, 1
				%cond = icmp eq i32 %induction_next, %arg
				br i1 %cond, label %exit, label %loop, !llvm.loop !64
				}

				!32 = distinct !{!32, !33}
				!33 = !{!"llvm.loop.vectorize.width", i32 32}
				!64 = distinct !{!64, !65}
				!65 = !{!"llvm.loop.vectorize.width", i32 64}

llvm/test/Transforms/LoopVectorize/X86/i1-reg-usage.ll

This file was added.

				; RUN: opt -loop-vectorize -debug-only=loop-vectorize -disable-output 2>&1 < %s \| FileCheck %s
				; REQUIRES: asserts

				target triple = "x86_64"

				; Test that shows how many registers the loop vectorizer thinks an illegal <VF x i1> will consume.

				; CHECK-LABEL: LV: Checking a loop in 'or_reduction_avx' from <stdin>
				; CHECK: LV(REG): VF = 64
				; CHECK-NEXT: LV(REG): Found max usage: 2 item
				; CHECK-NEXT: LV(REG): RegisterClass: Generic::VectorRC, 136 registers
				; CHECK-NEXT: LV(REG): RegisterClass: Generic::ScalarRC, 1 registers

				define i1 @or_reduction_avx(i32 %arg, ptr %ptr) "target-features"="+avx" {
				entry:
				br label %loop
				exit:
				ret i1 %reduction_next
				loop:
				%induction = phi i32 [ 0, %entry ], [ %induction_next, %loop ]
				%reduction = phi i1 [ 0, %entry ], [ %reduction_next, %loop ]
				%gep = getelementptr inbounds i32, ptr %ptr, i32 %induction
				%loaded = load i32, ptr %gep
				%i1 = icmp eq i32 %loaded, %induction
				%reduction_next = or i1 %i1, %reduction
				%induction_next = add nuw i32 %induction, 1
				%cond = icmp eq i32 %induction_next, %arg
				br i1 %cond, label %exit, label %loop, !llvm.loop !64
				}

				!64 = distinct !{!64, !65}
				!65 = !{!"llvm.loop.vectorize.width", i32 64}

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Improve register pressure estimate at high VFsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 431298

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/include/llvm/CodeGen/BasicTTIImpl.h

llvm/lib/Analysis/TargetTransformInfo.cpp

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/AArch64/i1-reg-usage.ll

llvm/test/Transforms/LoopVectorize/X86/i1-reg-usage.ll

[LV] Improve register pressure estimate at high VFs
ClosedPublic