This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] VPTransformState::get() can always return lane 0 for uniforms.
Needs RevisionPublic

Authored by fhahn on Nov 15 2020, 9:31 AM.

Download Raw Diff

Details

Reviewers

Ayal
gilr
rengolin
reames

Summary

When requesting a scalar value for a uniform VPDef, we can always return
lane 0. This can avoid unnecessary inserting some unncessary instructions
to duplicate the uniform value across lanes.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.Nov 15 2020, 9:31 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 15 2020, 9:31 AM

Herald added subscribers: psnobl, rogfer01, bollu, hiraditya. · View Herald Transcript

fhahn requested review of this revision.Nov 15 2020, 9:31 AM

Herald added a subscriber: vkmr. · View Herald TranscriptNov 15 2020, 9:31 AM

Harbormaster completed remote builds in B78890: Diff 305370.Nov 15 2020, 9:36 AM

fhahn mentioned this in D91398: [LoopVectorizer] Lower uniform loads as a single load (instead of relying on CSE).Nov 15 2020, 9:54 AM

LGTM - though, please keep in mind I'm not fully familiar with this code. You may want to wait for another reviewer.

This revision is now accepted and ready to land.Nov 22 2020, 4:32 PM

Actually, LGTM revoked. This apparently depends on D91500 (which isn't marked in the metadata), and as commented there I don't have context.

This revision now requires changes to proceed.Nov 22 2020, 4:44 PM

Not blocking this review, but I think it's bug-prone to mix lane 0 of scalarized divergent values and truly uniform values that can be kept on a single scalar. Possible examples:

top-loop:
  if %iv % VF != 0:
    inner-loop:
      %iv = [ 0, inner.ph ], [ %iv.next, inner.latch ] ; Uniform, but lane0 doesn't make much sense since it masked out.
      ...
      divergent exit condition

bb:
  %sel = select i1 %divergent, 42, %divergent.def ; divergent in general
  use %sel
  br i1 %divergent, label %uni.use.bb, label %bb2

uni.use.bb:
  %uni.phi = phi [ %sel, %bb ] ; "Conditionally" uniform - all active lanes have the same uniform value
  ; Long compute chain based on %uni.phi that we'd like to keep on a single scalar

In the latter case the correct extract for the uniform value would be from the first *active* lane, not from the lane 0. And I believe it's very easy to make a mistake if the same data storage is used for both scalarized parts of divergent values and for really uniform values that should be kept on a single scalar def/register.

To summarize - I think it's possible to implement everything correctly by repurposing lane0 storage for keeping uniform values, but it might lead (in future, once we try to implement more complex/complicated optimizations) to unexpected confusions and omissions that might lead to silent miscompiles (e.g. extracting undef values from lane0 instead of extracting required uniform values from the first active lane).

Herald added a subscriber: tschuett. · View Herald TranscriptJan 28 2021, 1:18 PM

david-arm added a subscriber: david-arm.Jan 29 2021, 12:18 AM

fhahn mentioned this in D116654: [LV] Use VPReplicateRecipe::isUniform instead isUniformAfterVec (NFCI)..Jan 5 2022, 5:13 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

VPlan.h

20 lines

VPlan.cpp

25 lines

Diff 305370

llvm/lib/Transforms/Vectorize/VPlan.h

Show First 20 Lines • Show All 280 Lines • ▼ Show 20 Lines	Value get(VPValue Def, unsigned Part) {
// If Values have been set for this Def return the one relevant for \p Part.		// If Values have been set for this Def return the one relevant for \p Part.
if (Data.PerPartOutput.count(Def))		if (Data.PerPartOutput.count(Def))
return Data.PerPartOutput[Def][Part];		return Data.PerPartOutput[Def][Part];
// Def is managed by ILV: bring the Values from ValueMap.		// Def is managed by ILV: bring the Values from ValueMap.
return Callback.getOrCreateVectorValues(VPValue2Value[Def], Part);		return Callback.getOrCreateVectorValues(VPValue2Value[Def], Part);
}		}

/// Get the generated Value for a given VPValue and given Part and Lane.		/// Get the generated Value for a given VPValue and given Part and Lane.
Value get(VPValue Def, const VPIteration &Instance) {		Value get(VPValue Def, VPIteration Instance);
// If the Def is managed directly by VPTransformState, extract the lane from
// the relevant part. Note that currently only VPInstructions and external
// defs are managed by VPTransformState. Other Defs are still created by ILV
// and managed in its ValueMap. For those this method currently just
// delegates the call to ILV below.
if (Data.PerPartOutput.count(Def)) {
auto *VecPart = Data.PerPartOutput[Def][Instance.Part];
if (!VecPart->getType()->isVectorTy()) {
assert(Instance.Lane == 0 && "cannot get lane > 0 for scalar");
return VecPart;
}
// TODO: Cache created scalar values.
return Builder.CreateExtractElement(VecPart,
Builder.getInt32(Instance.Lane));
}

return Callback.getOrCreateScalarValue(VPValue2Value[Def], Instance);
}

/// Set the generated Value for a given VPValue and a given Part.		/// Set the generated Value for a given VPValue and a given Part.
void set(VPValue Def, Value V, unsigned Part) {		void set(VPValue Def, Value V, unsigned Part) {
if (!Data.PerPartOutput.count(Def)) {		if (!Data.PerPartOutput.count(Def)) {
DataState::PerPartValuesTy Entry(UF);		DataState::PerPartValuesTy Entry(UF);
Data.PerPartOutput[Def] = Entry;		Data.PerPartOutput[Def] = Entry;
}		}
Data.PerPartOutput[Def][Part] = V;		Data.PerPartOutput[Def][Part] = V;
▲ Show 20 Lines • Show All 1,805 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlan.cpp

Show First 20 Lines • Show All 231 Lines • ▼ Show 20 Lines	void VPBlockBase::deleteCFG(VPBlockBase *Entry) {

for (VPBlockBase *Block : Blocks)		for (VPBlockBase *Block : Blocks)
delete Block;		delete Block;
}		}

VPBasicBlock::iterator VPBasicBlock::getFirstNonPhi() {		VPBasicBlock::iterator VPBasicBlock::getFirstNonPhi() {
iterator It = begin();		iterator It = begin();
while (It != end() && (isa<VPWidenPHIRecipe>(&*It) \|\|		while (It != end() && (isa<VPWidenPHIRecipe>(&*It) \|\|
isa<VPWidenIntOrFpInductionRecipe>(&*It) \|\|		isa<VPWidenIntOrFpInductionRecipe>(&*It) \|\|
		Lint: Pre-merge checks Inline Actions clang-tidy: error: no member named 'isUniform' in 'llvm::VPReplicateRecipe' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: no member named 'isUniform' in 'llvm::VPReplicateRecipe' [clang-diagnostic…
isa<VPPredInstPHIRecipe>(&*It) \|\|		isa<VPPredInstPHIRecipe>(&*It) \|\|
isa<VPWidenCanonicalIVRecipe>(&*It)))		isa<VPWidenCanonicalIVRecipe>(&*It)))
It++;		It++;
return It;		return It;
}		}

		Value VPTransformState::get(VPValue Def, VPIteration Instance) {
		// For uniform definitions, all lanes produce the same value, so we can always
		// return the first lane.
		if (auto *ReplicateR = dyn_cast<VPReplicateRecipe>(Def))
		if (ReplicateR->isUniform())
		Instance.Lane = 0;
		// If the Def is managed directly by VPTransformState, extract the lane from
		// the relevant part. Note that currently only VPInstructions and external
		// defs are managed by VPTransformState. Other Defs are still created by ILV
		// and managed in its ValueMap. For those this method currently just
		// delegates the call to ILV below.
		if (Data.PerPartOutput.count(Def)) {
		auto *VecPart = Data.PerPartOutput[Def][Instance.Part];
		if (!VecPart->getType()->isVectorTy()) {
		assert(Instance.Lane == 0 && "cannot get lane > 0 for scalar");
		return VecPart;
		}
		// TODO: Cache created scalar values.
		return Builder.CreateExtractElement(VecPart,
		Builder.getInt32(Instance.Lane));
		}

		return Callback.getOrCreateScalarValue(VPValue2Value[Def], Instance);
		}

BasicBlock *		BasicBlock *
VPBasicBlock::createEmptyBasicBlock(VPTransformState::CFGState &CFG) {		VPBasicBlock::createEmptyBasicBlock(VPTransformState::CFGState &CFG) {
// BB stands for IR BasicBlocks. VPBB stands for VPlan VPBasicBlocks.		// BB stands for IR BasicBlocks. VPBB stands for VPlan VPBasicBlocks.
// Pred stands for Predessor. Prev stands for Previous - last visited/created.		// Pred stands for Predessor. Prev stands for Previous - last visited/created.
BasicBlock *PrevBB = CFG.PrevBB;		BasicBlock *PrevBB = CFG.PrevBB;
BasicBlock *NewBB = BasicBlock::Create(PrevBB->getContext(), getName(),		BasicBlock *NewBB = BasicBlock::Create(PrevBB->getContext(), getName(),
PrevBB->getParent(), CFG.LastBB);		PrevBB->getParent(), CFG.LastBB);
LLVM_DEBUG(dbgs() << "LV: created " << NewBB->getName() << '\n');		LLVM_DEBUG(dbgs() << "LV: created " << NewBB->getName() << '\n');
▲ Show 20 Lines • Show All 900 Lines • Show Last 20 Lines