llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
1362	These variables are only used in the if block and should be declared there.
1363	This variable is only used once - the expression can be used directly instead.
1367	The Instruction pointer I is optional but is dereferenced unconditionally. The is likely to segfault. There is a dyn_cast but the result is not checked for nullptr so there may also be a segfault if I is something else. If a valid vp intrinsic pointer is expected for this target implementation then there should be an assert to check the dyn_cast result.
1367	Having the entire middle of the function occupied by a path that one probably never wants to execute is slightly painful. I would suggest handling the shouldDoNothing path first instead.
1402	If this is made an or condition with not pwr9 then the pwr10 case can return here so there aren't two return LT.first lines.
1405	Missing vectorCostAdjustment to account for P9.

This revision now requires changes to proceed.Oct 20 2021, 12:02 PM

Address all but one of Roland's comments.

Harbormaster completed remote builds in B129978: Diff 381314.Oct 21 2021, 10:38 AM

bmahjour added inline comments.Oct 21 2021, 10:43 AM

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
1367	Actually the `getVPLegalizationStrategy` doesn't seem like the right interface to use (because we may not have the VP intrinsics available at the time we do cost-analysis). What we really need is something that can tell us if the opcode/type combination is considered legal on a target (ie PPC). I think I asked this question before, and someone told me to use `hasActiveVectorLength()`. However, I think we need to change that interface to take in more pieces of info. I'm going to look into that next.

Implement and use hasActiveVectorLength() for PPC target.

Harbormaster completed remote builds in B130139: Diff 381544.Oct 22 2021, 7:45 AM

RolandF added inline comments.Oct 28 2021, 12:12 PM

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
1395	I don't follow this reasoning. Maybe 64-bit data is aligned half the time, but how is that true for char data for instance?
1431	The scalar loads have scalar type results. I assume if the code is being vectorized and the function of a vector load is being replaced, the results have to go in a vector. I think there is some insert cost here.

bmahjour added inline comments.Nov 25 2021, 1:00 PM

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
1395	Does this sound better? return P9PipelineFlushEstimate / ((Alignment/8) + 1); It would also be an equivalent of saying: return ((Alignment == 8) ? P9PipelineFlushEstimate / 2 : P9PipelineFlushEstimate); unless we cast the values to float in that formula. Any other suggestion?

bmahjour added inline comments.Nov 26 2021, 11:51 AM

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
1431	The call to `getScalarizationOverhead` is supposed to account for insert vs extract cost (notice we pass `IsLoad` vs `!isLoad` to indicate whether we need insert or extract cost respectively). There is an existing function, `getCommonMaskedMemoryOpCost()` which computes cost in a very similar manor and is called by `getMaskedMemoryOpCost()`. I'm thinking of replacing lines 1412-1437 with a call to that function instead. Any thoughts @RolandF ?

Made the following changes:

Made hasActiveVectorLength() return false for anything other than load/stores.
Only halve the cost of pipeline flush for 8-byte aligned accesses. For smaller alignments we consider the full cost.
Replaced the calculation of the scalarization cost with getMaskedMemoryOpCost().

Harbormaster completed remote builds in B136278: Diff 390113.Nov 26 2021, 1:31 PM

RolandF added inline comments.Nov 29 2021, 2:51 PM

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
1395	I think that it makes sense to use float. Something like Misaligned = (NumElements - 1) / NumElements Result = Misaligned * P9PipelineFlushEstimate + (1 - Misaligned) * Cost The present calculation I think implies that the aligned cost is zero.

Addressed comments.

Harbormaster completed remote builds in B136714: Diff 390734.Nov 30 2021, 8:58 AM

RolandF added inline comments.Nov 30 2021, 1:14 PM

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
1400	I like this approach better. But I think the AlignmentProb calculation is incorrect. This would calculate 0% aligned for align 1 and 7/8 aligned for align 8 instead of 50%. If you want to key off alignment rather than number of elements then I think you want to do size / alignment to figure out the number of possible offsets and use that to calculate probability.

bmahjour updated this revision to Diff 391027.Dec 1 2021, 7:50 AM

bmahjour added inline comments.

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
1400	The accesses are consecutive and the minimum alignment of the address is specified on the load/store instruction, which is passed to this function, so I don't think the element size or the size of the vector matters. I've changed the alignment probability calculation to simply divide the specified alignment by the desired alignment. This way we'll get 50% probability for 8-byte aligned accesses, 25% probability for 4-byte aligned accesses and so forth.

Harbormaster completed remote builds in B136923: Diff 391027.Dec 1 2021, 8:01 AM

LGTM

This revision is now accepted and ready to land.Dec 1 2021, 12:02 PM

This revision was landed with ongoing or failed builds.Dec 7 2021, 11:20 AM

Closed by commit rG8aee78336691: [VP] Cost model for VPMemory operations on PowerPC. (authored by bmahjour). · Explain Why

This revision was automatically updated to reflect the committed changes.

bmahjour added a commit: rG8aee78336691: [VP] Cost model for VPMemory operations on PowerPC..

Diff 381314

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 1,114 Lines • ▼ Show 20 Lines	public:

/// \return The cost of Load and Store instructions.		/// \return The cost of Load and Store instructions.
InstructionCost		InstructionCost
getMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,		getMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,
unsigned AddressSpace,		unsigned AddressSpace,
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,		TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,
const Instruction *I = nullptr) const;		const Instruction *I = nullptr) const;

		/// \return The cost of VP Load and Store instructions.
		InstructionCost
		getVPMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,
		unsigned AddressSpace,
		TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,
		const Instruction *I = nullptr) const;

/// \return The cost of masked Load and Store instructions.		/// \return The cost of masked Load and Store instructions.
InstructionCost getMaskedMemoryOpCost(		InstructionCost getMaskedMemoryOpCost(
unsigned Opcode, Type *Src, Align Alignment, unsigned AddressSpace,		unsigned Opcode, Type *Src, Align Alignment, unsigned AddressSpace,
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput) const;		TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput) const;

/// \return The cost of Gather or Scatter operation		/// \return The cost of Gather or Scatter operation
/// \p Opcode - is a type of memory access Load or Store		/// \p Opcode - is a type of memory access Load or Store
/// \p DataTy - a vector type of the data to be loaded or stored		/// \p DataTy - a vector type of the data to be loaded or stored
▲ Show 20 Lines • Show All 505 Lines • ▼ Show 20 Lines	virtual InstructionCost getCmpSelInstrCost(unsigned Opcode, Type *ValTy,
const Instruction *I) = 0;		const Instruction *I) = 0;
virtual InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,		virtual InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
unsigned Index) = 0;		unsigned Index) = 0;
virtual InstructionCost getMemoryOpCost(unsigned Opcode, Type *Src,		virtual InstructionCost getMemoryOpCost(unsigned Opcode, Type *Src,
Align Alignment,		Align Alignment,
unsigned AddressSpace,		unsigned AddressSpace,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I) = 0;		const Instruction *I) = 0;
		virtual InstructionCost getVPMemoryOpCost(unsigned Opcode, Type *Src,
		Align Alignment,
		unsigned AddressSpace,
		TTI::TargetCostKind CostKind,
		const Instruction *I) = 0;
virtual InstructionCost		virtual InstructionCost
getMaskedMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,		getMaskedMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,
unsigned AddressSpace,		unsigned AddressSpace,
TTI::TargetCostKind CostKind) = 0;		TTI::TargetCostKind CostKind) = 0;
virtual InstructionCost		virtual InstructionCost
getGatherScatterOpCost(unsigned Opcode, Type DataTy, const Value Ptr,		getGatherScatterOpCost(unsigned Opcode, Type DataTy, const Value Ptr,
bool VariableMask, Align Alignment,		bool VariableMask, Align Alignment,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
▲ Show 20 Lines • Show All 487 Lines • ▼ Show 20 Lines	public:
}		}
InstructionCost getMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,		InstructionCost getMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,
unsigned AddressSpace,		unsigned AddressSpace,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I) override {		const Instruction *I) override {
return Impl.getMemoryOpCost(Opcode, Src, Alignment, AddressSpace,		return Impl.getMemoryOpCost(Opcode, Src, Alignment, AddressSpace,
CostKind, I);		CostKind, I);
}		}
		InstructionCost getVPMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,
		unsigned AddressSpace,
		TTI::TargetCostKind CostKind,
		const Instruction *I) override {
		return Impl.getVPMemoryOpCost(Opcode, Src, Alignment, AddressSpace,
		CostKind, I);
		}
InstructionCost getMaskedMemoryOpCost(unsigned Opcode, Type *Src,		InstructionCost getMaskedMemoryOpCost(unsigned Opcode, Type *Src,
Align Alignment, unsigned AddressSpace,		Align Alignment, unsigned AddressSpace,
TTI::TargetCostKind CostKind) override {		TTI::TargetCostKind CostKind) override {
return Impl.getMaskedMemoryOpCost(Opcode, Src, Alignment, AddressSpace,		return Impl.getMaskedMemoryOpCost(Opcode, Src, Alignment, AddressSpace,
CostKind);		CostKind);
}		}
InstructionCost		InstructionCost
getGatherScatterOpCost(unsigned Opcode, Type DataTy, const Value Ptr,		getGatherScatterOpCost(unsigned Opcode, Type DataTy, const Value Ptr,
▲ Show 20 Lines • Show All 261 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 545 Lines • ▼ Show 20 Lines	public:

InstructionCost getMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,		InstructionCost getMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,
unsigned AddressSpace,		unsigned AddressSpace,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I) const {		const Instruction *I) const {
return 1;		return 1;
}		}

		InstructionCost getVPMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,
		unsigned AddressSpace,
		TTI::TargetCostKind CostKind,
		const Instruction *I) const {
		return 1;
		}

InstructionCost getMaskedMemoryOpCost(unsigned Opcode, Type *Src,		InstructionCost getMaskedMemoryOpCost(unsigned Opcode, Type *Src,
Align Alignment, unsigned AddressSpace,		Align Alignment, unsigned AddressSpace,
TTI::TargetCostKind CostKind) const {		TTI::TargetCostKind CostKind) const {
return 1;		return 1;
}		}

InstructionCost getGatherScatterOpCost(unsigned Opcode, Type *DataTy,		InstructionCost getGatherScatterOpCost(unsigned Opcode, Type *DataTy,
const Value *Ptr, bool VariableMask,		const Value *Ptr, bool VariableMask,
▲ Show 20 Lines • Show All 624 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h

Show First 20 Lines • Show All 131 Lines • ▼ Show 20 Lines	InstructionCost getInterleavedMemoryOpCost(
unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,		unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,
Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind,		Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind,
bool UseMaskForCond = false, bool UseMaskForGaps = false);		bool UseMaskForCond = false, bool UseMaskForGaps = false);
InstructionCost getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,		InstructionCost getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);
bool areFunctionArgsABICompatible(const Function *Caller,		bool areFunctionArgsABICompatible(const Function *Caller,
const Function *Callee,		const Function *Callee,
SmallPtrSetImpl<Argument *> &Args) const;		SmallPtrSetImpl<Argument *> &Args) const;
		InstructionCost getVPMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,
		unsigned AddressSpace,
		TTI::TargetCostKind CostKind,
		const Instruction *I = nullptr);

		bmahjourAuthorUnsubmitted Done Reply Inline Actions make this a static const member variable and include "P9" in the name. bmahjour: make this a static const member variable and include "P9" in the name.
		private:
		// The following constant is used for estimating costs on power9.
		static const InstructionCost::CostType P9PipelineFlushEstimate = 80;

/// @}		/// @}
};		};

} // end namespace llvm		} // end namespace llvm

#endif		#endif

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp

Show First 20 Lines • Show All 1,331 Lines • ▼ Show 20 Lines	case Intrinsic::ppc_vsx_stxvp: {
return true;		return true;
}		}
default:		default:
break;		break;
}		}

return false;		return false;
}		}

		InstructionCost PPCTTIImpl::getVPMemoryOpCost(unsigned Opcode, Type *Src,
		Align Alignment,
		unsigned AddressSpace,
		TTI::TargetCostKind CostKind,
		const Instruction *I) {
		InstructionCost Cost = BaseT::getVPMemoryOpCost(Opcode, Src, Alignment,
		AddressSpace, CostKind, I);
		if (TLI->getValueType(DL, Src, true) == MVT::Other)
		return Cost;
		// TODO: Handle other cost kinds.
		if (CostKind != TTI::TCK_RecipThroughput)
		return Cost;

		assert((Opcode == Instruction::Load \|\| Opcode == Instruction::Store) &&
		"Invalid Opcode");
		bool IsLoad = (Opcode == Instruction::Load);

		auto *SrcVTy = dyn_cast<FixedVectorType>(Src);
		assert(SrcVTy && "Expected a vector type for VP memory operations");

		if (getVPLegalizationStrategy(*dyn_cast<VPIntrinsic>(I)).shouldDoNothing()) {
		std::pair<InstructionCost, MVT> LT =
		RolandFUnsubmitted Done Reply Inline Actions These variables are only used in the if block and should be declared there. RolandF: These variables are only used in the if block and should be declared there.
		TLI->getTypeLegalizationCost(DL, SrcVTy);
		RolandFUnsubmitted Done Reply Inline Actions This variable is only used once - the expression can be used directly instead. RolandF: This variable is only used once - the expression can be used directly instead.
		InstructionCost Cost = vectorCostAdjustment(LT.first, Opcode, Src, nullptr);

		// On P9 but not on P10, if the op is misaligned then it will cause a
		// pipeline flush. Otherwise the VSX masked memops cost the same as unmasked
		RolandFUnsubmitted Done Reply Inline Actions The Instruction pointer I is optional but is dereferenced unconditionally. The is likely to segfault. There is a dyn_cast but the result is not checked for nullptr so there may also be a segfault if I is something else. If a valid vp intrinsic pointer is expected for this target implementation then there should be an assert to check the dyn_cast result. RolandF: The Instruction pointer I is optional but is dereferenced unconditionally. The is likely to…
		bmahjourAuthorUnsubmitted Done Reply Inline Actions Actually the `getVPLegalizationStrategy` doesn't seem like the right interface to use (because we may not have the VP intrinsics available at the time we do cost-analysis). What we really need is something that can tell us if the opcode/type combination is considered legal on a target (ie PPC). I think I asked this question before, and someone told me to use `hasActiveVectorLength()`. However, I think we need to change that interface to take in more pieces of info. I'm going to look into that next. bmahjour: Actually the `getVPLegalizationStrategy` doesn't seem like the right interface to use (because…
		RolandFUnsubmitted Done Reply Inline Actions Having the entire middle of the function occupied by a path that one probably never wants to execute is slightly painful. I would suggest handling the shouldDoNothing path first instead. RolandF: Having the entire middle of the function occupied by a path that one probably never wants to…
		// ones.
		if (Alignment >= 16 \|\| ST->getCPUDirective() != PPC::DIR_PWR9)
		return Cost;

		// We assume the average case: that ops with alignment <= 128
		// will flush a full pipeline about half the time.
		// The cost when this happens is about 80 cycles.
		return P9PipelineFlushEstimate / 2;
		}

		// Usually we should not get to this point, but the following is an attempt to
		// model the cost of legalization. Currently we can only lower intrinsics with
		// evl but no mask, on Power 9/10. Otherwise, we must scalarize. We need to
		// extract (from the mask) the most/least significant byte of all halfwords
		// aligned with vector elements, and do an access predicated on its 0th bit.
		// We make the simplifying assumption that byte-extraction costs are
		// stride-invariant, so we model the extraction as scalarizing a load of
		// <NumElems x i8>.

		// VSX masks have lanes per bit, but the predication is per halfword.
		unsigned NumElems = SrcVTy->getNumElements();
		auto *MaskI8Ty = Type::getInt8Ty(SrcVTy->getContext());
		InstructionCost MaskSplitCost = getScalarizationOverhead(
		FixedVectorType::get(MaskI8Ty, NumElems), false, true);
		const InstructionCost ScalarCompareCostInstrCost =
		getCmpSelInstrCost(Instruction::ICmp, MaskI8Ty, nullptr,
		CmpInst::BAD_ICMP_PREDICATE, CostKind);

		RolandFUnsubmitted Done Reply Inline Actions I don't follow this reasoning. Maybe 64-bit data is aligned half the time, but how is that true for char data for instance? RolandF: I don't follow this reasoning. Maybe 64-bit data is aligned half the time, but how is that…
		bmahjourAuthorUnsubmitted Done Reply Inline Actions Does this sound better? return P9PipelineFlushEstimate / ((Alignment/8) + 1); It would also be an equivalent of saying: return ((Alignment == 8) ? P9PipelineFlushEstimate / 2 : P9PipelineFlushEstimate); unless we cast the values to float in that formula. Any other suggestion? bmahjour: Does this sound better? ``` return P9PipelineFlushEstimate / ((Alignment/8) + 1); ``` It would…
		RolandFUnsubmitted Done Reply Inline Actions I think that it makes sense to use float. Something like Misaligned = (NumElements - 1) / NumElements Result = Misaligned * P9PipelineFlushEstimate + (1 - Misaligned) * Cost The present calculation I think implies that the aligned cost is zero. RolandF: I think that it makes sense to use float. Something like Misaligned = (NumElements - 1) /…
		assert(ScalarCompareCostInstrCost.isValid() &&
		"Expected valid instruction cost");
		int ScalarCompareCost = *(ScalarCompareCostInstrCost.getValue());

		const InstructionCost BranchInstrCost =
		RolandFUnsubmitted Not Done Reply Inline Actions I like this approach better. But I think the AlignmentProb calculation is incorrect. This would calculate 0% aligned for align 1 and 7/8 aligned for align 8 instead of 50%. If you want to key off alignment rather than number of elements then I think you want to do size / alignment to figure out the number of possible offsets and use that to calculate probability. RolandF: I like this approach better. But I think the AlignmentProb calculation is incorrect. This…
		bmahjourAuthorUnsubmitted Done Reply Inline Actions The accesses are consecutive and the minimum alignment of the address is specified on the load/store instruction, which is passed to this function, so I don't think the element size or the size of the vector matters. I've changed the alignment probability calculation to simply divide the specified alignment by the desired alignment. This way we'll get 50% probability for 8-byte aligned accesses, 25% probability for 4-byte aligned accesses and so forth. bmahjour: The accesses are consecutive and the minimum alignment of the address is specified on the…
		getCFInstrCost(Instruction::Br, CostKind);
		assert(BranchInstrCost.isValid() && "Expected valid instruction cost");
		RolandFUnsubmitted Done Reply Inline Actions If this is made an or condition with not pwr9 then the pwr10 case can return here so there aren't two return LT.first lines. RolandF: If this is made an or condition with not pwr9 then the pwr10 case can return here so there…
		int BranchCost = *BranchInstrCost.getValue();
		int MaskCmpCost = NumElems * (BranchCost + ScalarCompareCost);

		RolandFUnsubmitted Done Reply Inline Actions Missing vectorCostAdjustment to account for P9. RolandF: Missing vectorCostAdjustment to account for P9.
		InstructionCost ValueSplitCost =
		getScalarizationOverhead(SrcVTy, IsLoad, !IsLoad);
		const InstructionCost ScalarMemOpInstrCost =
		NumElems * BaseT::getMemoryOpCost(Opcode, SrcVTy->getScalarType(),
		Alignment, AddressSpace, CostKind);
		assert(ScalarMemOpInstrCost.isValid() && "Expected valid instruction cost");
		int ScalarMemOpCost = *(ScalarMemOpInstrCost.getValue());
		return ScalarMemOpCost + ValueSplitCost + MaskSplitCost + MaskCmpCost;
		}
		RolandFUnsubmitted Done Reply Inline Actions The scalar loads have scalar type results. I assume if the code is being vectorized and the function of a vector load is being replaced, the results have to go in a vector. I think there is some insert cost here. RolandF: The scalar loads have scalar type results. I assume if the code is being vectorized and the…
		bmahjourAuthorUnsubmitted Done Reply Inline Actions The call to `getScalarizationOverhead` is supposed to account for insert vs extract cost (notice we pass `IsLoad` vs `!isLoad` to indicate whether we need insert or extract cost respectively). There is an existing function, `getCommonMaskedMemoryOpCost()` which computes cost in a very similar manor and is called by `getMaskedMemoryOpCost()`. I'm thinking of replacing lines 1412-1437 with a call to that function instead. Any thoughts @RolandF ? bmahjour: The call to `getScalarizationOverhead` is supposed to account for insert vs extract cost…

This is an archive of the discontinued LLVM Phabricator instance.

Cost model for VPMemory operations on PowerPC.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 381314

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp

This is an archive of the discontinued LLVM Phabricator instance.

Cost model for VPMemory operations on PowerPC.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 381314

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp

Cost model for VPMemory operations on PowerPC.
ClosedPublic