This is an archive of the discontinued LLVM Phabricator instance.

[System Model] [TTI] Add TTI interfaces for write-combining buffers
Needs ReviewPublic

Authored by greened on Oct 10 2019, 8:22 AM.

Download Raw Diff

Details

Reviewers

kparzysz
MatzeB
jdoerfert
simoll
Meinersbur
hfinkel
andreadb
rengolin
steleman
ajasty-cavium
joelkevinjones
anemet
grosser
arsenm

Summary

Add interfaces for subtargets to return the number of write-combining buffers available. Also provide TTI interfaces that delegate to the subtarget interface.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

greened created this revision.Oct 10 2019, 8:22 AM

Herald added subscribers: llvm-commits, hiraditya, wdng. · View Herald TranscriptOct 10 2019, 8:22 AM

How do you imagine that we'd use this? Do we need some kind of size to go along with this?

In D68793#1704266, @hfinkel wrote:

How do you imagine that we'd use this? Do we need some kind of size to go along with this?

See the Intel optimization guide, section 3.6.9.

https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf

Basically, this information can be used to inform loop transformations as well as use of non-temporal instructions. A write-combining buffer is not the same as a store buffer. A write-combining buffer is always one cache line in size, so I don't think we need size information.

In D68793#1704496, @greened wrote:

In D68793#1704266, @hfinkel wrote:

How do you imagine that we'd use this? Do we need some kind of size to go along with this?

See the Intel optimization guide, section 3.6.9.

https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf

Basically, this information can be used to inform loop transformations as well as use of non-temporal instructions. A write-combining buffer is not the same as a store buffer. A write-combining buffer is always one cache line in size, so I don't think we need size information.

Alright, thanks. First, we should document this in the interface. Instead of just saying:

\return the number of write-combining buffers.

we might say something like:

\return the number of write-combining buffers. A write-combining buffer is a per-core resource used for collecting writes to a particular cache line before further processing those writes using other parts of the memory subsystem.

we already have getCacheLineSize(), so we know how big that is, but we don't currently have a way to account for how many hardware threads per core, right? Don't we need that to estimate how many write-combining buffers we get for the current hardware thread? (Presumably, we'd want the same thing to use the total-cache-size functions too, because we need to generate code assuming a working-set size per thread?)

The Intel optimization guide talks about using this number to drive loop distribution, where we don't update more arrays (cache lines) at a time than can fit into the thread's WC buffers. Is this what you had in mind?

we might say something like:

\return the number of write-combining buffers. A write-combining buffer is a per-core resource used for collecting writes to a particular cache line before further processing those writes using other parts of the memory subsystem.

Will do.

we already have getCacheLineSize(), so we know how big that is, but we don't currently have a way to account for how many hardware threads per core, right? Don't we need that to estimate how many write-combining buffers we get for the current hardware thread? (Presumably, we'd want the same thing to use the total-cache-size functions too, because we need to generate code assuming a working-set size per thread?)

This is something that will become available once more bits of the system model are implemented. The model can specify things like number of cores and threads per core. The subtarget will be able to examine its execution resource configuration and return an appropriate number. After this patch makes it through I will be in a place where I can start posting the TableGen changes to generate models and then post the TableGen model classes after that. At that point targets can define their own models and away we go.

The Intel optimization guide talks about using this number to drive loop distribution, where we don't update more arrays (cache lines) at a time than can fit into the thread's WC buffers. Is this what you had in mind?

Yes. It's useful for anything that cares about performance of writes to memory.

Added comment explaining what a write-combining buffer is and one possibility of how to use this information.

Ok, I think I've addressed all the comments so this is ready for another review. Thanks!

Ping. This is ready for another review.

Seems fine to me, @hfinkel?

Updated to latest master.

arsenm resigned from this revision.Feb 13 2020, 4:53 PM

@jdoerfert @hfinkel should I consider Johannes' comment a LGTM? This has been waiting for two months now.

Matt added a subscriber: Matt.Apr 20 2021, 8:22 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

16 lines

TargetTransformInfoImpl.h

5 lines

CodeGen/

BasicTTIImpl.h

5 lines

MC/

MCSubtargetInfo.h

3 lines

lib/

Analysis/

TargetTransformInfo.cpp

4 lines

MC/

MCSubtargetInfo.cpp

4 lines

Diff 234081

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 872 Lines • ▼ Show 20 Lines	public:
/// i.e. prefetch with any stride.		/// i.e. prefetch with any stride.
unsigned getMinPrefetchStride() const;		unsigned getMinPrefetchStride() const;

/// \return The maximum number of iterations to prefetch ahead. If		/// \return The maximum number of iterations to prefetch ahead. If
/// the required number of iterations is more than this number, no		/// the required number of iterations is more than this number, no
/// prefetching is performed.		/// prefetching is performed.
unsigned getMaxPrefetchIterationsAhead() const;		unsigned getMaxPrefetchIterationsAhead() const;

		/// \return the number of write-combining buffers. A write-combining buffer is
		/// a per-core resource used for collecting writes to a particular cache line
		/// before further processing those writes using other parts of the memory
		/// subsystem. Knowledge of the number of write-combining buffers available
		/// can help the optimizer manage those resources, for example fissioning
		/// loops to avoid oversubscribing write-combining buffers.
		unsigned getNumWriteCombiningBuffers() const;

/// \return The maximum interleave factor that any transform should try to		/// \return The maximum interleave factor that any transform should try to
/// perform for this target. This number depends on the level of parallelism		/// perform for this target. This number depends on the level of parallelism
/// and the number of execution units in the CPU.		/// and the number of execution units in the CPU.
unsigned getMaxInterleaveFactor(unsigned VF) const;		unsigned getMaxInterleaveFactor(unsigned VF) const;

/// Collect properties of V used in cost analysis, e.g. OP_PowerOf2.		/// Collect properties of V used in cost analysis, e.g. OP_PowerOf2.
static OperandValueKind getOperandInfo(Value *V,		static OperandValueKind getOperandInfo(Value *V,
OperandValueProperties &OpProps);		OperandValueProperties &OpProps);
▲ Show 20 Lines • Show All 417 Lines • ▼ Show 20 Lines	public:
/// i.e. prefetch with any stride.		/// i.e. prefetch with any stride.
virtual unsigned getMinPrefetchStride() const = 0;		virtual unsigned getMinPrefetchStride() const = 0;

/// \return The maximum number of iterations to prefetch ahead. If		/// \return The maximum number of iterations to prefetch ahead. If
/// the required number of iterations is more than this number, no		/// the required number of iterations is more than this number, no
/// prefetching is performed.		/// prefetching is performed.
virtual unsigned getMaxPrefetchIterationsAhead() const = 0;		virtual unsigned getMaxPrefetchIterationsAhead() const = 0;

		/// \return the number of write-combining buffers.
		virtual unsigned getNumWriteCombiningBuffers() const = 0;

virtual unsigned getMaxInterleaveFactor(unsigned VF) = 0;		virtual unsigned getMaxInterleaveFactor(unsigned VF) = 0;
virtual unsigned getArithmeticInstrCost(		virtual unsigned getArithmeticInstrCost(
unsigned Opcode, Type *Ty, OperandValueKind Opd1Info,		unsigned Opcode, Type *Ty, OperandValueKind Opd1Info,
OperandValueKind Opd2Info, OperandValueProperties Opd1PropInfo,		OperandValueKind Opd2Info, OperandValueProperties Opd1PropInfo,
OperandValueProperties Opd2PropInfo, ArrayRef<const Value *> Args,		OperandValueProperties Opd2PropInfo, ArrayRef<const Value *> Args,
const Instruction *CxtI = nullptr) = 0;		const Instruction *CxtI = nullptr) = 0;
virtual int getShuffleCost(ShuffleKind Kind, Type *Tp, int Index,		virtual int getShuffleCost(ShuffleKind Kind, Type *Tp, int Index,
Type *SubTp) = 0;		Type *SubTp) = 0;
▲ Show 20 Lines • Show All 375 Lines • ▼ Show 20 Lines	public:

/// Return the maximum prefetch distance in terms of loop		/// Return the maximum prefetch distance in terms of loop
/// iterations.		/// iterations.
///		///
unsigned getMaxPrefetchIterationsAhead() const override {		unsigned getMaxPrefetchIterationsAhead() const override {
return Impl.getMaxPrefetchIterationsAhead();		return Impl.getMaxPrefetchIterationsAhead();
}		}

		/// Return the number of write-combining buffers.
		unsigned getNumWriteCombiningBuffers() const override {
		return Impl.getNumWriteCombiningBuffers();
		}

unsigned getMaxInterleaveFactor(unsigned VF) override {		unsigned getMaxInterleaveFactor(unsigned VF) override {
return Impl.getMaxInterleaveFactor(VF);		return Impl.getMaxInterleaveFactor(VF);
}		}
unsigned getEstimatedNumberOfCaseClusters(const SwitchInst &SI,		unsigned getEstimatedNumberOfCaseClusters(const SwitchInst &SI,
unsigned &JTSize,		unsigned &JTSize,
ProfileSummaryInfo *PSI,		ProfileSummaryInfo *PSI,
BlockFrequencyInfo *BFI) override {		BlockFrequencyInfo *BFI) override {
return Impl.getEstimatedNumberOfCaseClusters(SI, JTSize, PSI, BFI);		return Impl.getEstimatedNumberOfCaseClusters(SI, JTSize, PSI, BFI);
▲ Show 20 Lines • Show All 273 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 417 Lines • ▼ Show 20 Lines	llvm::Optional<unsigned> getCacheAssociativity(

llvm_unreachable("Unknown TargetTransformInfo::CacheLevel");		llvm_unreachable("Unknown TargetTransformInfo::CacheLevel");
}		}

unsigned getPrefetchDistance() const { return 0; }		unsigned getPrefetchDistance() const { return 0; }
unsigned getMinPrefetchStride() const { return 1; }		unsigned getMinPrefetchStride() const { return 1; }
unsigned getMaxPrefetchIterationsAhead() const { return UINT_MAX; }		unsigned getMaxPrefetchIterationsAhead() const { return UINT_MAX; }

		/// Return the default number of write-combining buffers.
		unsigned getNumWriteCombiningBuffers() const {
		return 0;
		}

unsigned getMaxInterleaveFactor(unsigned VF) { return 1; }		unsigned getMaxInterleaveFactor(unsigned VF) { return 1; }

unsigned getArithmeticInstrCost(unsigned Opcode, Type *Ty,		unsigned getArithmeticInstrCost(unsigned Opcode, Type *Ty,
TTI::OperandValueKind Opd1Info,		TTI::OperandValueKind Opd1Info,
TTI::OperandValueKind Opd2Info,		TTI::OperandValueKind Opd2Info,
TTI::OperandValueProperties Opd1PropInfo,		TTI::OperandValueProperties Opd1PropInfo,
TTI::OperandValueProperties Opd2PropInfo,		TTI::OperandValueProperties Opd2PropInfo,
ArrayRef<const Value *> Args,		ArrayRef<const Value *> Args,
▲ Show 20 Lines • Show All 500 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 551 Lines • ▼ Show 20 Lines	public:
virtual unsigned getMinPrefetchStride() const {		virtual unsigned getMinPrefetchStride() const {
return getST()->getMinPrefetchStride();		return getST()->getMinPrefetchStride();
}		}

virtual unsigned getMaxPrefetchIterationsAhead() const {		virtual unsigned getMaxPrefetchIterationsAhead() const {
return getST()->getMaxPrefetchIterationsAhead();		return getST()->getMaxPrefetchIterationsAhead();
}		}

		/// Return the number of write-combining buffers.
		unsigned getNumWriteCombiningBuffers() const {
		return getST()->getNumWriteCombiningBuffers();
		}

/// @}		/// @}

/// \name Vector TTI Implementations		/// \name Vector TTI Implementations
/// @{		/// @{

unsigned getRegisterBitWidth(bool Vector) const { return 32; }		unsigned getRegisterBitWidth(bool Vector) const { return 32; }

/// Estimate the overhead of scalarizing an instruction. Insert and Extract		/// Estimate the overhead of scalarizing an instruction. Insert and Extract
▲ Show 20 Lines • Show All 1,180 Lines • Show Last 20 Lines

llvm/include/llvm/MC/MCSubtargetInfo.h

Show First 20 Lines • Show All 261 Lines • ▼ Show 20 Lines	public:
/// iterations.		/// iterations.
///		///
virtual unsigned getMaxPrefetchIterationsAhead() const;		virtual unsigned getMaxPrefetchIterationsAhead() const;

/// Return the minimum stride necessary to trigger software		/// Return the minimum stride necessary to trigger software
/// prefetching.		/// prefetching.
///		///
virtual unsigned getMinPrefetchStride() const;		virtual unsigned getMinPrefetchStride() const;

		/// Return the number of write-combining buffers.
		virtual unsigned getNumWriteCombiningBuffers() const;
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_MC_MCSUBTARGETINFO_H		#endif // LLVM_MC_MCSUBTARGETINFO_H

llvm/lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 530 Lines • ▼ Show 20 Lines
	unsigned TargetTransformInfo::getMinPrefetchStride() const {			unsigned TargetTransformInfo::getMinPrefetchStride() const {
	return TTIImpl->getMinPrefetchStride();			return TTIImpl->getMinPrefetchStride();
	}			}

	unsigned TargetTransformInfo::getMaxPrefetchIterationsAhead() const {			unsigned TargetTransformInfo::getMaxPrefetchIterationsAhead() const {
	return TTIImpl->getMaxPrefetchIterationsAhead();			return TTIImpl->getMaxPrefetchIterationsAhead();
	}			}

				unsigned TargetTransformInfo::getNumWriteCombiningBuffers() const {
				return TTIImpl->getNumWriteCombiningBuffers();
				}

	unsigned TargetTransformInfo::getMaxInterleaveFactor(unsigned VF) const {			unsigned TargetTransformInfo::getMaxInterleaveFactor(unsigned VF) const {
	return TTIImpl->getMaxInterleaveFactor(VF);			return TTIImpl->getMaxInterleaveFactor(VF);
	}			}

	TargetTransformInfo::OperandValueKind			TargetTransformInfo::OperandValueKind
	TargetTransformInfo::getOperandInfo(Value *V, OperandValueProperties &OpProps) {			TargetTransformInfo::getOperandInfo(Value *V, OperandValueProperties &OpProps) {
	OperandValueKind OpInfo = OK_AnyValue;			OperandValueKind OpInfo = OK_AnyValue;
	OpProps = OP_None;			OpProps = OP_None;
	▲ Show 20 Lines • Show All 848 Lines • Show Last 20 Lines

llvm/lib/MC/MCSubtargetInfo.cpp

	Show First 20 Lines • Show All 334 Lines • ▼ Show 20 Lines

	unsigned MCSubtargetInfo::getMaxPrefetchIterationsAhead() const {			unsigned MCSubtargetInfo::getMaxPrefetchIterationsAhead() const {
	return UINT_MAX;			return UINT_MAX;
	}			}

	unsigned MCSubtargetInfo::getMinPrefetchStride() const {			unsigned MCSubtargetInfo::getMinPrefetchStride() const {
	return 1;			return 1;
	}			}

				unsigned MCSubtargetInfo::getNumWriteCombiningBuffers() const {
				return 0;
				}