This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
Analysis/
-
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
-
CodeGen/
-
ExpandReductions.h
-
Passes.h
-
InitializePasses.h
-
Transforms/Utils/
-
Utils/
-
LoopUtils.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
CodeGen/
-
CMakeLists.txt
5/19
ExpandReductions.cpp
-
TargetPassConfig.cpp
-
Target/AArch64/
-
AArch64/
-
AArch64TargetTransformInfo.h
-
Transforms/Utils/
-
Utils/
-
LoopUtils.cpp
-
test/CodeGen/Generic/
-
CodeGen/
-
Generic/
1/1
expand-experimental-reductions.ll
-
tools/
-
llc/
-
llc.cpp
-
opt/
-
opt.cpp

Differential D32245

Add an IR expansion pass for the experimental reductions
ClosedPublic

Authored by aemerson on Apr 19 2017, 2:34 PM.

Download Raw Diff

Details

Reviewers

mkuper
delena
rengolin

Commits

rG836b0f48c116: Add a late IR expansion pass for the experimental reduction intrinsics.
rL302631: Add a late IR expansion pass for the experimental reduction intrinsics.

Summary

This is an IR expansion pass intended to allow targets to opt-in to using the experimental reduction intrinsics introduced in D30086.

Its purpose is to see the effects of switching to the intrinsics in the IR, so this pass should be added to a target's pass config late, just before codegen. The expansion should result in the same shufflevector sequence form that targets currently expect reductions to be in.

Diff Detail

Repository: rL LLVM

Event Timeline

aemerson created this revision.Apr 19 2017, 2:34 PM

Herald added a subscriber: mgorny. · View Herald TranscriptApr 19 2017, 2:34 PM

aemerson added a parent revision: D30086: Add generic IR vector reductions.Apr 19 2017, 2:35 PM

tschuett added a subscriber: tschuett.Apr 19 2017, 11:05 PM

Adding some target people - I think they ought to care about this more than I do. :-)

lib/CodeGen/ExpandReductions.cpp
83	I'm not a huge fan of this - I would prefer not to rely on the invalidation semantics. Maybe collect all relevant instructions into a vector first, then do the replacement? (But if others disagree, this is fine.)
85	How do you expect this to happen?
93	What do we expect to happen in this case?
97	Please annotate the fallthrough here. (And perhaps it would be better to rewrite this to avoid it)
106	What about the internal instructions?
118	I'd expect a target query somewhere, regarding whether the intrinsic needs to be expanded.

aemerson marked 4 inline comments as done.Apr 26 2017, 3:18 AM

aemerson added a subscriber: RKSimon.

aemerson added inline comments.

lib/CodeGen/ExpandReductions.cpp
83	I can do that, but I've seen this kind of thing done before in other places.
85	Sorry, not entirely sure what you mean? This is an early exit if the given instruction isn't an intrinsic call.
93	As no in-tree target currently supports ordered reductions, and given that for SVE we want to enable support completely without using this expansion pass, I decided against trying to handle ordered reductions here. We just skip the intrinsic if we find it's an ordered reduction. If other targets want to experiment with ordered I think they can implement expansion via some scalarization method here.
118	My expectation was that targets wouldn't need, at least at first, that level of granularity. @RKSimon what do you think about this?

Please add full context to the diff - especially as its dependent on another (in progress) patch.

lib/CodeGen/ExpandReductions.cpp
118	Are we guaranteeing that the reductions will match the ones supported by TargetTransformInfo::getReductionCost ?
test/CodeGen/Generic/expand-experimental-reductions.ll
2	I'd prefer to see the full reduction codegen here - regenerate with utils\update_test_checks.py ?

aemerson marked 3 inline comments as done.Apr 26 2017, 5:47 AM

aemerson added inline comments.

lib/CodeGen/ExpandReductions.cpp
118	Do you mean useReductionIntrinsic()? If so, I suppose it comes down to the exact use case of this expansion. Michael originally asked for this so that targets could check the effect of using the intrinsics at the IR level only, and at a very late stage converting them into the shuffle form we have now. For that, I don't see why you would care about which individual intrinsics are expanded, rather than a simple on/off decision. If however there might be more uses of this, for example in future, if we want to enable intrinsic forms for all targets as a canonical form, and then use this pass with TTI to make a target dependent decision on which codegen-level form is preferred, then I think a TTI hook would make sense. I can add a hook anyway, perhaps defaulting to "expand all intrinsics" unless the target overrides it.

RKSimon added inline comments.Apr 26 2017, 6:35 AM

lib/CodeGen/ExpandReductions.cpp
97	You've marked it as done by LLVM_FALLTHROUGH is still missing from these - you will get warnings on some buildbots

aemerson added inline comments.Apr 26 2017, 6:39 AM

lib/CodeGen/ExpandReductions.cpp
97	I might be misunderstanding what the "Done" means. I used it to mean I'll address this in the next patch when I upload it. I haven't got around to that yet.

mkuper added inline comments.Apr 26 2017, 10:45 AM

lib/CodeGen/ExpandReductions.cpp
83	I'd suggest getting another reviewer's opinion on this.
85	Sorry, I misread this is dyn_cast<Instruction>, ignore.
93	The issue is that target-independent intrinsics are, by definition, supposed to be handled by any target. I shouldn't see a backend crash if I write IR that has the ordered intrinsic, and try to compile it for x86. Having said that - this is fine for now, but if we ever want to make these intrinsics non-experimental, this will have to be dealt with somehow. Please add a TODO.
118	I can add a hook anyway, perhaps defaulting to "expand all intrinsics" unless the target overrides it. That's exactly what I'd expect, thanks.

Addressed review comments, rewritten the pass a bit to be somewhat neater. D30086 is now committed now so this is ready to go if it looks ok.

Herald added a subscriber: javed.absar. · View Herald TranscriptMay 9 2017, 7:56 AM

LGTM, with a nit.

lib/CodeGen/ExpandReductions.cpp
137	I don't believe you should ever be in the situation you don't have a TTI here. So it should be safe to just do: const auto *TTI = getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);

This revision is now accepted and ready to land.May 9 2017, 3:38 PM

Thanks, I'll make that change and commit.

Closed by commit rL302631: Add a late IR expansion pass for the experimental reduction intrinsics. (authored by aemerson). · Explain WhyMay 10 2017, 2:56 AM

This revision was automatically updated to reflect the committed changes.

ZhangKang marked an inline comment as done.Sep 25 2019, 12:46 AM

ZhangKang added a subscriber: ZhangKang.

ZhangKang added inline comments.

llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h

466 ↗

(On Diff #98420)

Hello @aemerson ,
Here you set the function shouldExpandReduction to return true.
For below test case:

asm
declare i8 @llvm.experimental.vector.reduce.and.i8.v3i8(<3 x i8> %a)
define i8 @test_v3i8(<3 x i8> %a) nounwind {
  %b = call i8 @llvm.experimental.vector.reduce.and.i8.v3i8(<3 x i8> %a)
  ret i8 %b
}

If I built above case on ppc:, I will get below error:

shell
llc error_case.ll -mtriple=powerpc64-unknown-linux-gnu
llc: /home/shkzhang/llvm/llvm/lib/Transforms/Utils/LoopUtils.cpp:828: llvm::Value *llvm::getShuffleReduction(IRBuilder<> &, llvm::Value *, unsigned int, RecurrenceDescriptor::MinMaxRecurrenceKind, ArrayRef<llvm::Value *>): Assertion `isPowerOf2_32(VF) && "Reduction emission only supported for pow2 vectors!"' failed.
Stack dump:
0.	Program arguments: llc error_case.ll -mtriple=powerpc64-unknown-linux-gnu
1.	Running pass 'Function Pass Manager' on module 'error_case.ll'.
2.	Running pass 'Expand reduction intrinsics' on function '@test_v3i8'
 #0 0x000000001244d094 PrintStackTraceSignalHandler(void*) (/home/shkzhang/llvm/build/bin/llc+0x1244d094)
 #1 0x000000001244a348 llvm::sys::RunSignalHandlers() (/home/shkzhang/llvm/build/bin/llc+0x1244a348)
 #2 0x000000001244d6cc SignalHandler(int) (/home/shkzhang/llvm/build/bin/llc+0x1244d6cc)
 #3 0x00007869689104d8 (linux-vdso64.so.1+0x4d8)
 #4 0x00007869681ee98c __libc_signal_restore_set /build/glibc-uvws04/glibc-2.27/signal/../sysdeps/unix/sysv/linux/nptl-signals.h:80:0
 #5 0x00007869681ee98c raise /build/glibc-uvws04/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:48:0
 #6 0x00007869681f0be0 abort /build/glibc-uvws04/glibc-2.27/stdlib/abort.c:79:0
 #7 0x00007869681dbb38 __assert_fail_base /build/glibc-uvws04/glibc-2.27/assert/assert.c:92:0
 #8 0x00007869681dbbe4 __assert_fail /build/glibc-uvws04/glibc-2.27/assert/assert.c:101:0
 #9 0x00000000124e036c llvm::getShuffleReduction(llvm::IRBuilder<llvm::ConstantFolder, llvm::IRBuilderDefaultInserter>&, llvm::Value*, unsigned int, llvm::RecurrenceDescriptor::MinMaxRecurrenceKind, llvm::ArrayRef<llvm::Value*>) (/home/shkzhang/llvm/build/bin/llc+0x124e036c)
#10 0x000000001175de9c (anonymous namespace)::expandReductions(llvm::Function&, llvm::TargetTransformInfo const*) (/home/shkzhang/llvm/build/bin/llc+0x1175de9c)
#11 0x0000000011cc9700 llvm::FPPassManager::runOnFunction(llvm::Function&) (/home/shkzhang/llvm/build/bin/llc+0x11cc9700)
#12 0x0000000011cc9b90 llvm::FPPassManager::runOnModule(llvm::Module&) (/home/shkzhang/llvm/build/bin/llc+0x11cc9b90)
#13 0x0000000011cca354 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/home/shkzhang/llvm/build/bin/llc+0x11cca354)
#14 0x0000000011cca9ec llvm::legacy::PassManager::run(llvm::Module&) (/home/shkzhang/llvm/build/bin/llc+0x11cca9ec)
#15 0x0000000010377408 compileModule(char**, llvm::LLVMContext&) (/home/shkzhang/llvm/build/bin/llc+0x10377408)
#16 0x0000000010374a3c main (/home/shkzhang/llvm/build/bin/llc+0x10374a3c)
#17 0x00007869681c441c generic_start_main /build/glibc-uvws04/glibc-2.27/csu/../csu/libc-start.c:310:0
#18 0x00007869681c4618 __libc_start_main /build/glibc-uvws04/glibc-2.27/csu/../sysdeps/unix/sysv/linux/powerpc/libc-start.c:116:0
Aborted (core dumped)

This is because I use v3i8 here, it's not pow2. But for those ARCH like AArch64, this case can pass, because the function shouldExpandReduction will return false.

I have question that, whether we should fix above error. For example, if the number of element is not pow2, we do not call shouldExpandReduction?

Herald added a project: Restricted Project. · View Herald TranscriptSep 25 2019, 12:46 AM

Revision Contents

Path

Size

include/

llvm/

Analysis/

TargetTransformInfo.h

7 lines

TargetTransformInfoImpl.h

4 lines

CodeGen/

ExpandReductions.h

24 lines

Passes.h

4 lines

InitializePasses.h

1 line

Transforms/

Utils/

LoopUtils.h

6 lines

lib/

Analysis/

TargetTransformInfo.cpp

3 lines

CodeGen/

CMakeLists.txt

1 line

ExpandReductions.cpp

167 lines

TargetPassConfig.cpp

3 lines

Target/

AArch64/

AArch64TargetTransformInfo.h

4 lines

Transforms/

Utils/

LoopUtils.cpp

9 lines

test/

CodeGen/

Generic/

expand-experimental-reductions.ll

210 lines

tools/

llc/

llc.cpp

1 line

opt/

opt.cpp

1 line

Diff 98285

include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 747 Lines • ▼ Show 20 Lines	struct ReductionFlags {
bool NoNaN; ///< If op is an fp min/max, whether NaNs may be present.		bool NoNaN; ///< If op is an fp min/max, whether NaNs may be present.
};		};

/// \returns True if the target wants to handle the given reduction idiom in		/// \returns True if the target wants to handle the given reduction idiom in
/// the intrinsics form instead of the shuffle form.		/// the intrinsics form instead of the shuffle form.
bool useReductionIntrinsic(unsigned Opcode, Type *Ty,		bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
ReductionFlags Flags) const;		ReductionFlags Flags) const;

		/// \returns True if the target wants to expand the given reduction intrinsic
		/// into a shuffle sequence.
		bool shouldExpandReduction(const IntrinsicInst *II) const;
/// @}		/// @}

private:		private:
/// \brief The abstract base class used to type erase specific TTI		/// \brief The abstract base class used to type erase specific TTI
/// implementations.		/// implementations.
class Concept;		class Concept;

/// \brief The template model for the base class which wraps a concrete		/// \brief The template model for the base class which wraps a concrete
▲ Show 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	public:
virtual unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,		virtual unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const = 0;		VectorType *VecTy) const = 0;
virtual unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,		virtual unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const = 0;		VectorType *VecTy) const = 0;
virtual bool useReductionIntrinsic(unsigned Opcode, Type *Ty,		virtual bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
ReductionFlags) const = 0;		ReductionFlags) const = 0;
		virtual bool shouldExpandReduction(const IntrinsicInst *II) const = 0;
};		};

template <typename T>		template <typename T>
class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {		class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {
T Impl;		T Impl;

public:		public:
Model(T Impl) : Impl(std::move(Impl)) {}		Model(T Impl) : Impl(std::move(Impl)) {}
▲ Show 20 Lines • Show All 293 Lines • ▼ Show 20 Lines	unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const override {		VectorType *VecTy) const override {
return Impl.getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);		return Impl.getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);
}		}
bool useReductionIntrinsic(unsigned Opcode, Type *Ty,		bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
ReductionFlags Flags) const override {		ReductionFlags Flags) const override {
return Impl.useReductionIntrinsic(Opcode, Ty, Flags);		return Impl.useReductionIntrinsic(Opcode, Ty, Flags);
}		}
		bool shouldExpandReduction(const IntrinsicInst *II) const override {
		return Impl.shouldExpandReduction(II);
		}
};		};

template <typename T>		template <typename T>
TargetTransformInfo::TargetTransformInfo(T Impl)		TargetTransformInfo::TargetTransformInfo(T Impl)
: TTIImpl(new Model<T>(Impl)) {}		: TTIImpl(new Model<T>(Impl)) {}

/// \brief Analysis pass providing the \c TargetTransformInfo.		/// \brief Analysis pass providing the \c TargetTransformInfo.
///		///
▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 456 Lines • ▼ Show 20 Lines	unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
return VF;		return VF;
}		}

bool useReductionIntrinsic(unsigned Opcode, Type *Ty,		bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
TTI::ReductionFlags Flags) const {		TTI::ReductionFlags Flags) const {
return false;		return false;
}		}

		bool shouldExpandReduction(const IntrinsicInst *II) const {
		return true;
		}

protected:		protected:
// Obtain the minimum required size to hold the value (without the sign)		// Obtain the minimum required size to hold the value (without the sign)
// In case of a vector it returns the min required size for one element.		// In case of a vector it returns the min required size for one element.
unsigned minRequiredElementSize(const Value* Val, bool &isSigned) {		unsigned minRequiredElementSize(const Value* Val, bool &isSigned) {
if (isa<ConstantDataVector>(Val) \|\| isa<ConstantVector>(Val)) {		if (isa<ConstantDataVector>(Val) \|\| isa<ConstantVector>(Val)) {
const auto* VectorValue = cast<Constant>(Val);		const auto* VectorValue = cast<Constant>(Val);

// In case of a vector need to pick the max between the min		// In case of a vector need to pick the max between the min
▲ Show 20 Lines • Show All 225 Lines • Show Last 20 Lines

include/llvm/CodeGen/ExpandReductions.h

This file was added.

				//===----- ExpandReductions.h - Expand experimental reduction intrinsics --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CODEGEN_EXPANDREDUCTIONS_H
				#define LLVM_CODEGEN_EXPANDREDUCTIONS_H

				#include "llvm/IR/PassManager.h"

				namespace llvm {

				class ExpandReductionsPass
				: public PassInfoMixin<ExpandReductionsPass> {
				public:
				PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
				};
				} // end namespace llvm

				#endif // LLVM_CODEGEN_EXPANDREDUCTIONS_H

include/llvm/CodeGen/Passes.h

Show First 20 Lines • Show All 399 Lines • ▼ Show 20 Lines	/// MachineDominanaceFrontier - This pass is a machine dominators analysis pass.

/// This pass combine basic blocks guarded by the same branch.		/// This pass combine basic blocks guarded by the same branch.
extern char &BranchCoalescingID;		extern char &BranchCoalescingID;

/// This pass performs outlining on machine instructions directly before		/// This pass performs outlining on machine instructions directly before
/// printing assembly.		/// printing assembly.
ModulePass *createMachineOutlinerPass();		ModulePass *createMachineOutlinerPass();

		/// This pass expands the experimental reduction intrinsics into sequences of
		/// shuffles.
		FunctionPass *createExpandReductionsPass();

} // End llvm namespace		} // End llvm namespace

/// Target machine pass initializer for passes with dependencies. Use with		/// Target machine pass initializer for passes with dependencies. Use with
/// INITIALIZE_TM_PASS_END.		/// INITIALIZE_TM_PASS_END.
#define INITIALIZE_TM_PASS_BEGIN INITIALIZE_PASS_BEGIN		#define INITIALIZE_TM_PASS_BEGIN INITIALIZE_PASS_BEGIN

/// Target machine pass initializer for passes with dependencies. Use with		/// Target machine pass initializer for passes with dependencies. Use with
/// INITIALIZE_TM_PASS_BEGIN.		/// INITIALIZE_TM_PASS_BEGIN.
Show All 24 Lines

include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 124 Lines • ▼ Show 20 Lines
	void initializeEarlyCSELegacyPassPass(PassRegistry&);			void initializeEarlyCSELegacyPassPass(PassRegistry&);
	void initializeEarlyCSEMemSSALegacyPassPass(PassRegistry&);			void initializeEarlyCSEMemSSALegacyPassPass(PassRegistry&);
	void initializeEarlyIfConverterPass(PassRegistry&);			void initializeEarlyIfConverterPass(PassRegistry&);
	void initializeEdgeBundlesPass(PassRegistry&);			void initializeEdgeBundlesPass(PassRegistry&);
	void initializeEfficiencySanitizerPass(PassRegistry&);			void initializeEfficiencySanitizerPass(PassRegistry&);
	void initializeEliminateAvailableExternallyLegacyPassPass(PassRegistry&);			void initializeEliminateAvailableExternallyLegacyPassPass(PassRegistry&);
	void initializeExpandISelPseudosPass(PassRegistry&);			void initializeExpandISelPseudosPass(PassRegistry&);
	void initializeExpandPostRAPass(PassRegistry&);			void initializeExpandPostRAPass(PassRegistry&);
				void initializeExpandReductionsPass(PassRegistry&);
	void initializeExternalAAWrapperPassPass(PassRegistry&);			void initializeExternalAAWrapperPassPass(PassRegistry&);
	void initializeFEntryInserterPass(PassRegistry&);			void initializeFEntryInserterPass(PassRegistry&);
	void initializeFinalizeMachineBundlesPass(PassRegistry&);			void initializeFinalizeMachineBundlesPass(PassRegistry&);
	void initializeFlattenCFGPassPass(PassRegistry&);			void initializeFlattenCFGPassPass(PassRegistry&);
	void initializeFloat2IntLegacyPassPass(PassRegistry&);			void initializeFloat2IntLegacyPassPass(PassRegistry&);
	void initializeForceFunctionAttrsLegacyPassPass(PassRegistry&);			void initializeForceFunctionAttrsLegacyPassPass(PassRegistry&);
	void initializeForwardControlFlowIntegrityPass(PassRegistry&);			void initializeForwardControlFlowIntegrityPass(PassRegistry&);
	void initializeFuncletLayoutPass(PassRegistry&);			void initializeFuncletLayoutPass(PassRegistry&);
	▲ Show 20 Lines • Show All 234 Lines • Show Last 20 Lines

include/llvm/Transforms/Utils/LoopUtils.h

	Show First 20 Lines • Show All 485 Lines • ▼ Show 20 Lines
	/// instructions from loop body to preheader/exit. Check if the instruction			/// instructions from loop body to preheader/exit. Check if the instruction
	/// can execute speculatively.			/// can execute speculatively.
	/// If \p ORE is set use it to emit optimization remarks.			/// If \p ORE is set use it to emit optimization remarks.
	bool canSinkOrHoistInst(Instruction &I, AAResults AA, DominatorTree DT,			bool canSinkOrHoistInst(Instruction &I, AAResults AA, DominatorTree DT,
	Loop CurLoop, AliasSetTracker CurAST,			Loop CurLoop, AliasSetTracker CurAST,
	LoopSafetyInfo *SafetyInfo,			LoopSafetyInfo *SafetyInfo,
	OptimizationRemarkEmitter *ORE = nullptr);			OptimizationRemarkEmitter *ORE = nullptr);

				/// Generates a vector reduction using shufflevectors to reduce the value.
				Value getShuffleReduction(IRBuilder<> &Builder, Value Src, unsigned Op,
				RecurrenceDescriptor::MinMaxRecurrenceKind
				MinMaxKind = RecurrenceDescriptor::MRK_Invalid,
				ArrayRef<Value > RedOps = ArrayRef<Value >());

	/// Create a target reduction of the given vector. The reduction operation			/// Create a target reduction of the given vector. The reduction operation
	/// is described by the \p Opcode parameter. min/max reductions require			/// is described by the \p Opcode parameter. min/max reductions require
	/// additional information supplied in \p Flags.			/// additional information supplied in \p Flags.
	/// The target is queried to determine if intrinsics or shuffle sequences are			/// The target is queried to determine if intrinsics or shuffle sequences are
	/// required to implement the reduction.			/// required to implement the reduction.
	Value *			Value *
	createSimpleTargetReduction(IRBuilder<> &B, const TargetTransformInfo *TTI,			createSimpleTargetReduction(IRBuilder<> &B, const TargetTransformInfo *TTI,
	unsigned Opcode, Value *Src,			unsigned Opcode, Value *Src,
	Show All 19 Lines

lib/Analysis/TargetTransformInfo.cpp

Show First 20 Lines • Show All 499 Lines • ▼ Show 20 Lines	unsigned TargetTransformInfo::getStoreVectorFactor(unsigned VF,
return TTIImpl->getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);		return TTIImpl->getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);
}		}

bool TargetTransformInfo::useReductionIntrinsic(unsigned Opcode,		bool TargetTransformInfo::useReductionIntrinsic(unsigned Opcode,
Type *Ty, ReductionFlags Flags) const {		Type *Ty, ReductionFlags Flags) const {
return TTIImpl->useReductionIntrinsic(Opcode, Ty, Flags);		return TTIImpl->useReductionIntrinsic(Opcode, Ty, Flags);
}		}

		bool TargetTransformInfo::shouldExpandReduction(const IntrinsicInst *II) const {
		return TTIImpl->shouldExpandReduction(II);
		}

TargetTransformInfo::Concept::~Concept() {}		TargetTransformInfo::Concept::~Concept() {}

TargetIRAnalysis::TargetIRAnalysis() : TTICallback(&getDefaultTTI) {}		TargetIRAnalysis::TargetIRAnalysis() : TTICallback(&getDefaultTTI) {}

TargetIRAnalysis::TargetIRAnalysis(		TargetIRAnalysis::TargetIRAnalysis(
std::function<Result(const Function &)> TTICallback)		std::function<Result(const Function &)> TTICallback)
: TTICallback(std::move(TTICallback)) {}		: TTICallback(std::move(TTICallback)) {}
▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

lib/CodeGen/CMakeLists.txt

Show All 17 Lines	add_llvm_library(LLVMCodeGen
DetectDeadLanes.cpp		DetectDeadLanes.cpp
DFAPacketizer.cpp		DFAPacketizer.cpp
DwarfEHPrepare.cpp		DwarfEHPrepare.cpp
EarlyIfConversion.cpp		EarlyIfConversion.cpp
EdgeBundles.cpp		EdgeBundles.cpp
ExecutionDepsFix.cpp		ExecutionDepsFix.cpp
ExpandISelPseudos.cpp		ExpandISelPseudos.cpp
ExpandPostRAPseudos.cpp		ExpandPostRAPseudos.cpp
		ExpandReductions.cpp
FaultMaps.cpp		FaultMaps.cpp
FEntryInserter.cpp		FEntryInserter.cpp
FuncletLayout.cpp		FuncletLayout.cpp
GCMetadata.cpp		GCMetadata.cpp
GCMetadataPrinter.cpp		GCMetadataPrinter.cpp
GCRootLowering.cpp		GCRootLowering.cpp
GCStrategy.cpp		GCStrategy.cpp
GlobalMerge.cpp		GlobalMerge.cpp
▲ Show 20 Lines • Show All 133 Lines • Show Last 20 Lines

lib/CodeGen/ExpandReductions.cpp

This file was added.

				//===--- ExpandReductions.cpp - Expand experimental reduction intrinsics --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass implements IR expansion for reduction intrinsics, allowing targets
				// to enable the experimental intrinsics until just before codegen.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Analysis/TargetTransformInfo.h"
				#include "llvm/CodeGen/ExpandReductions.h"
				#include "llvm/CodeGen/Passes.h"
				#include "llvm/IR/Function.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/InstIterator.h"
				#include "llvm/IR/Intrinsics.h"
				#include "llvm/IR/IntrinsicInst.h"
				#include "llvm/IR/Module.h"
				#include "llvm/Transforms/Utils/LoopUtils.h"
				#include "llvm/Pass.h"

				using namespace llvm;

				namespace {

				unsigned getOpcode(Intrinsic::ID ID) {
				switch (ID) {
				case Intrinsic::experimental_vector_reduce_fadd:
				return Instruction::FAdd;
				case Intrinsic::experimental_vector_reduce_fmul:
				return Instruction::FMul;
				case Intrinsic::experimental_vector_reduce_add:
				return Instruction::Add;
				case Intrinsic::experimental_vector_reduce_mul:
				return Instruction::Mul;
				case Intrinsic::experimental_vector_reduce_and:
				return Instruction::And;
				case Intrinsic::experimental_vector_reduce_or:
				return Instruction::Or;
				case Intrinsic::experimental_vector_reduce_xor:
				return Instruction::Xor;
				case Intrinsic::experimental_vector_reduce_smax:
				case Intrinsic::experimental_vector_reduce_smin:
				case Intrinsic::experimental_vector_reduce_umax:
				case Intrinsic::experimental_vector_reduce_umin:
				return Instruction::ICmp;
				case Intrinsic::experimental_vector_reduce_fmax:
				case Intrinsic::experimental_vector_reduce_fmin:
				return Instruction::FCmp;
				default:
				llvm_unreachable("Unexpected ID");
				}
				}

				RecurrenceDescriptor::MinMaxRecurrenceKind getMRK(Intrinsic::ID ID) {
				switch (ID) {
				case Intrinsic::experimental_vector_reduce_smax:
				return RecurrenceDescriptor::MRK_SIntMax;
				case Intrinsic::experimental_vector_reduce_smin:
				return RecurrenceDescriptor::MRK_SIntMin;
				case Intrinsic::experimental_vector_reduce_umax:
				return RecurrenceDescriptor::MRK_UIntMax;
				case Intrinsic::experimental_vector_reduce_umin:
				return RecurrenceDescriptor::MRK_UIntMin;
				case Intrinsic::experimental_vector_reduce_fmax:
				return RecurrenceDescriptor::MRK_FloatMax;
				case Intrinsic::experimental_vector_reduce_fmin:
				return RecurrenceDescriptor::MRK_FloatMin;
				default:
				return RecurrenceDescriptor::MRK_Invalid;
				}
				}

				bool expandReductions(Function &F, const TargetTransformInfo *TTI) {
				bool Changed = false;
				SmallVector<IntrinsicInst*, 4> Worklist;
				for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I)
				if (auto II = dyn_cast<IntrinsicInst>(&*I))
				mkuperUnsubmitted Done Reply Inline Actions I'm not a huge fan of this - I would prefer not to rely on the invalidation semantics. Maybe collect all relevant instructions into a vector first, then do the replacement? (But if others disagree, this is fine.) mkuper: I'm not a huge fan of this - I would prefer not to rely on the invalidation semantics. Maybe…
				aemersonAuthorUnsubmitted Not Done Reply Inline Actions I can do that, but I've seen this kind of thing done before in other places. aemerson: I can do that, but I've seen this kind of thing done before in other places.
				mkuperUnsubmitted Not Done Reply Inline Actions I'd suggest getting another reviewer's opinion on this. mkuper: I'd suggest getting another reviewer's opinion on this.
				Worklist.push_back(II);

				mkuperUnsubmitted Not Done Reply Inline Actions How do you expect this to happen? mkuper: How do you expect this to happen?
				aemersonAuthorUnsubmitted Not Done Reply Inline Actions Sorry, not entirely sure what you mean? This is an early exit if the given instruction isn't an intrinsic call. aemerson: Sorry, not entirely sure what you mean? This is an early exit if the given instruction isn't an…
				mkuperUnsubmitted Not Done Reply Inline Actions Sorry, I misread this is dyn_cast<Instruction>, ignore. mkuper: Sorry, I misread this is dyn_cast<Instruction>, ignore.
				for (auto *II : Worklist) {
				IRBuilder<> Builder(II);
				Value *Vec = nullptr;
				auto ID = II->getIntrinsicID();
				auto MRK = RecurrenceDescriptor::MRK_Invalid;
				switch (ID) {
				case Intrinsic::experimental_vector_reduce_fadd:
				case Intrinsic::experimental_vector_reduce_fmul:
				mkuperUnsubmitted Not Done Reply Inline Actions What do we expect to happen in this case? mkuper: What do we expect to happen in this case?
				aemersonAuthorUnsubmitted Not Done Reply Inline Actions As no in-tree target currently supports ordered reductions, and given that for SVE we want to enable support completely without using this expansion pass, I decided against trying to handle ordered reductions here. We just skip the intrinsic if we find it's an ordered reduction. If other targets want to experiment with ordered I think they can implement expansion via some scalarization method here. aemerson: As no in-tree target currently supports ordered reductions, and given that for SVE we want to…
				mkuperUnsubmitted Not Done Reply Inline Actions The issue is that target-independent intrinsics are, by definition, supposed to be handled by any target. I shouldn't see a backend crash if I write IR that has the ordered intrinsic, and try to compile it for x86. Having said that - this is fine for now, but if we ever want to make these intrinsics non-experimental, this will have to be dealt with somehow. Please add a TODO. mkuper: The issue is that target-independent intrinsics are, by definition, supposed to be handled by…
				// FMFs must be attached to the call, otherwise it's an ordered reduction
				// and it can't be handled by generating this shuffle sequence.
				// TODO: Implement scalarization of ordered reductions here for targets
				// without native support.
				mkuperUnsubmitted Done Reply Inline Actions Please annotate the fallthrough here. (And perhaps it would be better to rewrite this to avoid it) mkuper: Please annotate the fallthrough here. (And perhaps it would be better to rewrite this to avoid…
				RKSimonUnsubmitted Not Done Reply Inline Actions You've marked it as done by LLVM_FALLTHROUGH is still missing from these - you will get warnings on some buildbots RKSimon: You've marked it as done by LLVM_FALLTHROUGH is still missing from these - you will get…
				aemersonAuthorUnsubmitted Not Done Reply Inline Actions I might be misunderstanding what the "Done" means. I used it to mean I'll address this in the next patch when I upload it. I haven't got around to that yet. aemerson: I might be misunderstanding what the "Done" means. I used it to mean I'll address this in the…
				if (!II->getFastMathFlags().unsafeAlgebra())
				continue;
				Vec = II->getArgOperand(1);
				break;
				case Intrinsic::experimental_vector_reduce_add:
				case Intrinsic::experimental_vector_reduce_mul:
				case Intrinsic::experimental_vector_reduce_and:
				case Intrinsic::experimental_vector_reduce_or:
				case Intrinsic::experimental_vector_reduce_xor:
				mkuperUnsubmitted Done Reply Inline Actions What about the internal instructions? mkuper: What about the internal instructions?
				case Intrinsic::experimental_vector_reduce_smax:
				case Intrinsic::experimental_vector_reduce_smin:
				case Intrinsic::experimental_vector_reduce_umax:
				case Intrinsic::experimental_vector_reduce_umin:
				case Intrinsic::experimental_vector_reduce_fmax:
				case Intrinsic::experimental_vector_reduce_fmin:
				Vec = II->getArgOperand(0);
				MRK = getMRK(ID);
				break;
				default:
				continue;
				}
				mkuperUnsubmitted Done Reply Inline Actions I'd expect a target query somewhere, regarding whether the intrinsic needs to be expanded. mkuper: I'd expect a target query somewhere, regarding whether the intrinsic needs to be expanded.
				aemersonAuthorUnsubmitted Not Done Reply Inline Actions My expectation was that targets wouldn't need, at least at first, that level of granularity. @RKSimon what do you think about this? aemerson: My expectation was that targets wouldn't need, at least at first, that level of granularity.
				RKSimonUnsubmitted Not Done Reply Inline Actions Are we guaranteeing that the reductions will match the ones supported by TargetTransformInfo::getReductionCost ? RKSimon: Are we guaranteeing that the reductions will match the ones supported by TargetTransformInfo…
				aemersonAuthorUnsubmitted Not Done Reply Inline Actions Do you mean useReductionIntrinsic()? If so, I suppose it comes down to the exact use case of this expansion. Michael originally asked for this so that targets could check the effect of using the intrinsics at the IR level only, and at a very late stage converting them into the shuffle form we have now. For that, I don't see why you would care about which individual intrinsics are expanded, rather than a simple on/off decision. If however there might be more uses of this, for example in future, if we want to enable intrinsic forms for all targets as a canonical form, and then use this pass with TTI to make a target dependent decision on which codegen-level form is preferred, then I think a TTI hook would make sense. I can add a hook anyway, perhaps defaulting to "expand all intrinsics" unless the target overrides it. aemerson: Do you mean useReductionIntrinsic()? If so, I suppose it comes down to the exact use case of…
				mkuperUnsubmitted Not Done Reply Inline Actions I can add a hook anyway, perhaps defaulting to "expand all intrinsics" unless the target overrides it. That's exactly what I'd expect, thanks. mkuper: > I can add a hook anyway, perhaps defaulting to "expand all intrinsics" unless the target…
				if (!TTI \|\| !TTI->shouldExpandReduction(II))
				continue;
				auto Rdx = getShuffleReduction(Builder, Vec, getOpcode(ID), MRK);
				II->replaceAllUsesWith(Rdx);
				II->eraseFromParent();
				Changed = true;
				}
				return Changed;
				}

				class ExpandReductions : public FunctionPass {
				public:
				static char ID;
				ExpandReductions() : FunctionPass(ID) {
				initializeExpandReductionsPass(*PassRegistry::getPassRegistry());
				}

				bool runOnFunction(Function &F) override {
				auto *TTIP = getAnalysisIfAvailable<TargetTransformInfoWrapperPass>();
				mkuperUnsubmitted Done Reply Inline Actions I don't believe you should ever be in the situation you don't have a TTI here. So it should be safe to just do: const auto TTI = getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F); mkuper:* I don't believe you should ever be in the situation you don't have a TTI here. So it should be…
				const TargetTransformInfo *TTI = TTIP ? &TTIP->getTTI(F) : nullptr;
				return expandReductions(F, TTI);
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.setPreservesCFG();
				}
				};
				}

				char ExpandReductions::ID;
				INITIALIZE_PASS_BEGIN(ExpandReductions, "expand-reductions",
				"Expand reduction intrinsics", false, false)
				INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
				INITIALIZE_PASS_END(ExpandReductions, "expand-reductions",
				"Expand reduction intrinsics", false, false)

				FunctionPass *llvm::createExpandReductionsPass() {
				return new ExpandReductions();
				}

				PreservedAnalyses ExpandReductionsPass::run(Function &F,
				FunctionAnalysisManager &AM) {
				const auto &TTI = AM.getResult<TargetIRAnalysis>(F);
				if (!expandReductions(F, &TTI))
				return PreservedAnalyses::all();
				PreservedAnalyses PA;
				PA.preserveSet<CFGAnalyses>();
				return PA;
				}

lib/CodeGen/TargetPassConfig.cpp

Show First 20 Lines • Show All 481 Lines • ▼ Show 20 Lines	void TargetPassConfig::addIRPasses() {
if (getOptLevel() != CodeGenOpt::None && !DisableConstantHoisting)		if (getOptLevel() != CodeGenOpt::None && !DisableConstantHoisting)
addPass(createConstantHoistingPass());		addPass(createConstantHoistingPass());

if (getOptLevel() != CodeGenOpt::None && !DisablePartialLibcallInlining)		if (getOptLevel() != CodeGenOpt::None && !DisablePartialLibcallInlining)
addPass(createPartiallyInlineLibCallsPass());		addPass(createPartiallyInlineLibCallsPass());

// Insert calls to mcount-like functions.		// Insert calls to mcount-like functions.
addPass(createCountingFunctionInserterPass());		addPass(createCountingFunctionInserterPass());

		// Expand reduction intrinsics into shuffle sequences if the target wants to.
		addPass(createExpandReductionsPass());
}		}

/// Turn exception handling constructs into something the code generators can		/// Turn exception handling constructs into something the code generators can
/// handle.		/// handle.
void TargetPassConfig::addPassesToHandleExceptions() {		void TargetPassConfig::addPassesToHandleExceptions() {
const MCAsmInfo *MCAI = TM->getMCAsmInfo();		const MCAsmInfo *MCAI = TM->getMCAsmInfo();
assert(MCAI && "No MCAsmInfo");		assert(MCAI && "No MCAsmInfo");
switch (MCAI->getExceptionHandlingType()) {		switch (MCAI->getExceptionHandlingType()) {
▲ Show 20 Lines • Show All 437 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64TargetTransformInfo.h

Show First 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	public:

unsigned getCacheLineSize();		unsigned getCacheLineSize();

unsigned getPrefetchDistance();		unsigned getPrefetchDistance();

unsigned getMinPrefetchStride();		unsigned getMinPrefetchStride();

unsigned getMaxPrefetchIterationsAhead();		unsigned getMaxPrefetchIterationsAhead();

		bool shouldExpandReduction(const IntrinsicInst *II) const {
		return false;
		}
/// @}		/// @}
};		};

} // end namespace llvm		} // end namespace llvm

#endif		#endif

lib/Transforms/Utils/LoopUtils.cpp

Show First 20 Lines • Show All 1,119 Lines • ▼ Show 20 Lines	if (isa<FPMathOperator>(V)) {
FastMathFlags Flags;		FastMathFlags Flags;
Flags.setUnsafeAlgebra();		Flags.setUnsafeAlgebra();
cast<Instruction>(V)->setFastMathFlags(Flags);		cast<Instruction>(V)->setFastMathFlags(Flags);
}		}
return V;		return V;
}		}

// Helper to generate a log2 shuffle reduction.		// Helper to generate a log2 shuffle reduction.
static Value *		Value *
getShuffleReduction(IRBuilder<> &Builder, Value *Src, unsigned Op,		llvm::getShuffleReduction(IRBuilder<> &Builder, Value *Src, unsigned Op,
RecurrenceDescriptor::MinMaxRecurrenceKind MinMaxKind =		RecurrenceDescriptor::MinMaxRecurrenceKind MinMaxKind,
RecurrenceDescriptor::MRK_Invalid,		ArrayRef<Value *> RedOps) {
ArrayRef<Value > RedOps = ArrayRef<Value >()) {
unsigned VF = Src->getType()->getVectorNumElements();		unsigned VF = Src->getType()->getVectorNumElements();
// VF is a power of 2 so we can emit the reduction using log2(VF) shuffles		// VF is a power of 2 so we can emit the reduction using log2(VF) shuffles
// and vector ops, reducing the set of values being computed by half each		// and vector ops, reducing the set of values being computed by half each
// round.		// round.
assert(isPowerOf2_32(VF) &&		assert(isPowerOf2_32(VF) &&
"Reduction emission only supported for pow2 vectors!");		"Reduction emission only supported for pow2 vectors!");
Value *TmpVec = Src;		Value *TmpVec = Src;
SmallVector<Constant *, 32> ShuffleMask(VF, nullptr);		SmallVector<Constant *, 32> ShuffleMask(VF, nullptr);
▲ Show 20 Lines • Show All 176 Lines • Show Last 20 Lines

test/CodeGen/Generic/expand-experimental-reductions.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -expand-reductions -S \| FileCheck %s
				RKSimonUnsubmitted Done Reply Inline Actions I'd prefer to see the full reduction codegen here - regenerate with utils\update_test_checks.py ? RKSimon: I'd prefer to see the full reduction codegen here - regenerate with utils\update_test_checks.py…
				; Tests without a target which should expand all reductions
				declare i64 @llvm.experimental.vector.reduce.add.i64.v2i64(<2 x i64>)
				declare i64 @llvm.experimental.vector.reduce.mul.i64.v2i64(<2 x i64>)
				declare i64 @llvm.experimental.vector.reduce.and.i64.v2i64(<2 x i64>)
				declare i64 @llvm.experimental.vector.reduce.or.i64.v2i64(<2 x i64>)
				declare i64 @llvm.experimental.vector.reduce.xor.i64.v2i64(<2 x i64>)

				declare float @llvm.experimental.vector.reduce.fadd.f32.v4f32(float, <4 x float>)
				declare float @llvm.experimental.vector.reduce.fmul.f32.v4f32(float, <4 x float>)

				declare i64 @llvm.experimental.vector.reduce.smax.i64.v2i64(<2 x i64>)
				declare i64 @llvm.experimental.vector.reduce.smin.i64.v2i64(<2 x i64>)
				declare i64 @llvm.experimental.vector.reduce.umax.i64.v2i64(<2 x i64>)
				declare i64 @llvm.experimental.vector.reduce.umin.i64.v2i64(<2 x i64>)

				declare double @llvm.experimental.vector.reduce.fmax.f64.v2f64(<2 x double>)
				declare double @llvm.experimental.vector.reduce.fmin.f64.v2f64(<2 x double>)


				define i64 @add_i64(<2 x i64> %vec) {
				; CHECK-LABEL: @add_i64(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[RDX_SHUF:%.]] = shufflevector <2 x i64> [[VEC:%.]], <2 x i64> undef, <2 x i32> <i32 1, i32 undef>
				; CHECK-NEXT: [[BIN_RDX:%.*]] = add <2 x i64> [[VEC]], [[RDX_SHUF]]
				; CHECK-NEXT: [[TMP0:%.*]] = extractelement <2 x i64> [[BIN_RDX]], i32 0
				; CHECK-NEXT: ret i64 [[TMP0]]
				;
				entry:
				%r = call i64 @llvm.experimental.vector.reduce.add.i64.v2i64(<2 x i64> %vec)
				ret i64 %r
				}

				define i64 @mul_i64(<2 x i64> %vec) {
				; CHECK-LABEL: @mul_i64(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[RDX_SHUF:%.]] = shufflevector <2 x i64> [[VEC:%.]], <2 x i64> undef, <2 x i32> <i32 1, i32 undef>
				; CHECK-NEXT: [[BIN_RDX:%.*]] = mul <2 x i64> [[VEC]], [[RDX_SHUF]]
				; CHECK-NEXT: [[TMP0:%.*]] = extractelement <2 x i64> [[BIN_RDX]], i32 0
				; CHECK-NEXT: ret i64 [[TMP0]]
				;
				entry:
				%r = call i64 @llvm.experimental.vector.reduce.mul.i64.v2i64(<2 x i64> %vec)
				ret i64 %r
				}

				define i64 @and_i64(<2 x i64> %vec) {
				; CHECK-LABEL: @and_i64(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[RDX_SHUF:%.]] = shufflevector <2 x i64> [[VEC:%.]], <2 x i64> undef, <2 x i32> <i32 1, i32 undef>
				; CHECK-NEXT: [[BIN_RDX:%.*]] = and <2 x i64> [[VEC]], [[RDX_SHUF]]
				; CHECK-NEXT: [[TMP0:%.*]] = extractelement <2 x i64> [[BIN_RDX]], i32 0
				; CHECK-NEXT: ret i64 [[TMP0]]
				;
				entry:
				%r = call i64 @llvm.experimental.vector.reduce.and.i64.v2i64(<2 x i64> %vec)
				ret i64 %r
				}

				define i64 @or_i64(<2 x i64> %vec) {
				; CHECK-LABEL: @or_i64(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[RDX_SHUF:%.]] = shufflevector <2 x i64> [[VEC:%.]], <2 x i64> undef, <2 x i32> <i32 1, i32 undef>
				; CHECK-NEXT: [[BIN_RDX:%.*]] = or <2 x i64> [[VEC]], [[RDX_SHUF]]
				; CHECK-NEXT: [[TMP0:%.*]] = extractelement <2 x i64> [[BIN_RDX]], i32 0
				; CHECK-NEXT: ret i64 [[TMP0]]
				;
				entry:
				%r = call i64 @llvm.experimental.vector.reduce.or.i64.v2i64(<2 x i64> %vec)
				ret i64 %r
				}

				define i64 @xor_i64(<2 x i64> %vec) {
				; CHECK-LABEL: @xor_i64(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[RDX_SHUF:%.]] = shufflevector <2 x i64> [[VEC:%.]], <2 x i64> undef, <2 x i32> <i32 1, i32 undef>
				; CHECK-NEXT: [[BIN_RDX:%.*]] = xor <2 x i64> [[VEC]], [[RDX_SHUF]]
				; CHECK-NEXT: [[TMP0:%.*]] = extractelement <2 x i64> [[BIN_RDX]], i32 0
				; CHECK-NEXT: ret i64 [[TMP0]]
				;
				entry:
				%r = call i64 @llvm.experimental.vector.reduce.xor.i64.v2i64(<2 x i64> %vec)
				ret i64 %r
				}

				define float @fadd_f32(<4 x float> %vec) {
				; CHECK-LABEL: @fadd_f32(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[RDX_SHUF:%.]] = shufflevector <4 x float> [[VEC:%.]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
				; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[VEC]], [[RDX_SHUF]]
				; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
				; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
				; CHECK-NEXT: [[TMP0:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
				; CHECK-NEXT: ret float [[TMP0]]
				;
				entry:
				%r = call fast float @llvm.experimental.vector.reduce.fadd.f32.v4f32(float undef, <4 x float> %vec)
				ret float %r
				}

				define float @fadd_f32_strict(<4 x float> %vec) {
				; CHECK-LABEL: @fadd_f32_strict(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[R:%.]] = call float @llvm.experimental.vector.reduce.fadd.f32.f32.v4f32(float undef, <4 x float> [[VEC:%.]])
				; CHECK-NEXT: ret float [[R]]
				;
				entry:
				%r = call float @llvm.experimental.vector.reduce.fadd.f32.v4f32(float undef, <4 x float> %vec)
				ret float %r
				}

				define float @fmul_f32(<4 x float> %vec) {
				; CHECK-LABEL: @fmul_f32(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[RDX_SHUF:%.]] = shufflevector <4 x float> [[VEC:%.]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
				; CHECK-NEXT: [[BIN_RDX:%.*]] = fmul fast <4 x float> [[VEC]], [[RDX_SHUF]]
				; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
				; CHECK-NEXT: [[BIN_RDX2:%.*]] = fmul fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
				; CHECK-NEXT: [[TMP0:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
				; CHECK-NEXT: ret float [[TMP0]]
				;
				entry:
				%r = call fast float @llvm.experimental.vector.reduce.fmul.f32.v4f32(float undef, <4 x float> %vec)
				ret float %r
				}

				define i64 @smax_i64(<2 x i64> %vec) {
				; CHECK-LABEL: @smax_i64(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[RDX_SHUF:%.]] = shufflevector <2 x i64> [[VEC:%.]], <2 x i64> undef, <2 x i32> <i32 1, i32 undef>
				; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <2 x i64> [[VEC]], [[RDX_SHUF]]
				; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <2 x i1> [[RDX_MINMAX_CMP]], <2 x i64> [[VEC]], <2 x i64> [[RDX_SHUF]]
				; CHECK-NEXT: [[TMP0:%.*]] = extractelement <2 x i64> [[RDX_MINMAX_SELECT]], i32 0
				; CHECK-NEXT: ret i64 [[TMP0]]
				;
				entry:
				%r = call i64 @llvm.experimental.vector.reduce.smax.i64.v2i64(<2 x i64> %vec)
				ret i64 %r
				}

				define i64 @smin_i64(<2 x i64> %vec) {
				; CHECK-LABEL: @smin_i64(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[RDX_SHUF:%.]] = shufflevector <2 x i64> [[VEC:%.]], <2 x i64> undef, <2 x i32> <i32 1, i32 undef>
				; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp slt <2 x i64> [[VEC]], [[RDX_SHUF]]
				; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <2 x i1> [[RDX_MINMAX_CMP]], <2 x i64> [[VEC]], <2 x i64> [[RDX_SHUF]]
				; CHECK-NEXT: [[TMP0:%.*]] = extractelement <2 x i64> [[RDX_MINMAX_SELECT]], i32 0
				; CHECK-NEXT: ret i64 [[TMP0]]
				;
				entry:
				%r = call i64 @llvm.experimental.vector.reduce.smin.i64.v2i64(<2 x i64> %vec)
				ret i64 %r
				}

				define i64 @umax_i64(<2 x i64> %vec) {
				; CHECK-LABEL: @umax_i64(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[RDX_SHUF:%.]] = shufflevector <2 x i64> [[VEC:%.]], <2 x i64> undef, <2 x i32> <i32 1, i32 undef>
				; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp ugt <2 x i64> [[VEC]], [[RDX_SHUF]]
				; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <2 x i1> [[RDX_MINMAX_CMP]], <2 x i64> [[VEC]], <2 x i64> [[RDX_SHUF]]
				; CHECK-NEXT: [[TMP0:%.*]] = extractelement <2 x i64> [[RDX_MINMAX_SELECT]], i32 0
				; CHECK-NEXT: ret i64 [[TMP0]]
				;
				entry:
				%r = call i64 @llvm.experimental.vector.reduce.umax.i64.v2i64(<2 x i64> %vec)
				ret i64 %r
				}

				define i64 @umin_i64(<2 x i64> %vec) {
				; CHECK-LABEL: @umin_i64(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[RDX_SHUF:%.]] = shufflevector <2 x i64> [[VEC:%.]], <2 x i64> undef, <2 x i32> <i32 1, i32 undef>
				; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp ult <2 x i64> [[VEC]], [[RDX_SHUF]]
				; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <2 x i1> [[RDX_MINMAX_CMP]], <2 x i64> [[VEC]], <2 x i64> [[RDX_SHUF]]
				; CHECK-NEXT: [[TMP0:%.*]] = extractelement <2 x i64> [[RDX_MINMAX_SELECT]], i32 0
				; CHECK-NEXT: ret i64 [[TMP0]]
				;
				entry:
				%r = call i64 @llvm.experimental.vector.reduce.umin.i64.v2i64(<2 x i64> %vec)
				ret i64 %r
				}

				define double @fmax_f64(<2 x double> %vec) {
				; CHECK-LABEL: @fmax_f64(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[RDX_SHUF:%.]] = shufflevector <2 x double> [[VEC:%.]], <2 x double> undef, <2 x i32> <i32 1, i32 undef>
				; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = fcmp fast ogt <2 x double> [[VEC]], [[RDX_SHUF]]
				; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <2 x i1> [[RDX_MINMAX_CMP]], <2 x double> [[VEC]], <2 x double> [[RDX_SHUF]]
				; CHECK-NEXT: [[TMP0:%.*]] = extractelement <2 x double> [[RDX_MINMAX_SELECT]], i32 0
				; CHECK-NEXT: ret double [[TMP0]]
				;
				entry:
				%r = call double @llvm.experimental.vector.reduce.fmax.f64.v2f64(<2 x double> %vec)
				ret double %r
				}

				define double @fmin_f64(<2 x double> %vec) {
				; CHECK-LABEL: @fmin_f64(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[RDX_SHUF:%.]] = shufflevector <2 x double> [[VEC:%.]], <2 x double> undef, <2 x i32> <i32 1, i32 undef>
				; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = fcmp fast olt <2 x double> [[VEC]], [[RDX_SHUF]]
				; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <2 x i1> [[RDX_MINMAX_CMP]], <2 x double> [[VEC]], <2 x double> [[RDX_SHUF]]
				; CHECK-NEXT: [[TMP0:%.*]] = extractelement <2 x double> [[RDX_MINMAX_SELECT]], i32 0
				; CHECK-NEXT: ret double [[TMP0]]
				;
				entry:
				%r = call double @llvm.experimental.vector.reduce.fmin.f64.v2f64(<2 x double> %vec)
				ret double %r
				}

tools/llc/llc.cpp

Show First 20 Lines • Show All 295 Lines • ▼ Show 20 Lines	int main(int argc, char **argv) {
initializeCodeGen(*Registry);		initializeCodeGen(*Registry);
initializeLoopStrengthReducePass(*Registry);		initializeLoopStrengthReducePass(*Registry);
initializeLowerIntrinsicsPass(*Registry);		initializeLowerIntrinsicsPass(*Registry);
initializeCountingFunctionInserterPass(*Registry);		initializeCountingFunctionInserterPass(*Registry);
initializeUnreachableBlockElimLegacyPassPass(*Registry);		initializeUnreachableBlockElimLegacyPassPass(*Registry);
initializeConstantHoistingLegacyPassPass(*Registry);		initializeConstantHoistingLegacyPassPass(*Registry);
initializeScalarOpts(*Registry);		initializeScalarOpts(*Registry);
initializeVectorization(*Registry);		initializeVectorization(*Registry);
		initializeExpandReductionsPass(*Registry);

// Register the target printer for --version.		// Register the target printer for --version.
cl::AddExtraVersionPrinter(TargetRegistry::printRegisteredTargetsForVersion);		cl::AddExtraVersionPrinter(TargetRegistry::printRegisteredTargetsForVersion);

cl::ParseCommandLineOptions(argc, argv, "llvm system compiler\n");		cl::ParseCommandLineOptions(argc, argv, "llvm system compiler\n");

Context.setDiscardValueNames(DiscardValueNames);		Context.setDiscardValueNames(DiscardValueNames);

▲ Show 20 Lines • Show All 303 Lines • Show Last 20 Lines

tools/opt/opt.cpp

Show First 20 Lines • Show All 391 Lines • ▼ Show 20 Lines	int main(int argc, char **argv) {
initializeDwarfEHPreparePass(Registry);		initializeDwarfEHPreparePass(Registry);
initializeSafeStackPass(Registry);		initializeSafeStackPass(Registry);
initializeSjLjEHPreparePass(Registry);		initializeSjLjEHPreparePass(Registry);
initializePreISelIntrinsicLoweringLegacyPassPass(Registry);		initializePreISelIntrinsicLoweringLegacyPassPass(Registry);
initializeGlobalMergePass(Registry);		initializeGlobalMergePass(Registry);
initializeInterleavedAccessPass(Registry);		initializeInterleavedAccessPass(Registry);
initializeCountingFunctionInserterPass(Registry);		initializeCountingFunctionInserterPass(Registry);
initializeUnreachableBlockElimLegacyPassPass(Registry);		initializeUnreachableBlockElimLegacyPassPass(Registry);
		initializeExpandReductionsPass(Registry);

#ifdef LINK_POLLY_INTO_TOOLS		#ifdef LINK_POLLY_INTO_TOOLS
polly::initializePollyPasses(Registry);		polly::initializePollyPasses(Registry);
#endif		#endif

cl::ParseCommandLineOptions(argc, argv,		cl::ParseCommandLineOptions(argc, argv,
"llvm .bc -> .bc modular optimizer and analysis printer\n");		"llvm .bc -> .bc modular optimizer and analysis printer\n");

▲ Show 20 Lines • Show All 363 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Add an IR expansion pass for the experimental reductionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 98285

include/llvm/Analysis/TargetTransformInfo.h

include/llvm/Analysis/TargetTransformInfoImpl.h

include/llvm/CodeGen/ExpandReductions.h

include/llvm/CodeGen/Passes.h

include/llvm/InitializePasses.h

include/llvm/Transforms/Utils/LoopUtils.h

lib/Analysis/TargetTransformInfo.cpp

lib/CodeGen/CMakeLists.txt

lib/CodeGen/ExpandReductions.cpp

lib/CodeGen/TargetPassConfig.cpp

lib/Target/AArch64/AArch64TargetTransformInfo.h

lib/Transforms/Utils/LoopUtils.cpp

test/CodeGen/Generic/expand-experimental-reductions.ll

tools/llc/llc.cpp

tools/opt/opt.cpp

Add an IR expansion pass for the experimental reductions
ClosedPublic