This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
-
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
-
ValueTracking.h
-
Transforms/Utils/
-
Utils/
-
LoopUtils.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
ValueTracking.cpp
-
Target/PowerPC/
-
PowerPC/
-
PPCTargetTransformInfo.h
-
PPCTargetTransformInfo.cpp
-
Transforms/Scalar/
-
Scalar/
-
LICM.cpp

Differential D135451

[TTI] New PPC target hook enableUncondDivisionSpeculation
AbandonedPublic

Authored by alexgatea on Oct 7 2022, 8:39 AM.

Download Raw Diff

Details

Reviewers

nemanjai
bmahjour
nlopes
nikic
spatel
RKSimon

Summary

Created PPC-specific TTI hook for enabling unconditional speculative execution of division operations in llvm::isSafeToSpeculativelyExecute(). Integer division by 0 does not cause an exception on PPC, so it should be safe to speculate divide operations. This allows the compiler to optimize more aggressively, which benefits other passes.

For example, consider the following code optimized with clang -O3

void foo (int m);
int fcn (int x, int y){
  int n = 0;
  for(int i = 0; i < 1000; i++){
    foo(0);
    n += x/y;
  }
  return n;
}

With the speculative execution off, the final assembly file effectively does

int n = 0;
int tmp = x/y;
for(int i = 0; i < 1000; i++){
    foo(0);
    n += tmp;
}
return n;

However, with the speculative execution on, LICM hoists the division early in the pipeline which allows IndVarSimplifyPass to optimize
n += tmp in the loop to a multiplication at the end:

for(int i = 0; i < 1000; i++)
    foo(0);
return (x/y) * 1000;

Diff Detail

Event Timeline

alexgatea created this revision.Oct 7 2022, 8:39 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 7 2022, 8:39 AM

Herald added subscribers: shchenz, asbirlea, kbarton, hiraditya. · View Herald Transcript

alexgatea requested review of this revision.Oct 7 2022, 8:39 AM

Herald added a subscriber: llvm-commits. · View Herald TranscriptOct 7 2022, 8:39 AM

IIUC this proposal would effectively re-define udiv and urem's semantics on the IR level to not have undefined behavior for PPC?

If that's the case, this sounds problematic, as the semantics of LLVM IR instructions are target-independent. Adding some other reviewers for additional who might have additional thoughts.

In D135451#3842992, @fhahn wrote:

IIUC this proposal would effectively re-define udiv and urem's semantics on the IR level to not have undefined behavior for PPC?

I don't think that's quite correct. We still view them as undefined, it's just that we allow further optimizations to happen that we previously bailed out of. The example I gave shows exactly this; without the speculative execution the div is still hoisted to the preheader, but this is done much later in the pipeline by MachineLICM so we do not optimize it fully (because IndVarSimplifyPass occurs earlier).

In D135451#3842992, @fhahn wrote:

Adding some other reviewers for additional who might have additional thoughts.

Of course, I'd like to get other reviewers' thoughts as well.

In D135451#3843064, @alexgatea wrote:

In D135451#3842992, @fhahn wrote:

IIUC this proposal would effectively re-define udiv and urem's semantics on the IR level to not have undefined behavior for PPC?

I don't think that's quite correct. We still view them as undefined, it's just that we allow further optimizations to happen that we previously bailed out of. The example I gave shows exactly this; without the speculative execution the div is still hoisted to the preheader, but this is done much later in the pipeline by MachineLICM so we do not optimize it fully (because IndVarSimplifyPass occurs earlier).

Right, but the reason MachineLICM can do this today is because at this point we are dealing with machine instructions.

In terms of LLVM IR semantics, this change would allow hoisting an instruction that may produce UB into a path that didn't have UB before AFAICT.

Harbormaster completed remote builds in B190951: Diff 466084.Oct 7 2022, 9:39 AM

In D135451#3843064, @alexgatea wrote:

In D135451#3842992, @fhahn wrote:

IIUC this proposal would effectively re-define udiv and urem's semantics on the IR level to not have undefined behavior for PPC?

I don't think that's quite correct. We still view them as undefined,

division by zero is currently undefined behavior at the IR level; if your program would execute it, it has no meaning at all. So hoisting a divide will interact badly with other optimizations; for example, instcombine will currently turn a divide by zero into "unreachable". This is different from instructions that return poison.

If you want a version of division that returns a poison value, you need to modify the semantics in LangRef.

In the original example, instead of trying to make the divide hoistable, you could teach LLVM to peel the first iteration of the loop, then CSE the divide.

In D135451#3843081, @fhahn wrote:

In D135451#3843064, @alexgatea wrote:

In D135451#3842992, @fhahn wrote:

IIUC this proposal would effectively re-define udiv and urem's semantics on the IR level to not have undefined behavior for PPC?

I don't think that's quite correct. We still view them as undefined, it's just that we allow further optimizations to happen that we previously bailed out of. The example I gave shows exactly this; without the speculative execution the div is still hoisted to the preheader, but this is done much later in the pipeline by MachineLICM so we do not optimize it fully (because IndVarSimplifyPass occurs earlier).

Right, but the reason MachineLICM can do this today is because at this point we are dealing with machine instructions.

In terms of LLVM IR semantics, this change would allow hoisting an instruction that may produce UB into a path that didn't have UB before AFAICT.

I see what you mean. It's definitely a valid point, let's see what others think as well.

Marking this as changes requested per above comments. I agree that this is not legal under current LLVM IR semantics.

This revision now requires changes to proceed.Oct 7 2022, 2:31 PM

In D135451#3843135, @efriedma wrote:

In D135451#3843064, @alexgatea wrote:

In D135451#3842992, @fhahn wrote:

IIUC this proposal would effectively re-define udiv and urem's semantics on the IR level to not have undefined behavior for PPC?

I don't think that's quite correct. We still view them as undefined,

division by zero is currently undefined behavior at the IR level; if your program would execute it, it has no meaning at all. So hoisting a divide will interact badly with other optimizations; for example, instcombine will currently turn a divide by zero into "unreachable". This is different from instructions that return poison.

If you want a version of division that returns a poison value, you need to modify the semantics in LangRef.

As I understand it integer divide by zero is considered undefined behaviour in C while the same may not be true in other languages. Furthermore presence of side effects may be dependent on other configurations including target hardware (eg default treatment on Power vs x86). LLVM seems to distinguish the concept of "undefined behaviour" from "undefined value", treating the former more consequential than the latter. It currently treats div-by-zero as undefined behaviour, but that may be an over pessimistic treatment as demonstrated in this review. Baking assumptions about the source language or target hardware in the LLVM IR gets us into the situation were we have to sacrifice performance for some combinations to ensure functional correctness for others. To allow more flexibility we could either leave the IR neutral and let the optimizer decide based on config info (eg. TTI) or separate the undefined behaviour/value semantics from the udiv/urem instructions (eg. using an instruction flag). I think this revision takes the first approach and what you are suggesting is the second. I agree the second approach is cleaner and might be necessary given the historic assumptions made in this regard, although it would be a larger effort.

In the original example, instead of trying to make the divide hoistable, you could teach LLVM to peel the first iteration of the loop, then CSE the divide.

I don't think that would work in general, for example if the loop had unknown bounds, because peeling in such cases would still require the peeled iteration to be conditionally executed.

As I understand it integer divide by zero is considered undefined behaviour in C while the same may not be true in other languages. Furthermore presence of side effects may be dependent on other configurations including target hardware (eg default treatment on Power vs x86).

The LLVM IR rule is more driven by the behavior of the instructions on various targets. If a target only has a trapping divide, we'd need to wrap it in control flow to implement a non-trapping divide. And particularly for signed divide, that check isn't cheap. We tend to prefer poison where it makes sense (for example, out-of-bounds shifts).

Frontends can always use control flow to get whatever user-visible behavior they want. (For example, the Rust divide operator panics if you divide by zero.)

To allow more flexibility we could either leave the IR neutral and let the optimizer decide based on config info (eg. TTI)

We try to avoid making core IR semantics depend on TTI. Not that we can completely ignore target differences when writing IR optimizations, but we want to keep IR understandable without reference to target-specific semantics.

I mean, it would be self-consistent to write in LangRef something like "whether division by zero is undefined behavior, or defined to produce a poison value, depends on the current target/current subtarget/bitwith of the operation/current moon cycle". But I don't want to go there. If the rules are the same across all targets, it's easier to understand, and easier to implement tools like Alive2 to validate transforms.

In the original example, instead of trying to make the divide hoistable, you could teach LLVM to peel the first iteration of the loop, then CSE the divide.

I don't think that would work in general, for example if the loop had unknown bounds, because peeling in such cases would still require the peeled iteration to be conditionally executed.

Any loop can be peeled (as long as the body doesn't contain some exotic construct that inhibits cloning); it's basically just cloning the loop body. And if the divide dominates the latch before peeling, it will dominate the peeled loop after peeling.

The "general" problem is really the case where peeling is too expensive.

I would also prefer to not go this route.
Alternatives include using predicated builtins, stick to pre-headers, or introduce a new intrinsic that yields poison on division by zero and/or INT_MAX/-1.

In D135451#3844084, @efriedma wrote:

As I understand it integer divide by zero is considered undefined behaviour in C while the same may not be true in other languages. Furthermore presence of side effects may be dependent on other configurations including target hardware (eg default treatment on Power vs x86).

The LLVM IR rule is more driven by the behavior of the instructions on various targets. If a target only has a trapping divide, we'd need to wrap it in control flow to implement a non-trapping divide. And particularly for signed divide, that check isn't cheap. We tend to prefer poison where it makes sense (for example, out-of-bounds shifts).

Frontends can always use control flow to get whatever user-visible behavior they want. (For example, the Rust divide operator panics if you divide by zero.)

To allow more flexibility we could either leave the IR neutral and let the optimizer decide based on config info (eg. TTI)

We try to avoid making core IR semantics depend on TTI. Not that we can completely ignore target differences when writing IR optimizations, but we want to keep IR understandable without reference to target-specific semantics.

I mean, it would be self-consistent to write in LangRef something like "whether division by zero is undefined behavior, or defined to produce a poison value, depends on the current target/current subtarget/bitwith of the operation/current moon cycle". But I don't want to go there. If the rules are the same across all targets, it's easier to understand, and easier to implement tools like Alive2 to validate transforms.

I'm sorry but I don't quite follow the logic that tries to justify the current design. On the one hand the IR rules are driven by target requirements, and at the same time you avoid making IR semantics dependent on TTI (which is the proper way to query about target requirements). They seem like recipes for the exact problem we are dealing with here, ie there are target-specific assumptions baked into the IR that are not configurable.

In the original example, instead of trying to make the divide hoistable, you could teach LLVM to peel the first iteration of the loop, then CSE the divide.

I don't think that would work in general, for example if the loop had unknown bounds, because peeling in such cases would still require the peeled iteration to be conditionally executed.

Any loop can be peeled (as long as the body doesn't contain some exotic construct that inhibits cloning); it's basically just cloning the loop body. And if the divide dominates the latch before peeling, it will dominate the peeled loop after peeling.

The "general" problem is really the case where peeling is too expensive.

consider this loop:

for (i = 0; i < n; i++) {
  v += x/y;
}

after peeling:

if (n > 0) {
  v += x/y;
}
for (i = 1; i < n; i++) {
  v += x/y;
}

The peeled divide that is guarded by the if (n > 0) conditional does not dominate the divide that's in the loop body. Even if we try to consider control flow equivalence between that guard and the loop guard, there could still be cases where the dominance cannot be safely determined (eg. non-affine loops).

In D135451#3849630, @bmahjour wrote:

In D135451#3844084, @efriedma wrote:

I mean, it would be self-consistent to write in LangRef something like "whether division by zero is undefined behavior, or defined to produce a poison value, depends on the current target/current subtarget/bitwith of the operation/current moon cycle". But I don't want to go there. If the rules are the same across all targets, it's easier to understand, and easier to implement tools like Alive2 to validate transforms.

I'm sorry but I don't quite follow the logic that tries to justify the current design. On the one hand the IR rules are driven by target requirements, and at the same time you avoid making IR semantics dependent on TTI (which is the proper way to query about target requirements). They seem like recipes for the exact problem we are dealing with here, ie there are target-specific assumptions baked into the IR that are not configurable.

A practical consequence of changing the semantics to be target-dependent is that you would also have to audit each piece of code the reasons about udiv to make sure it properly considers both semantics.

TTI is at the moment only used for queries that inform cost decisions AFAICT. Changing it to impact semantics of LLVM IR would be a big change IMO.

I don't think that would work in general, for example if the loop had unknown bounds, because peeling in such cases would still require the peeled iteration to be conditionally executed.

Any loop can be peeled (as long as the body doesn't contain some exotic construct that inhibits cloning); it's basically just cloning the loop body. And if the divide dominates the latch before peeling, it will dominate the peeled loop after peeling.

The "general" problem is really the case where peeling is too expensive.

consider this loop:
for (i = 0; i < n; i++) {
  v += x/y;
}
after peeling:
if (n > 0) {
  v += x/y;
}
for (i = 1; i < n; i++) {
  v += x/y;
}
The peeled divide that is guarded by the if (n > 0) conditional does not dominate the divide that's in the loop body. Even if we try to consider control flow equivalence between that guard and the loop guard, there could still be cases where the dominance cannot be safely determined (eg. non-affine loops).

I'm not sure I follow here. The loop peeling implementation in LLVM should make sure that the remainder loop is dominated by the peeled iterations. For the motivating example, peeling by 1 iteration seems to have the desired effect: https://clang.godbolt.org/z/rhvsdvEjG

Peeling is used in a similar fashion to turn loads in a loop into dereferenceable loads, see D108114.

In D135451#3849630, @bmahjour wrote:

In D135451#3844084, @efriedma wrote:

As I understand it integer divide by zero is considered undefined behaviour in C while the same may not be true in other languages. Furthermore presence of side effects may be dependent on other configurations including target hardware (eg default treatment on Power vs x86).

The LLVM IR rule is more driven by the behavior of the instructions on various targets. If a target only has a trapping divide, we'd need to wrap it in control flow to implement a non-trapping divide. And particularly for signed divide, that check isn't cheap. We tend to prefer poison where it makes sense (for example, out-of-bounds shifts).

Frontends can always use control flow to get whatever user-visible behavior they want. (For example, the Rust divide operator panics if you divide by zero.)

To allow more flexibility we could either leave the IR neutral and let the optimizer decide based on config info (eg. TTI)

We try to avoid making core IR semantics depend on TTI. Not that we can completely ignore target differences when writing IR optimizations, but we want to keep IR understandable without reference to target-specific semantics.

I mean, it would be self-consistent to write in LangRef something like "whether division by zero is undefined behavior, or defined to produce a poison value, depends on the current target/current subtarget/bitwith of the operation/current moon cycle". But I don't want to go there. If the rules are the same across all targets, it's easier to understand, and easier to implement tools like Alive2 to validate transforms.

I'm sorry but I don't quite follow the logic that tries to justify the current design. On the one hand the IR rules are driven by target requirements, and at the same time you avoid making IR semantics dependent on TTI (which is the proper way to query about target requirements). They seem like recipes for the exact problem we are dealing with here, ie there are target-specific assumptions baked into the IR that are not configurable.

The design is influenced by the all the ISAs we want to support, which means it won't be optimal for some (or any), but that's life. That doesn't imply the semantics of the IR must be parameterizable on TTI information.

Changing something fundamental like the semantics of the division instruction has far reaching effects. Even if we wanted to do it, your patch is very incomplete. We would need to audit every single optimization/analysis that touches divisions and make sure they would still be correct with your proposed semantics. Removing UB from division (as you propose) makes some optimizations wrong as they can't assume that RHS is always non-zero, for example.

In summary, changing semantics of div is not something we are willing to do. The risk is too high and the benefits seem low. The best way forward for you is to investigate existing intrinsics or create a new one.

Could you instead insert a clamp of the divisor and then pattern match that out during selection?

In D135451#3849697, @arsenm wrote:

Could you instead insert a clamp of the divisor and then pattern match that out during selection?

Hmm not sure what you mean by selection. Could you please elaborate (perhaps with an example)?

Thank you for all the comments and suggestions! Closing this PR since the consensus is that using a TTI hook to create target-specific IR semantics is undesirable.

In D135451#3859075, @alexgatea wrote:

In D135451#3849697, @arsenm wrote:

Could you instead insert a clamp of the divisor and then pattern match that out during selection?

Hmm not sure what you mean by selection. Could you please elaborate (perhaps with an example)?

If you speculate sdiv x, y, replace it with sdiv x, (y == 0 ? 1 : y) or whatever behavior you get for this case. In your backend, then pattern match the divide by 0 check when selecting to the instruction

In D135451#3871711, @arsenm wrote:

In D135451#3859075, @alexgatea wrote:

In D135451#3849697, @arsenm wrote:

Could you instead insert a clamp of the divisor and then pattern match that out during selection?

Hmm not sure what you mean by selection. Could you please elaborate (perhaps with an example)?

If you speculate sdiv x, y, replace it with sdiv x, (y == 0 ? 1 : y) or whatever behavior you get for this case. In your backend, then pattern match the divide by 0 check when selecting to the instruction

I see what you mean. But what if the optimizer in the meantime changes/removes the select instruction? E.g. if y = foo() where foo() always returns 0 then the optimizer will at some point replace (y == 0 ? 1 : y) with 1. Also, how can we know in the backend that the original instruction was sdiv x, y and not actually sdiv x, (y == 0 ? 1 : y) ?

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

9 lines

TargetTransformInfoImpl.h

2 lines

ValueTracking.h

7 lines

Transforms/

Utils/

LoopUtils.h

7 lines

lib/

Analysis/

TargetTransformInfo.cpp

4 lines

ValueTracking.cpp

12 lines

Target/

PowerPC/

PPCTargetTransformInfo.h

1 line

PPCTargetTransformInfo.cpp

2 lines

Transforms/

Scalar/

LICM.cpp

29 lines

Diff 466084

llvm/include/llvm/Analysis/TargetTransformInfo.h

Context not available.
	bool hasActiveVectorLength(unsigned Opcode, Type *DataType,	bool hasActiveVectorLength(unsigned Opcode, Type *DataType,
	Align Alignment) const;	Align Alignment) const;

		/// \returns True when division operations are allowed to be executed speculatively
		/// unconditionally.
		bool enableUncondDivisionSpeculation() const;

	struct VPLegalization {	struct VPLegalization {
	enum VPTransform {	enum VPTransform {
	// keep the predicating parameter	// keep the predicating parameter
Context not available.
	virtual bool supportsScalableVectors() const = 0;	virtual bool supportsScalableVectors() const = 0;
	virtual bool hasActiveVectorLength(unsigned Opcode, Type *DataType,	virtual bool hasActiveVectorLength(unsigned Opcode, Type *DataType,
	Align Alignment) const = 0;	Align Alignment) const = 0;
		virtual bool enableUncondDivisionSpeculation() const = 0;
	virtual VPLegalization	virtual VPLegalization
	getVPLegalizationStrategy(const VPIntrinsic &PI) const = 0;	getVPLegalizationStrategy(const VPIntrinsic &PI) const = 0;
	};	};
Context not available.
	return Impl.hasActiveVectorLength(Opcode, DataType, Alignment);	return Impl.hasActiveVectorLength(Opcode, DataType, Alignment);
	}	}

		bool enableUncondDivisionSpeculation() const override {
		return Impl.enableUncondDivisionSpeculation();
		}

	VPLegalization	VPLegalization
	getVPLegalizationStrategy(const VPIntrinsic &PI) const override {	getVPLegalizationStrategy(const VPIntrinsic &PI) const override {
	return Impl.getVPLegalizationStrategy(PI);	return Impl.getVPLegalizationStrategy(PI);
Context not available.

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Context not available.
	return false;	return false;
	}	}

		bool enableUncondDivisionSpeculation() const { return false; }

	TargetTransformInfo::VPLegalization	TargetTransformInfo::VPLegalization
	getVPLegalizationStrategy(const VPIntrinsic &PI) const {	getVPLegalizationStrategy(const VPIntrinsic &PI) const {
	return TargetTransformInfo::VPLegalization(	return TargetTransformInfo::VPLegalization(
Context not available.

llvm/include/llvm/Analysis/ValueTracking.h

Context not available.
	class OptimizationRemarkEmitter;	class OptimizationRemarkEmitter;
	class StringRef;	class StringRef;
	class TargetLibraryInfo;	class TargetLibraryInfo;
		class TargetTransformInfo;
	class Value;	class Value;

	constexpr unsigned MaxAnalysisRecursionDepth = 6;	constexpr unsigned MaxAnalysisRecursionDepth = 6;
Context not available.
	const Instruction *CtxI = nullptr,	const Instruction *CtxI = nullptr,
	AssumptionCache *AC = nullptr,	AssumptionCache *AC = nullptr,
	const DominatorTree *DT = nullptr,	const DominatorTree *DT = nullptr,
	const TargetLibraryInfo *TLI = nullptr);	const TargetLibraryInfo *TLI = nullptr,
		const TargetTransformInfo *TTI = nullptr);

	/// This returns the same result as isSafeToSpeculativelyExecute if Opcode is	/// This returns the same result as isSafeToSpeculativelyExecute if Opcode is
	/// the actual opcode of Inst. If the provided and actual opcode differ, the	/// the actual opcode of Inst. If the provided and actual opcode differ, the
Context not available.
	bool isSafeToSpeculativelyExecuteWithOpcode(	bool isSafeToSpeculativelyExecuteWithOpcode(
	unsigned Opcode, const Instruction Inst, const Instruction CtxI = nullptr,	unsigned Opcode, const Instruction Inst, const Instruction CtxI = nullptr,
	AssumptionCache AC = nullptr, const DominatorTree DT = nullptr,	AssumptionCache AC = nullptr, const DominatorTree DT = nullptr,
	const TargetLibraryInfo *TLI = nullptr);	const TargetLibraryInfo *TLI = nullptr,
		const TargetTransformInfo *TTI = nullptr);

	/// Returns true if the result or effects of the given instructions \p I	/// Returns true if the result or effects of the given instructions \p I
	/// depend values not reachable through the def use graph.	/// depend values not reachable through the def use graph.
Context not available.

llvm/include/llvm/Transforms/Utils/LoopUtils.h

Context not available.
	/// \p AllowSpeculation is whether values should be hoisted even if they are not	/// \p AllowSpeculation is whether values should be hoisted even if they are not
	/// guaranteed to execute in the loop, but are safe to speculatively execute.	/// guaranteed to execute in the loop, but are safe to speculatively execute.
	bool hoistRegion(DomTreeNode , AAResults , LoopInfo , DominatorTree ,	bool hoistRegion(DomTreeNode , AAResults , LoopInfo , DominatorTree ,
	AssumptionCache , TargetLibraryInfo , Loop *,	AssumptionCache , TargetLibraryInfo , TargetTransformInfo *,
	MemorySSAUpdater &, ScalarEvolution , ICFLoopSafetyInfo ,	Loop , MemorySSAUpdater &, ScalarEvolution , ICFLoopSafetyInfo *,
	SinkAndHoistLICMFlags &, OptimizationRemarkEmitter *, bool,	SinkAndHoistLICMFlags &, OptimizationRemarkEmitter *, bool,
	bool AllowSpeculation);	bool AllowSpeculation);

Context not available.
	const SmallSetVector<Value , 8> &, SmallVectorImpl<BasicBlock > &,	const SmallSetVector<Value , 8> &, SmallVectorImpl<BasicBlock > &,
	SmallVectorImpl<Instruction > &, SmallVectorImpl<MemoryAccess > &,	SmallVectorImpl<Instruction > &, SmallVectorImpl<MemoryAccess > &,
	PredIteratorCache &, LoopInfo , DominatorTree , AssumptionCache *AC,	PredIteratorCache &, LoopInfo , DominatorTree , AssumptionCache *AC,
	const TargetLibraryInfo , Loop , MemorySSAUpdater &, ICFLoopSafetyInfo *,	const TargetLibraryInfo , const TargetTransformInfo TTI, Loop *,
		MemorySSAUpdater &, ICFLoopSafetyInfo *,
	OptimizationRemarkEmitter *, bool AllowSpeculation);	OptimizationRemarkEmitter *, bool AllowSpeculation);

	/// Does a BFS from a given node to all of its children inside a given loop.	/// Does a BFS from a given node to all of its children inside a given loop.
Context not available.

llvm/lib/Analysis/TargetTransformInfo.cpp

Context not available.
	return TTIImpl->hasActiveVectorLength(Opcode, DataType, Alignment);	return TTIImpl->hasActiveVectorLength(Opcode, DataType, Alignment);
	}	}

		bool TargetTransformInfo::enableUncondDivisionSpeculation() const {
		return TTIImpl->enableUncondDivisionSpeculation();
		}

	TargetTransformInfo::Concept::~Concept() = default;	TargetTransformInfo::Concept::~Concept() = default;

	TargetIRAnalysis::TargetIRAnalysis() : TTICallback(&getDefaultTTI) {}	TargetIRAnalysis::TargetIRAnalysis() : TTICallback(&getDefaultTTI) {}
Context not available.

llvm/lib/Analysis/ValueTracking.cpp

Context not available.
	#include "llvm/Analysis/LoopInfo.h"	#include "llvm/Analysis/LoopInfo.h"
	#include "llvm/Analysis/OptimizationRemarkEmitter.h"	#include "llvm/Analysis/OptimizationRemarkEmitter.h"
	#include "llvm/Analysis/TargetLibraryInfo.h"	#include "llvm/Analysis/TargetLibraryInfo.h"
		#include "llvm/Analysis/TargetTransformInfo.h"
	#include "llvm/IR/Argument.h"	#include "llvm/IR/Argument.h"
	#include "llvm/IR/Attributes.h"	#include "llvm/IR/Attributes.h"
	#include "llvm/IR/BasicBlock.h"	#include "llvm/IR/BasicBlock.h"
Context not available.
	const Instruction *CtxI,	const Instruction *CtxI,
	AssumptionCache *AC,	AssumptionCache *AC,
	const DominatorTree *DT,	const DominatorTree *DT,
	const TargetLibraryInfo *TLI) {	const TargetLibraryInfo *TLI,
		const TargetTransformInfo *TTI) {
	return isSafeToSpeculativelyExecuteWithOpcode(Inst->getOpcode(), Inst, CtxI,	return isSafeToSpeculativelyExecuteWithOpcode(Inst->getOpcode(), Inst, CtxI,
	AC, DT, TLI);	AC, DT, TLI,TTI);
	}	}

	bool llvm::isSafeToSpeculativelyExecuteWithOpcode(	bool llvm::isSafeToSpeculativelyExecuteWithOpcode(
	unsigned Opcode, const Instruction Inst, const Instruction CtxI,	unsigned Opcode, const Instruction Inst, const Instruction CtxI,
	AssumptionCache AC, const DominatorTree DT,	AssumptionCache AC, const DominatorTree DT,
	const TargetLibraryInfo *TLI) {	const TargetLibraryInfo TLI,const TargetTransformInfo TTI) {
	#ifndef NDEBUG	#ifndef NDEBUG
	if (Inst->getOpcode() != Opcode) {	if (Inst->getOpcode() != Opcode) {
	// Check that the operands are actually compatible with the Opcode override.	// Check that the operands are actually compatible with the Opcode override.
Context not available.
	}	}
	#endif	#endif

		// Instructions are to be executed speculatively unconditionally.
		bool ignoreDivCheck = TTI && TTI->enableUncondDivisionSpeculation();
	switch (Opcode) {	switch (Opcode) {
	default:	default:
	return true;	return true;
	case Instruction::UDiv:	case Instruction::UDiv:
	case Instruction::URem: {	case Instruction::URem: {
		if(ignoreDivCheck) return true;
	// x / y is undefined if y == 0.	// x / y is undefined if y == 0.
	const APInt *V;	const APInt *V;
	if (match(Inst->getOperand(1), m_APInt(V)))	if (match(Inst->getOperand(1), m_APInt(V)))
Context not available.
	}	}
	case Instruction::SDiv:	case Instruction::SDiv:
	case Instruction::SRem: {	case Instruction::SRem: {
		if(ignoreDivCheck) return true;
	// x / y is undefined if y == 0 or x == INT_MIN and y == -1	// x / y is undefined if y == 0 or x == INT_MIN and y == -1
	const APInt Numerator, Denominator;	const APInt Numerator, Denominator;
	if (!match(Inst->getOperand(1), m_APInt(Denominator)))	if (!match(Inst->getOperand(1), m_APInt(Denominator)))
Context not available.

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h

Context not available.
	const ArrayRef<Type *> &Types) const;	const ArrayRef<Type *> &Types) const;
	bool hasActiveVectorLength(unsigned Opcode, Type *DataType,	bool hasActiveVectorLength(unsigned Opcode, Type *DataType,
	Align Alignment) const;	Align Alignment) const;
		bool enableUncondDivisionSpeculation() const;
	InstructionCost getVPMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,	InstructionCost getVPMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,
	unsigned AddressSpace,	unsigned AddressSpace,
	TTI::TargetCostKind CostKind,	TTI::TargetCostKind CostKind,
Context not available.

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp

Context not available.
	return IntWidth == 8 \|\| IntWidth == 16 \|\| IntWidth == 32 \|\| IntWidth == 64;	return IntWidth == 8 \|\| IntWidth == 16 \|\| IntWidth == 32 \|\| IntWidth == 64;
	}	}

		bool PPCTTIImpl::enableUncondDivisionSpeculation() const { return true; }

	InstructionCost PPCTTIImpl::getVPMemoryOpCost(unsigned Opcode, Type *Src,	InstructionCost PPCTTIImpl::getVPMemoryOpCost(unsigned Opcode, Type *Src,
	Align Alignment,	Align Alignment,
	unsigned AddressSpace,	unsigned AddressSpace,
Context not available.

llvm/lib/Transforms/Scalar/LICM.cpp

Context not available.
	MemorySSAUpdater &MSSAU, OptimizationRemarkEmitter *ORE);	MemorySSAUpdater &MSSAU, OptimizationRemarkEmitter *ORE);
	static bool isSafeToExecuteUnconditionally(	static bool isSafeToExecuteUnconditionally(
	Instruction &Inst, const DominatorTree DT, const TargetLibraryInfo TLI,	Instruction &Inst, const DominatorTree DT, const TargetLibraryInfo TLI,
	const Loop CurLoop, const LoopSafetyInfo SafetyInfo,	const TargetTransformInfo TTI, const Loop CurLoop,
	OptimizationRemarkEmitter ORE, const Instruction CtxI,	const LoopSafetyInfo SafetyInfo, OptimizationRemarkEmitter ORE,
	AssumptionCache *AC, bool AllowSpeculation);	const Instruction CtxI, AssumptionCache AC, bool AllowSpeculation);
	static bool pointerInvalidatedByLoop(MemorySSA MSSA, MemoryUse MU,	static bool pointerInvalidatedByLoop(MemorySSA MSSA, MemoryUse MU,
	Loop *CurLoop, Instruction &I,	Loop *CurLoop, Instruction &I,
	SinkAndHoistLICMFlags &Flags);	SinkAndHoistLICMFlags &Flags);
Context not available.
	MSSAU, &SafetyInfo, Flags, ORE);	MSSAU, &SafetyInfo, Flags, ORE);
	Flags.setIsSink(false);	Flags.setIsSink(false);
	if (Preheader)	if (Preheader)
	Changed \|= hoistRegion(DT->getNode(L->getHeader()), AA, LI, DT, AC, TLI, L,	Changed \|= hoistRegion(DT->getNode(L->getHeader()), AA, LI, DT, AC, TLI, TTI,
	MSSAU, SE, &SafetyInfo, Flags, ORE, LoopNestMode,	L, MSSAU, SE, &SafetyInfo, Flags, ORE, LoopNestMode,
	LicmAllowSpeculation);	LicmAllowSpeculation);

	// Now that all loop invariants have been removed from the loop, promote any	// Now that all loop invariants have been removed from the loop, promote any
Context not available.
	collectPromotionCandidates(MSSA, AA, L)) {	collectPromotionCandidates(MSSA, AA, L)) {
	LocalPromoted \|= promoteLoopAccessesToScalars(	LocalPromoted \|= promoteLoopAccessesToScalars(
	PointerMustAliases, ExitBlocks, InsertPts, MSSAInsertPts, PIC, LI,	PointerMustAliases, ExitBlocks, InsertPts, MSSAInsertPts, PIC, LI,
	DT, AC, TLI, L, MSSAU, &SafetyInfo, ORE, LicmAllowSpeculation);	DT, AC, TLI, TTI, L, MSSAU, &SafetyInfo, ORE, LicmAllowSpeculation);
	}	}
	Promoted \|= LocalPromoted;	Promoted \|= LocalPromoted;
	} while (LocalPromoted);	} while (LocalPromoted);
Context not available.
	///	///
	bool llvm::hoistRegion(DomTreeNode N, AAResults AA, LoopInfo *LI,	bool llvm::hoistRegion(DomTreeNode N, AAResults AA, LoopInfo *LI,
	DominatorTree DT, AssumptionCache AC,	DominatorTree DT, AssumptionCache AC,
	TargetLibraryInfo TLI, Loop CurLoop,	TargetLibraryInfo TLI, TargetTransformInfo TTI,
	MemorySSAUpdater &MSSAU, ScalarEvolution *SE,	Loop CurLoop, MemorySSAUpdater &MSSAU, ScalarEvolution SE,
	ICFLoopSafetyInfo *SafetyInfo,	ICFLoopSafetyInfo *SafetyInfo,
	SinkAndHoistLICMFlags &Flags,	SinkAndHoistLICMFlags &Flags,
	OptimizationRemarkEmitter *ORE, bool LoopNestMode,	OptimizationRemarkEmitter *ORE, bool LoopNestMode,
Context not available.
	if (CurLoop->hasLoopInvariantOperands(&I) &&	if (CurLoop->hasLoopInvariantOperands(&I) &&
	canSinkOrHoistInst(I, AA, DT, CurLoop, MSSAU, true, Flags, ORE) &&	canSinkOrHoistInst(I, AA, DT, CurLoop, MSSAU, true, Flags, ORE) &&
	isSafeToExecuteUnconditionally(	isSafeToExecuteUnconditionally(
	I, DT, TLI, CurLoop, SafetyInfo, ORE,	I, DT, TLI, TTI, CurLoop, SafetyInfo, ORE,
	CurLoop->getLoopPreheader()->getTerminator(), AC,	CurLoop->getLoopPreheader()->getTerminator(), AC,
	AllowSpeculation)) {	AllowSpeculation)) {
	hoist(I, DT, CurLoop, CFH.getOrCreateHoistedBlock(BB), SafetyInfo,	hoist(I, DT, CurLoop, CFH.getOrCreateHoistedBlock(BB), SafetyInfo,
Context not available.
	/// or if it is a trapping instruction and is guaranteed to execute.	/// or if it is a trapping instruction and is guaranteed to execute.
	static bool isSafeToExecuteUnconditionally(	static bool isSafeToExecuteUnconditionally(
	Instruction &Inst, const DominatorTree DT, const TargetLibraryInfo TLI,	Instruction &Inst, const DominatorTree DT, const TargetLibraryInfo TLI,
	const Loop CurLoop, const LoopSafetyInfo SafetyInfo,	const TargetTransformInfo TTI, const Loop CurLoop,
		const LoopSafetyInfo *SafetyInfo,
	OptimizationRemarkEmitter ORE, const Instruction CtxI,	OptimizationRemarkEmitter ORE, const Instruction CtxI,
	AssumptionCache *AC, bool AllowSpeculation) {	AssumptionCache *AC, bool AllowSpeculation) {
	if (AllowSpeculation &&	if (AllowSpeculation &&
Context not available.
	SmallVectorImpl<Instruction *> &InsertPts,	SmallVectorImpl<Instruction *> &InsertPts,
	SmallVectorImpl<MemoryAccess *> &MSSAInsertPts, PredIteratorCache &PIC,	SmallVectorImpl<MemoryAccess *> &MSSAInsertPts, PredIteratorCache &PIC,
	LoopInfo LI, DominatorTree DT, AssumptionCache *AC,	LoopInfo LI, DominatorTree DT, AssumptionCache *AC,
	const TargetLibraryInfo TLI, Loop CurLoop, MemorySSAUpdater &MSSAU,	const TargetLibraryInfo TLI, const TargetTransformInfo TTI, Loop *CurLoop,
	ICFLoopSafetyInfo SafetyInfo, OptimizationRemarkEmitter ORE,	MemorySSAUpdater &MSSAU, ICFLoopSafetyInfo *SafetyInfo,
	bool AllowSpeculation) {	OptimizationRemarkEmitter *ORE, bool AllowSpeculation) {
	// Verify inputs.	// Verify inputs.
	assert(LI != nullptr && DT != nullptr && CurLoop != nullptr &&	assert(LI != nullptr && DT != nullptr && CurLoop != nullptr &&
	SafetyInfo != nullptr &&	SafetyInfo != nullptr &&
Context not available.
	// alignment as well.	// alignment as well.
	if (!DereferenceableInPH \|\| (InstAlignment > Alignment))	if (!DereferenceableInPH \|\| (InstAlignment > Alignment))
	if (isSafeToExecuteUnconditionally(	if (isSafeToExecuteUnconditionally(
	*Load, DT, TLI, CurLoop, SafetyInfo, ORE,	*Load, DT, TLI, TTI, CurLoop, SafetyInfo, ORE,
	Preheader->getTerminator(), AC, AllowSpeculation)) {	Preheader->getTerminator(), AC, AllowSpeculation)) {
	DereferenceableInPH = true;	DereferenceableInPH = true;
	Alignment = std::max(Alignment, InstAlignment);	Alignment = std::max(Alignment, InstAlignment);
Context not available.