This is an archive of the discontinued LLVM Phabricator instance.

[Codegen Prepare] Swap commutative binops before splitting branch condition.
AbandonedPublic

Authored by bmakam on Jun 13 2016, 9:43 AM.

Download Raw Diff

Details

Reviewers

sebpop
rengolin
t.p.northover
majnemer
jmolloy
mcrosier

Summary

We generically canonicalize commutative binary operation nodes so that if
only one operand is a constant, it will be on the RHS. However, if both
operands are comparisons against constants, it is good to move the most
likely taken condition to the RHS if the binary operation is Instruction::And
and move the less likely taken condition to the RHS if the binary operation
is Instruction::Or.

Diff Detail

Event Timeline

bmakam updated this revision to Diff 60551.Jun 13 2016, 9:43 AM

bmakam retitled this revision from to [Codegen Prepare] Swap commutative binops before splitting branch condition..

bmakam updated this object.

bmakam added reviewers: jmolloy, majnemer, rengolin, t.p.northover, mcrosier.

bmakam added a subscriber: llvm-commits.

Herald added a subscriber: mcrosier. · View Herald TranscriptJun 13 2016, 9:43 AM

bmakam added a child revision: D20030: [AArch64] Add option to disable speculation of triangle whose tail is the only latch block.Jun 13 2016, 9:45 AM

bmakam mentioned this in D20030: [AArch64] Add option to disable speculation of triangle whose tail is the only latch block.Jun 13 2016, 11:16 AM

update lit tests.

Gentle ping. Fixes PR21600

cleanup code. NFCI.

flyingforyou added a subscriber: flyingforyou.Jun 16 2016, 2:34 AM

The code itself looks good, does what says on the tin. :)

But I wonder if this generic assumption works well (or better) for all targets...

Can you share a bit of the motivation behind this? Like benchmark numbers, etc?

Thanks Renato,

The main motivation for this patch is to fix the regressions we see in spec2006/mcf when D20030 is enabled. I have so far tested this patch in combination with D20030 on spec2006 on Kryo and overall the results are positive(better) for LTO config:

Benchmark                          Diff
---------------------------------------------
spec2006/astar:ref/BigLakes2048.cfg   0.47%
spec2006/gcc:ref/g23.in               0.41%
spec2006/gobmk:ref/nngs.tst          -0.33%
spec2006/gobmk:ref/score2.tst        -1.14%
spec2006/gobmk:ref/trevorc.tst       -0.50%
spec2006/gobmk:ref/trevord.tst       -0.47%
spec2006/mcf:ref                      8.14%
spec2006/omnetpp:ref                 -1.75%
spec2006/perlbench:ref/checkspam.pl  -2.34%
spec2006/perlbench:ref/splitmail.pl  -1.89%
spec2006/povray:ref                   1.02%
spec2006/sjeng:ref                    0.65%
spec2006/xalancbmk:ref                2.07%

In D21299#459899, @bmakam wrote:

The main motivation for this patch is to fix the regressions we see in spec2006/mcf when D20030 is enabled. I have so far tested this patch in combination with D20030 on spec2006 on Kryo and overall the results are positive(better) for LTO config:

Right, but this is a change in the generic codegen prepare, not even AArch64 specific. So, testing only on Kryo seems very dangerous.

I encourage you to run the same benchmarks on other architectures (AArch64 A57, ARMv7, x86_64 at least) before concluding that this is an overall winning strategy.

Or, you lower this piece of the code into the AArch64-specific part, and make sure to wrap around a target feature flag if it's only beneficial to Kryo.

cheers,
--renato

In principle, this change is target independent because it reassociates binary operands to simplify branches. The reassociation pass is designed for transformations that will help down the line optimizations such as constant propagation, GCSE, LICM, PRE etc.. so I moved it down to CGP.
I can certainly verify for A57 and know for a fact that it improves spec2006/mcf on A57 as well. However, I am uncertain of reliably testing and verifying on other targets.

If we want to move this to AArch64 backend only, this needs to be done at pre-ISel stage. AArch64PromoteConstantPass and Arch64AddressTypePromotionPass are the only pre-ISel passes in AArch64 backend but their purpose is different to what this change tries to accomplish. I'm not sure if it is reasonable to create another pass just to do this transformation.

Although this is target independent, I have added a feature flag to guard this change. It is currently enabled only for Kryo because I tested only on this target. If this is profitable for other targets, we can add the feature flag to those targets.

Renato, is this along the lines of what you were suggesting?

In D21299#461003, @bmakam wrote:

In principle, this change is target independent because it reassociates binary operands to simplify branches. The reassociation pass is designed for transformations that will help down the line optimizations such as constant propagation, GCSE, LICM, PRE etc.. so I moved it down to CGP.

The re-association is target independent, but guessing which branch will be taken probably isn't, as it depends on the branch-predictor, which are wildly different on some targets / workloads.

I can certainly verify for A57 and know for a fact that it improves spec2006/mcf on A57 as well. However, I am uncertain of reliably testing and verifying on other targets.

At least for A57 would be nice.

The new flag should suffice. It can also allow other target maintainers to test on their arches by adding the feature and benchmarking, and then commit the change if profitable. Only then, if this is universally true, we could remove the flag and make it a generic pass.

cheers,
--renato

In D21299#461547, @rengolin wrote:

In D21299#461003, @bmakam wrote:

In principle, this change is target independent because it reassociates binary operands to simplify branches. The reassociation pass is designed for transformations that will help down the line optimizations such as constant propagation, GCSE, LICM, PRE etc.. so I moved it down to CGP.

The re-association is target independent, but guessing which branch will be taken probably isn't, as it depends on the branch-predictor, which are wildly different on some targets / workloads.

I can certainly verify for A57 and know for a fact that it improves spec2006/mcf on A57 as well. However, I am uncertain of reliably testing and verifying on other targets.

At least for A57 would be nice.

The new flag should suffice. It can also allow other target maintainers to test on their arches by adding the feature and benchmarking, and then commit the change if profitable. Only then, if this is universally true, we could remove the flag and make it a generic pass.

Thanks Renato,
Performance runs on A57 are ongoing, and I will update the results once I get them.

The only clear performance differences on A57 are:

Benchmark                                           Diff
----------------------------------------------------------------
spec2006/mcf:ref                                    3.21
spec2006/hmmer:ref                                  1.22
spec2006/sjeng:ref                                  0.87
spec2006/sphinx3:ref                                0.71
spec2006/perlbench:ref                              -2.51
spec2006/povray:ref                                 -2.44

Overall this has a minor impact on performance on A57. Is it reasonable to turn it on for A57 too?

rebased.

It seems like there is no objection to turn this on for A57 but there is no official approval yet. If it helps to pursuade the community, I have completed running tests on A53 and this change is performance neutral on A53 with no noise regressions or gains. IMHO this seems to be good for AArch64 targets, but I am inclined to leave it enabled only for Kryo because there is no approval from the community. Any thoughts?

bmakam added a reviewer: sebpop.Jul 11 2016, 10:46 AM

sebpop added a subscriber: evandro.Jul 11 2016, 11:10 AM

If all cores that have FeaturePredictableSelectIsExpensive can also have the new flag, and it makes sense, it could be coalesced into a single flag?

It does look a bit redundant as it is, but I haven't really looked closely how the two features are linked.

@t.p.northover,any comments?

In D21299#480901, @rengolin wrote:

If all cores that have FeaturePredictableSelectIsExpensive can also have the new flag, and it makes sense, it could be coalesced into a single flag?

FWIW, PredictableSelectIsExpensive is also set in X86ISelLowering.cpp: PredictableSelectIsExpensive = Subtarget.getSchedModel().isOutOfOrder()
I created a new flag because I could not verify the profitability of this patch on x86 target. I agree if it makes sense to enable it for x86, we could coalesce into a single flag.

Yuck, I hate heuristics like this. It's not even particularly clear that "range size" correlates well with probability in real code, let alone with what any given branch predictor thinks of that probability.

When you remove the 'mcf' test that this patch was specifically written to target, it turns into a net regression on geomean even on Kryo. I'm not entirely sure how fair that is (I avoided statistics like the plague), but isn't using separate data to derive and test code/hypotheses generally considered a good thing?

In D21299#481076, @t.p.northover wrote:

Yuck, I hate heuristics like this. It's not even particularly clear that "range size" correlates well with probability in real code, let alone with what any given branch predictor thinks of that probability.

With all due respect, I think there are some facts that need some clarification. First, for targets which have cheap jump instructions, currently LLVM splits a conditional branch like:

%0 = icmp ne i32 %a, 0
%1 = icmp eq i32 %b, 2
%or.cond = and i1 %0, %1
br i1 %or.cond, label %TrueBB, label %FalseBB

into multiple branch instructions like:

BB1:
%0 = icmp ne i32 %a, 0
br i1 %0, label %SplitBB, label %FalseBB
SplitBB:
%1 = icmp eq i32 %b, 2
br i1 %1, label %TrueBB, label %FalseBB

This requires creation of SplitBB after BB1 and currently we update branch weights taking liberty under the assumption that

FalseProb for BB1 == TrueProb for BB1 * FalseProb for SplitBB.

This is very fragile because it assumes that the source order correlates well with the probability in real code which is not true as seen in mcf case here. Furthermore, the codegen doesn't end up always with source order because earlier transformations such as Reassociation and JumpThreading sometimes commutate the binary operands resulting in big swings in performance for reasons unrelated to the original change as seen in here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20141117/244920.html
What this patch tries to achieve is to address this issue by ranking the commutative binary operands based on their range sizes, so that we have a consistent codegen that doesn't fluctuate the performance with any minor changes in earlier optimization passes and at the same time gives better performance overall.
Second, it is very hard to correlate something with probability in real code. Even when using PGO for SPEC, we generate profile using train input and use it for ref input although for most benchmarks train input does not correlate to ref input. FWIW, this heuristic is not assigning any branch probabilities based on the range size, it only ranks the commutative binary operands based on the generic assumption that if we have two conditions a != 0 and b == 2, it is more likely that a != 0 than b == 2, because for all possible values 'a' and 'b' can take it is more likely that it is not equal to some value than it is equal to some other value. Conservatively, I scaled the comparision to logarithmic so that small differences in range sizes can be ignored. One situation this assumption might not hold is when 'a' or 'b' are enums or macros, although I haven't checked if this is the cause for some of the regressions we see.

When you remove the 'mcf' test that this patch was specifically written to target, it turns into a net regression on geomean even on Kryo. I'm not entirely sure how fair that is (I avoided statistics like the plague), but isn't using separate data to derive and test code/hypotheses generally considered a good thing?

Earnestly, the regressions were minor (except for perlbench which was -2.5%) compared to the gains that I did not spend time looking for the reason. I am sorry that you found this heuristic distasteful, one alternative I can think of to fix this issue is to canonicalize icmp ne to RHS of the AND expression or to LHS of the OR expression which might be less yuck but still strange.

FWIW, this heuristic is not assigning any branch probabilities based on the range size, it only ranks the commutative binary operands based on the generic assumption that if we have two conditions a != 0 and b == 2, it is more likely that a != 0 than b == 2

And I still think that's not obviously true. Integers actually used often take a very limited number of values, and this seems like a common idiom to me:

int res = some_func();
if (res < 0)
  llvm_unreachable("WTF happened");
else if (res == 0)
  [...]

Earnestly, the regressions were minor (except for perlbench which was -2.5%) compared to the gains that I did not spend time looking for the reason.

I don't mind that you didn't investigate the other regressions, I worry about the experimental soundness of including mcf in the analysis deciding whether this actually is an optimization. For example https://en.wikipedia.org/wiki/Test_set.

<rant>
But this is actually why I try to stay well clear of benchmarking discussions (unless asked, as here). I strongly suspect most of what we do around data gathering is fundamentally unprincipled and probably unsound, and would make actual statisticians and experimental designers claw their eyes out with a rusty spoon.

Unfortunately, I don't know enough to say precisely where we're going wrong or how to fix it. So I mostly just pretend it's not happening and keep out of those areas.
</rant>

To be clear, I'm not really trying to block this. I was asked what I thought and replied.

In D21299#482177, @t.p.northover wrote:
FWIW, this heuristic is not assigning any branch probabilities based on the range size, it only ranks the commutative binary operands based on the generic assumption that if we have two conditions a != 0 and b == 2, it is more likely that a != 0 than b == 2

And I still think that's not obviously true. Integers actually used often take a very limited number of values, and this seems like a common idiom to me:
int res = some_func();
if (res < 0)
  llvm_unreachable("WTF happened");
else if (res == 0)
  [...]

This is exactly the idiom I am trying to clarify. This change does not influence the branch direction for the idiom like above at all. All this change targets is code like

if (cond1 && cond2)
  do_something

if (cond1 || cond2)
  do_something

Currently, the codegen decides to split the condition into branches for targets which have cheap jump instructions and there is some flexibility in the order of the blocks containing cond1 and cond2. This patch is trying to rank cond1 and cond2 in such a way that it is profitable in most cases and is always consistent independent of earlier optimizations.

Thanks,
Balaram

For the giggles (really only that, as you might guess I view all numbers in this thread and any other LLVM benchmarking attempts with the deepest suspicion, I'm certainly not an experimental designer), I ran 3 tests on a Cyclone-like processor:

Enable the PredictableSelectIsExpensive feature (this seems reasonable, like Kryo predictable branches are very very cheap on Cyclone).
That plus the suggested heuristic
The first with exactly the opposite of the suggested heuristic.

The SPEC2006 results are below (> 1 => improvement)

Benchmark	suggested speedup	opposite speedup
433.milc/433.milc	1.00235569837	0.996451239064
444.namd/444.namd	1.00085232996	1.00266512107
447.dealII/447.dealII	1.00232774224	1.00139244117
450.soplex/450.soplex	0.991283676704	1.0
470.lbm/470.lbm	1.00033869904	0.992413122292
400.perlbench/400.perlbench	1.0124059534	1.00875815688
401.bzip2/401.bzip2	0.996946748407	0.999879439635
403.gcc/403.gcc	1.00149565217	1.01734859727
429.mcf/429.mcf	1.00949811937	0.997803188194
445.gobmk/445.gobmk	0.973701955496	0.996549344375
456.hmmer/456.hmmer	1.00178618018	1.00164546294
458.sjeng/458.sjeng	1.00405505452	0.994367699255
462.libquantum/462.libquantum	0.993502343417	0.989287229529
464.h264ref/464.h264ref	0.996045116438	1.00229694506
471.omnetpp/471.omnetpp	0.998019595581	1.00524934383
473.astar/473.astar	1.00208689727	0.998415807973
483.xalancbmk/483.xalancbmk	1.00291218638	0.987646150452

Geomeans were 0.99935572086 for the patch, 0.99951568293 for the complete opposite. Both worse than the status-quo, but whether in a statistically significant way, who knows?

I think the only conclusion we can really draw from this is that the LLVM project really needs to hire an actual scientist who specializes in designing experiments (not a computer scientist, not a mathematician who dabbles in programming) and give them some clout. We shouldn't be making these kinds of decisions based on ad-hoc runs of a handful of 10 year old benchmarks on ${RANDOM_HARDWARE}.

As for this patch, meh.

Tim.

Thanks for testing this patch out, Tim.

Now that I have lured you into benchmarking :) I will go out on a limb and claim that if you run this patch along with its dependent patch D20030 and enable the feature by turning on the flag "-aarch64-ccmp-disable-triangle-latch" you will see the numbers that will be comparable to the numbers I have shown on this thread. This patch by itself is not very interesting and it will get interesting once D20030 is enabled. All the numbers I have presented in this thread are using this suggested heuristic in combination with D20030 enabled. Appreciate your help.

Tim, both sets of numbers are statistically equivalent (and both move down to 0.999...), so the change is probably completely innocuous at run time, and surely adds some compile time.

As Balaram said, this helps D20030, which Kristof has seen some improvements, so there could be some merit underneath it all.

My take away points from this discussion are:

The two feature flags are very close to each other, and I think more investigation is needed to make sure you must create a new flag instead of using the current one.
The choice does look arbitrary, but I think Balaram's argument to make it more predictable interesting. I have no idea how this could play on the vast different (sub-)architectures out there, but it shouldn't be worse, on average, than a non-predictable output.
Maybe what we need is to merge these two patches and test on all architectures that have the FeaturePredictableSelectIsExpensive to make sure this is not the same effect showing up in different ways.

If the predictability makes it converge to a reasonable scenario, then I think the change is positive. If not, it's probably just noise.

Renato,

I agree to all your points. FWIW, the idea of using value range for static branch prediction is not new. The idea was first introduced by Jason Patterson in his SIGPLAN'95 paper: "Accurate Static Branch Prediction by Value Range Propagation". So I think there is definitely some value in this. However, I have dropped this from my plate because I have already spent a lot of time trying to improve this past several months, so I would rather spend my efforts elsewhere. Thanks for the review.

bmakam removed a child revision: D20030: [AArch64] Add option to disable speculation of triangle whose tail is the only latch block.Aug 18 2016, 3:23 PM

Revision Contents

Path

Size

include/

llvm/

Target/

TargetLowering.h

10 lines

lib/

CodeGen/

CodeGenPrepare.cpp

73 lines

TargetLoweringBase.cpp

1 line

Target/

AArch64/

AArch64.td

7 lines

AArch64ISelLowering.cpp

2 lines

AArch64Subtarget.h

4 lines

test/

CodeGen/

AArch64/

aarch64-codegen-prepare-constant-cmp.ll

31 lines

Diff 61137

include/llvm/Target/TargetLowering.h

Show First 20 Lines • Show All 256 Lines • ▼ Show 20 Lines	public:
bool isJumpExpensive() const { return JumpIsExpensive; }		bool isJumpExpensive() const { return JumpIsExpensive; }

/// Return true if selects are only cheaper than branches if the branch is		/// Return true if selects are only cheaper than branches if the branch is
/// unlikely to be predicted right.		/// unlikely to be predicted right.
bool isPredictableSelectExpensive() const {		bool isPredictableSelectExpensive() const {
return PredictableSelectIsExpensive;		return PredictableSelectIsExpensive;
}		}

		/// Return true if conditional compares are only cheaper than branches if the
		/// branch is unlikely to be predicted right.
		bool isPredictableConditionalCompareExpensive() const {
		return PredictableConditionalCompareIsExpensive;
		}

/// If a branch or a select condition is skewed in one direction by more than		/// If a branch or a select condition is skewed in one direction by more than
/// this factor, it is very likely to be predicted correctly.		/// this factor, it is very likely to be predicted correctly.
virtual BranchProbability getPredictableBranchThreshold() const;		virtual BranchProbability getPredictableBranchThreshold() const;

/// isLoadBitCastBeneficial() - Return true if the following transform		/// isLoadBitCastBeneficial() - Return true if the following transform
/// is beneficial.		/// is beneficial.
/// fold (conv (load x)) -> (load (conv*)x)		/// fold (conv (load x)) -> (load (conv*)x)
/// On architectures that don't natively support some vector loads		/// On architectures that don't natively support some vector loads
▲ Show 20 Lines • Show All 1,849 Lines • ▼ Show 20 Lines	protected:
/// Maximum number of store instructions that may be substituted for a call to		/// Maximum number of store instructions that may be substituted for a call to
/// memmove, used for functions with OptSize attribute.		/// memmove, used for functions with OptSize attribute.
unsigned MaxStoresPerMemmoveOptSize;		unsigned MaxStoresPerMemmoveOptSize;

/// Tells the code generator that select is more expensive than a branch if		/// Tells the code generator that select is more expensive than a branch if
/// the branch is usually predicted right.		/// the branch is usually predicted right.
bool PredictableSelectIsExpensive;		bool PredictableSelectIsExpensive;

		/// Tells the code generator that conditional compare is more expensive than a
		/// branch if the branch is usually predicted right.
		bool PredictableConditionalCompareIsExpensive;

/// MaskAndBranchFoldingIsLegal - Indicates if the target supports folding		/// MaskAndBranchFoldingIsLegal - Indicates if the target supports folding
/// a mask of a single bit, a compare, and a branch into a single instruction.		/// a mask of a single bit, a compare, and a branch into a single instruction.
bool MaskAndBranchFoldingIsLegal;		bool MaskAndBranchFoldingIsLegal;

/// \see enableExtLdPromotion.		/// \see enableExtLdPromotion.
bool EnableExtLdPromotion;		bool EnableExtLdPromotion;

protected:		protected:
▲ Show 20 Lines • Show All 918 Lines • Show Last 20 Lines

lib/CodeGen/CodeGenPrepare.cpp

Show First 20 Lines • Show All 191 Lines • ▼ Show 20 Lines	private:
bool optimizeExtractElementInst(Instruction *Inst);		bool optimizeExtractElementInst(Instruction *Inst);
bool dupRetToEnableTailCallOpts(BasicBlock *BB);		bool dupRetToEnableTailCallOpts(BasicBlock *BB);
bool placeDbgValues(Function &F);		bool placeDbgValues(Function &F);
bool sinkAndCmp(Function &F);		bool sinkAndCmp(Function &F);
bool extLdPromotion(TypePromotionTransaction &TPT, LoadInst *&LI,		bool extLdPromotion(TypePromotionTransaction &TPT, LoadInst *&LI,
Instruction *&Inst,		Instruction *&Inst,
const SmallVectorImpl<Instruction *> &Exts,		const SmallVectorImpl<Instruction *> &Exts,
unsigned CreatedInstCost);		unsigned CreatedInstCost);
		bool swapConstantCmp(Function &F);
bool splitBranchCondition(Function &F);		bool splitBranchCondition(Function &F);
bool simplifyOffsetableRelocate(Instruction &I);		bool simplifyOffsetableRelocate(Instruction &I);
void stripInvariantGroupMetadata(Instruction &I);		void stripInvariantGroupMetadata(Instruction &I);
};		};
}		}

char CodeGenPrepare::ID = 0;		char CodeGenPrepare::ID = 0;
INITIALIZE_TM_PASS(CodeGenPrepare, "codegenprepare",		INITIALIZE_TM_PASS(CodeGenPrepare, "codegenprepare",
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	bool CodeGenPrepare::runOnFunction(Function &F) {
EverMadeChange \|= placeDbgValues(F);		EverMadeChange \|= placeDbgValues(F);

// If there is a mask, compare against zero, and branch that can be combined		// If there is a mask, compare against zero, and branch that can be combined
// into a single target instruction, push the mask and compare into branch		// into a single target instruction, push the mask and compare into branch
// users. Do this before OptimizeBlock -> OptimizeInst ->		// users. Do this before OptimizeBlock -> OptimizeInst ->
// OptimizeCmpExpression, which perturbs the pattern being searched for.		// OptimizeCmpExpression, which perturbs the pattern being searched for.
if (!DisableBranchOpts) {		if (!DisableBranchOpts) {
EverMadeChange \|= sinkAndCmp(F);		EverMadeChange \|= sinkAndCmp(F);
		EverMadeChange \|= swapConstantCmp(F);
EverMadeChange \|= splitBranchCondition(F);		EverMadeChange \|= splitBranchCondition(F);
}		}

bool MadeChange = true;		bool MadeChange = true;
while (MadeChange) {		while (MadeChange) {
MadeChange = false;		MadeChange = false;
for (Function::iterator I = F.begin(); I != F.end(); ) {		for (Function::iterator I = F.begin(); I != F.end(); ) {
BasicBlock BB = &I++;		BasicBlock BB = &I++;
▲ Show 20 Lines • Show All 5,177 Lines • ▼ Show 20 Lines
/// \brief Scale down both weights to fit into uint32_t.		/// \brief Scale down both weights to fit into uint32_t.
static void scaleWeights(uint64_t &NewTrue, uint64_t &NewFalse) {		static void scaleWeights(uint64_t &NewTrue, uint64_t &NewFalse) {
uint64_t NewMax = (NewTrue > NewFalse) ? NewTrue : NewFalse;		uint64_t NewMax = (NewTrue > NewFalse) ? NewTrue : NewFalse;
uint32_t Scale = (NewMax / UINT32_MAX) + 1;		uint32_t Scale = (NewMax / UINT32_MAX) + 1;
NewTrue = NewTrue / Scale;		NewTrue = NewTrue / Scale;
NewFalse = NewFalse / Scale;		NewFalse = NewFalse / Scale;
}		}

		/// \brief If there is a sequence that branches based on two constant
		/// comparisons like:
		/// \code
		/// %0 = icmp ne i32 %a, 0
		/// %1 = icmp eq i32 %b, 2
		/// %or.cond = and i1 %0, %1
		/// br i1 %or.cond, label %TrueBB, label %FalseBB
		/// \endcode
		/// Swap the order of comparisons based on their constant ranges like:
		/// \code
		/// %0 = icmp ne i32 %a, 0
		/// %1 = icmp eq i32 %b, 2
		/// %and.cond = and i1 %1, %0
		/// br i1 %and.cond, label %TrueBB, label %FalseBB
		/// \endcode
		/// This will shorten the branch decision by moving the most likely
		/// condition to the right if the binary operation is Instruction::And
		/// If the binary operation is Instruction::Or we move the less likely
		/// condition to the right.
		bool CodeGenPrepare::swapConstantCmp(Function &F) {
		if (!TLI \|\| !TLI->isPredictableConditionalCompareExpensive())
		return false;

		bool MadeChange = false;
		for (auto &BB : F) {
		// Does this BB end with the following?
		// %cond1 = icmp pred, X, C1
		// %cond2 = icmp pred, Y, C2
		// %cond.or = or\|and i1 %cond1, cond2
		// br i1 %cond.or label %dest1, label %dest2"
		BinaryOperator *LogicOp;
		BasicBlock TBB, FBB;
		if (!match(BB.getTerminator(), m_Br(m_OneUse(m_BinOp(LogicOp)), TBB, FBB)))
		continue;

		unsigned Opc;
		Value Cond1, Cond2;
		if (match(LogicOp,
		m_And(m_OneUse(m_Value(Cond1)), m_OneUse(m_Value(Cond2)))))
		Opc = Instruction::And;
		else if (match(LogicOp,
		m_Or(m_OneUse(m_Value(Cond1)), m_OneUse(m_Value(Cond2)))))
		Opc = Instruction::Or;
		else
		continue;

		ConstantInt RHS1, RHS2;
		ICmpInst::Predicate Pred1, Pred2;
		if (!match(Cond1, m_ICmp(Pred1, m_Value(), m_ConstantInt(RHS1))) \|\|
		!match(Cond2, m_ICmp(Pred2, m_Value(), m_ConstantInt(RHS2))))
		continue;

		ConstantRange CR1 =
		ConstantRange::makeExactICmpRegion(Pred1, RHS1->getValue());
		ConstantRange CR2 =
		ConstantRange::makeExactICmpRegion(Pred2, RHS2->getValue());
		uint32_t TrueWeight = CR1.getSetSize().ceilLogBase2() + CR2.getBitWidth();
		uint32_t FalseWeight = CR2.getSetSize().ceilLogBase2() + CR1.getBitWidth();
		// Keep the source order if weights are the same or profitable.
		if (TrueWeight == FalseWeight)
		continue;
		if ((Opc == Instruction::And) != (TrueWeight > FalseWeight))
		continue;

		// Swap the operands.
		LogicOp->swapOperands();
		MadeChange = true;
		}
		return MadeChange;
		}

/// \brief Some targets prefer to split a conditional branch like:		/// \brief Some targets prefer to split a conditional branch like:
/// \code		/// \code
/// %0 = icmp ne i32 %a, 0		/// %0 = icmp ne i32 %a, 0
/// %1 = icmp ne i32 %b, 0		/// %1 = icmp ne i32 %b, 0
/// %or.cond = or i1 %0, %1		/// %or.cond = or i1 %0, %1
/// br i1 %or.cond, label %TrueBB, label %FalseBB		/// br i1 %or.cond, label %TrueBB, label %FalseBB
/// \endcode		/// \endcode
/// into multiple branch instructions like:		/// into multiple branch instructions like:
▲ Show 20 Lines • Show All 193 Lines • Show Last 20 Lines

lib/CodeGen/TargetLoweringBase.cpp

Show First 20 Lines • Show All 804 Lines • ▼ Show 20 Lines	TargetLoweringBase::TargetLoweringBase(const TargetMachine &tm) : TM(tm) {
UseUnderscoreSetJmp = false;		UseUnderscoreSetJmp = false;
UseUnderscoreLongJmp = false;		UseUnderscoreLongJmp = false;
SelectIsExpensive = false;		SelectIsExpensive = false;
HasMultipleConditionRegisters = false;		HasMultipleConditionRegisters = false;
HasExtractBitsInsn = false;		HasExtractBitsInsn = false;
FsqrtIsCheap = false;		FsqrtIsCheap = false;
JumpIsExpensive = JumpIsExpensiveOverride;		JumpIsExpensive = JumpIsExpensiveOverride;
PredictableSelectIsExpensive = false;		PredictableSelectIsExpensive = false;
		PredictableConditionalCompareIsExpensive = false;
MaskAndBranchFoldingIsLegal = false;		MaskAndBranchFoldingIsLegal = false;
EnableExtLdPromotion = false;		EnableExtLdPromotion = false;
HasFloatingPointExceptions = true;		HasFloatingPointExceptions = true;
StackPointerRegisterToSaveRestore = 0;		StackPointerRegisterToSaveRestore = 0;
BooleanContents = UndefinedBooleanContent;		BooleanContents = UndefinedBooleanContent;
BooleanFloatContents = UndefinedBooleanContent;		BooleanFloatContents = UndefinedBooleanContent;
BooleanVectorContents = UndefinedBooleanContent;		BooleanVectorContents = UndefinedBooleanContent;
SchedPreferenceInfo = Sched::ILP;		SchedPreferenceInfo = Sched::ILP;
▲ Show 20 Lines • Show All 1,019 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64.td

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
def FeatureBalanceFPOps : SubtargetFeature<"balance-fp-ops", "BalanceFPOps",		def FeatureBalanceFPOps : SubtargetFeature<"balance-fp-ops", "BalanceFPOps",
"true",		"true",
"balance mix of odd and even D-registers for fp multiply(-accumulate) ops">;		"balance mix of odd and even D-registers for fp multiply(-accumulate) ops">;

def FeaturePredictableSelectIsExpensive : SubtargetFeature<		def FeaturePredictableSelectIsExpensive : SubtargetFeature<
"predictable-select-expensive", "PredictableSelectIsExpensive", "true",		"predictable-select-expensive", "PredictableSelectIsExpensive", "true",
"Prefer likely predicted branches over selects">;		"Prefer likely predicted branches over selects">;

		def FeaturePredictableConditionalCompareIsExpensive : SubtargetFeature<
		"predictable-ccmp-expensive", "PredictableConditionalCompareIsExpensive", "true",
		"Prefer likely predicted branches over conditional compares">;

def FeatureCustomCheapAsMoveHandling : SubtargetFeature<"custom-cheap-as-move",		def FeatureCustomCheapAsMoveHandling : SubtargetFeature<"custom-cheap-as-move",
"CustomAsCheapAsMove", "true",		"CustomAsCheapAsMove", "true",
"Use custom code for TargetInstrInfo::isAsCheapAsAMove()">;		"Use custom code for TargetInstrInfo::isAsCheapAsAMove()">;

def FeaturePostRAScheduler : SubtargetFeature<"use-postra-scheduler",		def FeaturePostRAScheduler : SubtargetFeature<"use-postra-scheduler",
"UsePostRAScheduler", "true", "Schedule again after register allocation">;		"UsePostRAScheduler", "true", "Schedule again after register allocation">;

def FeatureSlowMisaligned128Store : SubtargetFeature<"slow-misaligned-128store",		def FeatureSlowMisaligned128Store : SubtargetFeature<"slow-misaligned-128store",
▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines	def ProcKryo : SubtargetFeature<"kryo", "ARMProcFamily", "Kryo",
FeatureCRC,		FeatureCRC,
FeatureCrypto,		FeatureCrypto,
FeatureCustomCheapAsMoveHandling,		FeatureCustomCheapAsMoveHandling,
FeatureFPARMv8,		FeatureFPARMv8,
FeatureMergeNarrowLd,		FeatureMergeNarrowLd,
FeatureNEON,		FeatureNEON,
FeaturePerfMon,		FeaturePerfMon,
FeaturePostRAScheduler,		FeaturePostRAScheduler,
FeaturePredictableSelectIsExpensive		FeaturePredictableSelectIsExpensive,
		FeaturePredictableConditionalCompareIsExpensive
]>;		]>;

def : ProcessorModel<"generic", NoSchedModel, [		def : ProcessorModel<"generic", NoSchedModel, [
FeatureCRC,		FeatureCRC,
FeatureFPARMv8,		FeatureFPARMv8,
FeatureNEON,		FeatureNEON,
FeaturePerfMon,		FeaturePerfMon,
FeaturePostRAScheduler		FeaturePostRAScheduler
▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 633 Lines • ▼ Show 20 Lines	for (MVT Ty : {MVT::v2f32, MVT::v4f32, MVT::v2f64}) {
setOperationAction(ISD::FCEIL, Ty, Legal);		setOperationAction(ISD::FCEIL, Ty, Legal);
setOperationAction(ISD::FRINT, Ty, Legal);		setOperationAction(ISD::FRINT, Ty, Legal);
setOperationAction(ISD::FTRUNC, Ty, Legal);		setOperationAction(ISD::FTRUNC, Ty, Legal);
setOperationAction(ISD::FROUND, Ty, Legal);		setOperationAction(ISD::FROUND, Ty, Legal);
}		}
}		}

PredictableSelectIsExpensive = Subtarget->predictableSelectIsExpensive();		PredictableSelectIsExpensive = Subtarget->predictableSelectIsExpensive();
		PredictableConditionalCompareIsExpensive =
		Subtarget->predictableConditionalCompareIsExpensive();
}		}

void AArch64TargetLowering::addTypeForNEON(MVT VT, MVT PromotedBitwiseVT) {		void AArch64TargetLowering::addTypeForNEON(MVT VT, MVT PromotedBitwiseVT) {
if (VT == MVT::v2f32 \|\| VT == MVT::v4f16) {		if (VT == MVT::v2f32 \|\| VT == MVT::v4f16) {
setOperationAction(ISD::LOAD, VT, Promote);		setOperationAction(ISD::LOAD, VT, Promote);
AddPromotedToType(ISD::LOAD, VT, MVT::v2i32);		AddPromotedToType(ISD::LOAD, VT, MVT::v2i32);

setOperationAction(ISD::STORE, VT, Promote);		setOperationAction(ISD::STORE, VT, Promote);
▲ Show 20 Lines • Show All 9,767 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64Subtarget.h

Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	protected:
// HasZeroCycleZeroing - Has zero-cycle zeroing instructions.		// HasZeroCycleZeroing - Has zero-cycle zeroing instructions.
bool HasZeroCycleZeroing = false;		bool HasZeroCycleZeroing = false;

// StrictAlign - Disallow unaligned memory accesses.		// StrictAlign - Disallow unaligned memory accesses.
bool StrictAlign = false;		bool StrictAlign = false;
bool MergeNarrowLoads = false;		bool MergeNarrowLoads = false;
bool UseAA = false;		bool UseAA = false;
bool PredictableSelectIsExpensive = false;		bool PredictableSelectIsExpensive = false;
		bool PredictableConditionalCompareIsExpensive = false;
bool BalanceFPOps = false;		bool BalanceFPOps = false;
bool CustomAsCheapAsMove = false;		bool CustomAsCheapAsMove = false;
bool UsePostRAScheduler = false;		bool UsePostRAScheduler = false;
bool Misaligned128StoreIsSlow = false;		bool Misaligned128StoreIsSlow = false;
bool AvoidQuadLdStPairs = false;		bool AvoidQuadLdStPairs = false;
bool UseAlternateSExtLoadCVTF32Pattern = false;		bool UseAlternateSExtLoadCVTF32Pattern = false;
bool HasMacroOpFusion = false;		bool HasMacroOpFusion = false;
bool DisableLatencySchedHeuristic = false;		bool DisableLatencySchedHeuristic = false;
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	public:
bool hasCrypto() const { return HasCrypto; }		bool hasCrypto() const { return HasCrypto; }
bool hasCRC() const { return HasCRC; }		bool hasCRC() const { return HasCRC; }
bool hasRAS() const { return HasRAS; }		bool hasRAS() const { return HasRAS; }
bool mergeNarrowLoads() const { return MergeNarrowLoads; }		bool mergeNarrowLoads() const { return MergeNarrowLoads; }
bool balanceFPOps() const { return BalanceFPOps; }		bool balanceFPOps() const { return BalanceFPOps; }
bool predictableSelectIsExpensive() const {		bool predictableSelectIsExpensive() const {
return PredictableSelectIsExpensive;		return PredictableSelectIsExpensive;
}		}
		bool predictableConditionalCompareIsExpensive() const {
		return PredictableConditionalCompareIsExpensive;
		}
bool hasCustomCheapAsMoveHandling() const { return CustomAsCheapAsMove; }		bool hasCustomCheapAsMoveHandling() const { return CustomAsCheapAsMove; }
bool isMisaligned128StoreSlow() const { return Misaligned128StoreIsSlow; }		bool isMisaligned128StoreSlow() const { return Misaligned128StoreIsSlow; }
bool avoidQuadLdStPairs() const { return AvoidQuadLdStPairs; }		bool avoidQuadLdStPairs() const { return AvoidQuadLdStPairs; }
bool useAlternateSExtLoadCVTF32Pattern() const {		bool useAlternateSExtLoadCVTF32Pattern() const {
return UseAlternateSExtLoadCVTF32Pattern;		return UseAlternateSExtLoadCVTF32Pattern;
}		}
bool hasMacroOpFusion() const { return HasMacroOpFusion; }		bool hasMacroOpFusion() const { return HasMacroOpFusion; }
bool useRSqrt() const { return UseRSqrt; }		bool useRSqrt() const { return UseRSqrt; }
▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

test/CodeGen/AArch64/aarch64-codegen-prepare-constant-cmp.ll

This file was added.

				; RUN: opt -codegenprepare -mtriple=aarch64 -mcpu=kryo -S -o - %s \| FileCheck %s
				; CHECK-LABEL: @constant_cmp_and
				define i32 @constant_cmp_and(i32 %a, i32 %b) {
				entry:
				; CHECK [[Cond1:%.*]] = icmp ne i32 %a, 0
				; CHECK-NEXT [[Cond2:%.*]] = icmp eq i32 %b, 2
				; CHECK-NEXT and i1 [[Cond2]], [[Cond1]]
				%cmp2.i = icmp ne i32 %a, 0
				%cmp4.i = icmp eq i32 %b, 2
				%or.cond47 = and i1 %cmp2.i, %cmp4.i
				br i1 %or.cond47, label %true, label %false
				true:
				ret i32 42
				false:
				ret i32 0
				}
				; CHECK-LABEL: @constant_cmp_or
				define i32 @constant_cmp_or(i32 %a, i32 %b) {
				entry:
				; CHECK [[Cond1:%.*]] = icmp eq i32 %a, 2
				; CHECK-NEXT [[Cond2:%.*]] = icmp ne i32 %b, 0
				; CHECK-NEXT or i1 [[Cond2]], [[Cond1]]
				%cmp2.i = icmp eq i32 %a, 2
				%cmp4.i = icmp ne i32 %b, 0
				%or.cond47 = or i1 %cmp2.i, %cmp4.i
				br i1 %or.cond47, label %true, label %false
				true:
				ret i32 42
				false:
				ret i32 0
				}