This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
2/2
Local.h
-
lib/Transforms/
-
Transforms/
-
Scalar/
-
SimplifyCFGPass.cpp
-
Utils/
-
LoopSimplify.cpp
11/11
SimplifyCFG.cpp
-
test/Transforms/
-
Transforms/
-
LoopUnroll/
-
peel-loop-inner.ll
-
PhaseOrdering/X86/
-
X86/
-
vector-reductions-logical.ll
-
SimplifyCFG/
-
branch-fold-multiple.ll
-
branch-fold-threshold.ll
-
fold-branch-to-common-dest-two-preds-cost.ll

Differential D132408

[SimplifyCFG] accumulate bonus insts cost
ClosedPublic

Authored by yaxunl on Aug 22 2022, 12:59 PM.

Download Raw Diff

Details

Reviewers

tra
rampitec
ruiling
arsenm
bcahoon
aeubanks
lebedev.ri
spatel
fhahn
nikic

Commits

rG9d5adc7e49c1: Revert "reland e5581df60a35 [SimplifyCFG] accumulate bonus insts cost"
rGe5581df60a35: [SimplifyCFG] accumulate bonus insts cost

Summary

SimplifyCFG folds

bool foo() {
if (cond1) return false;
if (cond2) return false;
return true;
}

bool foo() {
  if (cond1 | cond2) return false
  return true;

'cond2' is called 'bonus insts' in branch folding since they introduce overhead
since the original CFG could do early exit but the folded CFG always executes
them. SimplifyCFG calculates the costs of 'bonus insts' of a folding a BB into
its predecessor BB which shares the destination. If it is below bonus-inst-threshold,
SimplifyCFG will fold that BB into its predecessor and cond2 will always be executed.

When SimplifyCFG calculates the cost of 'bonus insts', it only consider 'bonus' insts
in the current BB to be considered for folding. This causes issue for unrolled loops
which share destinations, e.g.

bool foo(int *a) {
  for (int i = 0; i < 32; i++)
    if (a[i] > 0) return false;
  return true;
}

After unrolling, it becomes

bool foo(int *a) {
  if(a[0]>0) return false
  if(a[1]>0) return false;
  //...
  if(a[31]>0) return false;
  return true;
}

SimplifyCFG will merge each BB with its predecessor BB,
and ends up with 32 'bonus insts' which are always executed, which
is much slower than the original CFG.

The root cause is that SimplifyCFG does not consider the
accumulated cost of 'bonus insts' which are folded from
different BB's.

This patch fixes that by introducing a DenseMap to track
costs of 'bonus insts' coming from different BB's into
the same BB, and cuts off if the accumulated cost
exceeds a threshold.

Diff Detail

Event Timeline

yaxunl created this revision.Aug 22 2022, 12:59 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 22 2022, 12:59 PM

Herald added subscribers: zzheng, hiraditya. · View Herald Transcript

yaxunl requested review of this revision.Aug 22 2022, 12:59 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 22 2022, 12:59 PM

Herald added a subscriber: wdng. · View Herald Transcript

Harbormaster completed remote builds in B182670: Diff 454595.Aug 22 2022, 1:00 PM

yaxunl retitled this revision from [SimplifyCFG] accumuate bonus insts cost to [SimplifyCFG] accumulate bonus insts cost.Aug 22 2022, 1:03 PM

yaxunl edited the summary of this revision. (Show Details)

bool foo() {
  if (cond1 || cond2) return false
  return true;
}

Nit. I'd use arithmetic |. || would imply that short-circuiting would make both examples *exactly* the same -- cond2 would not be evaluated if cond1 is true.
Or just use pseudo-code that is distinctly not C/C++ to illustrate proposed transformation.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
263–264	Do we ever need to clear the map carried by `SimplifyCFGOpt`? Unlike the maps instantiated as a local variable in other places, the lifetime of `SimplifyCFGOpt` is less certain. If we do need to clear them, should we be doing that to the map given to us by the user, or only to the local one?

yaxunl edited the summary of this revision. (Show Details)Aug 22 2022, 2:16 PM

yaxunl marked an inline comment as done.Aug 22 2022, 2:46 PM

yaxunl added inline comments.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
263–264	Do we ever need to clear the map carried by `SimplifyCFGOpt`? Unlike the maps instantiated as a local variable in other places, the lifetime of `SimplifyCFGOpt` is less certain. If we do need to clear them, should we be doing that to the map given to us by the user, or only to the local one? SimplifyCFGOpt is used by higher-level passes which work on functions or loops. The pass which uses SimplifyCFGOpt repeatedly is responsible for creating NumBonusInsts and passing it to SimplifyCFGOpt. NumBonusInsts is valid for the life cycle of the higher-level pass. It is only updated when there are new BB's folded. Once folded, the folded BB's are gone and will not be double counted in the future. Therefore NumBonusInsts needs not be emptied. The LocalNumBonusInsts is always empty when SimplifyCFGOpt. It is used for situations where SimplifyCFG is used to simplify CFG's which do not fold branches or is not concerned with accumulated bonus insts. It degenerates to the original behaviour, i.e. only consider bonus insts of the current BB.

tra added inline comments.Aug 22 2022, 3:03 PM

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
263–264	Once folded, the folded BB's are gone and will not be double counted in the future. Therefore NumBonusInsts needs not be emptied. OK. Further question -- is there a possibility that a BB we've recorded in the map gets deleted (e.g. after merging into another BB), and then another BB gets created, reusing the same pointer? That would result in erroneously counting in the old instance's bonus instructions. It may not be likely to happen and the effect would be that SimplifyCfg would hit the bonus threshold sooner than it should have normally, so it's probably not fatal. On the other hand, it may make compilation nondeterministic -- if/when we hit such a pointer reuse we'll end up generating different code and that's more of a problem. The LocalNumBonusInsts is always empty when SimplifyCFGOpt. Looks like something got lost during editing. I assume you meant "when SimplifyCFGOpt is constructed with user-supplied map".

yaxunl marked an inline comment as done.Aug 24 2022, 9:28 AM

yaxunl added inline comments.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
263–264	Further question -- is there a possibility that a BB we've recorded in the map gets deleted (e.g. after merging into another BB), and then another BB gets created, reusing the same pointer? That would result in erroneously counting in the old instance's bonus instructions. It may not be likely to happen and the effect would be that SimplifyCfg would hit the bonus threshold sooner than it should have normally, so it's probably not fatal. On the other hand, it may make compilation nondeterministic -- if/when we hit such a pointer reuse we'll end up generating different code and that's more of a problem. I can use CallbackVH, which will automatically update the map when the BB is deleted or replaced.

yaxunl added inline comments.Aug 25 2022, 2:32 PM

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
263–264	Further question -- is there a possibility that a BB we've recorded in the map gets deleted (e.g. after merging into another BB), and then another BB gets created, reusing the same pointer? That would result in erroneously counting in the old instance's bonus instructions. It may not be likely to happen and the effect would be that SimplifyCfg would hit the bonus threshold sooner than it should have normally, so it's probably not fatal. On the other hand, it may make compilation nondeterministic -- if/when we hit such a pointer reuse we'll end up generating different code and that's more of a problem. I can use CallbackVH, which will automatically update the map when the BB is deleted or replaced. Actually, I will use ValueMap, which uses CallbackVH to remove the BB from the map when it is deleted. It is simpler.

use ValueMap to track BasicBlock

Harbormaster completed remote builds in B183472: Diff 455717.Aug 25 2022, 3:16 PM

update tests

Harbormaster completed remote builds in B183595: Diff 455899.Aug 26 2022, 8:12 AM

Nice. I like the consolidated CostTracker.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
7283	This needs to be renamed, too.

yaxunl marked 2 inline comments as done.Aug 29 2022, 8:39 AM

yaxunl added inline comments.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
7283	will do

fix variable names

Harbormaster completed remote builds in B183939: Diff 456356.Aug 29 2022, 9:53 AM

tra accepted this revision.Aug 29 2022, 10:53 AM

tra added inline comments.

llvm/test/Transforms/SimplifyCFG/branch-fold-unrolled-loop.ll
8 ↗	(On Diff #456356)	It's not quite clear what's the purpose of this test. I.e. what are the important pieces of the checks below? I'd add a comment describing expected transformation.

This revision is now accepted and ready to land.Aug 29 2022, 10:53 AM

fhahn added a subscriber: fhahn.Aug 29 2022, 10:58 AM

fhahn added inline comments.

llvm/test/Transforms/SimplifyCFG/branch-fold-unrolled-loop.ll
3 ↗	(On Diff #456356)	Does the test depend on the target triple? If not , it should be removed. If it does it needs `REQUIRES:` to only run it when the AMDGPU backend is built. Also, can you add the test in a separate patch and only include the changes caused by the patch in the diff so it is easier to see what's going on? It also seems like the test coverage is quite limited. Are there some edge cases that should be covered by extra tests?

Do you have any numbers for the impact of this change? I suspect (but haven't checked) that it may be quite significant, because the current tiny bonus inst threshold (1) is tuned for the current implementation, and this patch will result in much less common dest folding than would be desirable.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
217	insert returns the iterator, no need to call find again.
225	You can use lookup() here, which will return 0 if not found.

In D132408#3756337, @nikic wrote:

Do you have any numbers for the impact of this change? I suspect (but haven't checked) that it may be quite significant, because the current tiny bonus inst threshold (1) is tuned for the current implementation, and this patch will result in much less common dest folding than would be desirable.

For a synthetic benchmark geared towards loops with small workloads, we saw around 16% performance gain due to avoiding always executing 31 extra load instructions by not folding 31 basic blocks coming from an unrolled loop.

For typical HIP programs, I did not see significant performance differences before and after this change. This is because the original bonus instruction threshold 2 is very small. It only allows folding small basic blocks which only contain one instruction other than the cmp/branch instructions. In practical applications, such basic blocks are rare and usually are not performance bottlenecks. Only in special situations (e.g. a fully unrolled loop of large loop counts ends up with a large number of foldable basic blocks) the performance penalty becomes significant.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
217	insert returns the iterator, no need to call find again. will do
225	You can use lookup() here, which will return 0 if not found. will do
llvm/test/Transforms/SimplifyCFG/branch-fold-unrolled-loop.ll
3 ↗	(On Diff #456356)	Does the test depend on the target triple? If not , it should be removed. If it does it needs `REQUIRES:` to only run it when the AMDGPU backend is built. The test depends on the target triple. Will add REQUIRES. Also, can you add the test in a separate patch and only include the changes caused by the patch in the diff so it is easier to see what's going on? Will add this test in a separate patch and include the change only. The difference is that: before this patch 3 BB's are folded into the first BB, accumulating 3 bonus insts; after this patch, only 2 BB's are folded, each accumulating 1 bonus inst. It also seems like the test coverage is quite limited. Are there some edge cases that should be covered by extra tests? Will add more tests for edge cases.
8 ↗	(On Diff #456356)	It's not quite clear what's the purpose of this test. I.e. what are the important pieces of the checks below? I'd add a comment describing expected transformation. will add a comment about expected transformations.

revised by Florian, Nikita, and Artem's comments.

Harbormaster completed remote builds in B184065: Diff 456532.Aug 29 2022, 8:28 PM

yaxunl added a parent revision: D132910: [SimplifyCFG] add a test for branch folding multiple BB.Aug 29 2022, 8:28 PM

yaxunl removed a parent revision: D132910: [SimplifyCFG] add a test for branch folding multiple BB.

yaxunl added a child revision: D132910: [SimplifyCFG] add a test for branch folding multiple BB.

ping. Any further concerns? Thanks.

In D132408#3757270, @yaxunl wrote:

In D132408#3756337, @nikic wrote:

Do you have any numbers for the impact of this change? I suspect (but haven't checked) that it may be quite significant, because the current tiny bonus inst threshold (1) is tuned for the current implementation, and this patch will result in much less common dest folding than would be desirable.

For a synthetic benchmark geared towards loops with small workloads, we saw around 16% performance gain due to avoiding always executing 31 extra load instructions by not folding 31 basic blocks coming from an unrolled loop.

For typical HIP programs, I did not see significant performance differences before and after this change. This is because the original bonus instruction threshold 2 is very small. It only allows folding small basic blocks which only contain one instruction other than the cmp/branch instructions. In practical applications, such basic blocks are rare and usually are not performance bottlenecks. Only in special situations (e.g. a fully unrolled loop of large loop counts ends up with a large number of foldable basic blocks) the performance penalty becomes significant.

I am not sure what typical HIP programs mean here. Did you measure the impact on general CPU benchmarks?

In D132408#3772184, @fhahn wrote:

In D132408#3757270, @yaxunl wrote:

In D132408#3756337, @nikic wrote:

Do you have any numbers for the impact of this change? I suspect (but haven't checked) that it may be quite significant, because the current tiny bonus inst threshold (1) is tuned for the current implementation, and this patch will result in much less common dest folding than would be desirable.

For a synthetic benchmark geared towards loops with small workloads, we saw around 16% performance gain due to avoiding always executing 31 extra load instructions by not folding 31 basic blocks coming from an unrolled loop.

For typical HIP programs, I did not see significant performance differences before and after this change. This is because the original bonus instruction threshold 2 is very small. It only allows folding small basic blocks which only contain one instruction other than the cmp/branch instructions. In practical applications, such basic blocks are rare and usually are not performance bottlenecks. Only in special situations (e.g. a fully unrolled loop of large loop counts ends up with a large number of foldable basic blocks) the performance penalty becomes significant.

I am not sure what typical HIP programs mean here. Did you measure the impact on general CPU benchmarks?

No. Can you recommend a free CPU benchmark that is suitable for this purpose? Thanks.

In D132408#3772225, @yaxunl wrote:

In D132408#3772184, @fhahn wrote:

In D132408#3757270, @yaxunl wrote:

In D132408#3756337, @nikic wrote:

Do you have any numbers for the impact of this change? I suspect (but haven't checked) that it may be quite significant, because the current tiny bonus inst threshold (1) is tuned for the current implementation, and this patch will result in much less common dest folding than would be desirable.

For a synthetic benchmark geared towards loops with small workloads, we saw around 16% performance gain due to avoiding always executing 31 extra load instructions by not folding 31 basic blocks coming from an unrolled loop.

For typical HIP programs, I did not see significant performance differences before and after this change. This is because the original bonus instruction threshold 2 is very small. It only allows folding small basic blocks which only contain one instruction other than the cmp/branch instructions. In practical applications, such basic blocks are rare and usually are not performance bottlenecks. Only in special situations (e.g. a fully unrolled loop of large loop counts ends up with a large number of foldable basic blocks) the performance penalty becomes significant.

I am not sure what typical HIP programs mean here. Did you measure the impact on general CPU benchmarks?

No. Can you recommend a free CPU benchmark that is suitable for this purpose? Thanks.

I will try running the SPEC CPU benchmark. There are many sub-tests. Do you recommend any sub-tests that are most relevant to this patch? Thanks.

In D132408#3773174, @yaxunl wrote:

In D132408#3772225, @yaxunl wrote:

In D132408#3772184, @fhahn wrote:

In D132408#3757270, @yaxunl wrote:

In D132408#3756337, @nikic wrote:

Do you have any numbers for the impact of this change? I suspect (but haven't checked) that it may be quite significant, because the current tiny bonus inst threshold (1) is tuned for the current implementation, and this patch will result in much less common dest folding than would be desirable.

For a synthetic benchmark geared towards loops with small workloads, we saw around 16% performance gain due to avoiding always executing 31 extra load instructions by not folding 31 basic blocks coming from an unrolled loop.

For typical HIP programs, I did not see significant performance differences before and after this change. This is because the original bonus instruction threshold 2 is very small. It only allows folding small basic blocks which only contain one instruction other than the cmp/branch instructions. In practical applications, such basic blocks are rare and usually are not performance bottlenecks. Only in special situations (e.g. a fully unrolled loop of large loop counts ends up with a large number of foldable basic blocks) the performance penalty becomes significant.

I am not sure what typical HIP programs mean here. Did you measure the impact on general CPU benchmarks?

No. Can you recommend a free CPU benchmark that is suitable for this purpose? Thanks.

I will try running the SPEC CPU benchmark. There are many sub-tests. Do you recommend any sub-tests that are most relevant to this patch? Thanks.

For SPEC, I usually run SPEC2017rate (both integer and fp subsets).

In D132408#3773836, @fhahn wrote:

In D132408#3773174, @yaxunl wrote:

In D132408#3772225, @yaxunl wrote:

In D132408#3772184, @fhahn wrote:

In D132408#3757270, @yaxunl wrote:

In D132408#3756337, @nikic wrote:

Do you have any numbers for the impact of this change? I suspect (but haven't checked) that it may be quite significant, because the current tiny bonus inst threshold (1) is tuned for the current implementation, and this patch will result in much less common dest folding than would be desirable.

For a synthetic benchmark geared towards loops with small workloads, we saw around 16% performance gain due to avoiding always executing 31 extra load instructions by not folding 31 basic blocks coming from an unrolled loop.

For typical HIP programs, I did not see significant performance differences before and after this change. This is because the original bonus instruction threshold 2 is very small. It only allows folding small basic blocks which only contain one instruction other than the cmp/branch instructions. In practical applications, such basic blocks are rare and usually are not performance bottlenecks. Only in special situations (e.g. a fully unrolled loop of large loop counts ends up with a large number of foldable basic blocks) the performance penalty becomes significant.

I am not sure what typical HIP programs mean here. Did you measure the impact on general CPU benchmarks?

No. Can you recommend a free CPU benchmark that is suitable for this purpose? Thanks.

I will try running the SPEC CPU benchmark. There are many sub-tests. Do you recommend any sub-tests that are most relevant to this patch? Thanks.

For SPEC, I usually run SPEC2017rate (both integer and fp subsets).

Helping our author here: overnight, i ran cpu 2017 int fp speed rather than rate, with the patch and without. would that suffice ?
sharing summar of results, or if you really want rate, i can run the rate.
the speed fp was .34% faster on GeoMean, and speed int was 0.21% slower.
605.mcf 631.deepsjeng were above the 1.75% threshold
605.mcf -3.41%
631.deepsjeng 2.32%

In D132408#3774420, @ronlieb wrote:

Helping our author here: overnight, i ran cpu 2017 int fp speed rather than rate, with the patch and without. would that suffice ?
sharing summar of results, or if you really want rate, i can run the rate.
the speed fp was .34% faster on GeoMean, and speed int was 0.21% slower.
605.mcf 631.deepsjeng were above the 1.75% threshold
605.mcf -3.41%
631.deepsjeng 2.32%

The following is the detailed results:

fp speed        before  after           rerun 3 before  rerun 3 after 
603.bwaves_s    525.95  527.75  0.34%     
607.cactuBSSN_s 242.86  246.16  1.36%     
619.lbm_s       67.54   67.55   0.01%     
621.wrf_s       88.79   89.74   1.07%     
627.cam4_s      124.71  123.57  -0.91%      
628.pop2_s      52.73   53.06   0.63%     
638.imagick_s   238.67  240.62  0.82%     
644.nab_s       370.06  367.73  -0.63%      
649.fotonik3d_s 82.96   83.76   0.96%     
654.roms_s      268.70  268.22  -0.18%      
geomean         158.36  158.90  0.34%     
            
int speed             
600.perlbench_s 4.34  4.32  -0.46%      
602.gcc_s       8.58  8.33  -2.91%    8.51  8.51  0.00%
605.mcf_s       6.73  6.50  -3.42%    6.74  6.51  -3.41%
620.omnetpp_s   3.53  3.46  -1.98%    3.54  3.51  -0.85%
623.xalancbmk_s 8.24  8.22  -0.24%      
625.x264_s      9.97  9.96  -0.10%      
631.deepsjeng_s 3.06  3.26  6.54%     3.02  3.09  2.32%
641.leela_s     3.79  3.83  1.06%     
648.exchange2_s 7.37  7.43  0.81%     
657.xz_s        19.65 19.45 -1.02%      
geomean         6.44  6.42  -0.21%

@fhahn Is this acceptable? Thanks.

arsenm added inline comments.Sep 9 2022, 6:50 AM

llvm/include/llvm/Transforms/Utils/Local.h
210	Shouldn't need llvm::

yaxunl marked an inline comment as done.Sep 12 2022, 10:28 AM

yaxunl added inline comments.

llvm/include/llvm/Transforms/Utils/Local.h
210	will remove

remove redundant llvm::

Harbormaster completed remote builds in B186191: Diff 459509.Sep 12 2022, 10:30 AM

ping. Any further concerns? Thanks.

revise the test to be target independent

Harbormaster completed remote builds in B186886: Diff 460440.Sep 15 2022, 9:22 AM

This revision was landed with ongoing or failed builds.Sep 18 2022, 5:24 PM

Closed by commit rGe5581df60a35: [SimplifyCFG] accumulate bonus insts cost (authored by yaxunl). · Explain Why

This revision was automatically updated to reflect the committed changes.

yaxunl added a commit: rGe5581df60a35: [SimplifyCFG] accumulate bonus insts cost.

nikic added a reverting change: rGdd61726d5bf3: Revert "[SimplifyCFG] accumulate bonus insts cost".Sep 19 2022, 5:46 AM

I've reverted this change due to large compile-time regressions, see http://llvm-compile-time-tracker.com/compare.php?from=930315f6aa587ac962183708844eb2390d5ba55e&to=e5581df60a35fffb0c69589777e4e126c849405f&stat=instructions. I don't immediately see why it would be this expensive.

In D132408#3799401, @nikic wrote:

I've reverted this change due to large compile-time regressions, see http://llvm-compile-time-tracker.com/compare.php?from=930315f6aa587ac962183708844eb2390d5ba55e&to=e5581df60a35fffb0c69589777e4e126c849405f&stat=instructions. I don't immediately see why it would be this expensive.

Thanks. I will take a look.

relanded by bd7949bcd86633bd4203b2ba6f891aea00fce4d1

compile-time increases less than 1%:

http://llvm-compile-time-tracker.com/compare.php?from=3637dc601c4923721a69426187aa69dd6a71a053&to=bd7949bcd86633bd4203b2ba6f891aea00fce4d1&stat=instructions

This revision is now accepted and ready to land.Oct 24 2022, 2:14 PM

Closed by commit rG9d5adc7e49c1: Revert "reland e5581df60a35 [SimplifyCFG] accumulate bonus insts cost" (authored by yaxunl). · Explain WhyOct 25 2022, 9:18 AM

This revision was automatically updated to reflect the committed changes.

yaxunl added a commit: rG9d5adc7e49c1: Revert "reland e5581df60a35 [SimplifyCFG] accumulate bonus insts cost".

Since there are different opinions about this approach, I reverted this patch and opened an RFC for further discussion:

https://discourse.llvm.org/t/simplifycfg-track-branch-fold-costs-across-transformations/66191

Thanks.

Matt added a subscriber: Matt.Oct 25 2022, 11:28 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Utils/

Local.h

27 lines

lib/

Transforms/

Scalar/

SimplifyCFGPass.cpp

12 lines

Utils/

LoopSimplify.cpp

3 lines

SimplifyCFG.cpp

63 lines

test/

Transforms/

LoopUnroll/

peel-loop-inner.ll

2 lines

PhaseOrdering/

X86/

vector-reductions-logical.ll

2 lines

SimplifyCFG/

branch-fold-multiple.ll

21 lines

branch-fold-threshold.ll

8 lines

fold-branch-to-common-dest-two-preds-cost.ll

4 lines

Diff 459509

llvm/include/llvm/Transforms/Utils/Local.h

	Show All 10 Lines
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_TRANSFORMS_UTILS_LOCAL_H			#ifndef LLVM_TRANSFORMS_UTILS_LOCAL_H
	#define LLVM_TRANSFORMS_UTILS_LOCAL_H			#define LLVM_TRANSFORMS_UTILS_LOCAL_H

	#include "llvm/ADT/ArrayRef.h"			#include "llvm/ADT/ArrayRef.h"
	#include "llvm/IR/Dominators.h"			#include "llvm/IR/Dominators.h"
				#include "llvm/IR/ValueMap.h"
	#include "llvm/Support/CommandLine.h"			#include "llvm/Support/CommandLine.h"
	#include "llvm/Transforms/Utils/SimplifyCFGOptions.h"			#include "llvm/Transforms/Utils/SimplifyCFGOptions.h"
	#include <cstdint>			#include <cstdint>

	namespace llvm {			namespace llvm {

	class DataLayout;			class DataLayout;
	class Value;			class Value;
	▲ Show 20 Lines • Show All 132 Lines • ▼ Show 20 Lines
	bool TryToSimplifyUncondBranchFromEmptyBlock(BasicBlock *BB,			bool TryToSimplifyUncondBranchFromEmptyBlock(BasicBlock *BB,
	DomTreeUpdater *DTU = nullptr);			DomTreeUpdater *DTU = nullptr);

	/// Check for and eliminate duplicate PHI nodes in this block. This doesn't try			/// Check for and eliminate duplicate PHI nodes in this block. This doesn't try
	/// to be clever about PHI nodes which differ only in the order of the incoming			/// to be clever about PHI nodes which differ only in the order of the incoming
	/// values, but instcombine orders them so it usually won't matter.			/// values, but instcombine orders them so it usually won't matter.
	bool EliminateDuplicatePHINodes(BasicBlock *BB);			bool EliminateDuplicatePHINodes(BasicBlock *BB);

				/// Class to track cost of simplify CFG transformations.
				class SimplifyCFGCostTracker {
				/// Number of bonus instructions due to folding branches into predecessors.
				/// E.g. folding
				/// if (cond1) return false;
				/// if (cond2) return false;
				/// return true;
				/// into
				/// if (cond1 \| cond2) return false;
				/// return true;
				/// In this case cond2 is always executed whereas originally it may be
				/// evicted due to early exit of cond1. 'cond2' is called bonus instructions
				/// and such bonus instructions could accumulate for unrolled loops, therefore
				/// use a value map to accumulate their costs across transformations.
				ValueMap<BasicBlock *, unsigned> NumBonusInsts;

				public:
				void updateNumBonusInsts(BasicBlock *Parent, unsigned InstCount);
				unsigned getNumBonusInsts(BasicBlock *Parent);
				};
	/// This function is used to do simplification of a CFG. For example, it			/// This function is used to do simplification of a CFG. For example, it
	/// adjusts branches to branches to eliminate the extra hop, it eliminates			/// adjusts branches to branches to eliminate the extra hop, it eliminates
	/// unreachable basic blocks, and does other peephole optimization of the CFG.			/// unreachable basic blocks, and does other peephole optimization of the CFG.
	/// It returns true if a modification was made, possibly deleting the basic			/// It returns true if a modification was made, possibly deleting the basic
	/// block that was pointed to. LoopHeaders is an optional input parameter			/// block that was pointed to. LoopHeaders is an optional input parameter
	/// providing the set of loop headers that SimplifyCFG should not eliminate.			/// providing the set of loop headers that SimplifyCFG should not eliminate.
	extern cl::opt<bool> RequireAndPreserveDomTree;			extern cl::opt<bool> RequireAndPreserveDomTree;
	bool simplifyCFG(BasicBlock *BB, const TargetTransformInfo &TTI,			bool simplifyCFG(BasicBlock *BB, const TargetTransformInfo &TTI,
	DomTreeUpdater *DTU = nullptr,			DomTreeUpdater *DTU = nullptr,
	const SimplifyCFGOptions &Options = {},			const SimplifyCFGOptions &Options = {},
	ArrayRef<WeakVH> LoopHeaders = {});			ArrayRef<WeakVH> LoopHeaders = {},
				SimplifyCFGCostTracker *CostTracker = nullptr);

	/// This function is used to flatten a CFG. For example, it uses parallel-and			/// This function is used to flatten a CFG. For example, it uses parallel-and
	/// and parallel-or mode to collapse if-conditions and merge if-regions with			/// and parallel-or mode to collapse if-conditions and merge if-regions with
	/// identical statements.			/// identical statements.
	bool FlattenCFG(BasicBlock BB, AAResults AA = nullptr);			bool FlattenCFG(BasicBlock BB, AAResults AA = nullptr);

	/// If this basic block is ONLY a setcc and a branch, and if a predecessor			/// If this basic block is ONLY a setcc and a branch, and if a predecessor
	/// branches to us and one of our successors, fold the setcc into the			/// branches to us and one of our successors, fold the setcc into the
	/// predecessor and use logical operations to pick the right destination.			/// predecessor and use logical operations to pick the right destination.
	bool FoldBranchToCommonDest(BranchInst BI, llvm::DomTreeUpdater DTU = nullptr,			bool FoldBranchToCommonDest(BranchInst *BI, SimplifyCFGCostTracker &CostTracker,
				DomTreeUpdater *DTU = nullptr,
				arsenmUnsubmitted Done Reply Inline Actions Shouldn't need llvm:: arsenm: Shouldn't need llvm::
				yaxunlAuthorUnsubmitted Done Reply Inline Actions will remove yaxunl: will remove
	MemorySSAUpdater *MSSAU = nullptr,			MemorySSAUpdater *MSSAU = nullptr,
	const TargetTransformInfo *TTI = nullptr,			const TargetTransformInfo *TTI = nullptr,
	unsigned BonusInstThreshold = 1);			unsigned BonusInstThreshold = 1);

	/// This function takes a virtual register computed by an Instruction and			/// This function takes a virtual register computed by an Instruction and
	/// replaces it with a slot in the stack frame, allocated via alloca.			/// replaces it with a slot in the stack frame, allocated via alloca.
	/// This allows the CFG to be changed around without fear of invalidating the			/// This allows the CFG to be changed around without fear of invalidating the
	/// SSA information for the value. It returns the pointer to the alloca inserted			/// SSA information for the value. It returns the pointer to the alloca inserted
	▲ Show 20 Lines • Show All 305 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp

Show First 20 Lines • Show All 215 Lines • ▼ Show 20 Lines	static bool tailMergeBlocksWithSimilarFunctionTerminators(Function &F,

return Changed;		return Changed;
}		}

/// Call SimplifyCFG on all the blocks in the function,		/// Call SimplifyCFG on all the blocks in the function,
/// iterating until no more changes are made.		/// iterating until no more changes are made.
static bool iterativelySimplifyCFG(Function &F, const TargetTransformInfo &TTI,		static bool iterativelySimplifyCFG(Function &F, const TargetTransformInfo &TTI,
DomTreeUpdater *DTU,		DomTreeUpdater *DTU,
const SimplifyCFGOptions &Options) {		const SimplifyCFGOptions &Options,
		SimplifyCFGCostTracker &CostTracker) {
bool Changed = false;		bool Changed = false;
bool LocalChange = true;		bool LocalChange = true;

SmallVector<std::pair<const BasicBlock , const BasicBlock >, 32> Edges;		SmallVector<std::pair<const BasicBlock , const BasicBlock >, 32> Edges;
FindFunctionBackedges(F, Edges);		FindFunctionBackedges(F, Edges);
SmallPtrSet<BasicBlock *, 16> UniqueLoopHeaders;		SmallPtrSet<BasicBlock *, 16> UniqueLoopHeaders;
for (unsigned i = 0, e = Edges.size(); i != e; ++i)		for (unsigned i = 0, e = Edges.size(); i != e; ++i)
UniqueLoopHeaders.insert(const_cast<BasicBlock *>(Edges[i].second));		UniqueLoopHeaders.insert(const_cast<BasicBlock *>(Edges[i].second));
Show All 14 Lines	for (Function::iterator BBIt = F.begin(); BBIt != F.end(); ) {
assert(		assert(
!DTU->isBBPendingDeletion(&BB) &&		!DTU->isBBPendingDeletion(&BB) &&
"Should not end up trying to simplify blocks marked for removal.");		"Should not end up trying to simplify blocks marked for removal.");
// Make sure that the advanced iterator does not point at the blocks		// Make sure that the advanced iterator does not point at the blocks
// that are marked for removal, skip over all such blocks.		// that are marked for removal, skip over all such blocks.
while (BBIt != F.end() && DTU->isBBPendingDeletion(&*BBIt))		while (BBIt != F.end() && DTU->isBBPendingDeletion(&*BBIt))
++BBIt;		++BBIt;
}		}
if (simplifyCFG(&BB, TTI, DTU, Options, LoopHeaders)) {		if (simplifyCFG(&BB, TTI, DTU, Options, LoopHeaders, &CostTracker)) {
LocalChange = true;		LocalChange = true;
++NumSimpl;		++NumSimpl;
}		}
}		}
Changed \|= LocalChange;		Changed \|= LocalChange;
}		}
return Changed;		return Changed;
}		}

static bool simplifyFunctionCFGImpl(Function &F, const TargetTransformInfo &TTI,		static bool simplifyFunctionCFGImpl(Function &F, const TargetTransformInfo &TTI,
DominatorTree *DT,		DominatorTree *DT,
const SimplifyCFGOptions &Options) {		const SimplifyCFGOptions &Options) {
DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);		DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);
		SimplifyCFGCostTracker CostTracker;

bool EverChanged = removeUnreachableBlocks(F, DT ? &DTU : nullptr);		bool EverChanged = removeUnreachableBlocks(F, DT ? &DTU : nullptr);
EverChanged \|=		EverChanged \|=
tailMergeBlocksWithSimilarFunctionTerminators(F, DT ? &DTU : nullptr);		tailMergeBlocksWithSimilarFunctionTerminators(F, DT ? &DTU : nullptr);
EverChanged \|= iterativelySimplifyCFG(F, TTI, DT ? &DTU : nullptr, Options);		EverChanged \|=
		iterativelySimplifyCFG(F, TTI, DT ? &DTU : nullptr, Options, CostTracker);

// If neither pass changed anything, we're done.		// If neither pass changed anything, we're done.
if (!EverChanged) return false;		if (!EverChanged) return false;

// iterativelySimplifyCFG can (rarely) make some loops dead. If this happens,		// iterativelySimplifyCFG can (rarely) make some loops dead. If this happens,
// removeUnreachableBlocks is needed to nuke them, which means we should		// removeUnreachableBlocks is needed to nuke them, which means we should
// iterate between the two optimizations. We structure the code like this to		// iterate between the two optimizations. We structure the code like this to
// avoid rerunning iterativelySimplifyCFG if the second pass of		// avoid rerunning iterativelySimplifyCFG if the second pass of
// removeUnreachableBlocks doesn't do anything.		// removeUnreachableBlocks doesn't do anything.
if (!removeUnreachableBlocks(F, DT ? &DTU : nullptr))		if (!removeUnreachableBlocks(F, DT ? &DTU : nullptr))
return true;		return true;

do {		do {
EverChanged = iterativelySimplifyCFG(F, TTI, DT ? &DTU : nullptr, Options);		EverChanged = iterativelySimplifyCFG(F, TTI, DT ? &DTU : nullptr, Options,
		CostTracker);
EverChanged \|= removeUnreachableBlocks(F, DT ? &DTU : nullptr);		EverChanged \|= removeUnreachableBlocks(F, DT ? &DTU : nullptr);
} while (EverChanged);		} while (EverChanged);

return true;		return true;
}		}

static bool simplifyFunctionCFG(Function &F, const TargetTransformInfo &TTI,		static bool simplifyFunctionCFG(Function &F, const TargetTransformInfo &TTI,
DominatorTree *DT,		DominatorTree *DT,
▲ Show 20 Lines • Show All 140 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/LoopSimplify.cpp

Show First 20 Lines • Show All 474 Lines • ▼ Show 20 Lines	static BasicBlock insertUniqueBackedgeBlock(Loop L, BasicBlock *Preheader,
return BEBlock;		return BEBlock;
}		}

/// Simplify one loop and queue further loops for simplification.		/// Simplify one loop and queue further loops for simplification.
static bool simplifyOneLoop(Loop L, SmallVectorImpl<Loop > &Worklist,		static bool simplifyOneLoop(Loop L, SmallVectorImpl<Loop > &Worklist,
DominatorTree DT, LoopInfo LI,		DominatorTree DT, LoopInfo LI,
ScalarEvolution SE, AssumptionCache AC,		ScalarEvolution SE, AssumptionCache AC,
MemorySSAUpdater *MSSAU, bool PreserveLCSSA) {		MemorySSAUpdater *MSSAU, bool PreserveLCSSA) {
		SimplifyCFGCostTracker CostTracker;
bool Changed = false;		bool Changed = false;
if (MSSAU && VerifyMemorySSA)		if (MSSAU && VerifyMemorySSA)
MSSAU->getMemorySSA()->verifyMemorySSA();		MSSAU->getMemorySSA()->verifyMemorySSA();

ReprocessLoop:		ReprocessLoop:

// Check to see that no blocks (other than the header) in this loop have		// Check to see that no blocks (other than the header) in this loop have
// predecessors that are not in the loop. This is not valid for natural		// predecessors that are not in the loop. This is not valid for natural
▲ Show 20 Lines • Show All 170 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = ExitingBlocks.size(); i != e; ++i) {
if (SE)		if (SE)
SE->forgetLoopDispositions(L);		SE->forgetLoopDispositions(L);
}		}
if (!AllInvariant) continue;		if (!AllInvariant) continue;

// The block has now been cleared of all instructions except for		// The block has now been cleared of all instructions except for
// a comparison and a conditional branch. SimplifyCFG may be able		// a comparison and a conditional branch. SimplifyCFG may be able
// to fold it now.		// to fold it now.
if (!FoldBranchToCommonDest(BI, /DTU=/nullptr, MSSAU))		if (!FoldBranchToCommonDest(BI, CostTracker, /DTU=/nullptr, MSSAU))
continue;		continue;

// Success. The block is now dead, so remove it from the loop,		// Success. The block is now dead, so remove it from the loop,
// update the dominator tree and delete it.		// update the dominator tree and delete it.
LLVM_DEBUG(dbgs() << "LoopSimplify: Eliminating exiting block "		LLVM_DEBUG(dbgs() << "LoopSimplify: Eliminating exiting block "
<< ExitingBlock->getName() << "\n");		<< ExitingBlock->getName() << "\n");

assert(pred_empty(ExitingBlock));		assert(pred_empty(ExitingBlock));
▲ Show 20 Lines • Show All 249 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/SimplifyCFG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 201 Lines • ▼ Show 20 Lines
STATISTIC(NumSinkCommonInstrs,		STATISTIC(NumSinkCommonInstrs,
"Number of common instructions sunk down to the end block");		"Number of common instructions sunk down to the end block");
STATISTIC(NumSpeculations, "Number of speculative executed instructions");		STATISTIC(NumSpeculations, "Number of speculative executed instructions");
STATISTIC(NumInvokes,		STATISTIC(NumInvokes,
"Number of invokes with empty resume blocks simplified into calls");		"Number of invokes with empty resume blocks simplified into calls");
STATISTIC(NumInvokesMerged, "Number of invokes that were merged together");		STATISTIC(NumInvokesMerged, "Number of invokes that were merged together");
STATISTIC(NumInvokeSetsFormed, "Number of invoke sets that were formed");		STATISTIC(NumInvokeSetsFormed, "Number of invoke sets that were formed");

		namespace llvm {

		void SimplifyCFGCostTracker::updateNumBonusInsts(BasicBlock *BB,
		unsigned InstCount) {
		auto Loc = NumBonusInsts.find(BB);
		if (Loc == NumBonusInsts.end())
		Loc = NumBonusInsts.insert({BB, 0}).first;
		Loc->second = Loc->second + InstCount;
		nikicUnsubmitted Done Reply Inline Actions insert returns the iterator, no need to call find again. nikic: insert returns the iterator, no need to call find again.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions insert returns the iterator, no need to call find again. will do yaxunl: > insert returns the iterator, no need to call find again. will do
		}
		unsigned SimplifyCFGCostTracker::getNumBonusInsts(BasicBlock *BB) {
		return NumBonusInsts.lookup(BB);
		}

		} // namespace llvm

namespace {		namespace {
		nikicUnsubmitted Done Reply Inline Actions You can use lookup() here, which will return 0 if not found. nikic: You can use lookup() here, which will return 0 if not found.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions You can use lookup() here, which will return 0 if not found. will do yaxunl: > You can use lookup() here, which will return 0 if not found. will do

// The first field contains the value that the switch produces when a certain		// The first field contains the value that the switch produces when a certain
// case group is selected, and the second field is a vector containing the		// case group is selected, and the second field is a vector containing the
// cases composing the case group.		// cases composing the case group.
using SwitchCaseResultVectorTy =		using SwitchCaseResultVectorTy =
SmallVector<std::pair<Constant , SmallVector<ConstantInt , 4>>, 2>;		SmallVector<std::pair<Constant , SmallVector<ConstantInt , 4>>, 2>;

// The first field contains the phi node that generates a result of the switch		// The first field contains the phi node that generates a result of the switch
Show All 19 Lines

class SimplifyCFGOpt {		class SimplifyCFGOpt {
const TargetTransformInfo &TTI;		const TargetTransformInfo &TTI;
DomTreeUpdater *DTU;		DomTreeUpdater *DTU;
const DataLayout &DL;		const DataLayout &DL;
ArrayRef<WeakVH> LoopHeaders;		ArrayRef<WeakVH> LoopHeaders;
const SimplifyCFGOptions &Options;		const SimplifyCFGOptions &Options;
bool Resimplify;		bool Resimplify;
		// Accumulates number of bonus instructions due to merging basic blocks
		// of common destination.
		SimplifyCFGCostTracker *CostTracker;
		SimplifyCFGCostTracker LocalCostTracker;
		traUnsubmitted Done Reply Inline Actions Do we ever need to clear the map carried by `SimplifyCFGOpt`? Unlike the maps instantiated as a local variable in other places, the lifetime of `SimplifyCFGOpt` is less certain. If we do need to clear them, should we be doing that to the map given to us by the user, or only to the local one? tra: Do we ever need to clear the map carried by `SimplifyCFGOpt`? Unlike the maps instantiated as a…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions Do we ever need to clear the map carried by `SimplifyCFGOpt`? Unlike the maps instantiated as a local variable in other places, the lifetime of `SimplifyCFGOpt` is less certain. If we do need to clear them, should we be doing that to the map given to us by the user, or only to the local one? SimplifyCFGOpt is used by higher-level passes which work on functions or loops. The pass which uses SimplifyCFGOpt repeatedly is responsible for creating NumBonusInsts and passing it to SimplifyCFGOpt. NumBonusInsts is valid for the life cycle of the higher-level pass. It is only updated when there are new BB's folded. Once folded, the folded BB's are gone and will not be double counted in the future. Therefore NumBonusInsts needs not be emptied. The LocalNumBonusInsts is always empty when SimplifyCFGOpt. It is used for situations where SimplifyCFG is used to simplify CFG's which do not fold branches or is not concerned with accumulated bonus insts. It degenerates to the original behaviour, i.e. only consider bonus insts of the current BB. yaxunl: > Do we ever need to clear the map carried by `SimplifyCFGOpt`? > Unlike the maps instantiated…
		traUnsubmitted Done Reply Inline Actions Once folded, the folded BB's are gone and will not be double counted in the future. Therefore NumBonusInsts needs not be emptied. OK. Further question -- is there a possibility that a BB we've recorded in the map gets deleted (e.g. after merging into another BB), and then another BB gets created, reusing the same pointer? That would result in erroneously counting in the old instance's bonus instructions. It may not be likely to happen and the effect would be that SimplifyCfg would hit the bonus threshold sooner than it should have normally, so it's probably not fatal. On the other hand, it may make compilation nondeterministic -- if/when we hit such a pointer reuse we'll end up generating different code and that's more of a problem. The LocalNumBonusInsts is always empty when SimplifyCFGOpt. Looks like something got lost during editing. I assume you meant "when SimplifyCFGOpt is constructed with user-supplied map". tra: > Once folded, the folded BB's are gone and will not be double counted in the future. Therefore…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions Further question -- is there a possibility that a BB we've recorded in the map gets deleted (e.g. after merging into another BB), and then another BB gets created, reusing the same pointer? That would result in erroneously counting in the old instance's bonus instructions. It may not be likely to happen and the effect would be that SimplifyCfg would hit the bonus threshold sooner than it should have normally, so it's probably not fatal. On the other hand, it may make compilation nondeterministic -- if/when we hit such a pointer reuse we'll end up generating different code and that's more of a problem. I can use CallbackVH, which will automatically update the map when the BB is deleted or replaced. yaxunl: > > Further question -- is there a possibility that a BB we've recorded in the map gets…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions Further question -- is there a possibility that a BB we've recorded in the map gets deleted (e.g. after merging into another BB), and then another BB gets created, reusing the same pointer? That would result in erroneously counting in the old instance's bonus instructions. It may not be likely to happen and the effect would be that SimplifyCfg would hit the bonus threshold sooner than it should have normally, so it's probably not fatal. On the other hand, it may make compilation nondeterministic -- if/when we hit such a pointer reuse we'll end up generating different code and that's more of a problem. I can use CallbackVH, which will automatically update the map when the BB is deleted or replaced. Actually, I will use ValueMap, which uses CallbackVH to remove the BB from the map when it is deleted. It is simpler. yaxunl: > > > > Further question -- is there a possibility that a BB we've recorded in the map gets…

Value isValueEqualityComparison(Instruction TI);		Value isValueEqualityComparison(Instruction TI);
BasicBlock *GetValueEqualityComparisonCases(		BasicBlock *GetValueEqualityComparisonCases(
Instruction *TI, std::vector<ValueEqualityComparisonCase> &Cases);		Instruction *TI, std::vector<ValueEqualityComparisonCase> &Cases);
bool SimplifyEqualityComparisonWithOnlyPredecessor(Instruction *TI,		bool SimplifyEqualityComparisonWithOnlyPredecessor(Instruction *TI,
BasicBlock *Pred,		BasicBlock *Pred,
IRBuilder<> &Builder);		IRBuilder<> &Builder);
bool PerformValueComparisonIntoPredecessorFolding(Instruction TI, Value &CV,		bool PerformValueComparisonIntoPredecessorFolding(Instruction TI, Value &CV,
Show All 27 Lines	bool SimplifyBranchOnICmpChain(BranchInst *BI, IRBuilder<> &Builder,
const DataLayout &DL);		const DataLayout &DL);
bool SimplifySwitchOnSelect(SwitchInst SI, SelectInst Select);		bool SimplifySwitchOnSelect(SwitchInst SI, SelectInst Select);
bool SimplifyIndirectBrOnSelect(IndirectBrInst IBI, SelectInst SI);		bool SimplifyIndirectBrOnSelect(IndirectBrInst IBI, SelectInst SI);
bool TurnSwitchRangeIntoICmp(SwitchInst *SI, IRBuilder<> &Builder);		bool TurnSwitchRangeIntoICmp(SwitchInst *SI, IRBuilder<> &Builder);

public:		public:
SimplifyCFGOpt(const TargetTransformInfo &TTI, DomTreeUpdater *DTU,		SimplifyCFGOpt(const TargetTransformInfo &TTI, DomTreeUpdater *DTU,
const DataLayout &DL, ArrayRef<WeakVH> LoopHeaders,		const DataLayout &DL, ArrayRef<WeakVH> LoopHeaders,
const SimplifyCFGOptions &Opts)		const SimplifyCFGOptions &Opts,
: TTI(TTI), DTU(DTU), DL(DL), LoopHeaders(LoopHeaders), Options(Opts) {		SimplifyCFGCostTracker *CostTracker_)
		: TTI(TTI), DTU(DTU), DL(DL), LoopHeaders(LoopHeaders), Options(Opts),
		CostTracker(CostTracker_ ? CostTracker_ : &LocalCostTracker) {
assert((!DTU \|\| !DTU->hasPostDomTree()) &&		assert((!DTU \|\| !DTU->hasPostDomTree()) &&
"SimplifyCFG is not yet capable of maintaining validity of a "		"SimplifyCFG is not yet capable of maintaining validity of a "
"PostDomTree, so don't ask for it.");		"PostDomTree, so don't ask for it.");
}		}

bool simplifyOnce(BasicBlock *BB);		bool simplifyOnce(BasicBlock *BB);
bool run(BasicBlock *BB);		bool run(BasicBlock *BB);

▲ Show 20 Lines • Show All 3,318 Lines • ▼ Show 20 Lines	static bool isVectorOp(Instruction &I) {
return I.getType()->isVectorTy() \|\| any_of(I.operands(), [](Use &U) {		return I.getType()->isVectorTy() \|\| any_of(I.operands(), [](Use &U) {
return U->getType()->isVectorTy();		return U->getType()->isVectorTy();
});		});
}		}

/// If this basic block is simple enough, and if a predecessor branches to us		/// If this basic block is simple enough, and if a predecessor branches to us
/// and one of our successors, fold the block into the predecessor and use		/// and one of our successors, fold the block into the predecessor and use
/// logical operations to pick the right destination.		/// logical operations to pick the right destination.
bool llvm::FoldBranchToCommonDest(BranchInst BI, DomTreeUpdater DTU,		bool llvm::FoldBranchToCommonDest(BranchInst *BI,
MemorySSAUpdater *MSSAU,		SimplifyCFGCostTracker &CostTracker,
		DomTreeUpdater DTU, MemorySSAUpdater MSSAU,
const TargetTransformInfo *TTI,		const TargetTransformInfo *TTI,
unsigned BonusInstThreshold) {		unsigned BonusInstThreshold) {
// If this block ends with an unconditional branch,		// If this block ends with an unconditional branch,
// let SpeculativelyExecuteBB() deal with it.		// let SpeculativelyExecuteBB() deal with it.
if (!BI->isConditional())		if (!BI->isConditional())
return false;		return false;

BasicBlock *BB = BI->getParent();		BasicBlock *BB = BI->getParent();
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	if (Preds.empty())
return false;		return false;

// Only allow this transformation if computing the condition doesn't involve		// Only allow this transformation if computing the condition doesn't involve
// too many instructions and these involved instructions can be executed		// too many instructions and these involved instructions can be executed
// unconditionally. We denote all involved instructions except the condition		// unconditionally. We denote all involved instructions except the condition
// as "bonus instructions", and only allow this transformation when the		// as "bonus instructions", and only allow this transformation when the
// number of the bonus instructions we'll need to create when cloning into		// number of the bonus instructions we'll need to create when cloning into
// each predecessor does not exceed a certain threshold.		// each predecessor does not exceed a certain threshold.
unsigned NumBonusInsts = 0;
bool SawVectorOp = false;		bool SawVectorOp = false;
const unsigned PredCount = Preds.size();		const unsigned PredCount = Preds.size();
for (Instruction &I : *BB) {		for (Instruction &I : *BB) {
// Don't check the branch condition comparison itself.		// Don't check the branch condition comparison itself.
if (&I == Cond)		if (&I == Cond)
continue;		continue;
// Ignore dbg intrinsics, and the terminator.		// Ignore dbg intrinsics, and the terminator.
if (isa<DbgInfoIntrinsic>(I) \|\| isa<BranchInst>(I))		if (isa<DbgInfoIntrinsic>(I) \|\| isa<BranchInst>(I))
continue;		continue;
// I must be safe to execute unconditionally.		// I must be safe to execute unconditionally.
if (!isSafeToSpeculativelyExecute(&I))		if (!isSafeToSpeculativelyExecute(&I))
return false;		return false;
SawVectorOp \|= isVectorOp(I);		SawVectorOp \|= isVectorOp(I);

// Account for the cost of duplicating this instruction into each		// Account for the cost of duplicating this instruction into each
// predecessor. Ignore free instructions.		// predecessor. Ignore free instructions.
if (!TTI \|\| TTI->getInstructionCost(&I, CostKind) !=		if (!TTI \|\| TTI->getInstructionCost(&I, CostKind) !=
TargetTransformInfo::TCC_Free) {		TargetTransformInfo::TCC_Free) {
NumBonusInsts += PredCount;		for (auto PredBB : Preds) {
		CostTracker.updateNumBonusInsts(PredBB, PredCount);
// Early exits once we reach the limit.		// Early exits once we reach the limit.
if (NumBonusInsts >		if (CostTracker.getNumBonusInsts(PredBB) >
BonusInstThreshold * BranchFoldToCommonDestVectorMultiplier)		BonusInstThreshold * BranchFoldToCommonDestVectorMultiplier)
return false;		return false;
}		}
		}

auto IsBCSSAUse = [BB, &I](Use &U) {		auto IsBCSSAUse = [BB, &I](Use &U) {
auto *UI = cast<Instruction>(U.getUser());		auto *UI = cast<Instruction>(U.getUser());
if (auto *PN = dyn_cast<PHINode>(UI))		if (auto *PN = dyn_cast<PHINode>(UI))
return PN->getIncomingBlock(U) == BB;		return PN->getIncomingBlock(U) == BB;
return UI->getParent() == BB && I.comesBefore(UI);		return UI->getParent() == BB && I.comesBefore(UI);
};		};

// Does this instruction require rewriting of uses?		// Does this instruction require rewriting of uses?
if (!all_of(I.uses(), IsBCSSAUse))		if (!all_of(I.uses(), IsBCSSAUse))
return false;		return false;
}		}
if (NumBonusInsts >		for (auto PredBB : Preds) {
		if (CostTracker.getNumBonusInsts(PredBB) >
BonusInstThreshold *		BonusInstThreshold *
(SawVectorOp ? BranchFoldToCommonDestVectorMultiplier : 1))		(SawVectorOp ? BranchFoldToCommonDestVectorMultiplier : 1))
return false;		return false;
		}

// Ok, we have the budget. Perform the transformation.		// Ok, we have the budget. Perform the transformation.
for (BasicBlock *PredBlock : Preds) {		for (BasicBlock *PredBlock : Preds) {
auto *PBI = cast<BranchInst>(PredBlock->getTerminator());		auto *PBI = cast<BranchInst>(PredBlock->getTerminator());
return performBranchToCommonDestFolding(BI, PBI, DTU, MSSAU, TTI);		return performBranchToCommonDestFolding(BI, PBI, DTU, MSSAU, TTI);
}		}
return false;		return false;
}		}
▲ Show 20 Lines • Show All 3,134 Lines • ▼ Show 20 Lines	if (LandingPadInst *LPad = dyn_cast<LandingPadInst>(I)) {
if (I->isTerminator() && TryToMergeLandingPad(LPad, BI, BB, DTU))		if (I->isTerminator() && TryToMergeLandingPad(LPad, BI, BB, DTU))
return true;		return true;
}		}

// If this basic block is ONLY a compare and a branch, and if a predecessor		// If this basic block is ONLY a compare and a branch, and if a predecessor
// branches to us and our successor, fold the comparison into the		// branches to us and our successor, fold the comparison into the
// predecessor and use logical operations to update the incoming value		// predecessor and use logical operations to update the incoming value
// for PHI nodes in common successor.		// for PHI nodes in common successor.
if (FoldBranchToCommonDest(BI, DTU, /MSSAU=/nullptr, &TTI,		if (FoldBranchToCommonDest(BI, CostTracker, DTU, /MSSAU=*/nullptr, &TTI,
Options.BonusInstThreshold))		Options.BonusInstThreshold))
return requestResimplify();		return requestResimplify();
return false;		return false;
}		}

static BasicBlock allPredecessorsComeFromSameSource(BasicBlock BB) {		static BasicBlock allPredecessorsComeFromSameSource(BasicBlock BB) {
BasicBlock *PredPred = nullptr;		BasicBlock *PredPred = nullptr;
for (auto *P : predecessors(BB)) {		for (auto *P : predecessors(BB)) {
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	if (Imp) {
BI->setCondition(TorF);		BI->setCondition(TorF);
RecursivelyDeleteTriviallyDeadInstructions(OldCond);		RecursivelyDeleteTriviallyDeadInstructions(OldCond);
return requestResimplify();		return requestResimplify();
}		}

// If this basic block is ONLY a compare and a branch, and if a predecessor		// If this basic block is ONLY a compare and a branch, and if a predecessor
// branches to us and one of our successors, fold the comparison into the		// branches to us and one of our successors, fold the comparison into the
// predecessor and use logical operations to pick the right destination.		// predecessor and use logical operations to pick the right destination.
if (FoldBranchToCommonDest(BI, DTU, /MSSAU=/nullptr, &TTI,		if (FoldBranchToCommonDest(BI, CostTracker, DTU, /MSSAU=*/nullptr, &TTI,
Options.BonusInstThreshold))		Options.BonusInstThreshold))
return requestResimplify();		return requestResimplify();

// We have a conditional branch to two blocks that are only reachable		// We have a conditional branch to two blocks that are only reachable
// from BI. We know that the condbr dominates the two blocks, so see if		// from BI. We know that the condbr dominates the two blocks, so see if
// there is any identical code in the "then" and "else" blocks. If so, we		// there is any identical code in the "then" and "else" blocks. If so, we
// can hoist it up to the branching block.		// can hoist it up to the branching block.
if (BI->getSuccessor(0)->getSinglePredecessor()) {		if (BI->getSuccessor(0)->getSinglePredecessor()) {
▲ Show 20 Lines • Show All 282 Lines • ▼ Show 20 Lines	do {
Changed \|= simplifyOnce(BB);		Changed \|= simplifyOnce(BB);
} while (Resimplify);		} while (Resimplify);

return Changed;		return Changed;
}		}

bool llvm::simplifyCFG(BasicBlock *BB, const TargetTransformInfo &TTI,		bool llvm::simplifyCFG(BasicBlock *BB, const TargetTransformInfo &TTI,
DomTreeUpdater *DTU, const SimplifyCFGOptions &Options,		DomTreeUpdater *DTU, const SimplifyCFGOptions &Options,
ArrayRef<WeakVH> LoopHeaders) {		ArrayRef<WeakVH> LoopHeaders,
		SimplifyCFGCostTracker *CostTracker) {
		traUnsubmitted Done Reply Inline Actions This needs to be renamed, too. tra: This needs to be renamed, too.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions will do yaxunl: will do
return SimplifyCFGOpt(TTI, DTU, BB->getModule()->getDataLayout(), LoopHeaders,		return SimplifyCFGOpt(TTI, DTU, BB->getModule()->getDataLayout(), LoopHeaders,
Options)		Options, CostTracker)
.run(BB);		.run(BB);
}		}

llvm/test/Transforms/LoopUnroll/peel-loop-inner.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -S -passes='require<opt-remark-emit>,loop-unroll<peeling;no-runtime>,simplifycfg,instcombine' -unroll-force-peel-count=3 -verify-dom-info \| FileCheck %s			; RUN: opt < %s -S -passes='require<opt-remark-emit>,loop-unroll<peeling;no-runtime>,simplifycfg<bonus-inst-threshold=3>,instcombine' -unroll-force-peel-count=3 -verify-dom-info \| FileCheck %s

	define void @basic(i32 %K, i32 %N) {			define void @basic(i32 %K, i32 %N) {
	; CHECK-LABEL: @basic(			; CHECK-LABEL: @basic(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[OUTER:%.*]]			; CHECK-NEXT: br label [[OUTER:%.*]]
	; CHECK: outer:			; CHECK: outer:
	; CHECK-NEXT: [[I:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[I_INC:%.]], [[OUTER_BACKEDGE:%.]] ]			; CHECK-NEXT: [[I:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[I_INC:%.]], [[OUTER_BACKEDGE:%.]] ]
	; CHECK-NEXT: [[CMP_INNER_PEEL:%.]] = icmp sgt i32 [[K:%.]], 1			; CHECK-NEXT: [[CMP_INNER_PEEL:%.]] = icmp sgt i32 [[K:%.]], 1
	Show All 39 Lines

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -O2 -S < %s \| FileCheck %s			; RUN: opt -bonus-inst-threshold=4 -O2 -S < %s \| FileCheck %s

	target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64--"			target triple = "x86_64--"

	define float @test_merge_allof_v4sf(<4 x float> %t) {			define float @test_merge_allof_v4sf(<4 x float> %t) {
	; CHECK-LABEL: @test_merge_allof_v4sf(			; CHECK-LABEL: @test_merge_allof_v4sf(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[T_FR:%.]] = freeze <4 x float> [[T:%.]]			; CHECK-NEXT: [[T_FR:%.]] = freeze <4 x float> [[T:%.]]
	▲ Show 20 Lines • Show All 614 Lines • Show Last 20 Lines

llvm/test/Transforms/SimplifyCFG/branch-fold-multiple.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; REQUIRES: amdgpu-registered-target			; REQUIRES: amdgpu-registered-target
	; RUN: opt %s -S -o - -simplifycfg \| FileCheck %s			; RUN: opt %s -S -o - -simplifycfg \| FileCheck %s

	target triple = "amdgcn-amd-amdhsa"			target triple = "amdgcn-amd-amdhsa"

	%struct.S = type { [4 x i32] }			%struct.S = type { [4 x i32] }

	; Check the second, third, and fourth basic blocks are folded into			; Check the second basic block is folded into the first basic block
	; the first basic block since each has one bonus intruction, which			; since it has one bonus intruction. The third basic block is not
	; is below the default bouns instruction threshold 2.			; folded into the first basic block since the accumulated bonus
				; instructions will exceed default threshold 2. The fourth basic
				; block is foled into the third basic block since the accumulated
				; bonus instruction cost is 1.

	define zeroext i1 @test1(%struct.S* nocapture noundef nonnull readonly align 4 dereferenceable(16) %this) unnamed_addr #0 align 2 {			define zeroext i1 @test1(%struct.S* nocapture noundef nonnull readonly align 4 dereferenceable(16) %this) unnamed_addr #0 align 2 {
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [[STRUCT_S:%.]], %struct.S* [[THIS:%.*]], i64 0, i32 0, i64 0			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [[STRUCT_S:%.]], %struct.S* [[THIS:%.*]], i64 0, i32 0, i64 0
	; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[ARRAYIDX]], align 4			; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[CMP2:%.*]] = icmp sgt i32 [[TMP0]], 0			; CHECK-NEXT: [[CMP2:%.*]] = icmp sgt i32 [[TMP0]], 0
	; CHECK-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[THIS]], i64 0, i32 0, i64 1			; CHECK-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[THIS]], i64 0, i32 0, i64 1
	; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[ARRAYIDX_1]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[ARRAYIDX_1]], align 4
	; CHECK-NEXT: [[CMP2_1:%.*]] = icmp sgt i32 [[TMP1]], 0			; CHECK-NEXT: [[CMP2_1:%.*]] = icmp sgt i32 [[TMP1]], 0
	; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[CMP2]], i1 true, i1 [[CMP2_1]]			; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[CMP2]], i1 true, i1 [[CMP2_1]]
				; CHECK-NEXT: br i1 [[OR_COND]], label [[CLEANUP:%.]], label [[FOR_COND_1:%.]]
				; CHECK: for.cond.1:
	; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[THIS]], i64 0, i32 0, i64 2			; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[THIS]], i64 0, i32 0, i64 2
	; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[ARRAYIDX_2]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[ARRAYIDX_2]], align 4
	; CHECK-NEXT: [[CMP2_2:%.*]] = icmp sgt i32 [[TMP2]], 0			; CHECK-NEXT: [[CMP2_2:%.*]] = icmp sgt i32 [[TMP2]], 0
	; CHECK-NEXT: [[OR_COND1:%.*]] = select i1 [[OR_COND]], i1 true, i1 [[CMP2_2]]
	; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[THIS]], i64 0, i32 0, i64 3			; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[THIS]], i64 0, i32 0, i64 3
	; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[ARRAYIDX_3]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[ARRAYIDX_3]], align 4
	; CHECK-NEXT: [[CMP2_3:%.*]] = icmp sgt i32 [[TMP3]], 0			; CHECK-NEXT: [[CMP2_3:%.*]] = icmp sgt i32 [[TMP3]], 0
	; CHECK-NEXT: [[OR_COND2:%.*]] = select i1 [[OR_COND1]], i1 true, i1 [[CMP2_3]]			; CHECK-NEXT: [[OR_COND1:%.*]] = select i1 [[CMP2_2]], i1 true, i1 [[CMP2_3]]
	; CHECK-NEXT: [[SPEC_SELECT:%.*]] = select i1 [[OR_COND2]], i1 false, i1 true			; CHECK-NEXT: [[SPEC_SELECT:%.*]] = select i1 [[OR_COND1]], i1 false, i1 true
	; CHECK-NEXT: ret i1 [[SPEC_SELECT]]			; CHECK-NEXT: br label [[CLEANUP]]
				; CHECK: cleanup:
				; CHECK-NEXT: [[CMP:%.]] = phi i1 [ false, [[ENTRY:%.]] ], [ [[SPEC_SELECT]], [[FOR_COND_1]] ]
				; CHECK-NEXT: ret i1 [[CMP]]
	;			;
	entry:			entry:
	%arrayidx = getelementptr inbounds %struct.S, %struct.S* %this, i64 0, i32 0, i64 0			%arrayidx = getelementptr inbounds %struct.S, %struct.S* %this, i64 0, i32 0, i64 0
	%0 = load i32, i32* %arrayidx, align 4			%0 = load i32, i32* %arrayidx, align 4
	%cmp2 = icmp sgt i32 %0, 0			%cmp2 = icmp sgt i32 %0, 0
	br i1 %cmp2, label %cleanup, label %for.cond			br i1 %cmp2, label %cleanup, label %for.cond

	for.cond:			for.cond:
	▲ Show 20 Lines • Show All 118 Lines • Show Last 20 Lines

llvm/test/Transforms/SimplifyCFG/branch-fold-threshold.ll

	; RUN: opt %s -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -S \| FileCheck %s --check-prefix=NORMAL			; RUN: opt %s -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -S \| FileCheck %s --check-prefix=NORMAL
	; RUN: opt %s -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -S -bonus-inst-threshold=2 \| FileCheck %s --check-prefix=AGGRESSIVE			; RUN: opt %s -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -S -bonus-inst-threshold=3 \| FileCheck %s --check-prefix=AGGRESSIVE
	; RUN: opt %s -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -S -bonus-inst-threshold=4 \| FileCheck %s --check-prefix=WAYAGGRESSIVE			; RUN: opt %s -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -S -bonus-inst-threshold=6 \| FileCheck %s --check-prefix=WAYAGGRESSIVE
	; RUN: opt %s -passes=simplifycfg -S \| FileCheck %s --check-prefix=NORMAL			; RUN: opt %s -passes=simplifycfg -S \| FileCheck %s --check-prefix=NORMAL
	; RUN: opt %s -passes='simplifycfg<bonus-inst-threshold=2>' -S \| FileCheck %s --check-prefix=AGGRESSIVE			; RUN: opt %s -passes='simplifycfg<bonus-inst-threshold=3>' -S \| FileCheck %s --check-prefix=AGGRESSIVE
	; RUN: opt %s -passes='simplifycfg<bonus-inst-threshold=4>' -S \| FileCheck %s --check-prefix=WAYAGGRESSIVE			; RUN: opt %s -passes='simplifycfg<bonus-inst-threshold=6>' -S \| FileCheck %s --check-prefix=WAYAGGRESSIVE

	define i32 @foo(i32 %a, i32 %b, i32 %c, i32 %d, i32* %input) {			define i32 @foo(i32 %a, i32 %b, i32 %c, i32 %d, i32* %input) {
	; NORMAL-LABEL: @foo(			; NORMAL-LABEL: @foo(
	; AGGRESSIVE-LABEL: @foo(			; AGGRESSIVE-LABEL: @foo(
	entry:			entry:
	%cmp = icmp sgt i32 %d, 3			%cmp = icmp sgt i32 %d, 3
	br i1 %cmp, label %cond.end, label %lor.lhs.false			br i1 %cmp, label %cond.end, label %lor.lhs.false
	; NORMAL: br i1			; NORMAL: br i1
	▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest-two-preds-cost.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -S -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -bonus-inst-threshold=1 \| FileCheck --check-prefixes=ALL,THR1 %s			; RUN: opt < %s -S -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -bonus-inst-threshold=1 \| FileCheck --check-prefixes=ALL,THR1 %s
	; RUN: opt < %s -S -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -bonus-inst-threshold=2 \| FileCheck --check-prefixes=ALL,THR2 %s			; RUN: opt < %s -S -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -bonus-inst-threshold=3 \| FileCheck --check-prefixes=ALL,THR2 %s

	declare void @sideeffect0()			declare void @sideeffect0()
	declare void @sideeffect1()			declare void @sideeffect1()
	declare void @sideeffect2()			declare void @sideeffect2()
	declare void @use8(i8)			declare void @use8(i8)
	declare i1 @gen1()			declare i1 @gen1()

	; Here we'd want to duplicate %v3_adj into two predecessors,			; Here we'd want to duplicate %v3_adj into two predecessors,
	; but -bonus-inst-threshold=1 says that we can only clone it into one.			; but -bonus-inst-threshold=1 says that we can only clone it into one.
	; With -bonus-inst-threshold=2 we can clone it into both though.			; With -bonus-inst-threshold=3 we can clone it into both though.
	define void @two_preds_with_extra_op(i8 %v0, i8 %v1, i8 %v2, i8 %v3) {			define void @two_preds_with_extra_op(i8 %v0, i8 %v1, i8 %v2, i8 %v3) {
	; THR1-LABEL: @two_preds_with_extra_op(			; THR1-LABEL: @two_preds_with_extra_op(
	; THR1-NEXT: entry:			; THR1-NEXT: entry:
	; THR1-NEXT: [[C0:%.]] = icmp eq i8 [[V0:%.]], 0			; THR1-NEXT: [[C0:%.]] = icmp eq i8 [[V0:%.]], 0
	; THR1-NEXT: br i1 [[C0]], label [[PRED0:%.]], label [[PRED1:%.]]			; THR1-NEXT: br i1 [[C0]], label [[PRED0:%.]], label [[PRED1:%.]]
	; THR1: pred0:			; THR1: pred0:
	; THR1-NEXT: [[C1:%.]] = icmp eq i8 [[V1:%.]], 0			; THR1-NEXT: [[C1:%.]] = icmp eq i8 [[V1:%.]], 0
	; THR1-NEXT: br i1 [[C1]], label [[FINAL_LEFT:%.]], label [[DISPATCH:%.]]			; THR1-NEXT: br i1 [[C1]], label [[FINAL_LEFT:%.]], label [[DISPATCH:%.]]
	▲ Show 20 Lines • Show All 117 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] accumulate bonus insts costClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 459509

llvm/include/llvm/Transforms/Utils/Local.h

llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp

llvm/lib/Transforms/Utils/LoopSimplify.cpp

llvm/lib/Transforms/Utils/SimplifyCFG.cpp

llvm/test/Transforms/LoopUnroll/peel-loop-inner.ll

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll

llvm/test/Transforms/SimplifyCFG/branch-fold-multiple.ll

llvm/test/Transforms/SimplifyCFG/branch-fold-threshold.ll

llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest-two-preds-cost.ll

[SimplifyCFG] accumulate bonus insts cost
ClosedPublic