This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/
-
CodeGen/
27/37
CodeGenPrepare.cpp
-
test/
-
CodeGen/
-
AArch64/
-
and-sink.ll
-
X86/
-
and-sink.ll
-
Transforms/CodeGenPrepare/
-
CodeGenPrepare/
-
ARM/
-
sinkchain-inseltpoison.ll
-
sinkchain.ll
-
X86/
-
gather-scatter-opt-inseltpoison.ll

Differential D129352

[CodeGen] Limit building time for CodeGenPrepare
ClosedPublic

Authored by xiangzhangllvm on Jul 8 2022, 1:00 AM.

Download Raw Diff

Details

Reviewers

craig.topper
RKSimon
pengfei
LuoYuanke
yubing
nikic
efriedma
skan
echristo
rahmanl
MaskRay
hoy
spatel

Commits

rG16743c953441: [CodeGen] Limit building time in CodeGenPrepare for huge function

Summary

Currently CodeGenPrepare is very time consuming in handling big functions.

Old Algorithm :
It iterate each BB in function, and go on handle very instructions in BB.
Due to some instruction optimizations may affect the BBs' dominate tree.
The old logic will re-iterate and try optimize for each BB.

Suppose we have a big function with 20000 BBs, If we handled the last BB
with fine tuning the dominate tree. We need totally re-iterate and try optimize
the 20000 BBs from the beginning.

The Complex is near N!

And we really encounter somes big tests (> 20000 BBs) that cost more than 30 mins in this pass.
(Debug version compiler will cost 2 hours here)

What this patch do for huge function ?
It mainly changes the iteration way for optimization.
1 First we do optimizeBlock for each BB (that is same with old way).
And, in the meaning time, If BB is changed/updated in the optimization, it will be put into FreshBBs (try do optimizeBlock again).
The new created BB at previous iteration will also put into FreshBBs.

2 For the BBs which not updated at previous iteration, we directly skip it.

Strictly speaking, here may miss some opportunity, but I think the probability is very small.
(from my local performance test for specs and some other perf tests, there is no harm to performance. )

3 For Instructions in single BB, we do optimizeInst for each instruction.
If optimizeInst change the instruction dominator in this BB, rather than break and go back to optimize the first BB (the old way),
we directly iterate instructions (to do optimizeInst) in this updated BB again (the new way).

What this patch do for small/normal (not huge) function ?
It is same with the Old Algorithm. (NFC)

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

xiangzhangllvm created this revision.Jul 8 2022, 1:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 8 2022, 1:00 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

xiangzhangllvm requested review of this revision.Jul 8 2022, 1:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 8 2022, 1:00 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B174324: Diff 443160.Jul 8 2022, 1:50 AM

optimizeGatherScatterInst is a mandatory procedure where codegenpre convert GEP(splat base ptr) into GEP(scalar base ptr) since SelectionDAGBuilder only handle GEP(scalar base ptr). please take a look at optimizeGatherScatterInst's comments

xiangzhangllvm added reviewers: craig.topper, RKSimon, pengfei, LuoYuanke.Jul 11 2022, 2:06 AM

Herald added a subscriber: StephenFan. · View Herald TranscriptJul 11 2022, 2:06 AM

xiangzhangllvm added a reviewer: yubing.Jul 11 2022, 2:06 AM

In D129352#3641747, @yubing wrote:

optimizeGatherScatterInst is a mandatory procedure where codegenpre convert GEP(splat base ptr) into GEP(scalar base ptr) since SelectionDAGBuilder only handle GEP(scalar base ptr). please take a look at optimizeGatherScatterInst's comments

Thanks a lot! But this has been converted in the first time. We just skip BB which has been optimized.

nikic added a subscriber: nikic.Jul 11 2022, 2:16 AM

nikic added inline comments.

llvm/lib/CodeGen/CodeGenPrepare.cpp
585–586	Before introducing limits, can we try to remove this case? That is, continue simplifying the rest of the blocks before we start a new iteration? I expect that the current implementation looks like this to avoid some iterator invalidation issues, but there's probably a way to avoid that with a bit more effort.

In D129352#3641942, @xiangzhangllvm wrote:

In D129352#3641747, @yubing wrote:

optimizeGatherScatterInst is a mandatory procedure where codegenpre convert GEP(splat base ptr) into GEP(scalar base ptr) since SelectionDAGBuilder only handle GEP(scalar base ptr). please take a look at optimizeGatherScatterInst's comments

Thanks a lot! But this has been converted in the first time. We just skip BB which has been optimized.

if GEP and Gather/Scatter are not in the same bb, we don't do optimize GEP in the first round. When it is second round and GEP and Gather/Scatter are put in the same bb, your code will let the GEP's optimization not happen.

// If the GEP and the gather/scatter aren't in the same BB, don't optimize.
// FIXME: We should support this by sinking the GEP.
if (MemoryInst->getParent() != GEP->getParent())
  return false;

In D129352#3642318, @yubing wrote:
if GEP and Gather/Scatter are not in the same bb, we don't do optimize GEP in the first round. When it is second round and GEP and Gather/Scatter are put in the same bb, your code will let the GEP's optimization not happen.
// If the GEP and the gather/scatter aren't in the same BB, don't optimize.
// FIXME: We should support this by sinking the GEP.
if (MemoryInst->getParent() != GEP->getParent())
  return false;

Let me try to find a test case. (I default enable BuildTimeLimit in my local, didn't meet any compiling fail test case yet.)
I think we should go further to find here " GEP's optimization is Mandatory" is make sense or not.

llvm/lib/CodeGen/CodeGenPrepare.cpp
585–586	Thanks a lot for reviewing this patch! but if the CFG changed, we should re-start the iteration, or the iteration may have problem.

In D129352#3643924, @xiangzhangllvm wrote:
In D129352#3642318, @yubing wrote:
if GEP and Gather/Scatter are not in the same bb, we don't do optimize GEP in the first round. When it is second round and GEP and Gather/Scatter are put in the same bb, your code will let the GEP's optimization not happen.
// If the GEP and the gather/scatter aren't in the same BB, don't optimize.
// FIXME: We should support this by sinking the GEP.
if (MemoryInst->getParent() != GEP->getParent())
  return false;
Let me try to find a test case. (I default enable BuildTimeLimit in my local, didn't meet any compiling fail test case yet.)
I think we should go further to find here " GEP's optimization is Mandatory" is make sense or not.

The GEP optimization isn't mandatory. It will only generate worse code. CodeGenPrepare doesn't run at -O0 so nothing in it can be mandatory.

In D129352#3643924, @xiangzhangllvm wrote:

In D129352#3642318, @yubing wrote:

if GEP and Gather/Scatter are not in the same bb, we don't do optimize GEP in the first round. When it is second round and GEP and Gather/Scatter are put in the same bb, your code will let the GEP's optimization not happen.

Let me try to find a test case. (I default enable BuildTimeLimit in my local, didn't meet any compiling fail test case yet.)
I think we should go further to find here " GEP's optimization is Mandatory" is make sense or not.

I didn't encounter Gather/Scatter build fails.

I did a local experiment：
Even I totally disable all the optimizeBlock, I only meet one build fail in isel for test/CodeGen/Thumb2/mve-pred-vselect.ll.
(affected with optimizeSelectInst)

-  bool MadeChange = true;
+ bool MadeChange = false;      // let it  not do optimization at all
  while (MadeChange) {
    MadeChange = false;
    DT.reset();
    for (BasicBlock &BB : llvm::make_early_inc_range(F)) {
      bool ModifiedDTOnIteration = false;
      MadeChange |= optimizeBlock(BB, ModifiedDTOnIteration);

I am not familiar with Thumb2's ISel. I don't think it is make sense that its correctness base on such optimization.
ping the test contributor @david-arm
(Anyway this patch didn't affect it.)

In D129352#3644205, @craig.topper wrote:

The GEP optimization isn't mandatory. It will only generate worse code. CodeGenPrepare doesn't run at -O0 so nothing in it can be mandatory.

Yes, and for this patch, I think it may not worse, only a few of instruction optimization will affect (just fine tuning) the CFG, if the BB has optimized, it will has little optimization chance again. I tested the spec performance, it has almost no affect.
If we not consider limited the building time, it very hurt to the JIT compiler. (in fact, it is our dpc++ user blame such compiling time)

xiangzhangllvm edited the summary of this revision. (Show Details)Jul 11 2022, 10:16 PM

nikic added inline comments.Jul 12 2022, 1:23 AM

llvm/lib/CodeGen/CodeGenPrepare.cpp
585–586	This is what we currently do, but the current implementation has a lot of room for improvement. For example, combineToUAddWithOverflow() will set ModifiedDT even though it does not modify control flow at all! It only does it to avoid iterating over a dead instruction in the same block. This can be solved by either only restarting the iteration in the BB (rather than the whole function), or by passing in the iterator to optimizeInst() so it can be moved in such case, or similar. For cases that actually do modify the CFG, we could use DTU instead of full DT recomputation, and we could maybe switch to a BB worklist, so new blocks can be queued, and deleted blocks removed from the worklist. Nobody has bothered to do this yet, because it never came up as a problem. Now that it did, things should get fixed properly.

xiangzhangllvm added inline comments.Jul 12 2022, 1:46 AM

llvm/lib/CodeGen/CodeGenPrepare.cpp
585–586	and we could maybe switch to a BB worklist, so new blocks can be queued, and deleted blocks removed from the worklist. Oh, That is very good suggestion ! Let's first break the N! Algorithm complexity . For cases that actually do modify the CFG, we could use DTU instead of full DT recomputation, According to my observation, the DT recomputation didn't consume too much time. Anyway we can refine it if one day we encounter such case. Thank you so much!

xiangzhangllvm added a reviewer: nikic.Jul 12 2022, 1:47 AM

I am implementing the refine job, but It's more complicated than we thought.

The most trouble is
The optimizeInst will affect other BBs even it do not change the Dominate Tree.

So, in addition to collected new created BBs and erase deleted BBs (for worklist).
we still need to track each optimization to check if they affect other BBs and collected the BBs (into worklist) to re-optimize them.

Refine:
1 Add SetVector FreshBBs to trace the BBs need to be optimized.
2 Add enum ModifyDT { NotModifyDT, ModifyBBDT, ModifyInstDT } to spilt out the iteration for instruction in single BB.
3 Modify some optimizations to re-optimized updated BBs

(All lit tests pass, my local big case still works well: 30mins --> 5 mins)

Harbormaster completed remote builds in B176403: Diff 446014.Jul 19 2022, 8:04 PM

RKSimon added inline comments.Jul 22 2022, 3:01 AM

llvm/lib/CodeGen/CodeGenPrepare.cpp
281	Please can you add doc comments to describe the enum values?
341	comment

Address Simon's comments. Thank you for reviewing!

xiangzhangllvm marked 2 inline comments as done.Jul 24 2022, 8:19 PM

Harbormaster completed remote builds in B177272: Diff 447184.Jul 24 2022, 8:54 PM

xiangzhangllvm edited the summary of this revision. (Show Details)Jul 25 2022, 6:00 PM

xiangzhangllvm added inline comments.Jul 26 2022, 6:09 PM

llvm/lib/CodeGen/CodeGenPrepare.cpp
1035	Here hurt the performance, because It can't make sure the BBs optimization order. Let me update here.

The order of inserting updated BB may be ataxic, and hurt performance.
So this patch is to let the BB iteration in order.
(In my local testing, it still works well in reducing time building for big/huge test cases.)

Herald added a subscriber: kristof.beyls. · View Herald TranscriptJul 26 2022, 7:10 PM

Harbormaster completed remote builds in B177776: Diff 447907.Jul 26 2022, 8:13 PM

ping :)

Hello @nikic, your suggestions are good. I have updated the patch.
Could you help review? many thanks!

In the latest update, I removed the worklist, due to I find it hard to control the updated BBs' order when push them in worklist.
And this cause performance regression in my local. So I update it with checking set FreshBBs.

xiangzhangllvm added a reviewer: efriedma.Aug 3 2022, 5:45 PM

xiangzhangllvm added a reviewer: skan.Aug 7 2022, 10:10 PM

ping : )

Hi, reviewers, I think you have concerns about this new logic.
Compared with old logic, It really may miss ,in theory, some optimization opportunity.
But build time is still a big problem now.
So I plan to let this new logic only works for very large cases.
If you still has any concerns or good idea, pls let me know, thanks!

Split the logic for small/normal function (NFC) and huge function (Consider building time).

xiangzhangllvm added reviewers: echristo, rahmanl, MaskRay, hoy.Aug 24 2022, 6:45 PM

xiangzhangllvm edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B183284: Diff 455447.Aug 24 2022, 9:12 PM

LuoYuanke added inline comments.Aug 28 2022, 6:18 PM

llvm/lib/CodeGen/CodeGenPrepare.cpp
601	The sinked BB also changed and may have several sinked BBs. Do we record the all sinked BBs in FreshBBs?
605	indent.

xiangzhangllvm added inline comments.Aug 29 2022, 5:52 PM

llvm/lib/CodeGen/CodeGenPrepare.cpp
601	Yes, all updated BB will be put into FreshBBs. Most of them are inserted in updated replaceAllUsesWith. For BB which no update, we just optimize them one time.
605	good catch!

Do clang format (I add an independent commit to clang format all this file at a808ac2e42a94f4e440ebd21fd8b759dae0b6a05)

xiangzhangllvm marked an inline comment as done.Aug 30 2022, 2:01 AM

LuoYuanke added inline comments.Aug 30 2022, 2:38 AM

llvm/lib/CodeGen/CodeGenPrepare.cpp
263	Pls add cases for the patch with HugeFuncThresholdInCGPP be specified with small number.
582–583	As @nikic suggest, is it better to have worklist for the un-visited BB or changed BB? But it seems we may need to avoid double pushing BB.
1031	OldI (old instruction)?
1059	How can we remider devloper that FreshBB should be updated for their new code?
2158–2160	Ditto.
2436	Why need this change? Isn't it covered by line 2431?
7106	The instruction is cloned. Why the def of its operand need to be sinked?
7136	Should we insert the BB to FreshBBs?
8220	It should be buggy if the following instrcutions in the BB depends on the dominator tree.

Harbormaster completed remote builds in B184092: Diff 456569.Aug 30 2022, 2:56 AM

xiangzhangllvm added inline comments.Aug 30 2022, 6:38 PM

llvm/lib/CodeGen/CodeGenPrepare.cpp
582–583	FreshBBs is that worklist. For huge function FreshBBs will collected all changed BB. (All un-visited BB will be visited at first iteration. Then set FuncIterated = true )
1031	let me refine it, thanks.
1059	Let me add requirement in FreshBB's comment.
2436	The BB passed in can be no Terminator, so BB->getTerminator() may be nullptr. dyn_cast can not handle nullptr.
7106	When we copy an Instruction (to a new place) to do optimization, means we updated an instruction. This instruction's operand defs may have new opportunity to sink to the optimized new instruction.
7136	Strictly speaking we should do. But this instruction has been erased. All the opportunity about this "updated" instruction is disappear.
8220	The break will try re-iterate such BB. Do you mean here the BB may be erased ? I think the optimizeInst can not erase the BB. If it split the BB, it will split it to the BB + new BB, the iteration on such BB still work.

LuoYuanke added inline comments.Aug 30 2022, 8:39 PM

llvm/lib/CodeGen/CodeGenPrepare.cpp
8220	I mean the dominator tree has changed but it is not updated yet. We can't invoke `DT.dominates(...)` before we updating the dominator tree.

LuoYuanke added inline comments.Aug 30 2022, 8:41 PM

llvm/lib/CodeGen/CodeGenPrepare.cpp
7106	Here we just copy the instruction. We know it is sinked when it is erased in line 7135.

xiangzhangllvm added inline comments.Aug 30 2022, 10:48 PM

llvm/lib/CodeGen/CodeGenPrepare.cpp
263	Sure
7106	Let me try give an example soon.
8220	Make sense, let me reset the DT here, thanks!

Do we need the IsHugeFunc flag? Can't we use the worklist approach unconditionally? This is what we usually do. Limiting it to IsHugeFunc means that it received essentially zero testing.

In D129352#3760370, @nikic wrote:

Do we need the IsHugeFunc flag? Can't we use the worklist approach unconditionally? This is what we usually do. Limiting it to IsHugeFunc means that it received essentially zero testing.

I think IsHugeFunc can help avoid runtime performance regression on real application. We can specify IsHugeFunc as 0 in test case and then shrink the default IsHugeFunc size after this patch. Finally we can remove IsHugeFunc.

In D129352#3760376, @LuoYuanke wrote:

In D129352#3760370, @nikic wrote:

Do we need the IsHugeFunc flag? Can't we use the worklist approach unconditionally? This is what we usually do. Limiting it to IsHugeFunc means that it received essentially zero testing.

I think IsHugeFunc can help avoid runtime performance regression on real application. We can specify IsHugeFunc as 0 in test case and then shrink the default IsHugeFunc size after this patch. Finally we can remove IsHugeFunc.

Is there any reason to expect this to materially affect runtime performance?

In D129352#3760394, @nikic wrote:

In D129352#3760376, @LuoYuanke wrote:

In D129352#3760370, @nikic wrote:

Do we need the IsHugeFunc flag? Can't we use the worklist approach unconditionally? This is what we usually do. Limiting it to IsHugeFunc means that it received essentially zero testing.

I think IsHugeFunc can help avoid runtime performance regression on real application. We can specify IsHugeFunc as 0 in test case and then shrink the default IsHugeFunc size after this patch. Finally we can remove IsHugeFunc.

Is there any reason to expect this to materially affect runtime performance?

I think it is more conservative and we can move on by tuning the IsHugeFunc size step by step. Otherwise we may revert the whole patch if someone report regression after the patch landing for a long while. Ideally, I agree to remove IsHugeFunc.

In D129352#3760426, @LuoYuanke wrote:

I think it is more conservative and we can move on by tuning the IsHugeFunc size step by step. Otherwise we may revert the whole patch if someone report regression after the patch landing for a long while. Ideally, I agree to remove IsHugeFunc.

IMHO, if it causes issues we want to find out early and revert the whole patch, rather than just exposing these issues in very rare circumstances.

xiangzhangllvm added inline comments.Aug 31 2022, 2:39 AM

llvm/lib/CodeGen/CodeGenPrepare.cpp

7106

I find an ARM case about it, it not about sink, it about copy 2 times:

1st time: copy insertelement from entry to vector.body

entry:
  ...
1  %l0 = trunc i16 %x to i8
2  %l1 = insertelement <8 x i8> undef, i8 %l0, i32 0
  ...
3  br label %vector.body

vector.body:                                      ; preds = %vector.body, %entry
  ...
4  %0 = insertelement <8 x i8> undef, i8 %l0, i32 0       // copy from line 2
5  %1 = shufflevector <8 x i8> %0, <8 x i8> undef, <8 x i32> zeroinitializer
6  %l9 = mul <8 x i8> %1, %l8

2nd time, copy insertelement's operand trunc

  ...
1  %l0 = trunc i16 %x to i8 
2  %l1 = insertelement <8 x i8> undef, i8 %l0, i32 0
  ...
3  br label %vector.body

vector.body:                                      ; preds = %vector.body, %entry
  ...
  %0 = trunc i16 %x to i8                           // 2nd copy optimization

  %1 = insertelement <8 x i8> undef, i8 %0, i32 0    // use 2nd copy
  %2 = shufflevector <8 x i8> %1, <8 x i8> undef, <8 x i32> zeroinitializer
  %l9 = mul <8 x i8> %2, %l8

In D129352#3760370, @nikic wrote:

Do we need the IsHugeFunc flag? Can't we use the worklist approach unconditionally? This is what we usually do. Limiting it to IsHugeFunc means that it received essentially zero testing.

I tend first step to add this IsHugeFunc flag, because the new logic may miss some opportunity, compared with old logic, in theory.
To save build time, the new logic only re-optimize updated BBs. In fact, the un-updated BB may still has opportunity to sink/combine to updated BBs.

LuoYuanke added inline comments.Aug 31 2022, 2:48 AM

llvm/lib/CodeGen/CodeGenPrepare.cpp
7106	Is the `insertelement` in entry BB erased for the first time?

xiangzhangllvm added inline comments.Aug 31 2022, 2:51 AM

llvm/lib/CodeGen/CodeGenPrepare.cpp
7106	It is not erased, because it still has user in entry BB. But it is possible to erase.

In D129352#3760693, @xiangzhangllvm wrote:

In D129352#3760370, @nikic wrote:

Do we need the IsHugeFunc flag? Can't we use the worklist approach unconditionally? This is what we usually do. Limiting it to IsHugeFunc means that it received essentially zero testing.

I tend first step to add this IsHugeFunc flag, because the new logic may miss some opportunity, compared with old logic, in theory.
To save build time, the new logic only re-optimize updated BBs. In fact, the un-updated BB may still has opportunity to sink/combine to updated BBs.

And
In fact the performance of new logic is very close to the old logic now.
Even here removed IsHugeFunc flag, all lit tests has no change.
And I also do spec testing (x86), there is no drop in my local.

add option -cgpp-huge-func to tests

Harbormaster completed remote builds in B184335: Diff 456921.Aug 31 2022, 5:09 AM

xiangzhangllvm updated this revision to Diff 457173.Aug 31 2022, 10:06 PM

Address Yuanke's comments

Harbormaster completed remote builds in B184512: Diff 457173.Aug 31 2022, 10:47 PM

The fails test msan_debug_info.ll has no relation with such pass, It must be a mistake, because it just run msan related passes (not run codegenparepare pass at all):

can reproduce by (in windows)

$ "c:\work\tests\llvm-premerge-tests\llvm-project\build\bin\opt.exe" "-passes=module(msan-module),function(msan)" "-msan-instrumentation-with-call-threshold=0" "-msan-track-origins=1" "-S" "./test/Instrumentation/MemorySanitizer/msan_debug_info.ll" -o  t.s --print-after-all &> pp

I'll set the cgpp-huge-func default value to 10000 back at first stage.

xiangzhangllvm updated this revision to Diff 457900.Sep 5 2022, 1:32 AM

LuoYuanke added inline comments.Sep 5 2022, 1:34 AM

llvm/lib/CodeGen/CodeGenPrepare.cpp
8221	It seems not necessary to getDT immedately after `DT.reset`. The lazy `getDT` should help on compiling time.

Harbormaster completed remote builds in B185044: Diff 457900.Sep 5 2022, 2:23 AM

xiangzhangllvm added inline comments.Sep 6 2022, 12:14 AM

llvm/lib/CodeGen/CodeGenPrepare.cpp
8221	If optimizeInst direct use the DT will fail. I meet some tests fail here before. So I add the getDT here.

LGTM.

This revision is now accepted and ready to land.Sep 6 2022, 12:37 AM

This revision was landed with ongoing or failed builds.Sep 6 2022, 7:10 PM

Closed by commit rG16743c953441: [CodeGen] Limit building time in CodeGenPrepare for huge function (authored by xiangzhangllvm). · Explain Why

This revision was automatically updated to reflect the committed changes.

xiangzhangllvm added a commit: rG16743c953441: [CodeGen] Limit building time in CodeGenPrepare for huge function.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

CodeGenPrepare.cpp

255 lines

test/

CodeGen/

AArch64/

and-sink.ll

1 line

X86/

and-sink.ll

1 line

Transforms/

CodeGenPrepare/

ARM/

sinkchain-inseltpoison.ll

1 line

sinkchain.ll

1 line

X86/

gather-scatter-opt-inseltpoison.ll

1 line

Diff 458344

llvm/lib/CodeGen/CodeGenPrepare.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 253 Lines • ▼ Show 20 Lines	static cl::opt<bool>
VerifyBFIUpdates("cgp-verify-bfi-updates", cl::Hidden, cl::init(false),		VerifyBFIUpdates("cgp-verify-bfi-updates", cl::Hidden, cl::init(false),
cl::desc("Enable BFI update verification for "		cl::desc("Enable BFI update verification for "
"CodeGenPrepare."));		"CodeGenPrepare."));

static cl::opt<bool>		static cl::opt<bool>
OptimizePhiTypes("cgp-optimize-phi-types", cl::Hidden, cl::init(false),		OptimizePhiTypes("cgp-optimize-phi-types", cl::Hidden, cl::init(false),
cl::desc("Enable converting phi types in CodeGenPrepare"));		cl::desc("Enable converting phi types in CodeGenPrepare"));

		static cl::opt<unsigned>
		HugeFuncThresholdInCGPP("cgpp-huge-func", cl::init(10000), cl::Hidden,
		LuoYuankeUnsubmitted Done Reply Inline Actions Pls add cases for the patch with HugeFuncThresholdInCGPP be specified with small number. LuoYuanke: Pls add cases for the patch with HugeFuncThresholdInCGPP be specified with small number.
		xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions Sure xiangzhangllvm: Sure
		cl::desc("Least BB number of huge function."));

namespace {		namespace {

enum ExtType {		enum ExtType {
ZeroExtension, // Zero extension has been seen.		ZeroExtension, // Zero extension has been seen.
SignExtension, // Sign extension has been seen.		SignExtension, // Sign extension has been seen.
BothExtension // This extension type is used if we saw sext after		BothExtension // This extension type is used if we saw sext after
// ZeroExtension had been set, or if we saw zext after		// ZeroExtension had been set, or if we saw zext after
// SignExtension had been set. It makes the type		// SignExtension had been set. It makes the type
// information of a promoted instruction invalid.		// information of a promoted instruction invalid.
};		};

		enum ModifyDT {
		NotModifyDT, // Not Modify any DT.
		ModifyBBDT, // Modify the Basic Block Dominator Tree.
		ModifyInstDT // Modify the Instruction Dominator in a Basic Block,
		// This usually means we move/delete/insert instruction
		RKSimonUnsubmitted Done Reply Inline Actions Please can you add doc comments to describe the enum values? RKSimon: Please can you add doc comments to describe the enum values?
		// in a Basic Block. So we should re-iterate instructions
		// in such Basic Block.
		};

using SetOfInstrs = SmallPtrSet<Instruction *, 16>;		using SetOfInstrs = SmallPtrSet<Instruction *, 16>;
using TypeIsSExt = PointerIntPair<Type *, 2, ExtType>;		using TypeIsSExt = PointerIntPair<Type *, 2, ExtType>;
using InstrToOrigTy = DenseMap<Instruction *, TypeIsSExt>;		using InstrToOrigTy = DenseMap<Instruction *, TypeIsSExt>;
using SExts = SmallVector<Instruction *, 16>;		using SExts = SmallVector<Instruction *, 16>;
using ValueToSExts = DenseMap<Value *, SExts>;		using ValueToSExts = DenseMap<Value *, SExts>;

class TypePromotionTransaction;		class TypePromotionTransaction;

Show All 39 Lines	class CodeGenPrepare : public FunctionPass {
/// size.		/// size.
MapVector<AssertingVH<Value>,		MapVector<AssertingVH<Value>,
SmallVector<std::pair<AssertingVH<GetElementPtrInst>, int64_t>, 32>>		SmallVector<std::pair<AssertingVH<GetElementPtrInst>, int64_t>, 32>>
LargeOffsetGEPMap;		LargeOffsetGEPMap;

/// Keep track of new GEP base after splitting the GEPs having large offset.		/// Keep track of new GEP base after splitting the GEPs having large offset.
SmallSet<AssertingVH<Value>, 2> NewGEPBases;		SmallSet<AssertingVH<Value>, 2> NewGEPBases;

/// Map serial numbers to Large offset GEPs.		/// Map serial numbers to Large offset GEPs.
		RKSimonUnsubmitted Done Reply Inline Actions comment RKSimon: comment
DenseMap<AssertingVH<GetElementPtrInst>, int> LargeOffsetGEPID;		DenseMap<AssertingVH<GetElementPtrInst>, int> LargeOffsetGEPID;

/// Keep track of SExt promoted.		/// Keep track of SExt promoted.
ValueToSExts ValToSExtendedUses;		ValueToSExts ValToSExtendedUses;

/// True if the function has the OptSize attribute.		/// True if the function has the OptSize attribute.
bool OptSize;		bool OptSize;

/// DataLayout for the Function being processed.		/// DataLayout for the Function being processed.
const DataLayout *DL = nullptr;		const DataLayout *DL = nullptr;

/// Building the dominator tree can be expensive, so we only build it		/// Building the dominator tree can be expensive, so we only build it
/// lazily and update it when required.		/// lazily and update it when required.
std::unique_ptr<DominatorTree> DT;		std::unique_ptr<DominatorTree> DT;

public:		public:
		/// If encounter huge function, we need to limit the build time.
		bool IsHugeFunc = false;

		/// FreshBBs is like worklist, it collected the updated BBs which need
		/// to be optimized again.
		/// Note: Consider building time in this pass, when a BB updated, we need
		/// to insert such BB into FreshBBs for huge function.
		SmallSet<BasicBlock *, 32> FreshBBs;

static char ID; // Pass identification, replacement for typeid		static char ID; // Pass identification, replacement for typeid

CodeGenPrepare() : FunctionPass(ID) {		CodeGenPrepare() : FunctionPass(ID) {
initializeCodeGenPreparePass(*PassRegistry::getPassRegistry());		initializeCodeGenPreparePass(*PassRegistry::getPassRegistry());
}		}

bool runOnFunction(Function &F) override;		bool runOnFunction(Function &F) override;

Show All 40 Lines	private:
bool eliminateFallThrough(Function &F);		bool eliminateFallThrough(Function &F);
bool eliminateMostlyEmptyBlocks(Function &F);		bool eliminateMostlyEmptyBlocks(Function &F);
BasicBlock findDestBlockOfMergeableEmptyBlock(BasicBlock BB);		BasicBlock findDestBlockOfMergeableEmptyBlock(BasicBlock BB);
bool canMergeBlocks(const BasicBlock BB, const BasicBlock DestBB) const;		bool canMergeBlocks(const BasicBlock BB, const BasicBlock DestBB) const;
void eliminateMostlyEmptyBlock(BasicBlock *BB);		void eliminateMostlyEmptyBlock(BasicBlock *BB);
bool isMergingEmptyBlockProfitable(BasicBlock BB, BasicBlock DestBB,		bool isMergingEmptyBlockProfitable(BasicBlock BB, BasicBlock DestBB,
bool isPreheader);		bool isPreheader);
bool makeBitReverse(Instruction &I);		bool makeBitReverse(Instruction &I);
bool optimizeBlock(BasicBlock &BB, bool &ModifiedDT);		bool optimizeBlock(BasicBlock &BB, ModifyDT &ModifiedDT);
bool optimizeInst(Instruction *I, bool &ModifiedDT);		bool optimizeInst(Instruction *I, ModifyDT &ModifiedDT);
bool optimizeMemoryInst(Instruction MemoryInst, Value Addr, Type *AccessTy,		bool optimizeMemoryInst(Instruction MemoryInst, Value Addr, Type *AccessTy,
unsigned AddrSpace);		unsigned AddrSpace);
bool optimizeGatherScatterInst(Instruction MemoryInst, Value Ptr);		bool optimizeGatherScatterInst(Instruction MemoryInst, Value Ptr);
bool optimizeInlineAsmInst(CallInst *CS);		bool optimizeInlineAsmInst(CallInst *CS);
bool optimizeCallInst(CallInst *CI, bool &ModifiedDT);		bool optimizeCallInst(CallInst *CI, ModifyDT &ModifiedDT);
bool optimizeExt(Instruction *&I);		bool optimizeExt(Instruction *&I);
bool optimizeExtUses(Instruction *I);		bool optimizeExtUses(Instruction *I);
bool optimizeLoadExt(LoadInst *Load);		bool optimizeLoadExt(LoadInst *Load);
bool optimizeShiftInst(BinaryOperator *BO);		bool optimizeShiftInst(BinaryOperator *BO);
bool optimizeFunnelShift(IntrinsicInst *Fsh);		bool optimizeFunnelShift(IntrinsicInst *Fsh);
bool optimizeSelectInst(SelectInst *SI);		bool optimizeSelectInst(SelectInst *SI);
bool optimizeShuffleVectorInst(ShuffleVectorInst *SVI);		bool optimizeShuffleVectorInst(ShuffleVectorInst *SVI);
bool optimizeSwitchType(SwitchInst *SI);		bool optimizeSwitchType(SwitchInst *SI);
bool optimizeSwitchPhiConstants(SwitchInst *SI);		bool optimizeSwitchPhiConstants(SwitchInst *SI);
bool optimizeSwitchInst(SwitchInst *SI);		bool optimizeSwitchInst(SwitchInst *SI);
bool optimizeExtractElementInst(Instruction *Inst);		bool optimizeExtractElementInst(Instruction *Inst);
bool dupRetToEnableTailCallOpts(BasicBlock *BB, bool &ModifiedDT);		bool dupRetToEnableTailCallOpts(BasicBlock *BB, ModifyDT &ModifiedDT);
bool fixupDbgValue(Instruction *I);		bool fixupDbgValue(Instruction *I);
bool placeDbgValues(Function &F);		bool placeDbgValues(Function &F);
bool placePseudoProbes(Function &F);		bool placePseudoProbes(Function &F);
bool canFormExtLd(const SmallVectorImpl<Instruction *> &MovedExts,		bool canFormExtLd(const SmallVectorImpl<Instruction *> &MovedExts,
LoadInst &LI, Instruction &Inst, bool HasPromoted);		LoadInst &LI, Instruction &Inst, bool HasPromoted);
bool tryToPromoteExts(TypePromotionTransaction &TPT,		bool tryToPromoteExts(TypePromotionTransaction &TPT,
const SmallVectorImpl<Instruction *> &Exts,		const SmallVectorImpl<Instruction *> &Exts,
SmallVectorImpl<Instruction *> &ProfitablyMovedExts,		SmallVectorImpl<Instruction *> &ProfitablyMovedExts,
unsigned CreatedInstsCost = 0);		unsigned CreatedInstsCost = 0);
bool mergeSExts(Function &F);		bool mergeSExts(Function &F);
bool splitLargeGEPOffsets();		bool splitLargeGEPOffsets();
bool optimizePhiType(PHINode Inst, SmallPtrSetImpl<PHINode > &Visited,		bool optimizePhiType(PHINode Inst, SmallPtrSetImpl<PHINode > &Visited,
SmallPtrSetImpl<Instruction *> &DeletedInstrs);		SmallPtrSetImpl<Instruction *> &DeletedInstrs);
bool optimizePhiTypes(Function &F);		bool optimizePhiTypes(Function &F);
bool performAddressTypePromotion(		bool performAddressTypePromotion(
Instruction *&Inst, bool AllowPromotionWithoutCommonHeader,		Instruction *&Inst, bool AllowPromotionWithoutCommonHeader,
bool HasPromoted, TypePromotionTransaction &TPT,		bool HasPromoted, TypePromotionTransaction &TPT,
SmallVectorImpl<Instruction *> &SpeculativelyMovedExts);		SmallVectorImpl<Instruction *> &SpeculativelyMovedExts);
bool splitBranchCondition(Function &F, bool &ModifiedDT);		bool splitBranchCondition(Function &F, ModifyDT &ModifiedDT);
bool simplifyOffsetableRelocate(GCStatepointInst &I);		bool simplifyOffsetableRelocate(GCStatepointInst &I);

bool tryToSinkFreeOperands(Instruction *I);		bool tryToSinkFreeOperands(Instruction *I);
bool replaceMathCmpWithIntrinsic(BinaryOperator BO, Value Arg0, Value *Arg1,		bool replaceMathCmpWithIntrinsic(BinaryOperator BO, Value Arg0, Value *Arg1,
CmpInst *Cmp, Intrinsic::ID IID);		CmpInst *Cmp, Intrinsic::ID IID);
bool optimizeCmp(CmpInst *Cmp, bool &ModifiedDT);		bool optimizeCmp(CmpInst *Cmp, ModifyDT &ModifiedDT);
bool combineToUSubWithOverflow(CmpInst *Cmp, bool &ModifiedDT);		bool combineToUSubWithOverflow(CmpInst *Cmp, ModifyDT &ModifiedDT);
bool combineToUAddWithOverflow(CmpInst *Cmp, bool &ModifiedDT);		bool combineToUAddWithOverflow(CmpInst *Cmp, ModifyDT &ModifiedDT);
void verifyBFIUpdates(Function &F);		void verifyBFIUpdates(Function &F);
};		};

} // end anonymous namespace		} // end anonymous namespace

char CodeGenPrepare::ID = 0;		char CodeGenPrepare::ID = 0;

INITIALIZE_PASS_BEGIN(CodeGenPrepare, DEBUG_TYPE,		INITIALIZE_PASS_BEGIN(CodeGenPrepare, DEBUG_TYPE,
Show All 14 Lines	if (skipFunction(F))
return false;		return false;

DL = &F.getParent()->getDataLayout();		DL = &F.getParent()->getDataLayout();

bool EverMadeChange = false;		bool EverMadeChange = false;
// Clear per function information.		// Clear per function information.
InsertedInsts.clear();		InsertedInsts.clear();
PromotedInsts.clear();		PromotedInsts.clear();
		FreshBBs.clear();

TM = &getAnalysis<TargetPassConfig>().getTM<TargetMachine>();		TM = &getAnalysis<TargetPassConfig>().getTM<TargetMachine>();
SubtargetInfo = TM->getSubtargetImpl(F);		SubtargetInfo = TM->getSubtargetImpl(F);
TLI = SubtargetInfo->getTargetLowering();		TLI = SubtargetInfo->getTargetLowering();
TRI = SubtargetInfo->getRegisterInfo();		TRI = SubtargetInfo->getRegisterInfo();
TLInfo = &getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);		TLInfo = &getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);
TTI = &getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);		TTI = &getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();		LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	bool CodeGenPrepare::runOnFunction(Function &F) {
// blocks, since there might be blocks that only contain @llvm.assume calls		// blocks, since there might be blocks that only contain @llvm.assume calls
// (plus arguments that we can get rid of).		// (plus arguments that we can get rid of).
EverMadeChange \|= eliminateAssumptions(F);		EverMadeChange \|= eliminateAssumptions(F);

// Eliminate blocks that contain only PHI nodes and an		// Eliminate blocks that contain only PHI nodes and an
// unconditional branch.		// unconditional branch.
EverMadeChange \|= eliminateMostlyEmptyBlocks(F);		EverMadeChange \|= eliminateMostlyEmptyBlocks(F);

bool ModifiedDT = false;		ModifyDT ModifiedDT = ModifyDT::NotModifyDT;
if (!DisableBranchOpts)		if (!DisableBranchOpts)
EverMadeChange \|= splitBranchCondition(F, ModifiedDT);		EverMadeChange \|= splitBranchCondition(F, ModifiedDT);

// Split some critical edges where one of the sources is an indirect branch,		// Split some critical edges where one of the sources is an indirect branch,
// to help generate sane code for PHIs involving such edges.		// to help generate sane code for PHIs involving such edges.
EverMadeChange \|=		EverMadeChange \|=
SplitIndirectBrCriticalEdges(F, /IgnoreBlocksWithoutPHI=/true);		SplitIndirectBrCriticalEdges(F, /IgnoreBlocksWithoutPHI=/true);

		// If we are optimzing huge function, we need to consider the build time.
		// Because the basic algorithm's complex is near O(N!).
		IsHugeFunc = F.size() > HugeFuncThresholdInCGPP;

bool MadeChange = true;		bool MadeChange = true;
		bool FuncIterated = false;
while (MadeChange) {		while (MadeChange) {
MadeChange = false;		MadeChange = false;
DT.reset();		DT.reset();

for (BasicBlock &BB : llvm::make_early_inc_range(F)) {		for (BasicBlock &BB : llvm::make_early_inc_range(F)) {
bool ModifiedDTOnIteration = false;		if (FuncIterated && !FreshBBs.contains(&BB))
		LuoYuankeUnsubmitted Done Reply Inline Actions As @nikic suggest, is it better to have worklist for the un-visited BB or changed BB? But it seems we may need to avoid double pushing BB. LuoYuanke: As @nikic suggest, is it better to have worklist for the un-visited BB or changed BB? But it…
		xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions FreshBBs is that worklist. For huge function FreshBBs will collected all changed BB. (All un-visited BB will be visited at first iteration. Then set FuncIterated = true ) xiangzhangllvm: FreshBBs is that worklist. For huge function FreshBBs will collected all changed BB. (All un…
MadeChange \|= optimizeBlock(BB, ModifiedDTOnIteration);		continue;

		ModifyDT ModifiedDTOnIteration = ModifyDT::NotModifyDT;
		nikicUnsubmitted Not Done Reply Inline Actions Before introducing limits, can we try to remove this case? That is, continue simplifying the rest of the blocks before we start a new iteration? I expect that the current implementation looks like this to avoid some iterator invalidation issues, but there's probably a way to avoid that with a bit more effort. nikic: Before introducing limits, can we try to remove this case? That is, continue simplifying the…
		xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions Thanks a lot for reviewing this patch! but if the CFG changed, we should re-start the iteration, or the iteration may have problem. xiangzhangllvm: Thanks a lot for reviewing this patch! but if the CFG changed, we should re-start the iteration…
		nikicUnsubmitted Not Done Reply Inline Actions This is what we currently do, but the current implementation has a lot of room for improvement. For example, combineToUAddWithOverflow() will set ModifiedDT even though it does not modify control flow at all! It only does it to avoid iterating over a dead instruction in the same block. This can be solved by either only restarting the iteration in the BB (rather than the whole function), or by passing in the iterator to optimizeInst() so it can be moved in such case, or similar. For cases that actually do modify the CFG, we could use DTU instead of full DT recomputation, and we could maybe switch to a BB worklist, so new blocks can be queued, and deleted blocks removed from the worklist. Nobody has bothered to do this yet, because it never came up as a problem. Now that it did, things should get fixed properly. nikic: This is what we currently do, but the current implementation has a lot of room for improvement.
		xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions and we could maybe switch to a BB worklist, so new blocks can be queued, and deleted blocks removed from the worklist. Oh, That is very good suggestion ! Let's first break the N! Algorithm complexity . For cases that actually do modify the CFG, we could use DTU instead of full DT recomputation, According to my observation, the DT recomputation didn't consume too much time. Anyway we can refine it if one day we encounter such case. Thank you so much! xiangzhangllvm: > and we could maybe switch to a BB worklist, so new blocks can be queued, and deleted blocks…
		bool Changed = optimizeBlock(BB, ModifiedDTOnIteration);

		MadeChange \|= Changed;
		if (IsHugeFunc) {
		// If the BB is updated, it may still has chance to be optimized.
		// This usually happen at sink optimization.
		// For example:
		//
		// bb0：
		// %and = and i32 %a, 4
		// %cmp = icmp eq i32 %and, 0
		//
		// If the %cmp sink to other BB, the %and will has chance to sink.
		if (Changed)
		FreshBBs.insert(&BB);
		LuoYuankeUnsubmitted Not Done Reply Inline Actions The sinked BB also changed and may have several sinked BBs. Do we record the all sinked BBs in FreshBBs? LuoYuanke: The sinked BB also changed and may have several sinked BBs. Do we record the all sinked BBs in…
		xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions Yes, all updated BB will be put into FreshBBs. Most of them are inserted in updated replaceAllUsesWith. For BB which no update, we just optimize them one time. xiangzhangllvm: Yes, all updated BB will be put into FreshBBs. Most of them are inserted in updated…
		else if (FuncIterated)
		FreshBBs.erase(&BB);

// Restart BB iteration if the dominator tree of the Function was changed		if (ModifiedDTOnIteration == ModifyDT::ModifyBBDT)
		LuoYuankeUnsubmitted Done Reply Inline Actions indent. LuoYuanke: indent.
		xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions good catch! xiangzhangllvm: good catch!
if (ModifiedDTOnIteration)		DT.reset();
		} else {
		// For small/normal functions, we restart BB iteration if the dominator
		// tree of the Function was changed.
		if (ModifiedDTOnIteration != ModifyDT::NotModifyDT)
break;		break;
}		}
		}
		// We have iterated all the BB in the (only work for huge) function.
		FuncIterated = IsHugeFunc;

if (EnableTypePromotionMerge && !ValToSExtendedUses.empty())		if (EnableTypePromotionMerge && !ValToSExtendedUses.empty())
MadeChange \|= mergeSExts(F);		MadeChange \|= mergeSExts(F);
if (!LargeOffsetGEPMap.empty())		if (!LargeOffsetGEPMap.empty())
MadeChange \|= splitLargeGEPOffsets();		MadeChange \|= splitLargeGEPOffsets();
MadeChange \|= optimizePhiTypes(F);		MadeChange \|= optimizePhiTypes(F);

if (MadeChange)		if (MadeChange)
eliminateFallThrough(F);		eliminateFallThrough(F);
▲ Show 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	for (auto &Block : Blocks) {
BranchInst *Term = dyn_cast<BranchInst>(SinglePred->getTerminator());		BranchInst *Term = dyn_cast<BranchInst>(SinglePred->getTerminator());
if (Term && !Term->isConditional()) {		if (Term && !Term->isConditional()) {
Changed = true;		Changed = true;
LLVM_DEBUG(dbgs() << "To merge:\n" << *BB << "\n\n\n");		LLVM_DEBUG(dbgs() << "To merge:\n" << *BB << "\n\n\n");

// Merge BB into SinglePred and delete it.		// Merge BB into SinglePred and delete it.
MergeBlockIntoPredecessor(BB);		MergeBlockIntoPredecessor(BB);
Preds.insert(SinglePred);		Preds.insert(SinglePred);

		if (IsHugeFunc) {
		// Update FreshBBs to optimize the merged BB.
		FreshBBs.insert(SinglePred);
		FreshBBs.erase(BB);
		}
}		}
}		}

// (Repeatedly) merging blocks into their predecessors can create redundant		// (Repeatedly) merging blocks into their predecessors can create redundant
// debug intrinsics.		// debug intrinsics.
for (const auto &Pred : Preds)		for (const auto &Pred : Preds)
if (auto *BB = cast_or_null<BasicBlock>(Pred))		if (auto *BB = cast_or_null<BasicBlock>(Pred))
RemoveRedundantDbgInstrs(BB);		RemoveRedundantDbgInstrs(BB);
▲ Show 20 Lines • Show All 218 Lines • ▼ Show 20 Lines	if (BBPreds.count(Pred)) { // Common predecessor?
return false;		return false;
}		}
}		}
}		}

return true;		return true;
}		}

		/// Replace all old uses with new ones, and push the updated BBs into FreshBBs.
		static void replaceAllUsesWith(Value Old, Value New,
		SmallSet<BasicBlock *, 32> &FreshBBs,
		bool IsHuge) {
		auto *OldI = dyn_cast<Instruction>(Old);
		LuoYuankeUnsubmitted Done Reply Inline Actions OldI (old instruction)? LuoYuanke: OldI (old instruction)?
		xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions let me refine it, thanks. xiangzhangllvm: let me refine it, thanks.
		if (OldI) {
		for (Value::user_iterator UI = OldI->user_begin(), E = OldI->user_end();
		UI != E; ++UI) {
		Instruction User = cast<Instruction>(UI);
		xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions Here hurt the performance, because It can't make sure the BBs optimization order. Let me update here. xiangzhangllvm: Here hurt the performance, because It can't make sure the BBs optimization order. Let me update…
		if (IsHuge)
		FreshBBs.insert(User->getParent());
		}
		}
		Old->replaceAllUsesWith(New);
		}

/// Eliminate a basic block that has only phi's and an unconditional branch in		/// Eliminate a basic block that has only phi's and an unconditional branch in
/// it.		/// it.
void CodeGenPrepare::eliminateMostlyEmptyBlock(BasicBlock *BB) {		void CodeGenPrepare::eliminateMostlyEmptyBlock(BasicBlock *BB) {
BranchInst *BI = cast<BranchInst>(BB->getTerminator());		BranchInst *BI = cast<BranchInst>(BB->getTerminator());
BasicBlock *DestBB = BI->getSuccessor(0);		BasicBlock *DestBB = BI->getSuccessor(0);

LLVM_DEBUG(dbgs() << "MERGING MOSTLY EMPTY BLOCKS - BEFORE:\n"		LLVM_DEBUG(dbgs() << "MERGING MOSTLY EMPTY BLOCKS - BEFORE:\n"
<< BB << DestBB);		<< BB << DestBB);

// If the destination block has a single pred, then this is a trivial edge,		// If the destination block has a single pred, then this is a trivial edge,
// just collapse it.		// just collapse it.
if (BasicBlock *SinglePred = DestBB->getSinglePredecessor()) {		if (BasicBlock *SinglePred = DestBB->getSinglePredecessor()) {
if (SinglePred != DestBB) {		if (SinglePred != DestBB) {
assert(SinglePred == BB &&		assert(SinglePred == BB &&
"Single predecessor not the same as predecessor");		"Single predecessor not the same as predecessor");
// Merge DestBB into SinglePred/BB and delete it.		// Merge DestBB into SinglePred/BB and delete it.
MergeBlockIntoPredecessor(DestBB);		MergeBlockIntoPredecessor(DestBB);
		LuoYuankeUnsubmitted Done Reply Inline Actions How can we remider devloper that FreshBB should be updated for their new code? LuoYuanke: How can we remider devloper that FreshBB should be updated for their new code?
		xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions Let me add requirement in FreshBB's comment. xiangzhangllvm: Let me add requirement in FreshBB's comment.
// Note: BB(=SinglePred) will not be deleted on this path.		// Note: BB(=SinglePred) will not be deleted on this path.
// DestBB(=its single successor) is the one that was deleted.		// DestBB(=its single successor) is the one that was deleted.
LLVM_DEBUG(dbgs() << "AFTER:\n" << *SinglePred << "\n\n\n");		LLVM_DEBUG(dbgs() << "AFTER:\n" << *SinglePred << "\n\n\n");

		if (IsHugeFunc) {
		// Update FreshBBs to optimize the merged BB.
		FreshBBs.insert(SinglePred);
		FreshBBs.erase(DestBB);
		}
return;		return;
}		}
}		}

// Otherwise, we have multiple predecessors of BB. Update the PHIs in DestBB		// Otherwise, we have multiple predecessors of BB. Update the PHIs in DestBB
// to handle the new incoming edges it is about to have.		// to handle the new incoming edges it is about to have.
for (PHINode &PN : DestBB->phis()) {		for (PHINode &PN : DestBB->phis()) {
// Remove the incoming value for BB, and remember it.		// Remove the incoming value for BB, and remember it.
▲ Show 20 Lines • Show All 451 Lines • ▼ Show 20 Lines	for (Instruction &Iter : *Cmp->getParent()) {
}		}
}		}
assert(InsertPt != nullptr && "Parent block did not contain cmp or binop");		assert(InsertPt != nullptr && "Parent block did not contain cmp or binop");

IRBuilder<> Builder(InsertPt);		IRBuilder<> Builder(InsertPt);
Value *MathOV = Builder.CreateBinaryIntrinsic(IID, Arg0, Arg1);		Value *MathOV = Builder.CreateBinaryIntrinsic(IID, Arg0, Arg1);
if (BO->getOpcode() != Instruction::Xor) {		if (BO->getOpcode() != Instruction::Xor) {
Value *Math = Builder.CreateExtractValue(MathOV, 0, "math");		Value *Math = Builder.CreateExtractValue(MathOV, 0, "math");
BO->replaceAllUsesWith(Math);		replaceAllUsesWith(BO, Math, FreshBBs, IsHugeFunc);
} else		} else
assert(BO->hasOneUse() &&		assert(BO->hasOneUse() &&
"Patterns with XOr should use the BO only in the compare");		"Patterns with XOr should use the BO only in the compare");
Value *OV = Builder.CreateExtractValue(MathOV, 1, "ov");		Value *OV = Builder.CreateExtractValue(MathOV, 1, "ov");
Cmp->replaceAllUsesWith(OV);		replaceAllUsesWith(Cmp, OV, FreshBBs, IsHugeFunc);
Cmp->eraseFromParent();		Cmp->eraseFromParent();
BO->eraseFromParent();		BO->eraseFromParent();
return true;		return true;
}		}

/// Match special-case patterns that check for unsigned add overflow.		/// Match special-case patterns that check for unsigned add overflow.
static bool matchUAddWithOverflowConstantEdgeCases(CmpInst *Cmp,		static bool matchUAddWithOverflowConstantEdgeCases(CmpInst *Cmp,
BinaryOperator *&Add) {		BinaryOperator *&Add) {
Show All 21 Lines	if (match(U, m_Add(m_Specific(A), m_Specific(B)))) {
return true;		return true;
}		}
}		}
return false;		return false;
}		}

/// Try to combine the compare into a call to the llvm.uadd.with.overflow		/// Try to combine the compare into a call to the llvm.uadd.with.overflow
/// intrinsic. Return true if any changes were made.		/// intrinsic. Return true if any changes were made.
bool CodeGenPrepare::combineToUAddWithOverflow(CmpInst *Cmp, bool &ModifiedDT) {		bool CodeGenPrepare::combineToUAddWithOverflow(CmpInst *Cmp,
		ModifyDT &ModifiedDT) {
Value A, B;		Value A, B;
BinaryOperator *Add;		BinaryOperator *Add;
if (!match(Cmp, m_UAddWithOverflow(m_Value(A), m_Value(B), m_BinOp(Add)))) {		if (!match(Cmp, m_UAddWithOverflow(m_Value(A), m_Value(B), m_BinOp(Add)))) {
if (!matchUAddWithOverflowConstantEdgeCases(Cmp, Add))		if (!matchUAddWithOverflowConstantEdgeCases(Cmp, Add))
return false;		return false;
// Set A and B in case we match matchUAddWithOverflowConstantEdgeCases.		// Set A and B in case we match matchUAddWithOverflowConstantEdgeCases.
A = Add->getOperand(0);		A = Add->getOperand(0);
B = Add->getOperand(1);		B = Add->getOperand(1);
Show All 10 Lines	bool CodeGenPrepare::combineToUAddWithOverflow(CmpInst *Cmp,
if (Add->getParent() != Cmp->getParent() && !Add->hasOneUse())		if (Add->getParent() != Cmp->getParent() && !Add->hasOneUse())
return false;		return false;

if (!replaceMathCmpWithIntrinsic(Add, A, B, Cmp,		if (!replaceMathCmpWithIntrinsic(Add, A, B, Cmp,
Intrinsic::uadd_with_overflow))		Intrinsic::uadd_with_overflow))
return false;		return false;

// Reset callers - do not crash by iterating over a dead instruction.		// Reset callers - do not crash by iterating over a dead instruction.
ModifiedDT = true;		ModifiedDT = ModifyDT::ModifyInstDT;
return true;		return true;
}		}

bool CodeGenPrepare::combineToUSubWithOverflow(CmpInst *Cmp, bool &ModifiedDT) {		bool CodeGenPrepare::combineToUSubWithOverflow(CmpInst *Cmp,
		ModifyDT &ModifiedDT) {
// We are not expecting non-canonical/degenerate code. Just bail out.		// We are not expecting non-canonical/degenerate code. Just bail out.
Value A = Cmp->getOperand(0), B = Cmp->getOperand(1);		Value A = Cmp->getOperand(0), B = Cmp->getOperand(1);
if (isa<Constant>(A) && isa<Constant>(B))		if (isa<Constant>(A) && isa<Constant>(B))
return false;		return false;

// Convert (A u> B) to (A u< B) to simplify pattern matching.		// Convert (A u> B) to (A u< B) to simplify pattern matching.
ICmpInst::Predicate Pred = Cmp->getPredicate();		ICmpInst::Predicate Pred = Cmp->getPredicate();
if (Pred == ICmpInst::ICMP_UGT) {		if (Pred == ICmpInst::ICMP_UGT) {
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	if (!TLI->shouldFormOverflowOp(ISD::USUBO,
Sub->hasNUsesOrMore(2)))		Sub->hasNUsesOrMore(2)))
return false;		return false;

if (!replaceMathCmpWithIntrinsic(Sub, Sub->getOperand(0), Sub->getOperand(1),		if (!replaceMathCmpWithIntrinsic(Sub, Sub->getOperand(0), Sub->getOperand(1),
Cmp, Intrinsic::usub_with_overflow))		Cmp, Intrinsic::usub_with_overflow))
return false;		return false;

// Reset callers - do not crash by iterating over a dead instruction.		// Reset callers - do not crash by iterating over a dead instruction.
ModifiedDT = true;		ModifiedDT = ModifyDT::ModifyInstDT;
return true;		return true;
}		}

/// Sink the given CmpInst into user blocks to reduce the number of virtual		/// Sink the given CmpInst into user blocks to reduce the number of virtual
/// registers that must be created and coalesced. This is a clear win except on		/// registers that must be created and coalesced. This is a clear win except on
/// targets with multiple condition code registers (PowerPC), where it might		/// targets with multiple condition code registers (PowerPC), where it might
/// lose; some adjustment may be wanted there.		/// lose; some adjustment may be wanted there.
///		///
▲ Show 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	if (auto *SI = dyn_cast<SelectInst>(U)) {
continue;		continue;
}		}
llvm_unreachable("Must be a branch or a select");		llvm_unreachable("Must be a branch or a select");
}		}
Cmp->setPredicate(CmpInst::getSwappedPredicate(DomPred));		Cmp->setPredicate(CmpInst::getSwappedPredicate(DomPred));
return true;		return true;
}		}

bool CodeGenPrepare::optimizeCmp(CmpInst *Cmp, bool &ModifiedDT) {		bool CodeGenPrepare::optimizeCmp(CmpInst *Cmp, ModifyDT &ModifiedDT) {
if (sinkCmpExpression(Cmp, *TLI))		if (sinkCmpExpression(Cmp, *TLI))
return true;		return true;

if (combineToUAddWithOverflow(Cmp, ModifiedDT))		if (combineToUAddWithOverflow(Cmp, ModifiedDT))
return true;		return true;

if (combineToUSubWithOverflow(Cmp, ModifiedDT))		if (combineToUSubWithOverflow(Cmp, ModifiedDT))
return true;		return true;
▲ Show 20 Lines • Show All 288 Lines • ▼ Show 20 Lines
/// %z = call i64 @llvm.cttz.i64(i64 %A, i1 true)		/// %z = call i64 @llvm.cttz.i64(i64 %A, i1 true)
/// br label %cond.end		/// br label %cond.end
/// cond.end:		/// cond.end:
/// %ctz = phi i64 [ 64, %entry ], [ %z, %cond.false ]		/// %ctz = phi i64 [ 64, %entry ], [ %z, %cond.false ]
///		///
/// If the transform is performed, return true and set ModifiedDT to true.		/// If the transform is performed, return true and set ModifiedDT to true.
static bool despeculateCountZeros(IntrinsicInst *CountZeros,		static bool despeculateCountZeros(IntrinsicInst *CountZeros,
const TargetLowering *TLI,		const TargetLowering *TLI,
const DataLayout *DL, bool &ModifiedDT) {		const DataLayout *DL, ModifyDT &ModifiedDT,
		SmallSet<BasicBlock *, 32> &FreshBBs,
		bool IsHugeFunc) {
// If a zero input is undefined, it doesn't make sense to despeculate that.		// If a zero input is undefined, it doesn't make sense to despeculate that.
if (match(CountZeros->getOperand(1), m_One()))		if (match(CountZeros->getOperand(1), m_One()))
return false;		return false;

// If it's cheap to speculate, there's nothing to do.		// If it's cheap to speculate, there's nothing to do.
Type *Ty = CountZeros->getType();		Type *Ty = CountZeros->getType();
auto IntrinsicID = CountZeros->getIntrinsicID();		auto IntrinsicID = CountZeros->getIntrinsicID();
if ((IntrinsicID == Intrinsic::cttz && TLI->isCheapToSpeculateCttz(Ty)) \|\|		if ((IntrinsicID == Intrinsic::cttz && TLI->isCheapToSpeculateCttz(Ty)) \|\|
(IntrinsicID == Intrinsic::ctlz && TLI->isCheapToSpeculateCtlz(Ty)))		(IntrinsicID == Intrinsic::ctlz && TLI->isCheapToSpeculateCtlz(Ty)))
return false;		return false;

// Only handle legal scalar cases. Anything else requires too much work.		// Only handle legal scalar cases. Anything else requires too much work.
unsigned SizeInBits = Ty->getScalarSizeInBits();		unsigned SizeInBits = Ty->getScalarSizeInBits();
if (Ty->isVectorTy() \|\| SizeInBits > DL->getLargestLegalIntTypeSizeInBits())		if (Ty->isVectorTy() \|\| SizeInBits > DL->getLargestLegalIntTypeSizeInBits())
return false;		return false;

// Bail if the value is never zero.		// Bail if the value is never zero.
Use &Op = CountZeros->getOperandUse(0);		Use &Op = CountZeros->getOperandUse(0);
if (isKnownNonZero(Op, *DL))		if (isKnownNonZero(Op, *DL))
return false;		return false;

// The intrinsic will be sunk behind a compare against zero and branch.		// The intrinsic will be sunk behind a compare against zero and branch.
BasicBlock *StartBlock = CountZeros->getParent();		BasicBlock *StartBlock = CountZeros->getParent();
BasicBlock *CallBlock = StartBlock->splitBasicBlock(CountZeros, "cond.false");		BasicBlock *CallBlock = StartBlock->splitBasicBlock(CountZeros, "cond.false");
		if (IsHugeFunc)
		FreshBBs.insert(CallBlock);
		LuoYuankeUnsubmitted Not Done Reply Inline Actions Ditto. LuoYuanke: Ditto.

// Create another block after the count zero intrinsic. A PHI will be added		// Create another block after the count zero intrinsic. A PHI will be added
// in this block to select the result of the intrinsic or the bit-width		// in this block to select the result of the intrinsic or the bit-width
// constant if the input to the intrinsic is zero.		// constant if the input to the intrinsic is zero.
BasicBlock::iterator SplitPt = ++(BasicBlock::iterator(CountZeros));		BasicBlock::iterator SplitPt = ++(BasicBlock::iterator(CountZeros));
BasicBlock *EndBlock = CallBlock->splitBasicBlock(SplitPt, "cond.end");		BasicBlock *EndBlock = CallBlock->splitBasicBlock(SplitPt, "cond.end");
		if (IsHugeFunc)
		FreshBBs.insert(EndBlock);

// Set up a builder to create a compare, conditional branch, and PHI.		// Set up a builder to create a compare, conditional branch, and PHI.
IRBuilder<> Builder(CountZeros->getContext());		IRBuilder<> Builder(CountZeros->getContext());
Builder.SetInsertPoint(StartBlock->getTerminator());		Builder.SetInsertPoint(StartBlock->getTerminator());
Builder.SetCurrentDebugLocation(CountZeros->getDebugLoc());		Builder.SetCurrentDebugLocation(CountZeros->getDebugLoc());

// Replace the unconditional branch that was created by the first split with		// Replace the unconditional branch that was created by the first split with
// a compare against zero and a conditional branch.		// a compare against zero and a conditional branch.
Value *Zero = Constant::getNullValue(Ty);		Value *Zero = Constant::getNullValue(Ty);
// Avoid introducing branch on poison. This also replaces the ctz operand.		// Avoid introducing branch on poison. This also replaces the ctz operand.
if (!isGuaranteedNotToBeUndefOrPoison(Op))		if (!isGuaranteedNotToBeUndefOrPoison(Op))
Op = Builder.CreateFreeze(Op, Op->getName() + ".fr");		Op = Builder.CreateFreeze(Op, Op->getName() + ".fr");
Value *Cmp = Builder.CreateICmpEQ(Op, Zero, "cmpz");		Value *Cmp = Builder.CreateICmpEQ(Op, Zero, "cmpz");
Builder.CreateCondBr(Cmp, EndBlock, CallBlock);		Builder.CreateCondBr(Cmp, EndBlock, CallBlock);
StartBlock->getTerminator()->eraseFromParent();		StartBlock->getTerminator()->eraseFromParent();

// Create a PHI in the end block to select either the output of the intrinsic		// Create a PHI in the end block to select either the output of the intrinsic
// or the bit width of the operand.		// or the bit width of the operand.
Builder.SetInsertPoint(&EndBlock->front());		Builder.SetInsertPoint(&EndBlock->front());
PHINode *PN = Builder.CreatePHI(Ty, 2, "ctz");		PHINode *PN = Builder.CreatePHI(Ty, 2, "ctz");
CountZeros->replaceAllUsesWith(PN);		replaceAllUsesWith(CountZeros, PN, FreshBBs, IsHugeFunc);
Value *BitWidth = Builder.getInt(APInt(SizeInBits, SizeInBits));		Value *BitWidth = Builder.getInt(APInt(SizeInBits, SizeInBits));
PN->addIncoming(BitWidth, StartBlock);		PN->addIncoming(BitWidth, StartBlock);
PN->addIncoming(CountZeros, CallBlock);		PN->addIncoming(CountZeros, CallBlock);

// We are explicitly handling the zero case, so we can set the intrinsic's		// We are explicitly handling the zero case, so we can set the intrinsic's
// undefined zero argument to 'true'. This will also prevent reprocessing the		// undefined zero argument to 'true'. This will also prevent reprocessing the
// intrinsic; we only despeculate when a zero input is defined.		// intrinsic; we only despeculate when a zero input is defined.
CountZeros->setArgOperand(1, Builder.getTrue());		CountZeros->setArgOperand(1, Builder.getTrue());
ModifiedDT = true;		ModifiedDT = ModifyDT::ModifyBBDT;
return true;		return true;
}		}

bool CodeGenPrepare::optimizeCallInst(CallInst *CI, bool &ModifiedDT) {		bool CodeGenPrepare::optimizeCallInst(CallInst *CI, ModifyDT &ModifiedDT) {
BasicBlock *BB = CI->getParent();		BasicBlock *BB = CI->getParent();

// Lower inline assembly if we can.		// Lower inline assembly if we can.
// If we found an inline asm expession, and if the target knows how to		// If we found an inline asm expession, and if the target knows how to
// lower it to normal LLVM code, do so now.		// lower it to normal LLVM code, do so now.
if (CI->isInlineAsm()) {		if (CI->isInlineAsm()) {
if (TLI->ExpandInlineAsm(CI)) {		if (TLI->ExpandInlineAsm(CI)) {
// Avoid invalidating the iterator.		// Avoid invalidating the iterator.
▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	case Intrinsic::strip_invariant_group: {
// Merge entries in LargeOffsetGEPMap to reflect the RAUW.		// Merge entries in LargeOffsetGEPMap to reflect the RAUW.
// Make sure not to have to deal with iterator invalidation		// Make sure not to have to deal with iterator invalidation
// after possibly adding ArgVal to LargeOffsetGEPMap.		// after possibly adding ArgVal to LargeOffsetGEPMap.
auto GEPs = std::move(it->second);		auto GEPs = std::move(it->second);
LargeOffsetGEPMap[ArgVal].append(GEPs.begin(), GEPs.end());		LargeOffsetGEPMap[ArgVal].append(GEPs.begin(), GEPs.end());
LargeOffsetGEPMap.erase(II);		LargeOffsetGEPMap.erase(II);
}		}

II->replaceAllUsesWith(ArgVal);		replaceAllUsesWith(II, ArgVal, FreshBBs, IsHugeFunc);
II->eraseFromParent();		II->eraseFromParent();
return true;		return true;
}		}
case Intrinsic::cttz:		case Intrinsic::cttz:
case Intrinsic::ctlz:		case Intrinsic::ctlz:
// If counting zeros is expensive, try to avoid it.		// If counting zeros is expensive, try to avoid it.
return despeculateCountZeros(II, TLI, DL, ModifiedDT);		return despeculateCountZeros(II, TLI, DL, ModifiedDT, FreshBBs,
		IsHugeFunc);
case Intrinsic::fshl:		case Intrinsic::fshl:
case Intrinsic::fshr:		case Intrinsic::fshr:
return optimizeFunnelShift(II);		return optimizeFunnelShift(II);
case Intrinsic::dbg_value:		case Intrinsic::dbg_value:
return fixupDbgValue(II);		return fixupDbgValue(II);
case Intrinsic::vscale: {		case Intrinsic::vscale: {
// If datalayout has no special restrictions on vector data layout,		// If datalayout has no special restrictions on vector data layout,
// replace `llvm.vscale` by an equivalent constant expression		// replace `llvm.vscale` by an equivalent constant expression
// to benefit from cheap constant propagation.		// to benefit from cheap constant propagation.
Type *ScalableVectorTy =		Type *ScalableVectorTy =
VectorType::get(Type::getInt8Ty(II->getContext()), 1, true);		VectorType::get(Type::getInt8Ty(II->getContext()), 1, true);
if (DL->getTypeAllocSize(ScalableVectorTy).getKnownMinSize() == 8) {		if (DL->getTypeAllocSize(ScalableVectorTy).getKnownMinSize() == 8) {
auto *Null = Constant::getNullValue(ScalableVectorTy->getPointerTo());		auto *Null = Constant::getNullValue(ScalableVectorTy->getPointerTo());
auto *One = ConstantInt::getSigned(II->getType(), 1);		auto *One = ConstantInt::getSigned(II->getType(), 1);
auto *CGep =		auto *CGep =
ConstantExpr::getGetElementPtr(ScalableVectorTy, Null, One);		ConstantExpr::getGetElementPtr(ScalableVectorTy, Null, One);
II->replaceAllUsesWith(ConstantExpr::getPtrToInt(CGep, II->getType()));		replaceAllUsesWith(II, ConstantExpr::getPtrToInt(CGep, II->getType()),
		FreshBBs, IsHugeFunc);
II->eraseFromParent();		II->eraseFromParent();
return true;		return true;
}		}
break;		break;
}		}
case Intrinsic::masked_gather:		case Intrinsic::masked_gather:
return optimizeGatherScatterInst(II, II->getArgOperand(0));		return optimizeGatherScatterInst(II, II->getArgOperand(0));
case Intrinsic::masked_scatter:		case Intrinsic::masked_scatter:
Show All 17 Lines	bool CodeGenPrepare::optimizeCallInst(CallInst *CI, ModifyDT &ModifiedDT) {

// Lower all default uses of _chk calls. This is very similar		// Lower all default uses of _chk calls. This is very similar
// to what InstCombineCalls does, but here we are only lowering calls		// to what InstCombineCalls does, but here we are only lowering calls
// to fortified library functions (e.g. __memcpy_chk) that have the default		// to fortified library functions (e.g. __memcpy_chk) that have the default
// "don't know" as the objectsize. Anything else should be left alone.		// "don't know" as the objectsize. Anything else should be left alone.
FortifiedLibCallSimplifier Simplifier(TLInfo, true);		FortifiedLibCallSimplifier Simplifier(TLInfo, true);
IRBuilder<> Builder(CI);		IRBuilder<> Builder(CI);
if (Value *V = Simplifier.optimizeCall(CI, Builder)) {		if (Value *V = Simplifier.optimizeCall(CI, Builder)) {
CI->replaceAllUsesWith(V);		replaceAllUsesWith(CI, V, FreshBBs, IsHugeFunc);
CI->eraseFromParent();		CI->eraseFromParent();
return true;		return true;
}		}

return false;		return false;
}		}

/// Look for opportunities to duplicate return instructions to the predecessor		/// Look for opportunities to duplicate return instructions to the predecessor
Show All 22 Lines
/// bb1:		/// bb1:
/// %tmp1 = tail call i32 @f1()		/// %tmp1 = tail call i32 @f1()
/// ret i32 %tmp1		/// ret i32 %tmp1
/// bb2:		/// bb2:
/// %tmp2 = tail call i32 @f2()		/// %tmp2 = tail call i32 @f2()
/// ret i32 %tmp2		/// ret i32 %tmp2
/// @endcode		/// @endcode
bool CodeGenPrepare::dupRetToEnableTailCallOpts(BasicBlock *BB,		bool CodeGenPrepare::dupRetToEnableTailCallOpts(BasicBlock *BB,
bool &ModifiedDT) {		ModifyDT &ModifiedDT) {
		if (!BB->getTerminator())
		LuoYuankeUnsubmitted Not Done Reply Inline Actions Why need this change? Isn't it covered by line 2431? LuoYuanke: Why need this change? Isn't it covered by line 2431?
		xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions The BB passed in can be no Terminator, so BB->getTerminator() may be nullptr. dyn_cast can not handle nullptr. xiangzhangllvm: The BB passed in can be no Terminator, so BB->getTerminator() may be nullptr. dyn_cast can not…
		return false;

ReturnInst *RetI = dyn_cast<ReturnInst>(BB->getTerminator());		ReturnInst *RetI = dyn_cast<ReturnInst>(BB->getTerminator());
if (!RetI)		if (!RetI)
return false;		return false;

PHINode *PN = nullptr;		PHINode *PN = nullptr;
ExtractValueInst *EVI = nullptr;		ExtractValueInst *EVI = nullptr;
BitCastInst *BCI = nullptr;		BitCastInst *BCI = nullptr;
Value *V = RetI->getReturnValue();		Value *V = RetI->getReturnValue();
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	for (auto const &TailCallBB : TailCallBBs) {

// Duplicate the return into TailCallBB.		// Duplicate the return into TailCallBB.
(void)FoldReturnIntoUncondBranch(RetI, BB, TailCallBB);		(void)FoldReturnIntoUncondBranch(RetI, BB, TailCallBB);
assert(!VerifyBFIUpdates \|\|		assert(!VerifyBFIUpdates \|\|
BFI->getBlockFreq(BB) >= BFI->getBlockFreq(TailCallBB));		BFI->getBlockFreq(BB) >= BFI->getBlockFreq(TailCallBB));
BFI->setBlockFreq(		BFI->setBlockFreq(
BB,		BB,
(BFI->getBlockFreq(BB) - BFI->getBlockFreq(TailCallBB)).getFrequency());		(BFI->getBlockFreq(BB) - BFI->getBlockFreq(TailCallBB)).getFrequency());
ModifiedDT = Changed = true;		ModifiedDT = ModifyDT::ModifyBBDT;
		Changed = true;
++NumRetsDup;		++NumRetsDup;
}		}

// If we eliminated all predecessors of the block, delete the block now.		// If we eliminated all predecessors of the block, delete the block now.
if (Changed && !BB->hasAddressTaken() && pred_empty(BB))		if (Changed && !BB->hasAddressTaken() && pred_empty(BB))
BB->eraseFromParent();		BB->eraseFromParent();

return Changed;		return Changed;
▲ Show 20 Lines • Show All 3,385 Lines • ▼ Show 20 Lines	for (auto &Entry : ValToSExtendedUses) {
SExts CurPts;		SExts CurPts;
for (Instruction *Inst : Insts) {		for (Instruction *Inst : Insts) {
if (RemovedInsts.count(Inst) \|\| !isa<SExtInst>(Inst) \|\|		if (RemovedInsts.count(Inst) \|\| !isa<SExtInst>(Inst) \|\|
Inst->getOperand(0) != Entry.first)		Inst->getOperand(0) != Entry.first)
continue;		continue;
bool inserted = false;		bool inserted = false;
for (auto &Pt : CurPts) {		for (auto &Pt : CurPts) {
if (getDT(F).dominates(Inst, Pt)) {		if (getDT(F).dominates(Inst, Pt)) {
Pt->replaceAllUsesWith(Inst);		replaceAllUsesWith(Pt, Inst, FreshBBs, IsHugeFunc);
RemovedInsts.insert(Pt);		RemovedInsts.insert(Pt);
Pt->removeFromParent();		Pt->removeFromParent();
Pt = Inst;		Pt = Inst;
inserted = true;		inserted = true;
Changed = true;		Changed = true;
break;		break;
}		}
if (!getDT(F).dominates(Pt, Inst))		if (!getDT(F).dominates(Pt, Inst))
// Give up if we need to merge in a common dominator as the		// Give up if we need to merge in a common dominator as the
// experiments show it is not profitable.		// experiments show it is not profitable.
continue;		continue;
Inst->replaceAllUsesWith(Pt);		replaceAllUsesWith(Inst, Pt, FreshBBs, IsHugeFunc);
RemovedInsts.insert(Inst);		RemovedInsts.insert(Inst);
Inst->removeFromParent();		Inst->removeFromParent();
inserted = true;		inserted = true;
Changed = true;		Changed = true;
break;		break;
}		}
if (!inserted)		if (!inserted)
CurPts.push_back(Inst);		CurPts.push_back(Inst);
▲ Show 20 Lines • Show All 135 Lines • ▼ Show 20 Lines	while (LargeOffsetGEP != LargeOffsetGEPs.end()) {
} else {		} else {
// Calculate the new offset for the new GEP.		// Calculate the new offset for the new GEP.
Value *Index = ConstantInt::get(IntPtrTy, Offset - BaseOffset);		Value *Index = ConstantInt::get(IntPtrTy, Offset - BaseOffset);
NewGEP = Builder.CreateGEP(I8Ty, NewBaseGEP, Index);		NewGEP = Builder.CreateGEP(I8Ty, NewBaseGEP, Index);

if (GEP->getType() != I8PtrTy)		if (GEP->getType() != I8PtrTy)
NewGEP = Builder.CreatePointerCast(NewGEP, GEP->getType());		NewGEP = Builder.CreatePointerCast(NewGEP, GEP->getType());
}		}
GEP->replaceAllUsesWith(NewGEP);		replaceAllUsesWith(GEP, NewGEP, FreshBBs, IsHugeFunc);
LargeOffsetGEPID.erase(GEP);		LargeOffsetGEPID.erase(GEP);
LargeOffsetGEP = LargeOffsetGEPs.erase(LargeOffsetGEP);		LargeOffsetGEP = LargeOffsetGEPs.erase(LargeOffsetGEP);
GEP->eraseFromParent();		GEP->eraseFromParent();
Changed = true;		Changed = true;
}		}
}		}
return Changed;		return Changed;
}		}
▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines	for (int i = 0, e = Phi->getNumIncomingValues(); i < e; i++)
NewPhi->addIncoming(ValMap[Phi->getIncomingValue(i)],		NewPhi->addIncoming(ValMap[Phi->getIncomingValue(i)],
Phi->getIncomingBlock(i));		Phi->getIncomingBlock(i));
Visited.insert(NewPhi);		Visited.insert(NewPhi);
}		}
// And finally pipe up the stores and bitcasts		// And finally pipe up the stores and bitcasts
for (Instruction *U : Uses) {		for (Instruction *U : Uses) {
if (isa<BitCastInst>(U)) {		if (isa<BitCastInst>(U)) {
DeletedInstrs.insert(U);		DeletedInstrs.insert(U);
U->replaceAllUsesWith(ValMap[U->getOperand(0)]);		replaceAllUsesWith(U, ValMap[U->getOperand(0)], FreshBBs, IsHugeFunc);
} else {		} else {
U->setOperand(0,		U->setOperand(0,
new BitCastInst(ValMap[U->getOperand(0)], PhiTy, "bc", U));		new BitCastInst(ValMap[U->getOperand(0)], PhiTy, "bc", U));
}		}
}		}

// Save the removed phis to be deleted later.		// Save the removed phis to be deleted later.
for (PHINode *Phi : PhiNodes)		for (PHINode *Phi : PhiNodes)
Show All 11 Lines	bool CodeGenPrepare::optimizePhiTypes(Function &F) {

// Attempt to optimize all the phis in the functions to the correct type.		// Attempt to optimize all the phis in the functions to the correct type.
for (auto &BB : F)		for (auto &BB : F)
for (auto &Phi : BB.phis())		for (auto &Phi : BB.phis())
Changed \|= optimizePhiType(&Phi, Visited, DeletedInstrs);		Changed \|= optimizePhiType(&Phi, Visited, DeletedInstrs);

// Remove any old phi's that have been converted.		// Remove any old phi's that have been converted.
for (auto *I : DeletedInstrs) {		for (auto *I : DeletedInstrs) {
I->replaceAllUsesWith(PoisonValue::get(I->getType()));		replaceAllUsesWith(I, PoisonValue::get(I->getType()), FreshBBs, IsHugeFunc);
I->eraseFromParent();		I->eraseFromParent();
}		}

return Changed;		return Changed;
}		}

/// Return true, if an ext(load) can be formed from an extension in		/// Return true, if an ext(load) can be formed from an extension in
/// \p MovedExts.		/// \p MovedExts.
▲ Show 20 Lines • Show All 398 Lines • ▼ Show 20 Lines	bool CodeGenPrepare::optimizeLoadExt(LoadInst *Load) {
auto *NewAnd = cast<Instruction>(		auto *NewAnd = cast<Instruction>(
Builder.CreateAnd(Load, ConstantInt::get(Ctx, DemandBits)));		Builder.CreateAnd(Load, ConstantInt::get(Ctx, DemandBits)));
// Mark this instruction as "inserted by CGP", so that other		// Mark this instruction as "inserted by CGP", so that other
// optimizations don't touch it.		// optimizations don't touch it.
InsertedInsts.insert(NewAnd);		InsertedInsts.insert(NewAnd);

// Replace all uses of load with new and (except for the use of load in the		// Replace all uses of load with new and (except for the use of load in the
// new and itself).		// new and itself).
Load->replaceAllUsesWith(NewAnd);		replaceAllUsesWith(Load, NewAnd, FreshBBs, IsHugeFunc);
NewAnd->setOperand(0, Load);		NewAnd->setOperand(0, Load);

// Remove any and instructions that are now redundant.		// Remove any and instructions that are now redundant.
for (auto *And : AndsToMaybeRemove)		for (auto *And : AndsToMaybeRemove)
// Check that the and mask is the same as the one we decided to put on the		// Check that the and mask is the same as the one we decided to put on the
// new and.		// new and.
if (cast<ConstantInt>(And->getOperand(1))->getValue() == DemandBits) {		if (cast<ConstantInt>(And->getOperand(1))->getValue() == DemandBits) {
And->replaceAllUsesWith(NewAnd);		replaceAllUsesWith(And, NewAnd, FreshBBs, IsHugeFunc);
if (&*CurInstIterator == And)		if (&*CurInstIterator == And)
CurInstIterator = std::next(And->getIterator());		CurInstIterator = std::next(And->getIterator());
And->eraseFromParent();		And->eraseFromParent();
++NumAndUses;		++NumAndUses;
}		}

++NumAndsAdded;		++NumAndsAdded;
return true;		return true;
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	bool CodeGenPrepare::optimizeShiftInst(BinaryOperator *Shift) {
if (!isSplatValue(TVal) \|\| !isSplatValue(FVal))		if (!isSplatValue(TVal) \|\| !isSplatValue(FVal))
return false;		return false;

IRBuilder<> Builder(Shift);		IRBuilder<> Builder(Shift);
BinaryOperator::BinaryOps Opcode = Shift->getOpcode();		BinaryOperator::BinaryOps Opcode = Shift->getOpcode();
Value *NewTVal = Builder.CreateBinOp(Opcode, Shift->getOperand(0), TVal);		Value *NewTVal = Builder.CreateBinOp(Opcode, Shift->getOperand(0), TVal);
Value *NewFVal = Builder.CreateBinOp(Opcode, Shift->getOperand(0), FVal);		Value *NewFVal = Builder.CreateBinOp(Opcode, Shift->getOperand(0), FVal);
Value *NewSel = Builder.CreateSelect(Cond, NewTVal, NewFVal);		Value *NewSel = Builder.CreateSelect(Cond, NewTVal, NewFVal);
Shift->replaceAllUsesWith(NewSel);		replaceAllUsesWith(Shift, NewSel, FreshBBs, IsHugeFunc);
Shift->eraseFromParent();		Shift->eraseFromParent();
return true;		return true;
}		}

bool CodeGenPrepare::optimizeFunnelShift(IntrinsicInst *Fsh) {		bool CodeGenPrepare::optimizeFunnelShift(IntrinsicInst *Fsh) {
Intrinsic::ID Opcode = Fsh->getIntrinsicID();		Intrinsic::ID Opcode = Fsh->getIntrinsicID();
assert((Opcode == Intrinsic::fshl \|\| Opcode == Intrinsic::fshr) &&		assert((Opcode == Intrinsic::fshl \|\| Opcode == Intrinsic::fshr) &&
"Expected a funnel shift");		"Expected a funnel shift");
Show All 18 Lines	bool CodeGenPrepare::optimizeFunnelShift(IntrinsicInst *Fsh) {
if (!isSplatValue(TVal) \|\| !isSplatValue(FVal))		if (!isSplatValue(TVal) \|\| !isSplatValue(FVal))
return false;		return false;

IRBuilder<> Builder(Fsh);		IRBuilder<> Builder(Fsh);
Value X = Fsh->getOperand(0), Y = Fsh->getOperand(1);		Value X = Fsh->getOperand(0), Y = Fsh->getOperand(1);
Value *NewTVal = Builder.CreateIntrinsic(Opcode, Ty, {X, Y, TVal});		Value *NewTVal = Builder.CreateIntrinsic(Opcode, Ty, {X, Y, TVal});
Value *NewFVal = Builder.CreateIntrinsic(Opcode, Ty, {X, Y, FVal});		Value *NewFVal = Builder.CreateIntrinsic(Opcode, Ty, {X, Y, FVal});
Value *NewSel = Builder.CreateSelect(Cond, NewTVal, NewFVal);		Value *NewSel = Builder.CreateSelect(Cond, NewTVal, NewFVal);
Fsh->replaceAllUsesWith(NewSel);		replaceAllUsesWith(Fsh, NewSel, FreshBBs, IsHugeFunc);
Fsh->eraseFromParent();		Fsh->eraseFromParent();
return true;		return true;
}		}

/// If we have a SelectInst that will likely profit from branch prediction,		/// If we have a SelectInst that will likely profit from branch prediction,
/// turn it into a branch.		/// turn it into a branch.
bool CodeGenPrepare::optimizeSelectInst(SelectInst *SI) {		bool CodeGenPrepare::optimizeSelectInst(SelectInst *SI) {
if (DisableSelectToBranch)		if (DisableSelectToBranch)
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	bool CodeGenPrepare::optimizeSelectInst(SelectInst *SI) {
// block and its branch may be optimized away. In that case, one side of the		// block and its branch may be optimized away. In that case, one side of the
// first branch will point directly to select.end, and the corresponding PHI		// first branch will point directly to select.end, and the corresponding PHI
// predecessor block will be the start block.		// predecessor block will be the start block.

// First, we split the block containing the select into 2 blocks.		// First, we split the block containing the select into 2 blocks.
BasicBlock *StartBlock = SI->getParent();		BasicBlock *StartBlock = SI->getParent();
BasicBlock::iterator SplitPt = ++(BasicBlock::iterator(LastSI));		BasicBlock::iterator SplitPt = ++(BasicBlock::iterator(LastSI));
BasicBlock *EndBlock = StartBlock->splitBasicBlock(SplitPt, "select.end");		BasicBlock *EndBlock = StartBlock->splitBasicBlock(SplitPt, "select.end");
		if (IsHugeFunc)
		FreshBBs.insert(EndBlock);
BFI->setBlockFreq(EndBlock, BFI->getBlockFreq(StartBlock).getFrequency());		BFI->setBlockFreq(EndBlock, BFI->getBlockFreq(StartBlock).getFrequency());

// Delete the unconditional branch that was just created by the split.		// Delete the unconditional branch that was just created by the split.
StartBlock->getTerminator()->eraseFromParent();		StartBlock->getTerminator()->eraseFromParent();

// These are the new basic blocks for the conditional branch.		// These are the new basic blocks for the conditional branch.
// At least one will become an actual new basic block.		// At least one will become an actual new basic block.
BasicBlock *TrueBlock = nullptr;		BasicBlock *TrueBlock = nullptr;
BasicBlock *FalseBlock = nullptr;		BasicBlock *FalseBlock = nullptr;
BranchInst *TrueBranch = nullptr;		BranchInst *TrueBranch = nullptr;
BranchInst *FalseBranch = nullptr;		BranchInst *FalseBranch = nullptr;

// Sink expensive instructions into the conditional blocks to avoid executing		// Sink expensive instructions into the conditional blocks to avoid executing
// them speculatively.		// them speculatively.
for (SelectInst *SI : ASI) {		for (SelectInst *SI : ASI) {
if (sinkSelectOperand(TTI, SI->getTrueValue())) {		if (sinkSelectOperand(TTI, SI->getTrueValue())) {
if (TrueBlock == nullptr) {		if (TrueBlock == nullptr) {
TrueBlock = BasicBlock::Create(SI->getContext(), "select.true.sink",		TrueBlock = BasicBlock::Create(SI->getContext(), "select.true.sink",
EndBlock->getParent(), EndBlock);		EndBlock->getParent(), EndBlock);
TrueBranch = BranchInst::Create(EndBlock, TrueBlock);		TrueBranch = BranchInst::Create(EndBlock, TrueBlock);
		if (IsHugeFunc)
		FreshBBs.insert(TrueBlock);
TrueBranch->setDebugLoc(SI->getDebugLoc());		TrueBranch->setDebugLoc(SI->getDebugLoc());
}		}
auto *TrueInst = cast<Instruction>(SI->getTrueValue());		auto *TrueInst = cast<Instruction>(SI->getTrueValue());
TrueInst->moveBefore(TrueBranch);		TrueInst->moveBefore(TrueBranch);
}		}
if (sinkSelectOperand(TTI, SI->getFalseValue())) {		if (sinkSelectOperand(TTI, SI->getFalseValue())) {
if (FalseBlock == nullptr) {		if (FalseBlock == nullptr) {
FalseBlock = BasicBlock::Create(SI->getContext(), "select.false.sink",		FalseBlock = BasicBlock::Create(SI->getContext(), "select.false.sink",
EndBlock->getParent(), EndBlock);		EndBlock->getParent(), EndBlock);
		if (IsHugeFunc)
		FreshBBs.insert(FalseBlock);
FalseBranch = BranchInst::Create(EndBlock, FalseBlock);		FalseBranch = BranchInst::Create(EndBlock, FalseBlock);
FalseBranch->setDebugLoc(SI->getDebugLoc());		FalseBranch->setDebugLoc(SI->getDebugLoc());
}		}
auto *FalseInst = cast<Instruction>(SI->getFalseValue());		auto *FalseInst = cast<Instruction>(SI->getFalseValue());
FalseInst->moveBefore(FalseBranch);		FalseInst->moveBefore(FalseBranch);
}		}
}		}

// If there was nothing to sink, then arbitrarily choose the 'false' side		// If there was nothing to sink, then arbitrarily choose the 'false' side
// for a new input value to the PHI.		// for a new input value to the PHI.
if (TrueBlock == FalseBlock) {		if (TrueBlock == FalseBlock) {
assert(TrueBlock == nullptr &&		assert(TrueBlock == nullptr &&
"Unexpected basic block transform while optimizing select");		"Unexpected basic block transform while optimizing select");

FalseBlock = BasicBlock::Create(SI->getContext(), "select.false",		FalseBlock = BasicBlock::Create(SI->getContext(), "select.false",
EndBlock->getParent(), EndBlock);		EndBlock->getParent(), EndBlock);
		if (IsHugeFunc)
		FreshBBs.insert(FalseBlock);
auto *FalseBranch = BranchInst::Create(EndBlock, FalseBlock);		auto *FalseBranch = BranchInst::Create(EndBlock, FalseBlock);
FalseBranch->setDebugLoc(SI->getDebugLoc());		FalseBranch->setDebugLoc(SI->getDebugLoc());
}		}

// Insert the real conditional branch based on the original condition.		// Insert the real conditional branch based on the original condition.
// If we did not create a new block for one of the 'true' or 'false' paths		// If we did not create a new block for one of the 'true' or 'false' paths
// of the condition, it means that side of the branch goes to the end block		// of the condition, it means that side of the branch goes to the end block
// directly and the path originates from the start block from the point of		// directly and the path originates from the start block from the point of
Show All 23 Lines	bool CodeGenPrepare::optimizeSelectInst(SelectInst *SI) {
for (SelectInst *SI : llvm::reverse(ASI)) {		for (SelectInst *SI : llvm::reverse(ASI)) {
// The select itself is replaced with a PHI Node.		// The select itself is replaced with a PHI Node.
PHINode *PN = PHINode::Create(SI->getType(), 2, "", &EndBlock->front());		PHINode *PN = PHINode::Create(SI->getType(), 2, "", &EndBlock->front());
PN->takeName(SI);		PN->takeName(SI);
PN->addIncoming(getTrueOrFalseValue(SI, true, INS), TrueBlock);		PN->addIncoming(getTrueOrFalseValue(SI, true, INS), TrueBlock);
PN->addIncoming(getTrueOrFalseValue(SI, false, INS), FalseBlock);		PN->addIncoming(getTrueOrFalseValue(SI, false, INS), FalseBlock);
PN->setDebugLoc(SI->getDebugLoc());		PN->setDebugLoc(SI->getDebugLoc());

SI->replaceAllUsesWith(PN);		replaceAllUsesWith(SI, PN, FreshBBs, IsHugeFunc);
SI->eraseFromParent();		SI->eraseFromParent();
INS.erase(SI);		INS.erase(SI);
++NumSelectsExpanded;		++NumSelectsExpanded;
}		}

// Instruct OptimizeBlock to skip to the next block.		// Instruct OptimizeBlock to skip to the next block.
CurInstIterator = StartBlock->end();		CurInstIterator = StartBlock->end();
return true;		return true;
Show All 21 Lines	bool CodeGenPrepare::optimizeShuffleVectorInst(ShuffleVectorInst *SVI) {
// Create a bitcast (shuffle (insert (bitcast(..))))		// Create a bitcast (shuffle (insert (bitcast(..))))
IRBuilder<> Builder(SVI->getContext());		IRBuilder<> Builder(SVI->getContext());
Builder.SetInsertPoint(SVI);		Builder.SetInsertPoint(SVI);
Value *BC1 = Builder.CreateBitCast(		Value *BC1 = Builder.CreateBitCast(
cast<Instruction>(SVI->getOperand(0))->getOperand(1), NewType);		cast<Instruction>(SVI->getOperand(0))->getOperand(1), NewType);
Value *Shuffle = Builder.CreateVectorSplat(NewVecType->getNumElements(), BC1);		Value *Shuffle = Builder.CreateVectorSplat(NewVecType->getNumElements(), BC1);
Value *BC2 = Builder.CreateBitCast(Shuffle, SVIVecType);		Value *BC2 = Builder.CreateBitCast(Shuffle, SVIVecType);

SVI->replaceAllUsesWith(BC2);		replaceAllUsesWith(SVI, BC2, FreshBBs, IsHugeFunc);
RecursivelyDeleteTriviallyDeadInstructions(		RecursivelyDeleteTriviallyDeadInstructions(
SVI, TLInfo, nullptr,		SVI, TLInfo, nullptr,
[&](Value *V) { removeAllAssertingVHReferences(V); });		[&](Value *V) { removeAllAssertingVHReferences(V); });

// Also hoist the bitcast up to its operand if it they are not in the same		// Also hoist the bitcast up to its operand if it they are not in the same
// block.		// block.
if (auto *BCI = dyn_cast<Instruction>(BC1))		if (auto *BCI = dyn_cast<Instruction>(BC1))
if (auto *Op = dyn_cast<Instruction>(BCI->getOperand(0)))		if (auto *Op = dyn_cast<Instruction>(BCI->getOperand(0)))
Show All 36 Lines	for (Use *U : reverse(OpsToSink)) {
ToReplace.push_back(U);		ToReplace.push_back(U);
}		}

SetVector<Instruction *> MaybeDead;		SetVector<Instruction *> MaybeDead;
DenseMap<Instruction , Instruction > NewInstructions;		DenseMap<Instruction , Instruction > NewInstructions;
for (Use *U : ToReplace) {		for (Use *U : ToReplace) {
auto *UI = cast<Instruction>(U->get());		auto *UI = cast<Instruction>(U->get());
Instruction *NI = UI->clone();		Instruction *NI = UI->clone();

		if (IsHugeFunc) {
		// Now we clone an instruction, its operands' defs may sink to this BB
		// now. So we put the operands defs' BBs into FreshBBs to do optmization.
		for (unsigned I = 0; I < NI->getNumOperands(); ++I) {
		LuoYuankeUnsubmitted Not Done Reply Inline Actions The instruction is cloned. Why the def of its operand need to be sinked? LuoYuanke: The instruction is cloned. Why the def of its operand need to be sinked?
		xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions When we copy an Instruction (to a new place) to do optimization, means we updated an instruction. This instruction's operand defs may have new opportunity to sink to the optimized new instruction. xiangzhangllvm: When we copy an Instruction (to a new place) to do optimization, means we updated an…
		LuoYuankeUnsubmitted Not Done Reply Inline Actions Here we just copy the instruction. We know it is sinked when it is erased in line 7135. LuoYuanke: Here we just copy the instruction. We know it is sinked when it is erased in line 7135.
		xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions Let me try give an example soon. xiangzhangllvm: Let me try give an example soon.
		xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions I find an ARM case about it, it not about sink, it about copy 2 times: 1st time: copy insertelement from entry to vector.body entry: ... 1 %l0 = trunc i16 %x to i8 2 %l1 = insertelement <8 x i8> undef, i8 %l0, i32 0 ... 3 br label %vector.body vector.body: ; preds = %vector.body, %entry ... 4 %0 = insertelement <8 x i8> undef, i8 %l0, i32 0 // copy from line 2 5 %1 = shufflevector <8 x i8> %0, <8 x i8> undef, <8 x i32> zeroinitializer 6 %l9 = mul <8 x i8> %1, %l8 2nd time, copy insertelement's operand trunc ... 1 %l0 = trunc i16 %x to i8 2 %l1 = insertelement <8 x i8> undef, i8 %l0, i32 0 ... 3 br label %vector.body vector.body: ; preds = %vector.body, %entry ... %0 = trunc i16 %x to i8 // 2nd copy optimization %1 = insertelement <8 x i8> undef, i8 %0, i32 0 // use 2nd copy %2 = shufflevector <8 x i8> %1, <8 x i8> undef, <8 x i32> zeroinitializer %l9 = mul <8 x i8> %2, %l8 xiangzhangllvm: I find an ARM case about it, it not about sink, it about copy 2 times: 1st time: copy…
		LuoYuankeUnsubmitted Not Done Reply Inline Actions Is the `insertelement` in entry BB erased for the first time? LuoYuanke: Is the `insertelement` in entry BB erased for the first time?
		xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions It is not erased, because it still has user in entry BB. But it is possible to erase. xiangzhangllvm: It is not erased, because it still has user in entry BB. But it is possible to erase.
		auto *OpDef = dyn_cast<Instruction>(NI->getOperand(I));
		if (!OpDef)
		continue;
		FreshBBs.insert(OpDef->getParent());
		}
		}

NewInstructions[UI] = NI;		NewInstructions[UI] = NI;
MaybeDead.insert(UI);		MaybeDead.insert(UI);
LLVM_DEBUG(dbgs() << "Sinking " << UI << " to user " << I << "\n");		LLVM_DEBUG(dbgs() << "Sinking " << UI << " to user " << I << "\n");
NI->insertBefore(InsertPoint);		NI->insertBefore(InsertPoint);
InsertPoint = NI;		InsertPoint = NI;
InsertedInsts.insert(NI);		InsertedInsts.insert(NI);

// Update the use for the new instruction, making sure that we update the		// Update the use for the new instruction, making sure that we update the
// sunk instruction uses, if it is part of a chain that has already been		// sunk instruction uses, if it is part of a chain that has already been
// sunk.		// sunk.
Instruction *OldI = cast<Instruction>(U->getUser());		Instruction *OldI = cast<Instruction>(U->getUser());
if (NewInstructions.count(OldI))		if (NewInstructions.count(OldI))
NewInstructions[OldI]->setOperand(U->getOperandNo(), NI);		NewInstructions[OldI]->setOperand(U->getOperandNo(), NI);
else		else
U->set(NI);		U->set(NI);
Changed = true;		Changed = true;
}		}

// Remove instructions that are dead after sinking.		// Remove instructions that are dead after sinking.
for (auto *I : MaybeDead) {		for (auto *I : MaybeDead) {
if (!I->hasNUsesOrMore(1)) {		if (!I->hasNUsesOrMore(1)) {
LLVM_DEBUG(dbgs() << "Removing dead instruction: " << *I << "\n");		LLVM_DEBUG(dbgs() << "Removing dead instruction: " << *I << "\n");
I->eraseFromParent();		I->eraseFromParent();
		LuoYuankeUnsubmitted Not Done Reply Inline Actions Should we insert the BB to FreshBBs? LuoYuanke: Should we insert the BB to FreshBBs?
		xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions Strictly speaking we should do. But this instruction has been erased. All the opportunity about this "updated" instruction is disappear. xiangzhangllvm: Strictly speaking we should do. But this instruction has been erased. All the opportunity about…
}		}
}		}

return Changed;		return Changed;
}		}

bool CodeGenPrepare::optimizeSwitchType(SwitchInst *SI) {		bool CodeGenPrepare::optimizeSwitchType(SwitchInst *SI) {
Value *Cond = SI->getCondition();		Value *Cond = SI->getCondition();
▲ Show 20 Lines • Show All 800 Lines • ▼ Show 20 Lines	static bool tryUnmergingGEPsAcrossIndirectBr(GetElementPtrInst *GEPI,
assert(llvm::none_of(GEPIOp->users(),		assert(llvm::none_of(GEPIOp->users(),
[&](User *Usr) {		[&](User *Usr) {
return cast<Instruction>(Usr)->getParent() != SrcBlock;		return cast<Instruction>(Usr)->getParent() != SrcBlock;
}) &&		}) &&
"GEPIOp is used outside SrcBlock");		"GEPIOp is used outside SrcBlock");
return true;		return true;
}		}

static bool optimizeBranch(BranchInst *Branch, const TargetLowering &TLI) {		static bool optimizeBranch(BranchInst *Branch, const TargetLowering &TLI,
		SmallSet<BasicBlock *, 32> &FreshBBs,
		bool IsHugeFunc) {
// Try and convert		// Try and convert
// %c = icmp ult %x, 8		// %c = icmp ult %x, 8
// br %c, bla, blb		// br %c, bla, blb
// %tc = lshr %x, 3		// %tc = lshr %x, 3
// to		// to
// %tc = lshr %x, 3		// %tc = lshr %x, 3
// %c = icmp eq %tc, 0		// %c = icmp eq %tc, 0
// br %c, bla, blb		// br %c, bla, blb
Show All 24 Lines	if (CmpC.isPowerOf2() && Cmp->getPredicate() == ICmpInst::ICMP_ULT &&
match(UI, m_Shr(m_Specific(X), m_SpecificInt(CmpC.logBase2())))) {		match(UI, m_Shr(m_Specific(X), m_SpecificInt(CmpC.logBase2())))) {
IRBuilder<> Builder(Branch);		IRBuilder<> Builder(Branch);
if (UI->getParent() != Branch->getParent())		if (UI->getParent() != Branch->getParent())
UI->moveBefore(Branch);		UI->moveBefore(Branch);
Value *NewCmp = Builder.CreateCmp(ICmpInst::ICMP_EQ, UI,		Value *NewCmp = Builder.CreateCmp(ICmpInst::ICMP_EQ, UI,
ConstantInt::get(UI->getType(), 0));		ConstantInt::get(UI->getType(), 0));
LLVM_DEBUG(dbgs() << "Converting " << *Cmp << "\n");		LLVM_DEBUG(dbgs() << "Converting " << *Cmp << "\n");
LLVM_DEBUG(dbgs() << " to compare on zero: " << *NewCmp << "\n");		LLVM_DEBUG(dbgs() << " to compare on zero: " << *NewCmp << "\n");
Cmp->replaceAllUsesWith(NewCmp);		replaceAllUsesWith(Cmp, NewCmp, FreshBBs, IsHugeFunc);
return true;		return true;
}		}
if (Cmp->isEquality() &&		if (Cmp->isEquality() &&
(match(UI, m_Add(m_Specific(X), m_SpecificInt(-CmpC))) \|\|		(match(UI, m_Add(m_Specific(X), m_SpecificInt(-CmpC))) \|\|
match(UI, m_Sub(m_Specific(X), m_SpecificInt(CmpC))))) {		match(UI, m_Sub(m_Specific(X), m_SpecificInt(CmpC))))) {
IRBuilder<> Builder(Branch);		IRBuilder<> Builder(Branch);
if (UI->getParent() != Branch->getParent())		if (UI->getParent() != Branch->getParent())
UI->moveBefore(Branch);		UI->moveBefore(Branch);
Value *NewCmp = Builder.CreateCmp(Cmp->getPredicate(), UI,		Value *NewCmp = Builder.CreateCmp(Cmp->getPredicate(), UI,
ConstantInt::get(UI->getType(), 0));		ConstantInt::get(UI->getType(), 0));
LLVM_DEBUG(dbgs() << "Converting " << *Cmp << "\n");		LLVM_DEBUG(dbgs() << "Converting " << *Cmp << "\n");
LLVM_DEBUG(dbgs() << " to compare on zero: " << *NewCmp << "\n");		LLVM_DEBUG(dbgs() << " to compare on zero: " << *NewCmp << "\n");
Cmp->replaceAllUsesWith(NewCmp);		replaceAllUsesWith(Cmp, NewCmp, FreshBBs, IsHugeFunc);
return true;		return true;
}		}
}		}
return false;		return false;
}		}

bool CodeGenPrepare::optimizeInst(Instruction *I, bool &ModifiedDT) {		bool CodeGenPrepare::optimizeInst(Instruction *I, ModifyDT &ModifiedDT) {
// Bail out if we inserted the instruction to prevent optimizations from		// Bail out if we inserted the instruction to prevent optimizations from
// stepping on each other's toes.		// stepping on each other's toes.
if (InsertedInsts.count(I))		if (InsertedInsts.count(I))
return false;		return false;

// TODO: Move into the switch on opcode below here.		// TODO: Move into the switch on opcode below here.
if (PHINode *P = dyn_cast<PHINode>(I)) {		if (PHINode *P = dyn_cast<PHINode>(I)) {
// It is possible for very late stage optimizations (such as SimplifyCFG)		// It is possible for very late stage optimizations (such as SimplifyCFG)
// to introduce PHI nodes too late to be cleaned up. If we detect such a		// to introduce PHI nodes too late to be cleaned up. If we detect such a
// trivial PHI, go ahead and zap it here.		// trivial PHI, go ahead and zap it here.
if (Value V = simplifyInstruction(P, {DL, TLInfo})) {		if (Value V = simplifyInstruction(P, {DL, TLInfo})) {
LargeOffsetGEPMap.erase(P);		LargeOffsetGEPMap.erase(P);
P->replaceAllUsesWith(V);		replaceAllUsesWith(P, V, FreshBBs, IsHugeFunc);
P->eraseFromParent();		P->eraseFromParent();
++NumPHIsElim;		++NumPHIsElim;
return true;		return true;
}		}
return false;		return false;
}		}

if (CastInst *CI = dyn_cast<CastInst>(I)) {		if (CastInst *CI = dyn_cast<CastInst>(I)) {
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	bool CodeGenPrepare::optimizeInst(Instruction *I, ModifyDT &ModifiedDT) {
}		}

if (GetElementPtrInst *GEPI = dyn_cast<GetElementPtrInst>(I)) {		if (GetElementPtrInst *GEPI = dyn_cast<GetElementPtrInst>(I)) {
if (GEPI->hasAllZeroIndices()) {		if (GEPI->hasAllZeroIndices()) {
/// The GEP operand must be a pointer, so must its result -> BitCast		/// The GEP operand must be a pointer, so must its result -> BitCast
Instruction *NC = new BitCastInst(GEPI->getOperand(0), GEPI->getType(),		Instruction *NC = new BitCastInst(GEPI->getOperand(0), GEPI->getType(),
GEPI->getName(), GEPI);		GEPI->getName(), GEPI);
NC->setDebugLoc(GEPI->getDebugLoc());		NC->setDebugLoc(GEPI->getDebugLoc());
GEPI->replaceAllUsesWith(NC);		replaceAllUsesWith(GEPI, NC, FreshBBs, IsHugeFunc);
GEPI->eraseFromParent();		GEPI->eraseFromParent();
++NumGEPsElim;		++NumGEPsElim;
optimizeInst(NC, ModifiedDT);		optimizeInst(NC, ModifiedDT);
return true;		return true;
}		}
if (tryUnmergingGEPsAcrossIndirectBr(GEPI, TTI)) {		if (tryUnmergingGEPsAcrossIndirectBr(GEPI, TTI)) {
return true;		return true;
}		}
Show All 16 Lines	if (CmpI && CmpI->hasOneUse()) {
bool Const1 = isa<ConstantInt>(Op1) \|\| isa<ConstantFP>(Op1) \|\|		bool Const1 = isa<ConstantInt>(Op1) \|\| isa<ConstantFP>(Op1) \|\|
isa<ConstantPointerNull>(Op1);		isa<ConstantPointerNull>(Op1);
if (Const0 \|\| Const1) {		if (Const0 \|\| Const1) {
if (!Const0 \|\| !Const1) {		if (!Const0 \|\| !Const1) {
auto *F = new FreezeInst(Const0 ? Op1 : Op0, "", CmpI);		auto *F = new FreezeInst(Const0 ? Op1 : Op0, "", CmpI);
F->takeName(FI);		F->takeName(FI);
CmpI->setOperand(Const0 ? 1 : 0, F);		CmpI->setOperand(Const0 ? 1 : 0, F);
}		}
FI->replaceAllUsesWith(CmpI);		replaceAllUsesWith(FI, CmpI, FreshBBs, IsHugeFunc);
FI->eraseFromParent();		FI->eraseFromParent();
return true;		return true;
}		}
}		}
return false;		return false;
}		}

if (tryToSinkFreeOperands(I))		if (tryToSinkFreeOperands(I))
Show All 10 Lines	case Instruction::Select:
return optimizeSelectInst(cast<SelectInst>(I));		return optimizeSelectInst(cast<SelectInst>(I));
case Instruction::ShuffleVector:		case Instruction::ShuffleVector:
return optimizeShuffleVectorInst(cast<ShuffleVectorInst>(I));		return optimizeShuffleVectorInst(cast<ShuffleVectorInst>(I));
case Instruction::Switch:		case Instruction::Switch:
return optimizeSwitchInst(cast<SwitchInst>(I));		return optimizeSwitchInst(cast<SwitchInst>(I));
case Instruction::ExtractElement:		case Instruction::ExtractElement:
return optimizeExtractElementInst(cast<ExtractElementInst>(I));		return optimizeExtractElementInst(cast<ExtractElementInst>(I));
case Instruction::Br:		case Instruction::Br:
return optimizeBranch(cast<BranchInst>(I), *TLI);		return optimizeBranch(cast<BranchInst>(I), *TLI, FreshBBs, IsHugeFunc);
}		}

return false;		return false;
}		}

/// Given an OR instruction, check to see if this is a bitreverse		/// Given an OR instruction, check to see if this is a bitreverse
/// idiom. If so, insert the new intrinsic and return true.		/// idiom. If so, insert the new intrinsic and return true.
bool CodeGenPrepare::makeBitReverse(Instruction &I) {		bool CodeGenPrepare::makeBitReverse(Instruction &I) {
if (!I.getType()->isIntegerTy() \|\|		if (!I.getType()->isIntegerTy() \|\|
!TLI->isOperationLegalOrCustom(ISD::BITREVERSE,		!TLI->isOperationLegalOrCustom(ISD::BITREVERSE,
TLI->getValueType(*DL, I.getType(), true)))		TLI->getValueType(*DL, I.getType(), true)))
return false;		return false;

SmallVector<Instruction *, 4> Insts;		SmallVector<Instruction *, 4> Insts;
if (!recognizeBSwapOrBitReverseIdiom(&I, false, true, Insts))		if (!recognizeBSwapOrBitReverseIdiom(&I, false, true, Insts))
return false;		return false;
Instruction *LastInst = Insts.back();		Instruction *LastInst = Insts.back();
I.replaceAllUsesWith(LastInst);		replaceAllUsesWith(&I, LastInst, FreshBBs, IsHugeFunc);
RecursivelyDeleteTriviallyDeadInstructions(		RecursivelyDeleteTriviallyDeadInstructions(
&I, TLInfo, nullptr,		&I, TLInfo, nullptr,
[&](Value *V) { removeAllAssertingVHReferences(V); });		[&](Value *V) { removeAllAssertingVHReferences(V); });
return true;		return true;
}		}

// In this pass we look for GEP and cast instructions that are used		// In this pass we look for GEP and cast instructions that are used
// across basic blocks and rewrite them to improve basic-block-at-a-time		// across basic blocks and rewrite them to improve basic-block-at-a-time
// selection.		// selection.
bool CodeGenPrepare::optimizeBlock(BasicBlock &BB, bool &ModifiedDT) {		bool CodeGenPrepare::optimizeBlock(BasicBlock &BB, ModifyDT &ModifiedDT) {
SunkAddrs.clear();		SunkAddrs.clear();
bool MadeChange = false;		bool MadeChange = false;

		do {
CurInstIterator = BB.begin();		CurInstIterator = BB.begin();
		ModifiedDT = ModifyDT::NotModifyDT;
while (CurInstIterator != BB.end()) {		while (CurInstIterator != BB.end()) {
MadeChange \|= optimizeInst(&*CurInstIterator++, ModifiedDT);		MadeChange \|= optimizeInst(&*CurInstIterator++, ModifiedDT);
if (ModifiedDT)		if (ModifiedDT != ModifyDT::NotModifyDT) {
		// For huge function we tend to quickly go though the inner optmization
		// opportunities in the BB. So we go back to the BB head to re-optimize
		// each instruction instead of go back to the function head.
		if (IsHugeFunc) {
		DT.reset();
		LuoYuankeUnsubmitted Done Reply Inline Actions It should be buggy if the following instrcutions in the BB depends on the dominator tree. LuoYuanke: It should be buggy if the following instrcutions in the BB depends on the dominator tree.
		xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions The break will try re-iterate such BB. Do you mean here the BB may be erased ? I think the optimizeInst can not erase the BB. If it split the BB, it will split it to the BB + new BB, the iteration on such BB still work. xiangzhangllvm: The break will try re-iterate such BB. Do you mean here the BB may be erased ? I think the…
		LuoYuankeUnsubmitted Done Reply Inline Actions I mean the dominator tree has changed but it is not updated yet. We can't invoke `DT.dominates(...)` before we updating the dominator tree. LuoYuanke: I mean the dominator tree has changed but it is not updated yet. We can't invoke `DT.dominates(.
		xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions Make sense, let me reset the DT here, thanks! xiangzhangllvm: Make sense, let me reset the DT here, thanks!
		getDT(*BB.getParent());
		LuoYuankeUnsubmitted Not Done Reply Inline Actions It seems not necessary to getDT immedately after `DT.reset`. The lazy `getDT` should help on compiling time. LuoYuanke: It seems not necessary to getDT immedately after `DT.reset`. The lazy `getDT` should help on…
		xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions If optimizeInst direct use the DT will fail. I meet some tests fail here before. So I add the getDT here. xiangzhangllvm: If optimizeInst direct use the DT will fail. I meet some tests fail here before. So I add the…
		break;
		} else {
return true;		return true;
}		}
		}
		}
		} while (ModifiedDT == ModifyDT::ModifyInstDT);

bool MadeBitReverse = true;		bool MadeBitReverse = true;
while (MadeBitReverse) {		while (MadeBitReverse) {
MadeBitReverse = false;		MadeBitReverse = false;
for (auto &I : reverse(BB)) {		for (auto &I : reverse(BB)) {
if (makeBitReverse(I)) {		if (makeBitReverse(I)) {
MadeBitReverse = MadeChange = true;		MadeBitReverse = MadeChange = true;
break;		break;
▲ Show 20 Lines • Show All 143 Lines • ▼ Show 20 Lines
/// br i1 %1, label %TrueBB, label %FalseBB		/// br i1 %1, label %TrueBB, label %FalseBB
/// \endcode		/// \endcode
/// This usually allows instruction selection to do even further optimizations		/// This usually allows instruction selection to do even further optimizations
/// and combine the compare with the branch instruction. Currently this is		/// and combine the compare with the branch instruction. Currently this is
/// applied for targets which have "cheap" jump instructions.		/// applied for targets which have "cheap" jump instructions.
///		///
/// FIXME: Remove the (equivalent?) implementation in SelectionDAG.		/// FIXME: Remove the (equivalent?) implementation in SelectionDAG.
///		///
bool CodeGenPrepare::splitBranchCondition(Function &F, bool &ModifiedDT) {		bool CodeGenPrepare::splitBranchCondition(Function &F, ModifyDT &ModifiedDT) {
if (!TM->Options.EnableFastISel \|\| TLI->isJumpExpensive())		if (!TM->Options.EnableFastISel \|\| TLI->isJumpExpensive())
return false;		return false;

bool MadeChange = false;		bool MadeChange = false;
for (auto &BB : F) {		for (auto &BB : F) {
// Does this BB end with the following?		// Does this BB end with the following?
// %cond1 = icmp\|fcmp\|binary instruction ...		// %cond1 = icmp\|fcmp\|binary instruction ...
// %cond2 = icmp\|fcmp\|binary instruction ...		// %cond2 = icmp\|fcmp\|binary instruction ...
Show All 34 Lines	if (!IsGoodCond(Cond1) \|\| !IsGoodCond(Cond2))
continue;		continue;

LLVM_DEBUG(dbgs() << "Before branch condition splitting\n"; BB.dump());		LLVM_DEBUG(dbgs() << "Before branch condition splitting\n"; BB.dump());

// Create a new BB.		// Create a new BB.
auto *TmpBB =		auto *TmpBB =
BasicBlock::Create(BB.getContext(), BB.getName() + ".cond.split",		BasicBlock::Create(BB.getContext(), BB.getName() + ".cond.split",
BB.getParent(), BB.getNextNode());		BB.getParent(), BB.getNextNode());
		if (IsHugeFunc)
		FreshBBs.insert(TmpBB);

// Update original basic block by using the first condition directly by the		// Update original basic block by using the first condition directly by the
// branch instruction and removing the no longer needed and/or instruction.		// branch instruction and removing the no longer needed and/or instruction.
Br1->setCondition(Cond1);		Br1->setCondition(Cond1);
LogicOp->eraseFromParent();		LogicOp->eraseFromParent();

// Depending on the condition we have to either replace the true or the		// Depending on the condition we have to either replace the true or the
// false successor of the original branch instruction.		// false successor of the original branch instruction.
▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	if (Opc == Instruction::Or) {
NewFalseWeight = FalseWeight;		NewFalseWeight = FalseWeight;
scaleWeights(NewTrueWeight, NewFalseWeight);		scaleWeights(NewTrueWeight, NewFalseWeight);
Br2->setMetadata(LLVMContext::MD_prof,		Br2->setMetadata(LLVMContext::MD_prof,
MDBuilder(Br2->getContext())		MDBuilder(Br2->getContext())
.createBranchWeights(TrueWeight, FalseWeight));		.createBranchWeights(TrueWeight, FalseWeight));
}		}
}		}

ModifiedDT = true;		ModifiedDT = ModifyDT::ModifyBBDT;
MadeChange = true;		MadeChange = true;

LLVM_DEBUG(dbgs() << "After branch condition splitting\n"; BB.dump();		LLVM_DEBUG(dbgs() << "After branch condition splitting\n"; BB.dump();
TmpBB->dump());		TmpBB->dump());
}		}
return MadeChange;		return MadeChange;
}		}

llvm/test/CodeGen/AArch64/and-sink.ll

	; RUN: llc -mtriple=aarch64-linux-gnu -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-linux-gnu -verify-machineinstrs < %s \| FileCheck %s
	; RUN: opt -S -codegenprepare -mtriple=aarch64-linux %s \| FileCheck --check-prefix=CHECK-CGP %s			; RUN: opt -S -codegenprepare -mtriple=aarch64-linux %s \| FileCheck --check-prefix=CHECK-CGP %s
				; RUN: opt -S -codegenprepare -cgpp-huge-func=0 -mtriple=aarch64-linux %s \| FileCheck --check-prefix=CHECK-CGP %s

	@A = dso_local global i32 zeroinitializer			@A = dso_local global i32 zeroinitializer
	@B = dso_local global i32 zeroinitializer			@B = dso_local global i32 zeroinitializer
	@C = dso_local global i32 zeroinitializer			@C = dso_local global i32 zeroinitializer

	; Test that and is sunk into cmp block to form tbz.			; Test that and is sunk into cmp block to form tbz.
	define dso_local i32 @and_sink1(i32 %a, i1 %c) {			define dso_local i32 @and_sink1(i32 %a, i1 %c) {
	; CHECK-LABEL: and_sink1:			; CHECK-LABEL: and_sink1:
	▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/and-sink.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=i686-unknown -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -mtriple=i686-unknown -verify-machineinstrs < %s \| FileCheck %s
	; RUN: opt < %s -codegenprepare -S -mtriple=x86_64-unknown-unknown \| FileCheck --check-prefix=CHECK-CGP %s			; RUN: opt < %s -codegenprepare -S -mtriple=x86_64-unknown-unknown \| FileCheck --check-prefix=CHECK-CGP %s
				; RUN: opt < %s -codegenprepare -cgpp-huge-func=0 -S -mtriple=x86_64-unknown-unknown \| FileCheck --check-prefix=CHECK-CGP %s

	@A = global i32 zeroinitializer			@A = global i32 zeroinitializer
	@B = global i32 zeroinitializer			@B = global i32 zeroinitializer
	@C = global i32 zeroinitializer			@C = global i32 zeroinitializer

	; Test that 'and' is sunk into bb0.			; Test that 'and' is sunk into bb0.
	define i32 @and_sink1(i32 %a, i1 %c) {			define i32 @and_sink1(i32 %a, i1 %c) {
	; CHECK-LABEL: and_sink1:			; CHECK-LABEL: and_sink1:
	▲ Show 20 Lines • Show All 225 Lines • Show Last 20 Lines

llvm/test/Transforms/CodeGenPrepare/ARM/sinkchain-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+mve.fp < %s -codegenprepare -S \| FileCheck -check-prefix=CHECK %s			; RUN: opt -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+mve.fp < %s -codegenprepare -S \| FileCheck -check-prefix=CHECK %s
				; RUN: opt -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+mve.fp < %s -codegenprepare -cgpp-huge-func=0 -S \| FileCheck -check-prefix=CHECK %s

	; Sink the shufflevector/insertelement pair, followed by the trunc. The sunk instruction end up dead.			; Sink the shufflevector/insertelement pair, followed by the trunc. The sunk instruction end up dead.
	define signext i8 @dead(i16* noalias nocapture readonly %s1, i16 zeroext %x, i8* noalias nocapture %d, i32 %n) {			define signext i8 @dead(i16* noalias nocapture readonly %s1, i16 zeroext %x, i8* noalias nocapture %d, i32 %n) {
	; CHECK-LABEL: @dead(			; CHECK-LABEL: @dead(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[N_VEC:%.]] = and i32 [[N:%.]], -8			; CHECK-NEXT: [[N_VEC:%.]] = and i32 [[N:%.]], -8
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	▲ Show 20 Lines • Show All 97 Lines • Show Last 20 Lines

llvm/test/Transforms/CodeGenPrepare/ARM/sinkchain.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+mve.fp < %s -codegenprepare -S \| FileCheck -check-prefix=CHECK %s			; RUN: opt -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+mve.fp < %s -codegenprepare -S \| FileCheck -check-prefix=CHECK %s
				; RUN: opt -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+mve.fp < %s -codegenprepare -cgpp-huge-func=0 -S \| FileCheck -check-prefix=CHECK %s

	; Sink the shufflevector/insertelement pair, followed by the trunc. The sunk instruction end up dead.			; Sink the shufflevector/insertelement pair, followed by the trunc. The sunk instruction end up dead.
	define signext i8 @dead(i16* noalias nocapture readonly %s1, i16 zeroext %x, i8* noalias nocapture %d, i32 %n) {			define signext i8 @dead(i16* noalias nocapture readonly %s1, i16 zeroext %x, i8* noalias nocapture %d, i32 %n) {
	; CHECK-LABEL: @dead(			; CHECK-LABEL: @dead(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[N_VEC:%.]] = and i32 [[N:%.]], -8			; CHECK-NEXT: [[N_VEC:%.]] = and i32 [[N:%.]], -8
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	▲ Show 20 Lines • Show All 97 Lines • Show Last 20 Lines

llvm/test/Transforms/CodeGenPrepare/X86/gather-scatter-opt-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -codegenprepare < %s \| FileCheck %s			; RUN: opt -S -codegenprepare < %s \| FileCheck %s
				; RUN: opt -S -codegenprepare -cgpp-huge-func=0 < %s \| FileCheck %s

	target datalayout =			target datalayout =
	"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128"			"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	%struct.a = type { i32, i32 }			%struct.a = type { i32, i32 }
	@c = external dso_local global %struct.a, align 4			@c = external dso_local global %struct.a, align 4
	@glob_array = internal unnamed_addr constant [16 x i32] [i32 1, i32 1, i32 2, i32 3, i32 5, i32 8, i32 13, i32 21, i32 34, i32 55, i32 89, i32 144, i32 233, i32 377, i32 610, i32 987], align 16			@glob_array = internal unnamed_addr constant [16 x i32] [i32 1, i32 1, i32 2, i32 3, i32 5, i32 8, i32 13, i32 21, i32 34, i32 55, i32 89, i32 144, i32 233, i32 377, i32 610, i32 987], align 16
	▲ Show 20 Lines • Show All 103 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Limit building time for CodeGenPrepareClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 458344

llvm/lib/CodeGen/CodeGenPrepare.cpp

llvm/test/CodeGen/AArch64/and-sink.ll

llvm/test/CodeGen/X86/and-sink.ll

llvm/test/Transforms/CodeGenPrepare/ARM/sinkchain-inseltpoison.ll

llvm/test/Transforms/CodeGenPrepare/ARM/sinkchain.ll

llvm/test/Transforms/CodeGenPrepare/X86/gather-scatter-opt-inseltpoison.ll

[CodeGen] Limit building time for CodeGenPrepare
ClosedPublic