This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
InitializePasses.h
-
LinkAllPasses.h
-
Transforms/
-
Scalar.h
-
Scalar/
1/1
DFAJumpThreading.h
-
lib/
-
Passes/
-
PassBuilder.cpp
-
PassRegistry.def
-
Transforms/
-
IPO/
2/2
PassManagerBuilder.cpp
-
Scalar/
-
CMakeLists.txt
48/52
DFAJumpThreading.cpp
-
Scalar.cpp
-
test/Transforms/DFAJumpThreading/
-
Transforms/
-
DFAJumpThreading/
-
dfa-constant-propagation.ll
-
dfa-jump-threading-analysis.ll
1/1
dfa-jump-threading-transform.ll
-
dfa-unfold-select.ll
-
max-path-length.ll
1/2
negative.ll

Differential D99205

Add jump-threading optimization for deterministic finite automata
ClosedPublic

Authored by alexey.zhikhar on Mar 23 2021, 12:16 PM.

Download Raw Diff

Details

Reviewers

alanphipps
xbolva00
efriedma
jkreiner
sebpop
SjoerdMeijer

Commits

rG02077da7e7a8: Add jump-threading optimization for deterministic finite automata

Summary

The current JumpThreading pass does not jump thread loops since it can
result in irreducible control flow that harms other optimizations. This
prevents switch statements inside a loop from being optimized to use
unconditional branches.

This code pattern occurs in the core_state_transition function of
Coremark. The state machine can be implemented manually with goto
statements resulting in a large runtime improvement, and this transform
makes the switch implementation match the goto version in performance.

This patch specifically targets switch statements inside a loop that
have the opportunity to be threaded. Once it identifies an opportunity,
it creates new paths that branch directly to the correct code block.
For example, the left CFG could be transformed to the right CFG:

       sw.bb                        sw.bb
     /   |   \                    /   |   \
case1  case2  case3          case1  case2  case3
     \   |   /                /       |       \
     latch.bb             latch.2  latch.3  latch.1
      br sw.bb              /         |         \
                        sw.bb.2     sw.bb.3     sw.bb.1
                         br case2    br case3    br case1

Co-author: Justin Kreiner @jkreiner
Co-author: Ehsan Amiri @amehsan

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

efriedma added inline comments.May 4 2021, 1:35 PM

llvm/lib/CodeGen/DFAJumpThreading.cpp
643 ↗	(On Diff #342832)	`cast<PHINode>(Incoming)`

In D99205#2737106, @efriedma wrote:

If we're going to be cloning basic blocks, we need some sort of cost computation. Some blocks aren't legal to clone, and some are expensive simply due to size. (Illegal to clone like this: convergent, noduplicate, indirectbr predecessors.) See llvm/Analysis/CodeMetrics.h.

That is a good point, I agree we will need a cost model for the transformation. I have started working on one and gathering more performance stats. Also thanks for sharing CodeMetrics.h, I can add a check for those types of blocks that aren't legal to clone.

If I'm understanding correctly, the goal here is to specifically handle cases where we can completely eliminate the switch. That's a bit narrow, but I guess it makes sense, as least as a first cut.

This is correct for how it has been implemented, but the transformation should also work for the general case where the switch can be partially threaded. The analysis part will need to be changed a bit to identify the partial threading opportunities, but it is possible in the future.

In case anyone is interested, there is some context about this patch over here: https://reviews.llvm.org/D88307

SjoerdMeijer added a subscriber: SjoerdMeijer.May 5 2021, 12:34 AM

Implemented a cost model to avoid cloning too much code, and addressed reviewer comments.

I removed the unnecessary conditions, and used the llvm::DenseMap instead.

llvm/lib/CodeGen/DFAJumpThreading.cpp
740 ↗	(On Diff #345215)	Here is the cost calculation we implemented. It could be more accurate by accounting for instructions and blocks that get simplified, but for now it at least prevents code explosion.
453 ↗	(On Diff #342832)	Yes, I agree the vector select check was unnecessary here.

Harbormaster completed remote builds in B104335: Diff 345215.May 13 2021, 12:08 PM

efriedma added inline comments.May 17 2021, 3:45 PM

llvm/lib/CodeGen/DFAJumpThreading.cpp
13 ↗	(On Diff #345215)	Could probably use a brief outline of the overall algorithm here.
109 ↗	(On Diff #345215)	Don't use std::list. I think you can use SmallVector and pop_back(), assuming the iteration order doesn't matter. If you need LIFO order, use std::deque.
182 ↗	(On Diff #345215)	I don't think this applyUpdates() call does anything? The DomTree doesn't care about unreachable edges.
213 ↗	(On Diff #345215)	Can you merge together the two triangle codepaths? They appear nearly identical. Probably just need a couple std::swap calls.
375 ↗	(On Diff #345215)	Do you check somewhere that there's exactly one ConstantInt associated with a given successor? In general, there could be any number.
450 ↗	(On Diff #345215)	isa<Instruction>() if you're not going to use the pointer returned by dyn_cast<>.
690 ↗	(On Diff #345215)	Probably want to ensure we're only calling collectEphemeralValues once per function.
740 ↗	(On Diff #345215)	Okay, makes sense.
llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
461 ↗	(On Diff #345215)	This is kind of late to be running this pass. That might be useful in some cases: we might be making the code harder to analyze if there's irreducible control flow. On the other hand, we're skipping interesting optimizations if the CFG becomes significantly simpler. My instinct is that the advantages of running earlier outweigh the disadvantages.
llvm/test/CodeGen/AArch64/dfa-constant-propagation.ll
1 ↗	(On Diff #345215)	Please put the tests in llvm/test/Transforms/DFAJumpThreading.

Matt added a subscriber: Matt.May 18 2021, 7:10 AM

jkreiner marked 2 inline comments as done.May 18 2021, 7:38 AM

jkreiner added inline comments.

llvm/lib/CodeGen/DFAJumpThreading.cpp
109 ↗	(On Diff #345215)	Okay, there's a few places std::list was used so I'll replace them all.
182 ↗	(On Diff #345215)	Wouldn't it be needed since a new block is created? This edge is reached sometimes. I may be misunderstanding the usage of the DomTreeUpdater though.
213 ↗	(On Diff #345215)	Sure I'll look into merging these two cases.
375 ↗	(On Diff #345215)	Good point, this case isn't checked for so it would currently cause buggy behavior in this function. Now that I think about it, this function can probably be removed since the return value stored as the EntryValue isn't used in the transformation.
llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
461 ↗	(On Diff #345215)	I agree that it may be beneficial to run the pass earlier. I'm not aware yet of which optimizations it could interfere with. Do you have any suggestions of where to run it instead?

dnsampaio added inline comments.May 21 2021, 1:41 AM

llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
461 ↗	(On Diff #345215)	I would suggest trying just before llvm's original jump-threading. For the proposed implemented here in D88307, it actually produced code that llvm's original jump-threading would further optimize, where the other way around, the original jump-threading would just not do anything. I'm running over llvm-12, so it may take me some time to test this patch.

xbolva00 added inline comments.May 21 2021, 1:48 AM

llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
461 ↗	(On Diff #345215)	and part of standard opt 2+ pipeline? why aarch64 specific?

ChuanqiXu added a subscriber: ChuanqiXu.May 21 2021, 1:49 AM

jkreiner added inline comments.May 21 2021, 6:53 AM

llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
461 ↗	(On Diff #345215)	Thank you for the suggestion, I'll try it out before the original jump-threading then. I only had it there since I was testing it on AArch64, but it could be moved to the general opt pipeline.

jkreiner added inline comments.May 21 2021, 1:35 PM

llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
461 ↗	(On Diff #345215)	This seems to be a better location for the pass. I've measured a performance gain 5% greater than before for Coremark by placing it before the original jump-threading.

Addressed reviewer comments, and moved the pass earlier in the pipeline.

Herald added subscribers: wenlei, steven_wu. · View Herald TranscriptMay 26 2021, 11:51 AM

jkreiner marked 10 inline comments as done.May 26 2021, 11:55 AM

Harbormaster completed remote builds in B106345: Diff 348040.May 26 2021, 12:18 PM

Fixed a failing opt pipeline test.

Herald added subscribers: kerbowa, nhaehnle, jvesely. · View Herald TranscriptMay 26 2021, 1:40 PM

Harbormaster completed remote builds in B106367: Diff 348077.May 26 2021, 2:15 PM

I've finally got time to test this.
For our downstream arch we're seeing gains up to 27%.
Very good, thanks for working on this.

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
364–367	In release mode I get warnings this is not used (we use -Werror): llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp:364:21: error: unused function 'operator<<' [-Werror,-Wunused-function] Perhaps put it around #ifndef NDEBUG ?

Naive question: does your new pass "result in irreducible control flow that harms other optimizations"? From the ASCII art diagram in the commit message it looks like it does. Why is that OK?

llvm/include/llvm/Transforms/Scalar/DFAJumpThreading.h
2	Typo ".cpp" :)

In D99205#2784375, @dnsampaio wrote:

For our downstream arch we're seeing gains up to 27%.

That's great to hear!

In D99205#2784461, @foad wrote:

Naive question: does your new pass "result in irreducible control flow that harms other optimizations"? From the ASCII art diagram in the commit message it looks like it does. Why is that OK?

It could produce irreducible control flow like that, but the pass is late enough in the pipeline that it won't have a negative impact. I experimented with putting it early in the pipeline and the gains I measured weren't great, but it seems to be profitable where it is right now.

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
364–367	I see, I'll add that #ifndef NDEBUG check.

jkreiner updated this revision to Diff 348343.May 27 2021, 11:54 AM

jkreiner marked 2 inline comments as done.

Harbormaster completed remote builds in B106571: Diff 348343.May 27 2021, 12:17 PM

In D99205#2785323, @jkreiner wrote:

In D99205#2784461, @foad wrote:

Naive question: does your new pass "result in irreducible control flow that harms other optimizations"? From the ASCII art diagram in the commit message it looks like it does. Why is that OK?

It could produce irreducible control flow like that, but the pass is late enough in the pipeline that it won't have a negative impact. I experimented with putting it early in the pipeline and the gains I measured weren't great, but it seems to be profitable where it is right now.

Naive follow-up question: why does this have to be a complete new implementation of jump threading? Would it be feasible to have a single implementation that takes a "don't create irreducible control flow" flag?

@jkreiner with both llvm 12 and head I get a compiler crash in this function for file F17082111. Just run opt --dfa-jump-threading. Sorry, I couldn't reduce it further, I just don't know how to get bugpoint to work with opt.
With a llvm-12 in debug build I get:

opt --dfa-jump-threading -S < fail.ll
opt: /work1/dsampaio/csw/llvm-project/llvm/include/llvm/IR/Instructions.h:2767: llvm::Value *llvm::PHINode::getIncomingValueForBlock(const llvm::BasicBlock *) const: Assertion `Idx >= 0 && "Invalid basic block argument!"' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: opt --dfa-jump-threading -S
1.      Running pass 'Function Pass Manager' on module '<stdin>'.
2.      Running pass 'DFA Jump Threading' on function '@main'
 #0 0x00007f6a3415119a llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /work1/dsampaio/csw/llvm-project/llvm/lib/Support/Unix/Signals.inc:565:11
 #1 0x00007f6a3415136b PrintStackTraceSignalHandler(void*) /work1/dsampaio/csw/llvm-project/llvm/lib/Support/Unix/Signals.inc:632:1
 #2 0x00007f6a3414f95b llvm::sys::RunSignalHandlers() /work1/dsampaio/csw/llvm-project/llvm/lib/Support/Signals.cpp:70:5
 #3 0x00007f6a34151ae1 SignalHandler(int) /work1/dsampaio/csw/llvm-project/llvm/lib/Support/Unix/Signals.inc:407:1
 #4 0x00007f6a32d7a980 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12980)
 #5 0x00007f6a32076fb7 raise /build/glibc-S9d2JN/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0
 #6 0x00007f6a32078921 abort /build/glibc-S9d2JN/glibc-2.27/stdlib/abort.c:81:0
 #7 0x00007f6a3206848a __assert_fail_base /build/glibc-S9d2JN/glibc-2.27/assert/assert.c:89:0
 #8 0x00007f6a32068502 (/lib/x86_64-linux-gnu/libc.so.6+0x30502)
 #9 0x00007f6a344eb474 llvm::PHINode::getIncomingValueForBlock(llvm::BasicBlock const*) const /work1/dsampaio/csw/llvm-project/llvm/include/llvm/IR/Instructions.h:2768:29
#10 0x00007f6a35a37265 (anonymous namespace)::AllSwitchPaths::run() /work1/dsampaio/csw/llvm-project/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp:532:24
#11 0x00007f6a35a36826 (anonymous namespace)::DFAJumpThreading::run(llvm::Function&) /work1/dsampaio/csw/llvm-project/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp:1149:23
#12 0x00007f6a35a36c1f (anonymous namespace)::DFAJumpThreadingLegacyPass::runOnFunction(llvm::Function&) /work1/dsampaio/csw/llvm-project/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp:164:5
#13 0x00007f6a3440f84c llvm::FPPassManager::runOnFunction(llvm::Function&) /work1/dsampaio/csw/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1435:23
#14 0x00007f6a34414ac5 llvm::FPPassManager::runOnModule(llvm::Module&) /work1/dsampaio/csw/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1481:16
#15 0x00007f6a34410214 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /work1/dsampaio/csw/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1550:23
#16 0x00007f6a3440fd38 llvm::legacy::PassManagerImpl::run(llvm::Module&) /work1/dsampaio/csw/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:541:16
#17 0x00007f6a34414dd1 llvm::legacy::PassManager::run(llvm::Module&) /work1/dsampaio/csw/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1677:3
#18 0x0000000000472e17 main /work1/dsampaio/csw/llvm-project/llvm/tools/opt/opt.cpp:998:3
#19 0x00007f6a32059bf7 __libc_start_main /build/glibc-S9d2JN/glibc-2.27/csu/../csu/libc-start.c:344:0
#20 0x000000000043577a _start (/work1/dsampaio/csw/devimage/toolchain_default/toolroot/opt/kalray/accesscore/bin/opt+0x43577a)
Aborted (core dumped)

In D99205#2786462, @foad wrote:

In D99205#2785323, @jkreiner wrote:

In D99205#2784461, @foad wrote:

Naive question: does your new pass "result in irreducible control flow that harms other optimizations"? From the ASCII art diagram in the commit message it looks like it does. Why is that OK?

It could produce irreducible control flow like that, but the pass is late enough in the pipeline that it won't have a negative impact. I experimented with putting it early in the pipeline and the gains I measured weren't great, but it seems to be profitable where it is right now.

Naive follow-up question: why does this have to be a complete new implementation of jump threading? Would it be feasible to have a single implementation that takes a "don't create irreducible control flow" flag?

I just want to clarify that this is not a complete new implementation of jump threading, the cases that both of the passes catch are mutually exclusive. There were some discussions about implementing a more aggressive jump threading versus a separate pass for specific uses (FSMs) that might answer your question over here: D88307

In D99205#2786565, @dnsampaio wrote:

@jkreiner with both llvm 12 and head I get a compiler crash in this function for file F17082111. Just run opt --dfa-jump-threading. Sorry, I couldn't reduce it further, I just don't know how to get bugpoint to work with opt.

Thank you for providing the testcase, I reproduced it locally. It is the last day of my internship, so I won't be able to submit a fix for it. @alexey.zhikhar initially wrote some parts of the analysis, and will continue where I left off on the remaining work items for this patch.

Thanks for all of the effort! On our downstream Cortex-R5 compiler, I'm seeing a 20.4% speedup on Coremark with this patch, which is good, however, the older patch (https://reviews.llvm.org/D88307) gave me a 21.6% speedup. Any idea what could account for the difference there?

In D99205#2792215, @alanphipps wrote:

Thanks for all of the effort! On our downstream Cortex-R5 compiler, I'm seeing a 20.4% speedup on Coremark with this patch, which is good, however, the older patch (https://reviews.llvm.org/D88307) gave me a 21.6% speedup. Any idea what could account for the difference there?

It would be very hard for us to explain the difference without having access to both the downstream compiler and the hardware but something that comes to mind is:

(1) were the two experiments performed on the same baseline compiler?
(2) 1% could be within noise and might be due to trivial changes in the generated code (change of alignment, change of branch addresses and branch target addresses, etc.)
(3) were both passes in the same place in the pipeline in both experiments?

From the information provided, it is not possible to understand the root cause of the difference.

Herald added a subscriber: ormris. · View Herald TranscriptJun 7 2021, 8:45 AM

ormris removed a subscriber: ormris.Jun 7 2021, 4:08 PM

In D99205#2802936, @alexey.zhikhar wrote:

In D99205#2792215, @alanphipps wrote:

Thanks for all of the effort! On our downstream Cortex-R5 compiler, I'm seeing a 20.4% speedup on Coremark with this patch, which is good, however, the older patch (https://reviews.llvm.org/D88307) gave me a 21.6% speedup. Any idea what could account for the difference there?

It would be very hard for us to explain the difference without having access to both the downstream compiler and the hardware but something that comes to mind is:

(1) were the two experiments performed on the same baseline compiler?
(2) 1% could be within noise and might be due to trivial changes in the generated code (change of alignment, change of branch addresses and branch target addresses, etc.)
(3) were both passes in the same place in the pipeline in both experiments?

From the information provided, it is not possible to understand the root cause of the difference.

Thanks, and understood. I'll see if I can do some more digging here, but I suspect it may be noise. I wasn't sure if there was an obvious difference in the algorithms here that might account for a degradation. I'm pretty sure both passes are not in the same place in the pipeline (just taking the patch from the other review).

marksl added a subscriber: marksl.Jun 8 2021, 2:15 PM

alexey.zhikhar commandeered this revision.Jun 9 2021, 7:00 AM

alexey.zhikhar added a reviewer: jkreiner.

Fixed the bug submitted by @dnsampaio, test case added

Misc minor changes

Harbormaster completed remote builds in B108408: Diff 350880.Jun 9 2021, 8:08 AM

Fix a clang-format warning

Harbormaster completed remote builds in B108879: Diff 351540.Jun 11 2021, 1:54 PM

@efriedma Eli, I believe all your comments have been addressed, please take another look when you get a chance, thanks.

@efriedma ping.

We understand that you might be busy, so maybe you could recommend another reviewer that could make the final decision? Thank you.

@sebpop Do you mind reviewing this patch? This is jump threading for finite state machine. You may remember we had a discussion about it back in 2018 and 2019 llvm dev meeting....We didn't follow up as quickly as I hoped but recently we got the opportunity to improve the code and post it here. As you can see some review has been done, but currently it is waiting for further review or green light to merge.

Earlier discussion that resulted in this patch is also here: https://reviews.llvm.org/D88307

amehsan added a reviewer: sebpop.Jul 8 2021, 9:21 AM

Did a first pass, see comments inline. Plan to look at this closer soon.

I think some tests are missing, mostly negative tests:

test for min code size,
test with blocks that cannot be cloned,
test with some unsupported instructions.

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp
104	Just a thought on "deploying" this. Perhaps easier to start having this off by default? In case problems are found after committing (correctness, or perhaps compile times), you don't need to revert the whole pass, but instead just toggle this and can keep the pass in tree.
llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
13	Nit: perhaps you can be more specific here: Currently this does not happen in LLVM jump threading and say this is JumpThreading.cpp?
25	Nit picking some words here. What do you mean by "switch variable" and what do you mean by "decided"?
30	Nit: runs -> triggers?
33	Just curious, is this algorithm described in literature? Can we include a reference?
186	Can you define what unfolding means in this context?
191	Do you mean that this is code that could be shared? Should this be a TODO?
322	// This data structure keeps track of all blocks that have been cloned in the // optimization. -> // This data structure keeps track of all blocks that have been cloned.
740	Could you include here, or where ever most appropiate, the definition of threading path. Copied from a test case: A threadable path includes a list of basic blocks, the exit state, and the block that determines the next state. < path of BBs that form a cycle > [ state, determinator ]
llvm/test/Transforms/DFAJumpThreading/dfa-jump-threading-transform.ll
20	Perhaps check the full IR for clarity.

I am making an AggressiveJumpThreading pass in downstream to solve the State Machine in Coremark.
And I omit the problem that more aggressive jump threading would cause irreducible control flow. I heard about that gcc has implemented a version which overcomes the problem. It may be beneficial to look into the details of the gcc implementation for all of us.
This patch looks like a big pattern match to me. It requires a switch statement whose condition are predictable, in other words, DFA.
First question is that is it profitable to make a pass for specific pattern? Since I feel that the DFA pattern wouldn't be normal in codes. Then if the pass turned off by default, I have no problems.

The second one is that this pass would collect all paths available in AllSwitchPaths and considering cost model when performing the transformation.
One concern is that if it may be possible wasting too many times in collecting useless paths who is cost is more than benefit.

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
73	IsViewCFGBefore => WouldViewCFGBeforeTransform or ViewCFGBeforeTransform. Now the name looks a little bit confusing.
262–263	It may be suitable to use range based for.
382	Does it necessary to mark `~MainSwitch` as virtual? Since there is no derived class from MainSwitch.
723	What's the rationale to use ceil log base 2? Maybe there is a conclusion in math. But I guess it may be better to tell it.
1139	It may be better to comment why we don't use range-based for here.

jaykang10 added a subscriber: jaykang10.Jul 9 2021, 2:06 AM

In D99205#2866301, @ChuanqiXu wrote:

I am making an AggressiveJumpThreading pass in downstream to solve the State Machine in Coremark.

We have 2 aggressive jump threading passes in upstream review: this work, and also D88307. If I understand this correctly, you're working on another downstream? If so, would it not be more efficient if we support this work? For me, this work has the most potential as it is under active development as opposed to D88307. Your remark about irreducible control flow is a valid one, which was made earlier too. Eli said this about that:

This is kind of late to be running this pass. That might be useful in some cases: we might be making the code harder to analyze if there's irreducible control flow. On the other hand, we're skipping interesting optimizations if the CFG becomes significantly simpler. My instinct is that the advantages of running earlier outweigh the disadvantages.

which is something that still needs to be figured out I think. I.e., it's run earlier now, but do we need more performance numbers to see if this doesn't regress other stuff? But @ChuanqiXu, if you're working on something similar and have ideas, please consider sharing them here.

And I omit the problem that more aggressive jump threading would cause irreducible control flow. I heard about that gcc has implemented a version which overcomes the problem. It may be beneficial to look into the details of the gcc implementation for all of us.
This patch looks like a big pattern match to me. It requires a switch statement whose condition are predictable, in other words, DFA.
First question is that is it profitable to make a pass for specific pattern? Since I feel that the DFA pattern wouldn't be normal in codes. Then if the pass turned off by default, I have no problems.

My colleague @jaykang10 found that this also triggers on Perlbench in SPEC.

I have seen similar code pattern in perlbench of spec2017. In the case, there are cleanups for lifetime.marker and the goto statements go through the cleanups. I have discussed it with cfe-dev. https://lists.llvm.org/pipermail/cfe-dev/2021-July/068478.html It looks like this patch resolves the issue with perlbench as well as coremark.

alexey.zhikhar added inline comments.Jul 9 2021, 9:32 AM

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
33	This is the algorithm that we came up with so we cannot provide a reference. "Codasip" has a whitepaper with a more general algorithm for jump threading that covers DFA as well. Justin @jkreiner compared our approach with the one by Codasip, and found that we can cover all cases that Codasip can (for DFA) with one exception: currently our implementation is limited to cases in which the state variable of a switch is always predictable, but it can be extended to cover the case where the state variable is only sometimes predictable. Thank you very much for your comments, I'm working on updating the diff, will submit it shortly.

Addressed the inline comments, added a co-author.

alexey.zhikhar marked 14 inline comments as done.Jul 9 2021, 1:53 PM

alexey.zhikhar added inline comments.

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp
104	Good idea, done.
llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
73	I usually name all my boolean variables `IsSomething` but I agree that `IsView...` sounds awkward. Also having a verb in the beginning make it sound like a routine rather than a variable. I updated it to `ClViewCFGBefore`, `Cl` for command line.
382	Classes might derive from `MainSwitch` in the future, it's a good practice to declare destructors virtual.
723	Added a comment.
740	Added a description to the definition of the `ThreadingPath` class.
1139	No good reason, fixed.

Harbormaster completed remote builds in B113280: Diff 357619.Jul 9 2021, 3:30 PM

It's a lot of code, and I am still reading it, but here's a bunch of mostly nits if you don't mind that... (while I am going to continue reading it).

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
121	Nit: don't need the brackets.
134	Nit: don't need the brackets.
216	Create a helper function for this...
226	... so that we don't need to duplicate this here.
288	Nit: no need for all the curly brackets. (sorry, I just find it a lot easier to read, and it's the coding style )
314	Perhaps a comment here that State just corresponds to the value of a case statement?
338	Do we need BBName? Can we not do something like: OS << BB->hasName() ? BB->getName() : "...";
357	I was confused what exactly the determinator was, but I think it's the block that determines the next state. Think some comments about this would be beneficial too.
473	no curly brackets
480	Same
486	Same, and actually in a lot more cases, so will stop mentioning it from now on, but there's opportunities to get rid of a lot of curly brackets. :-)
1147	auto *SI = dyn_cast<SwitchInst>(BB.getTerminator(); if (!SI) continue;

alexey.zhikhar updated this revision to Diff 358059.Jul 12 2021, 1:53 PM

alexey.zhikhar marked 5 inline comments as done.

alexey.zhikhar marked 11 inline comments as done.Jul 12 2021, 2:02 PM

alexey.zhikhar added inline comments.

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
338	The idea here is to print the basic block itself if it has no name, e.g.: 1: %2 = ... %3 = ... We can't say `OS << (BB->hasName() ? BB->getName() : *BB);` since both legs of a ternary operator must be of the same type.
486	Thanks, I went through the whole file looking for unnecessary curly braces, so now they should be no more. Please feel free to let me know if I missed something.

Harbormaster completed remote builds in B113584: Diff 358059.Jul 12 2021, 3:12 PM

In D99205#2866668, @SjoerdMeijer wrote:

In D99205#2866301, @ChuanqiXu wrote:

I am making an AggressiveJumpThreading pass in downstream to solve the State Machine in Coremark.

We have 2 aggressive jump threading passes in upstream review: this work, and also D88307. If I understand this correctly, you're working on another downstream? If so, would it not be more efficient if we support this work? For me, this work has the most potential as it is under active development as opposed to D88307. Your remark about irreducible control flow is a valid one, which was made earlier too. Eli said this about that:

This is kind of late to be running this pass. That might be useful in some cases: we might be making the code harder to analyze if there's irreducible control flow. On the other hand, we're skipping interesting optimizations if the CFG becomes significantly simpler. My instinct is that the advantages of running earlier outweigh the disadvantages.

which is something that still needs to be figured out I think. I.e., it's run earlier now, but do we need more performance numbers to see if this doesn't regress other stuff? But @ChuanqiXu, if you're working on something similar and have ideas, please consider sharing them here.

Since I only started to see Coremark recently and you have much more experience. I should be fine if there isn't any regression found.

And I omit the problem that more aggressive jump threading would cause irreducible control flow. I heard about that gcc has implemented a version which overcomes the problem. It may be beneficial to look into the details of the gcc implementation for all of us.
This patch looks like a big pattern match to me. It requires a switch statement whose condition are predictable, in other words, DFA.
First question is that is it profitable to make a pass for specific pattern? Since I feel that the DFA pattern wouldn't be normal in codes. Then if the pass turned off by default, I have no problems.

My colleague @jaykang10 found that this also triggers on Perlbench in SPEC.

Cool, I would try to measure it.

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
382	So it may be better to add `virtual ~MainSwitch` in the future.
723	Is this assumption stable? For the one hand, the motivation example in Coremark wouldn't be lowered Into a binary tree. On the other hand, it may be possible that the switch statement may be lowered to a jump table.

Thanks for the updates.
If I am not mistaken, I still think tests are missing for cases when jump threading should *not* trigger:

I think some tests are missing, mostly negative tests:

test for min code size,

test with blocks that cannot be cloned,

test with some unsupported instructions

In D99205#2873336, @SjoerdMeijer wrote:

Thanks for the updates.
If I am not mistaken, I still think tests are missing for cases when jump threading should *not* trigger:

I think some tests are missing, mostly negative tests:

test for min code size,

test with blocks that cannot be cloned,

test with some unsupported instructions

You're right that they're missing, I have not forgotten about the test cases, I'm working on them and will add them soon. Thanks

SjoerdMeijer added inline comments.Jul 14 2021, 1:43 AM

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
33	Thanks for this, and for addressing my previous nitpicks about definitions and terminology. I think it would be convenient to have most of them in one place, i.e. here, describing the high level ideas and terminology used. Sketching a little bit, and copy-pasting from different places, I was thinking of something like this: // Transform each threading path to effectively jump thread the FSM. For // example the CFG below could be transformed as follows, where the cloned // blocks unconditionally branch to the next correct case based on what is // identified in the analysis. // sw.bb sw.bb // / \| \ / \| \ // case1 case2 case3 case1 case2 case3 // \ \| / \| \| \| // determinator det.2 det.3 det.1 // br sw.bb / \| \ // sw.bb.2 sw.bb.3 sw.bb.1 // br case2 br case3 br case1§ // // Defintions and Terminology: // // * Threadable path: // a list of basic blocks, the exit state, and the block that determines // the next state, for which the following notation will be used: // < path of BBs that form a cycle > [ state, determinator ] // // * Predictable switch: // The switch variable is always a known constant so that all conditional // jumps based on switch variable can be converted to unconditional jump. // // * Determinator: // The basic block that determines the next state of the DFA. // // ETC And I think you can keep the comments at the different places inlined in the code, don't think that duplication will harm.
79	I think this option is not tested.
84	And this one too, so need some more tests for this.
405	I don't think we have tests for cases that are not predictable.
459	This could probably do with a more descriptive function name, to make more explicit what kind of Value we expect.
496	Don't need the brackets here
501	and here.

Add optimization remark emitter
Test for min code size
Test with blocks that cannot be cloned
Test with some unsupported instructions
Test unpredictable switch
Replace all mentions of FSM with DFA
Always use the term "Threading Path", instead of interchanging "threading" with "threadable"

alexey.zhikhar marked 7 inline comments as done.Jul 14 2021, 2:52 PM

alexey.zhikhar added inline comments.

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
33	Thanks, Sjoerd, I added this to the file header. Generally I prefer to not duplicate documentation for the same reasons as duplicating code, so I removed the long comment from `TransformDFA::createAllExitPaths()` but I can put it back if you insist.
79	A test case for `MaxPathDepth` is not ready yet, but duly noted. Thanks
382	Let's see if Sjoerd @SjoerdMeijer has an opinion here.
405	Please see `negative4`

Harbormaster completed remote builds in B114100: Diff 358757.Jul 14 2021, 4:28 PM

SjoerdMeijer added inline comments.Jul 15 2021, 5:29 AM

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
382	No strong opinion. Looks fine as it is.
llvm/test/Transforms/DFAJumpThreading/negative.ll
3	Perhaps this works, but thought this should be: --check-prefix=REMARK
186	I don't think we want to jump thread a function that has a `minsize` attribute? For example: define i32 @negative5( ) minsize { Need a test for that?

Add a test for the minsize attribute
Test MaxPathDepth with a test case where only a subset of paths is threaded
Reduce the test case for the cost model by passing the threshold through CLI
Fix a bug in how the NumTransforms statistic is incremented
Expand the cost model to include the case for jump tables

alexey.zhikhar marked 3 inline comments as done.Jul 22 2021, 2:46 PM

alexey.zhikhar added inline comments.

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
79	Please see `max_path_length`
723	That's a good point, I added a separate clause for when a jump table is expected instead of binary search (the same hook is used in inlining).

Harbormaster completed remote builds in B115700: Diff 360979.Jul 22 2021, 5:05 PM

Thanks, looking good. But one last request, sorry, can you add tests for the optimisation remarks too?

In D99205#2899231, @SjoerdMeijer wrote:

Thanks, looking good. But one last request, sorry, can you add tests for the optimisation remarks too?

No worries, Sjoerd, I appreciate your comments.

There are optimization remark tests in negative.ll, do you mean something in addition to that?

In D99205#2899915, @alexey.zhikhar wrote:

In D99205#2899231, @SjoerdMeijer wrote:

Thanks, looking good. But one last request, sorry, can you add tests for the optimisation remarks too?

No worries, Sjoerd, I appreciate your comments.

There are optimization remark tests in negative.ll, do you mean something in addition to that?

Ah, sorry, I had missed that, so ignore that comment.

Thanks for working on this. This LGTM as a first version that is off by default. Having this in tree makes testing and getting different (compile-time) numbers easier, which we need for the follow up to get this enabled by default.

This revision is now accepted and ready to land.Jul 23 2021, 7:08 AM

Rebase on top of the latest main

Harbormaster completed remote builds in B116303: Diff 361839.Jul 26 2021, 5:59 PM

Still LGTM :)

alexey.zhikhar edited the summary of this revision. (Show Details)Jul 27 2021, 7:38 AM

This revision was landed with ongoing or failed builds.Jul 27 2021, 11:36 AM

Closed by commit rG02077da7e7a8: Add jump-threading optimization for deterministic finite automata (authored by alexey.zhikhar, committed by dancgr). · Explain Why

This revision was automatically updated to reflect the committed changes.

dancgr added a commit: rG02077da7e7a8: Add jump-threading optimization for deterministic finite automata.

This appears to create a layering violation. llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp includes llvm/include/llvm/CodeGen/Passes.h and llvm/lib/CodeGen/HardwareLoops.cpp includes llvm/include/llvm/Transforms/Scalar.h creating a cycle between LLVMCodeGen and LLVMScalarOpts

@chandlerc as the owner of layering

Is the include of llvm/CodeGen/Passes.h required? My build appears to succeed without it

In D99205#2908261, @GMNGeoffrey wrote:

This appears to create a layering violation. llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp includes llvm/include/llvm/CodeGen/Passes.h and llvm/lib/CodeGen/HardwareLoops.cpp includes llvm/include/llvm/Transforms/Scalar.h creating a cycle between LLVMCodeGen and LLVMScalarOpts

@chandlerc as the owner of layering

Is the include of llvm/CodeGen/Passes.h required? My build appears to succeed without it

Ah looks like @bkramer just fixed it with https://github.com/llvm/llvm-project/commit/05815c9f638c2a62e1ce9b28b26d74c7bea81f2e, thanks!

Thanks guys

Compile time statistics gathered on CTMark (no change, basically):

	Compile Time
	dfa TRUE	dfa FALSE
test-suite :: CTMark/7zip/7zip-benchmark.test	83.2031	83.3182
test-suite :: CTMark/Bullet/bullet.test	54.6959	54.7607
test-suite :: CTMark/ClamAV/clamscan.test	32.4064	32.8996
test-suite :: CTMark/SPASS/SPASS.test	30.1786	30.0229
test-suite :: CTMark/consumer-typeset/consumer-typeset.test	24.089	24.0834
test-suite :: CTMark/kimwitu++/kc.test	33.7907	34.073
test-suite :: CTMark/lencod/lencod.test	39.3401	39.3303
test-suite :: CTMark/mafft/pairlocalalign.test	21.475	21.408
test-suite :: CTMark/sqlite3/sqlite3.test	31.1729	31.1351
test-suite :: CTMark/tramp3d-v4/tramp3d-v4.test	57.5609	56.1245

Number of transformed switch statements:

CTMark/kimwitu++/kimwl.stats:   "dfa-jump-threading.NumTransforms": 1
CTMark/7zip/LzmaEnc.stats:      "dfa-jump-threading.NumTransforms": 1
CTMark/7zip/DeflateDecoder.stats:       "dfa-jump-threading.NumTransforms": 1
CTMark/7zip/GzHandler.stats:    "dfa-jump-threading.NumTransforms": 1
CTMark/7zip/CabHandler.stats:   "dfa-jump-threading.NumTransforms": 3
CTMark/7zip/ShrinkDecoder.stats:        "dfa-jump-threading.NumTransforms": 1
CTMark/7zip/Update.stats:       "dfa-jump-threading.NumTransforms": 1
CTMark/7zip/List.stats: "dfa-jump-threading.NumTransforms": 1
CTMark/7zip/Extract.stats:      "dfa-jump-threading.NumTransforms": 1
CTMark/7zip/ZipHandlerOut.stats:        "dfa-jump-threading.NumTransforms": 1
CTMark/7zip/TarHandler.stats:   "dfa-jump-threading.NumTransforms": 1
CTMark/7zip/7zUpdate.stats:     "dfa-jump-threading.NumTransforms": 2
CTMark/7zip/XzDec.stats:        "dfa-jump-threading.NumTransforms": 1
CTMark/7zip/OpenArchive.stats:  "dfa-jump-threading.NumTransforms": 1
CTMark/7zip/BwtSort.stats:      "dfa-jump-threading.NumTransforms": 1
CTMark/ClamAV/libclamav_untar.stats:    "dfa-jump-threading.NumTransforms": 1
CTMark/ClamAV/libclamav_message.stats:  "dfa-jump-threading.NumTransforms": 1
CTMark/sqlite3/sqlite3.stats:   "dfa-jump-threading.NumTransforms": 10
CTMark/SPASS/iascanner.stats:   "dfa-jump-threading.NumTransforms": 1
CTMark/SPASS/dfgscanner.stats:  "dfa-jump-threading.NumTransforms": 1
CTMark/consumer-typeset/z36.stats:      "dfa-jump-threading.NumTransforms": 1
CTMark/consumer-typeset/z49.stats:      "dfa-jump-threading.NumTransforms": 1
CTMark/consumer-typeset/z38.stats:      "dfa-jump-threading.NumTransforms": 1

Nice numbers, actually, more “hits” than expected. +1.

Can you cooperate with @nikic to get fresh CT data for enabled dfa pass? Your numbers look fine so it would be great to enable it on by default.

Yes, that looks very promising. I think it is best to create a new patch for that, so that we can have and move this discussion there.

After fixing DT preservation under LegacyPM (https://github.com/llvm/llvm-project/commit/e97524cba2825ed412cd57a95a82c12ab439e171), I get these compile-time numbers: http://llvm-compile-time-tracker.com/compare.php?from=380b8a603c6e8997819726b15a76b8f6c94aa21a&to=abb759c879725b7bc09a466e92c9b9eca7f8f483&stat=instructions Assuming no pathological cases, that looks okay to me.

Thanks everybody.

Internally, we had had a few rounds of testing before posting this patch, but before enabling it by default, we'd like to have a few more to make sure that nothing unexpected will arise. I will submit a patch as soon as we're done with that.

Just a heads up. I have benchmarked this version against our downstream implementation of jump threading and see some regressions:

       Regression
core1  0.4%
core2  0.28%
core3  2.7%
core4  0.24%
core5  1.12%

Especially for the more capable cores "core3" and "core5" the difference is quite big, so we do leave some performance on the table.

I will probably add a reproducer as a regression test and raise a ticket for this, so that we can look into this.

cynecx removed a subscriber: cynecx.Aug 4 2021, 10:10 AM

A couple of comments/clarifications about irreducible CFG that came up in earlier comments: This pass does not generate irreducible CFG for coremark. It does generate irreducible CFG for https://bugs.llvm.org/show_bug.cgi?id=42313. We also checked gcc and it seems that gcc also generate irreducible CFG for PR42313.

The following observation could be helpful: CFG of the output of this pass, (if we collapse some nodes of CFG into one node) will be the same as the structure of the DFA graph represented by the code. I have not thought about it, but I suspect this is generalizable to cases where we partially jump thread a switch statement. This observation can be used to detect irreducible CFG in the analysis step and potentially disable the transformation or convert the code to reducible CFG using known techniques (cost analysis will be needed).

In D99205#2923056, @SjoerdMeijer wrote:
Just a heads up. I have benchmarked this version against our downstream implementation of jump threading and see some regressions:
       Regression
core1  0.4%
core2  0.28%
core3  2.7%
core4  0.24%
core5  1.12%
Especially for the more capable cores "core3" and "core5" the difference is quite big, so we do leave some performance on the table.

I will probably add a reproducer as a regression test and raise a ticket for this, so that we can look into this.

Thanks Sjoerd for the careful review of this patch and feedback provided. It will be interesting to see if the root cause of the performance degradation is in the code generated by the pass, or not. If that is the case, we can investigate whether changing the code gen in this pass is the best way forward, or some extra clean up/optimization is needed somewhere else. We will look into it when more information is available. Thanks again for all the feedback.

No worries, and thanks for working on this!

I had a first look, and I think the output code is just different (*) for the 2 jump threading implementations, so difficult to tell more about this at this point. I will look into this more, but that will be in September after the holiday period.

(*) It is the switch.o3.ll test case from D88307 that I took, which I think is just a reproducer from coremark.

zhaozhengpeng added a subscriber: zhaozhengpeng.Oct 11 2021, 1:31 AM

ychen added a subscriber: ychen.Nov 18 2021, 11:25 AM

I wrote a ticket about a crash with this pass that we stumbled upon in fuzzy testing:
https://github.com/llvm/llvm-project/issues/64860

Herald added a project: Restricted Project. · View Herald TranscriptAug 21 2023, 5:57 AM

Herald added subscribers: hoy, nlopes. · View Herald Transcript

nlopes added inline comments.Aug 21 2023, 10:48 AM

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
1143	please use a poison value as placeholder instead of undef. We are trying to remove undef. Thank you!

Revision Contents

Path

Size

llvm/

include/

llvm/

InitializePasses.h

1 line

LinkAllPasses.h

1 line

Transforms/

Scalar.h

8 lines

Scalar/

DFAJumpThreading.h

27 lines

lib/

Passes/

PassBuilder.cpp

5 lines

PassRegistry.def

1 line

Transforms/

IPO/

PassManagerBuilder.cpp

7 lines

Scalar/

CMakeLists.txt

1 line

DFAJumpThreading.cpp

1287 lines

Scalar.cpp

1 line

test/

Transforms/

DFAJumpThreading/

dfa-constant-propagation.ll

32 lines

dfa-jump-threading-analysis.ll

180 lines

dfa-jump-threading-transform.ll

234 lines

dfa-unfold-select.ll

293 lines

max-path-length.ll

101 lines

negative.ll

216 lines

Diff 362117

llvm/include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 118 Lines • ▼ Show 20 Lines
	void initializeConstraintEliminationPass(PassRegistry &);			void initializeConstraintEliminationPass(PassRegistry &);
	void initializeControlHeightReductionLegacyPassPass(PassRegistry&);			void initializeControlHeightReductionLegacyPassPass(PassRegistry&);
	void initializeCorrelatedValuePropagationPass(PassRegistry&);			void initializeCorrelatedValuePropagationPass(PassRegistry&);
	void initializeCostModelAnalysisPass(PassRegistry&);			void initializeCostModelAnalysisPass(PassRegistry&);
	void initializeCrossDSOCFIPass(PassRegistry&);			void initializeCrossDSOCFIPass(PassRegistry&);
	void initializeDAEPass(PassRegistry&);			void initializeDAEPass(PassRegistry&);
	void initializeDAHPass(PassRegistry&);			void initializeDAHPass(PassRegistry&);
	void initializeDCELegacyPassPass(PassRegistry&);			void initializeDCELegacyPassPass(PassRegistry&);
				void initializeDFAJumpThreadingLegacyPassPass(PassRegistry &);
	void initializeDSELegacyPassPass(PassRegistry&);			void initializeDSELegacyPassPass(PassRegistry&);
	void initializeDataFlowSanitizerLegacyPassPass(PassRegistry &);			void initializeDataFlowSanitizerLegacyPassPass(PassRegistry &);
	void initializeDeadMachineInstructionElimPass(PassRegistry&);			void initializeDeadMachineInstructionElimPass(PassRegistry&);
	void initializeDebugifyMachineModulePass(PassRegistry &);			void initializeDebugifyMachineModulePass(PassRegistry &);
	void initializeDelinearizationPass(PassRegistry&);			void initializeDelinearizationPass(PassRegistry&);
	void initializeDemandedBitsWrapperPassPass(PassRegistry&);			void initializeDemandedBitsWrapperPassPass(PassRegistry&);
	void initializeDependenceAnalysisPass(PassRegistry&);			void initializeDependenceAnalysisPass(PassRegistry&);
	void initializeDependenceAnalysisWrapperPassPass(PassRegistry&);			void initializeDependenceAnalysisWrapperPassPass(PassRegistry&);
	▲ Show 20 Lines • Show All 326 Lines • Show Last 20 Lines

llvm/include/llvm/LinkAllPasses.h

Show First 20 Lines • Show All 168 Lines • ▼ Show 20 Lines	ForcePassLinking() {
(void) llvm::createSROAPass();		(void) llvm::createSROAPass();
(void) llvm::createSingleLoopExtractorPass();		(void) llvm::createSingleLoopExtractorPass();
(void) llvm::createStripSymbolsPass();		(void) llvm::createStripSymbolsPass();
(void) llvm::createStripNonDebugSymbolsPass();		(void) llvm::createStripNonDebugSymbolsPass();
(void) llvm::createStripDeadDebugInfoPass();		(void) llvm::createStripDeadDebugInfoPass();
(void) llvm::createStripDeadPrototypesPass();		(void) llvm::createStripDeadPrototypesPass();
(void) llvm::createTailCallEliminationPass();		(void) llvm::createTailCallEliminationPass();
(void) llvm::createJumpThreadingPass();		(void) llvm::createJumpThreadingPass();
		(void) llvm::createDFAJumpThreadingPass();
(void) llvm::createUnifyFunctionExitNodesPass();		(void) llvm::createUnifyFunctionExitNodesPass();
(void) llvm::createInstCountPass();		(void) llvm::createInstCountPass();
(void) llvm::createConstantHoistingPass();		(void) llvm::createConstantHoistingPass();
(void) llvm::createCodeGenPreparePass();		(void) llvm::createCodeGenPreparePass();
(void) llvm::createEntryExitInstrumenterPass();		(void) llvm::createEntryExitInstrumenterPass();
(void) llvm::createPostInlineEntryExitInstrumenterPass();		(void) llvm::createPostInlineEntryExitInstrumenterPass();
(void) llvm::createEarlyCSEPass();		(void) llvm::createEarlyCSEPass();
(void) llvm::createGVNHoistPass();		(void) llvm::createGVNHoistPass();
▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Scalar.h

	Show First 20 Lines • Show All 249 Lines • ▼ Show 20 Lines
	// condition of a select that unfolds to branch. Thresholds other than minus one			// condition of a select that unfolds to branch. Thresholds other than minus one
	// override the internal BB duplication default threshold.			// override the internal BB duplication default threshold.
	//			//
	FunctionPass *createJumpThreadingPass(bool FreezeSelectCond = false,			FunctionPass *createJumpThreadingPass(bool FreezeSelectCond = false,
	int Threshold = -1);			int Threshold = -1);

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
				// DFAJumpThreading - When a switch statement inside a loop is used to
				// implement a deterministic finite automata we can jump thread the switch
				// statement reducing number of conditional jumps.
				//
				FunctionPass *createDFAJumpThreadingPass();

				//===----------------------------------------------------------------------===//
				//
	// CFGSimplification - Merge basic blocks, eliminate unreachable blocks,			// CFGSimplification - Merge basic blocks, eliminate unreachable blocks,
	// simplify terminator instructions, convert switches to lookup tables, etc.			// simplify terminator instructions, convert switches to lookup tables, etc.
	//			//
	FunctionPass *createCFGSimplificationPass(			FunctionPass *createCFGSimplificationPass(
	SimplifyCFGOptions Options = SimplifyCFGOptions(),			SimplifyCFGOptions Options = SimplifyCFGOptions(),
	std::function<bool(const Function &)> Ftor = nullptr);			std::function<bool(const Function &)> Ftor = nullptr);

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	▲ Show 20 Lines • Show All 294 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Scalar/DFAJumpThreading.h

This file was added.

				//===- DFAJumpThreading.h - Threads a switch statement inside a loop ------===//
				//
				foadUnsubmitted Done Reply Inline Actions Typo ".cpp" :) foad: Typo ".cpp" :)
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file provides the interface for the DFAJumpThreading pass.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TRANSFORMS_SCALAR_DFAJUMPTHREADING_H
				#define LLVM_TRANSFORMS_SCALAR_DFAJUMPTHREADING_H

				#include "llvm/IR/Function.h"
				#include "llvm/IR/PassManager.h"

				namespace llvm {

				struct DFAJumpThreadingPass : PassInfoMixin<DFAJumpThreadingPass> {
				PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
				};

				} // end namespace llvm

				#endif // LLVM_TRANSFORMS_SCALAR_DFAJUMPTHREADING_H

llvm/lib/Passes/PassBuilder.cpp

Show First 20 Lines • Show All 139 Lines • ▼ Show 20 Lines
#include "llvm/Transforms/Scalar/AlignmentFromAssumptions.h"		#include "llvm/Transforms/Scalar/AlignmentFromAssumptions.h"
#include "llvm/Transforms/Scalar/AnnotationRemarks.h"		#include "llvm/Transforms/Scalar/AnnotationRemarks.h"
#include "llvm/Transforms/Scalar/BDCE.h"		#include "llvm/Transforms/Scalar/BDCE.h"
#include "llvm/Transforms/Scalar/CallSiteSplitting.h"		#include "llvm/Transforms/Scalar/CallSiteSplitting.h"
#include "llvm/Transforms/Scalar/ConstantHoisting.h"		#include "llvm/Transforms/Scalar/ConstantHoisting.h"
#include "llvm/Transforms/Scalar/ConstraintElimination.h"		#include "llvm/Transforms/Scalar/ConstraintElimination.h"
#include "llvm/Transforms/Scalar/CorrelatedValuePropagation.h"		#include "llvm/Transforms/Scalar/CorrelatedValuePropagation.h"
#include "llvm/Transforms/Scalar/DCE.h"		#include "llvm/Transforms/Scalar/DCE.h"
		#include "llvm/Transforms/Scalar/DFAJumpThreading.h"
#include "llvm/Transforms/Scalar/DeadStoreElimination.h"		#include "llvm/Transforms/Scalar/DeadStoreElimination.h"
#include "llvm/Transforms/Scalar/DivRemPairs.h"		#include "llvm/Transforms/Scalar/DivRemPairs.h"
#include "llvm/Transforms/Scalar/EarlyCSE.h"		#include "llvm/Transforms/Scalar/EarlyCSE.h"
#include "llvm/Transforms/Scalar/Float2Int.h"		#include "llvm/Transforms/Scalar/Float2Int.h"
#include "llvm/Transforms/Scalar/GVN.h"		#include "llvm/Transforms/Scalar/GVN.h"
#include "llvm/Transforms/Scalar/GuardWidening.h"		#include "llvm/Transforms/Scalar/GuardWidening.h"
#include "llvm/Transforms/Scalar/IVUsersPrinter.h"		#include "llvm/Transforms/Scalar/IVUsersPrinter.h"
#include "llvm/Transforms/Scalar/IndVarSimplify.h"		#include "llvm/Transforms/Scalar/IndVarSimplify.h"
▲ Show 20 Lines • Show All 141 Lines • ▼ Show 20 Lines
extern cl::opt<bool> EnableGVNSink;		extern cl::opt<bool> EnableGVNSink;
extern cl::opt<bool> EnableHotColdSplit;		extern cl::opt<bool> EnableHotColdSplit;
extern cl::opt<bool> EnableIROutliner;		extern cl::opt<bool> EnableIROutliner;
extern cl::opt<bool> EnableOrderFileInstrumentation;		extern cl::opt<bool> EnableOrderFileInstrumentation;
extern cl::opt<bool> EnableCHR;		extern cl::opt<bool> EnableCHR;
extern cl::opt<bool> EnableLoopInterchange;		extern cl::opt<bool> EnableLoopInterchange;
extern cl::opt<bool> EnableUnrollAndJam;		extern cl::opt<bool> EnableUnrollAndJam;
extern cl::opt<bool> EnableLoopFlatten;		extern cl::opt<bool> EnableLoopFlatten;
		extern cl::opt<bool> EnableDFAJumpThreading;
extern cl::opt<bool> RunNewGVN;		extern cl::opt<bool> RunNewGVN;
extern cl::opt<bool> RunPartialInlining;		extern cl::opt<bool> RunPartialInlining;
extern cl::opt<bool> ExtraVectorizerPasses;		extern cl::opt<bool> ExtraVectorizerPasses;

extern cl::opt<bool> FlattenedProfileUsed;		extern cl::opt<bool> FlattenedProfileUsed;

extern cl::opt<AttributorRunOption> AttributorRun;		extern cl::opt<AttributorRunOption> AttributorRun;
extern cl::opt<bool> EnableKnowledgeRetention;		extern cl::opt<bool> EnableKnowledgeRetention;
▲ Show 20 Lines • Show All 511 Lines • ▼ Show 20 Lines	PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,

// Run instcombine after redundancy and dead bit elimination to exploit		// Run instcombine after redundancy and dead bit elimination to exploit
// opportunities opened up by them.		// opportunities opened up by them.
FPM.addPass(InstCombinePass());		FPM.addPass(InstCombinePass());
invokePeepholeEPCallbacks(FPM, Level);		invokePeepholeEPCallbacks(FPM, Level);

// Re-consider control flow based optimizations after redundancy elimination,		// Re-consider control flow based optimizations after redundancy elimination,
// redo DCE, etc.		// redo DCE, etc.
		if (EnableDFAJumpThreading && Level.getSizeLevel() == 0)
		FPM.addPass(DFAJumpThreadingPass());

FPM.addPass(JumpThreadingPass());		FPM.addPass(JumpThreadingPass());
FPM.addPass(CorrelatedValuePropagationPass());		FPM.addPass(CorrelatedValuePropagationPass());

// Finally, do an expensive DCE pass to catch all the dead code exposed by		// Finally, do an expensive DCE pass to catch all the dead code exposed by
// the simplifications and basic cleanup after all the simplifications.		// the simplifications and basic cleanup after all the simplifications.
// TODO: Investigate if this is too expensive.		// TODO: Investigate if this is too expensive.
FPM.addPass(ADCEPass());		FPM.addPass(ADCEPass());

▲ Show 20 Lines • Show All 2,382 Lines • Show Last 20 Lines

llvm/lib/Passes/PassRegistry.def

	Show First 20 Lines • Show All 206 Lines • ▼ Show 20 Lines
	FUNCTION_PASS("consthoist", ConstantHoistingPass())			FUNCTION_PASS("consthoist", ConstantHoistingPass())
	FUNCTION_PASS("constraint-elimination", ConstraintEliminationPass())			FUNCTION_PASS("constraint-elimination", ConstraintEliminationPass())
	FUNCTION_PASS("chr", ControlHeightReductionPass())			FUNCTION_PASS("chr", ControlHeightReductionPass())
	FUNCTION_PASS("coro-early", CoroEarlyPass())			FUNCTION_PASS("coro-early", CoroEarlyPass())
	FUNCTION_PASS("coro-elide", CoroElidePass())			FUNCTION_PASS("coro-elide", CoroElidePass())
	FUNCTION_PASS("coro-cleanup", CoroCleanupPass())			FUNCTION_PASS("coro-cleanup", CoroCleanupPass())
	FUNCTION_PASS("correlated-propagation", CorrelatedValuePropagationPass())			FUNCTION_PASS("correlated-propagation", CorrelatedValuePropagationPass())
	FUNCTION_PASS("dce", DCEPass())			FUNCTION_PASS("dce", DCEPass())
				FUNCTION_PASS("dfa-jump-threading", DFAJumpThreadingPass())
	FUNCTION_PASS("div-rem-pairs", DivRemPairsPass())			FUNCTION_PASS("div-rem-pairs", DivRemPairsPass())
	FUNCTION_PASS("dse", DSEPass())			FUNCTION_PASS("dse", DSEPass())
	FUNCTION_PASS("dot-cfg", CFGPrinterPass())			FUNCTION_PASS("dot-cfg", CFGPrinterPass())
	FUNCTION_PASS("dot-cfg-only", CFGOnlyPrinterPass())			FUNCTION_PASS("dot-cfg-only", CFGOnlyPrinterPass())
	FUNCTION_PASS("early-cse", EarlyCSEPass(/UseMemorySSA=/false))			FUNCTION_PASS("early-cse", EarlyCSEPass(/UseMemorySSA=/false))
	FUNCTION_PASS("early-cse-memssa", EarlyCSEPass(/UseMemorySSA=/true))			FUNCTION_PASS("early-cse-memssa", EarlyCSEPass(/UseMemorySSA=/true))
	FUNCTION_PASS("ee-instrument", EntryExitInstrumenterPass(/PostInlining=/false))			FUNCTION_PASS("ee-instrument", EntryExitInstrumenterPass(/PostInlining=/false))
	FUNCTION_PASS("fix-irreducible", FixIrreduciblePass())			FUNCTION_PASS("fix-irreducible", FixIrreduciblePass())
	▲ Show 20 Lines • Show All 232 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 93 Lines • ▼ Show 20 Lines
cl::opt<bool> EnableUnrollAndJam("enable-unroll-and-jam", cl::init(false),		cl::opt<bool> EnableUnrollAndJam("enable-unroll-and-jam", cl::init(false),
cl::Hidden,		cl::Hidden,
cl::desc("Enable Unroll And Jam Pass"));		cl::desc("Enable Unroll And Jam Pass"));

cl::opt<bool> EnableLoopFlatten("enable-loop-flatten", cl::init(false),		cl::opt<bool> EnableLoopFlatten("enable-loop-flatten", cl::init(false),
cl::Hidden,		cl::Hidden,
cl::desc("Enable the LoopFlatten Pass"));		cl::desc("Enable the LoopFlatten Pass"));

		cl::opt<bool> EnableDFAJumpThreading("enable-dfa-jump-thread",
		cl::desc("Enable DFA jump threading."),
		cl::init(false), cl::Hidden);
		SjoerdMeijerUnsubmitted Done Reply Inline Actions Just a thought on "deploying" this. Perhaps easier to start having this off by default? In case problems are found after committing (correctness, or perhaps compile times), you don't need to revert the whole pass, but instead just toggle this and can keep the pass in tree. SjoerdMeijer: Just a thought on "deploying" this. Perhaps easier to start having this off by default? In…
		alexey.zhikharAuthorUnsubmitted Done Reply Inline Actions Good idea, done. alexey.zhikhar: Good idea, done.

static cl::opt<bool>		static cl::opt<bool>
EnablePrepareForThinLTO("prepare-for-thinlto", cl::init(false), cl::Hidden,		EnablePrepareForThinLTO("prepare-for-thinlto", cl::init(false), cl::Hidden,
cl::desc("Enable preparation for ThinLTO."));		cl::desc("Enable preparation for ThinLTO."));

static cl::opt<bool>		static cl::opt<bool>
EnablePerformThinLTO("perform-thinlto", cl::init(false), cl::Hidden,		EnablePerformThinLTO("perform-thinlto", cl::init(false), cl::Hidden,
cl::desc("Enable performing ThinLTO."));		cl::desc("Enable performing ThinLTO."));

▲ Show 20 Lines • Show All 385 Lines • ▼ Show 20 Lines	void PassManagerBuilder::addFunctionSimplificationPasses(
// opportunities that creates).		// opportunities that creates).
MPM.add(createBitTrackingDCEPass()); // Delete dead bit computations		MPM.add(createBitTrackingDCEPass()); // Delete dead bit computations

// Run instcombine after redundancy elimination to exploit opportunities		// Run instcombine after redundancy elimination to exploit opportunities
// opened up by them.		// opened up by them.
MPM.add(createInstructionCombiningPass());		MPM.add(createInstructionCombiningPass());
addExtensionsToPM(EP_Peephole, MPM);		addExtensionsToPM(EP_Peephole, MPM);
if (OptLevel > 1) {		if (OptLevel > 1) {
		if (EnableDFAJumpThreading && SizeLevel == 0)
		MPM.add(createDFAJumpThreadingPass());

MPM.add(createJumpThreadingPass()); // Thread jumps		MPM.add(createJumpThreadingPass()); // Thread jumps
MPM.add(createCorrelatedValuePropagationPass());		MPM.add(createCorrelatedValuePropagationPass());
}		}
MPM.add(createAggressiveDCEPass()); // Delete dead instructions		MPM.add(createAggressiveDCEPass()); // Delete dead instructions

MPM.add(createMemCpyOptPass()); // Remove memcpy / form memset		MPM.add(createMemCpyOptPass()); // Remove memcpy / form memset
// TODO: Investigate if this is too expensive at O1.		// TODO: Investigate if this is too expensive at O1.
if (OptLevel > 1) {		if (OptLevel > 1) {
▲ Show 20 Lines • Show All 809 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/CMakeLists.txt

	add_llvm_component_library(LLVMScalarOpts			add_llvm_component_library(LLVMScalarOpts
	ADCE.cpp			ADCE.cpp
	AlignmentFromAssumptions.cpp			AlignmentFromAssumptions.cpp
	AnnotationRemarks.cpp			AnnotationRemarks.cpp
	BDCE.cpp			BDCE.cpp
	CallSiteSplitting.cpp			CallSiteSplitting.cpp
	ConstantHoisting.cpp			ConstantHoisting.cpp
	ConstraintElimination.cpp			ConstraintElimination.cpp
	CorrelatedValuePropagation.cpp			CorrelatedValuePropagation.cpp
	DCE.cpp			DCE.cpp
	DeadStoreElimination.cpp			DeadStoreElimination.cpp
				DFAJumpThreading.cpp
	DivRemPairs.cpp			DivRemPairs.cpp
	EarlyCSE.cpp			EarlyCSE.cpp
	FlattenCFGPass.cpp			FlattenCFGPass.cpp
	Float2Int.cpp			Float2Int.cpp
	GuardWidening.cpp			GuardWidening.cpp
	GVN.cpp			GVN.cpp
	GVNHoist.cpp			GVNHoist.cpp
	GVNSink.cpp			GVNSink.cpp
	▲ Show 20 Lines • Show All 79 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp

This file was added.

				//===- DFAJumpThreading.cpp - Threads a switch statement inside a loop ----===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// Transform each threading path to effectively jump thread the DFA. For
				// example, the CFG below could be transformed as follows, where the cloned
				// blocks unconditionally branch to the next correct case based on what is
				// identified in the analysis.
				SjoerdMeijerUnsubmitted Done Reply Inline Actions Nit: perhaps you can be more specific here: Currently this does not happen in LLVM jump threading and say this is JumpThreading.cpp? SjoerdMeijer: Nit: perhaps you can be more specific here: Currently this does not happen in LLVM jump…
				//
				// sw.bb sw.bb
				// / \| \ / \| \
				// case1 case2 case3 case1 case2 case3
				// \ \| / \| \| \|
				// determinator det.2 det.3 det.1
				// br sw.bb / \| \
				// sw.bb.2 sw.bb.3 sw.bb.1
				// br case2 br case3 br case1§
				//
				// Definitions and Terminology:
				//
				SjoerdMeijerUnsubmitted Done Reply Inline Actions Nit picking some words here. What do you mean by "switch variable" and what do you mean by "decided"? SjoerdMeijer: Nit picking some words here. What do you mean by "switch variable" and what do you mean by…
				// * Threading path:
				// a list of basic blocks, the exit state, and the block that determines
				// the next state, for which the following notation will be used:
				// < path of BBs that form a cycle > [ state, determinator ]
				//
				SjoerdMeijerUnsubmitted Done Reply Inline Actions Nit: runs -> triggers? SjoerdMeijer: Nit: runs -> triggers?
				// * Predictable switch:
				// The switch variable is always a known constant so that all conditional
				// jumps based on switch variable can be converted to unconditional jump.
				SjoerdMeijerUnsubmitted Done Reply Inline Actions Just curious, is this algorithm described in literature? Can we include a reference? SjoerdMeijer: Just curious, is this algorithm described in literature? Can we include a reference?
				alexey.zhikharAuthorUnsubmitted Done Reply Inline Actions This is the algorithm that we came up with so we cannot provide a reference. "Codasip" has a whitepaper with a more general algorithm for jump threading that covers DFA as well. Justin @jkreiner compared our approach with the one by Codasip, and found that we can cover all cases that Codasip can (for DFA) with one exception: currently our implementation is limited to cases in which the state variable of a switch is always predictable, but it can be extended to cover the case where the state variable is only sometimes predictable. Thank you very much for your comments, I'm working on updating the diff, will submit it shortly. alexey.zhikhar: This is the algorithm that we came up with so we cannot provide a reference. "Codasip" has a…
				SjoerdMeijerUnsubmitted Done Reply Inline Actions Thanks for this, and for addressing my previous nitpicks about definitions and terminology. I think it would be convenient to have most of them in one place, i.e. here, describing the high level ideas and terminology used. Sketching a little bit, and copy-pasting from different places, I was thinking of something like this: // Transform each threading path to effectively jump thread the FSM. For // example the CFG below could be transformed as follows, where the cloned // blocks unconditionally branch to the next correct case based on what is // identified in the analysis. // sw.bb sw.bb // / \| \ / \| \ // case1 case2 case3 case1 case2 case3 // \ \| / \| \| \| // determinator det.2 det.3 det.1 // br sw.bb / \| \ // sw.bb.2 sw.bb.3 sw.bb.1 // br case2 br case3 br case1§ // // Defintions and Terminology: // // * Threadable path: // a list of basic blocks, the exit state, and the block that determines // the next state, for which the following notation will be used: // < path of BBs that form a cycle > [ state, determinator ] // // * Predictable switch: // The switch variable is always a known constant so that all conditional // jumps based on switch variable can be converted to unconditional jump. // // * Determinator: // The basic block that determines the next state of the DFA. // // ETC And I think you can keep the comments at the different places inlined in the code, don't think that duplication will harm. SjoerdMeijer: Thanks for this, and for addressing my previous nitpicks about definitions and terminology. I…
				alexey.zhikharAuthorUnsubmitted Done Reply Inline Actions Thanks, Sjoerd, I added this to the file header. Generally I prefer to not duplicate documentation for the same reasons as duplicating code, so I removed the long comment from `TransformDFA::createAllExitPaths()` but I can put it back if you insist. alexey.zhikhar: Thanks, Sjoerd, I added this to the file header. Generally I prefer to not duplicate…
				//
				// * Determinator:
				// The basic block that determines the next state of the DFA.
				//
				// Representing the optimization in C-like pseudocode: the code pattern on the
				// left could functionally be transformed to the right pattern if the switch
				// condition is predictable.
				//
				// X = A goto A
				// for (...) A:
				// switch (X) ...
				// case A goto B
				// X = B B:
				// case B ...
				// X = C goto C
				//
				// The pass first checks that switch variable X is decided by the control flow
				// path taken in the loop; for example, in case B, the next value of X is
				// decided to be C. It then enumerates through all paths in the loop and labels
				// the basic blocks where the next state is decided.
				//
				// Using this information it creates new paths that unconditionally branch to
				// the next case. This involves cloning code, so it only gets triggered if the
				// amount of code duplicated is below a threshold.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Transforms/Scalar/DFAJumpThreading.h"
				#include "llvm/ADT/APInt.h"
				#include "llvm/ADT/DenseMap.h"
				#include "llvm/ADT/DepthFirstIterator.h"
				#include "llvm/ADT/SmallSet.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/Analysis/AssumptionCache.h"
				#include "llvm/Analysis/CodeMetrics.h"
				#include "llvm/Analysis/LoopIterator.h"
				#include "llvm/Analysis/OptimizationRemarkEmitter.h"
				#include "llvm/Analysis/TargetTransformInfo.h"
				#include "llvm/CodeGen/Passes.h"
				#include "llvm/IR/CFG.h"
				ChuanqiXuUnsubmitted Done Reply Inline Actions IsViewCFGBefore => WouldViewCFGBeforeTransform or ViewCFGBeforeTransform. Now the name looks a little bit confusing. ChuanqiXu: IsViewCFGBefore => WouldViewCFGBeforeTransform or ViewCFGBeforeTransform. Now the name looks a…
				alexey.zhikharAuthorUnsubmitted Done Reply Inline Actions I usually name all my boolean variables `IsSomething` but I agree that `IsView...` sounds awkward. Also having a verb in the beginning make it sound like a routine rather than a variable. I updated it to `ClViewCFGBefore`, `Cl` for command line. alexey.zhikhar: I usually name all my boolean variables `IsSomething` but I agree that `IsView...` sounds…
				#include "llvm/IR/Constants.h"
				#include "llvm/IR/IntrinsicInst.h"
				#include "llvm/IR/Verifier.h"
				#include "llvm/InitializePasses.h"
				#include "llvm/Pass.h"
				#include "llvm/Support/CommandLine.h"
				SjoerdMeijerUnsubmitted Done Reply Inline Actions I think this option is not tested. SjoerdMeijer: I think this option is not tested.
				alexey.zhikharAuthorUnsubmitted Done Reply Inline Actions A test case for `MaxPathDepth` is not ready yet, but duly noted. Thanks alexey.zhikhar: A test case for `MaxPathDepth` is not ready yet, but duly noted. Thanks
				alexey.zhikharAuthorUnsubmitted Done Reply Inline Actions Please see `max_path_length` alexey.zhikhar: Please see `max_path_length`
				#include "llvm/Support/Debug.h"
				#include "llvm/Transforms/Scalar.h"
				#include "llvm/Transforms/Utils/BasicBlockUtils.h"
				#include "llvm/Transforms/Utils/Cloning.h"
				#include "llvm/Transforms/Utils/SSAUpdaterBulk.h"
				SjoerdMeijerUnsubmitted Done Reply Inline Actions And this one too, so need some more tests for this. SjoerdMeijer: And this one too, so need some more tests for this.
				#include "llvm/Transforms/Utils/ValueMapper.h"
				#include <algorithm>
				#include <deque>
				#include <unordered_map>
				#include <unordered_set>

				using namespace llvm;

				#define DEBUG_TYPE "dfa-jump-threading"

				STATISTIC(NumTransforms, "Number of transformations done");
				STATISTIC(NumCloned, "Number of blocks cloned");
				STATISTIC(NumPaths, "Number of individual paths threaded");

				static cl::opt<bool>
				ClViewCfgBefore("dfa-jump-view-cfg-before",
				cl::desc("View the CFG before DFA Jump Threading"),
				cl::Hidden, cl::init(false));

				static cl::opt<unsigned> MaxPathLength(
				"dfa-max-path-length",
				cl::desc("Max number of blocks searched to find a threading path"),
				cl::Hidden, cl::init(20));

				static cl::opt<unsigned>
				CostThreshold("dfa-cost-threshold",
				cl::desc("Maximum cost accepted for the transformation"),
				cl::Hidden, cl::init(50));

				namespace {

				class SelectInstToUnfold {
				SelectInst *SI;
				PHINode *SIUse;

				public:
				SelectInstToUnfold(SelectInst SI, PHINode SIUse) : SI(SI), SIUse(SIUse) {}
				SjoerdMeijerUnsubmitted Done Reply Inline Actions Nit: don't need the brackets. SjoerdMeijer: Nit: don't need the brackets.

				SelectInst *getInst() { return SI; }
				PHINode *getUse() { return SIUse; }

				explicit operator bool() const { return SI && SIUse; }
				};

				void unfold(DomTreeUpdater *DTU, SelectInstToUnfold SIToUnfold,
				std::vector<SelectInstToUnfold> *NewSIsToUnfold,
				std::vector<BasicBlock > NewBBs);

				class DFAJumpThreading {
				public:
				SjoerdMeijerUnsubmitted Done Reply Inline Actions Nit: don't need the brackets. SjoerdMeijer: Nit: don't need the brackets.
				DFAJumpThreading(AssumptionCache AC, DominatorTree DT,
				TargetTransformInfo TTI, OptimizationRemarkEmitter ORE)
				: AC(AC), DT(DT), TTI(TTI), ORE(ORE) {}

				bool run(Function &F);

				private:
				void
				unfoldSelectInstrs(DominatorTree *DT,
				const SmallVector<SelectInstToUnfold, 4> &SelectInsts) {
				DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);
				SmallVector<SelectInstToUnfold, 4> Stack;
				for (SelectInstToUnfold SIToUnfold : SelectInsts)
				Stack.push_back(SIToUnfold);

				while (!Stack.empty()) {
				SelectInstToUnfold SIToUnfold = Stack.back();
				Stack.pop_back();

				std::vector<SelectInstToUnfold> NewSIsToUnfold;
				std::vector<BasicBlock *> NewBBs;
				unfold(&DTU, SIToUnfold, &NewSIsToUnfold, &NewBBs);

				// Put newly discovered select instructions into the work list.
				for (const SelectInstToUnfold &NewSIToUnfold : NewSIsToUnfold)
				Stack.push_back(NewSIToUnfold);
				}
				}

				AssumptionCache *AC;
				DominatorTree *DT;
				TargetTransformInfo *TTI;
				OptimizationRemarkEmitter *ORE;
				};

				class DFAJumpThreadingLegacyPass : public FunctionPass {
				public:
				static char ID; // Pass identification
				DFAJumpThreadingLegacyPass() : FunctionPass(ID) {}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.addRequired<AssumptionCacheTracker>();
				AU.addRequired<DominatorTreeWrapperPass>();
				AU.addRequired<TargetTransformInfoWrapperPass>();
				AU.addRequired<OptimizationRemarkEmitterWrapperPass>();
				}

				bool runOnFunction(Function &F) override {
				if (skipFunction(F))
				return false;

				AssumptionCache *AC =
				SjoerdMeijerUnsubmitted Done Reply Inline Actions Can you define what unfolding means in this context? SjoerdMeijer: Can you define what unfolding means in this context?
				&getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);
				DominatorTree *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
				TargetTransformInfo *TTI =
				&getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
				OptimizationRemarkEmitter *ORE =
				SjoerdMeijerUnsubmitted Done Reply Inline Actions Do you mean that this is code that could be shared? Should this be a TODO? SjoerdMeijer: Do you mean that this is code that could be shared? Should this be a TODO?
				&getAnalysis<OptimizationRemarkEmitterWrapperPass>().getORE();

				return DFAJumpThreading(AC, DT, TTI, ORE).run(F);
				}
				};
				} // end anonymous namespace

				char DFAJumpThreadingLegacyPass::ID = 0;
				INITIALIZE_PASS_BEGIN(DFAJumpThreadingLegacyPass, "dfa-jump-threading",
				"DFA Jump Threading", false, false)
				INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
				INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
				INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
				INITIALIZE_PASS_DEPENDENCY(OptimizationRemarkEmitterWrapperPass)
				INITIALIZE_PASS_END(DFAJumpThreadingLegacyPass, "dfa-jump-threading",
				"DFA Jump Threading", false, false)

				// Public interface to the DFA Jump Threading pass
				FunctionPass *llvm::createDFAJumpThreadingPass() {
				return new DFAJumpThreadingLegacyPass();
				}

				namespace {

				/// Create a new basic block and sink \p SIToSink into it.
				SjoerdMeijerUnsubmitted Done Reply Inline Actions Create a helper function for this... SjoerdMeijer: Create a helper function for this...
				void createBasicBlockAndSinkSelectInst(
				DomTreeUpdater DTU, SelectInst SI, PHINode SIUse, SelectInst SIToSink,
				BasicBlock EndBlock, StringRef NewBBName, BasicBlock *NewBlock,
				BranchInst *NewBranch, std::vector<SelectInstToUnfold> NewSIsToUnfold,
				std::vector<BasicBlock > NewBBs) {
				assert(SIToSink->hasOneUse());
				assert(NewBlock);
				assert(NewBranch);
				*NewBlock = BasicBlock::Create(SI->getContext(), NewBBName,
				EndBlock->getParent(), EndBlock);
				SjoerdMeijerUnsubmitted Done Reply Inline Actions ... so that we don't need to duplicate this here. SjoerdMeijer: ... so that we don't need to duplicate this here.
				NewBBs->push_back(*NewBlock);
				NewBranch = BranchInst::Create(EndBlock, NewBlock);
				SIToSink->moveBefore(*NewBranch);
				NewSIsToUnfold->push_back(SelectInstToUnfold(SIToSink, SIUse));
				DTU->applyUpdates({{DominatorTree::Insert, *NewBlock, EndBlock}});
				}

				/// Unfold the select instruction held in \p SIToUnfold by replacing it with
				/// control flow.
				///
				/// Put newly discovered select instructions into \p NewSIsToUnfold. Put newly
				/// created basic blocks into \p NewBBs.
				///
				/// TODO: merge it with CodeGenPrepare::optimizeSelectInst() if possible.
				void unfold(DomTreeUpdater *DTU, SelectInstToUnfold SIToUnfold,
				std::vector<SelectInstToUnfold> *NewSIsToUnfold,
				std::vector<BasicBlock > NewBBs) {
				SelectInst *SI = SIToUnfold.getInst();
				PHINode *SIUse = SIToUnfold.getUse();
				BasicBlock *StartBlock = SI->getParent();
				BasicBlock *EndBlock = SIUse->getParent();
				BranchInst *StartBlockTerm =
				dyn_cast<BranchInst>(StartBlock->getTerminator());

				assert(StartBlockTerm && StartBlockTerm->isUnconditional());
				assert(SI->hasOneUse());

				// These are the new basic blocks for the conditional branch.
				// At least one will become an actual new basic block.
				BasicBlock *TrueBlock = nullptr;
				BasicBlock *FalseBlock = nullptr;
				BranchInst *TrueBranch = nullptr;
				BranchInst *FalseBranch = nullptr;

				// Sink select instructions to be able to unfold them later.
				if (SelectInst *SIOp = dyn_cast<SelectInst>(SI->getTrueValue())) {
				createBasicBlockAndSinkSelectInst(DTU, SI, SIUse, SIOp, EndBlock,
				ChuanqiXuUnsubmitted Done Reply Inline Actions It may be suitable to use range based for. ChuanqiXu: It may be suitable to use range based for.
				"si.unfold.true", &TrueBlock, &TrueBranch,
				NewSIsToUnfold, NewBBs);
				}
				if (SelectInst *SIOp = dyn_cast<SelectInst>(SI->getFalseValue())) {
				createBasicBlockAndSinkSelectInst(DTU, SI, SIUse, SIOp, EndBlock,
				"si.unfold.false", &FalseBlock,
				&FalseBranch, NewSIsToUnfold, NewBBs);
				}

				// If there was nothing to sink, then arbitrarily choose the 'false' side
				// for a new input value to the PHI.
				if (!TrueBlock && !FalseBlock) {
				FalseBlock = BasicBlock::Create(SI->getContext(), "si.unfold.false",
				EndBlock->getParent(), EndBlock);
				NewBBs->push_back(FalseBlock);
				BranchInst::Create(EndBlock, FalseBlock);
				DTU->applyUpdates({{DominatorTree::Insert, FalseBlock, EndBlock}});
				}

				// Insert the real conditional branch based on the original condition.
				// If we did not create a new block for one of the 'true' or 'false' paths
				// of the condition, it means that side of the branch goes to the end block
				// directly and the path originates from the start block from the point of
				// view of the new PHI.
				BasicBlock *TT = EndBlock;
				SjoerdMeijerUnsubmitted Done Reply Inline Actions Nit: no need for all the curly brackets. (sorry, I just find it a lot easier to read, and it's the coding style ) SjoerdMeijer: Nit: no need for all the curly brackets. (sorry, I just find it a lot easier to read, and it's…
				BasicBlock *FT = EndBlock;
				if (TrueBlock && FalseBlock) {
				// A diamond.
				TT = TrueBlock;
				FT = FalseBlock;

				// Update the phi node of SI.
				SIUse->removeIncomingValue(StartBlock, /* DeletePHIIfEmpty = */ false);
				SIUse->addIncoming(SI->getTrueValue(), TrueBlock);
				SIUse->addIncoming(SI->getFalseValue(), FalseBlock);

				// Update any other PHI nodes in EndBlock.
				for (PHINode &Phi : EndBlock->phis()) {
				if (&Phi != SIUse) {
				Phi.addIncoming(Phi.getIncomingValueForBlock(StartBlock), TrueBlock);
				Phi.addIncoming(Phi.getIncomingValueForBlock(StartBlock), FalseBlock);
				}
				}
				} else {
				BasicBlock *NewBlock = nullptr;
				Value *SIOp1 = SI->getTrueValue();
				Value *SIOp2 = SI->getFalseValue();

				// A triangle pointing right.
				if (!TrueBlock) {
				NewBlock = FalseBlock;
				SjoerdMeijerUnsubmitted Done Reply Inline Actions Perhaps a comment here that State just corresponds to the value of a case statement? SjoerdMeijer: Perhaps a comment here that State just corresponds to the value of a case statement?
				FT = FalseBlock;
				}
				// A triangle pointing left.
				else {
				NewBlock = TrueBlock;
				TT = TrueBlock;
				std::swap(SIOp1, SIOp2);
				}
				SjoerdMeijerUnsubmitted Done Reply Inline Actions // This data structure keeps track of all blocks that have been cloned in the // optimization. -> // This data structure keeps track of all blocks that have been cloned. SjoerdMeijer: // This data structure keeps track of all blocks that have been cloned in the //…

				// Update the phi node of SI.
				for (unsigned Idx = 0; Idx < SIUse->getNumIncomingValues(); ++Idx) {
				if (SIUse->getIncomingBlock(Idx) == StartBlock)
				SIUse->setIncomingValue(Idx, SIOp1);
				}
				SIUse->addIncoming(SIOp2, NewBlock);

				// Update any other PHI nodes in EndBlock.
				for (auto II = EndBlock->begin(); PHINode *Phi = dyn_cast<PHINode>(II);
				++II) {
				if (Phi != SIUse)
				Phi->addIncoming(Phi->getIncomingValueForBlock(StartBlock), NewBlock);
				}
				}
				StartBlockTerm->eraseFromParent();
				SjoerdMeijerUnsubmitted Done Reply Inline Actions Do we need BBName? Can we not do something like: OS << BB->hasName() ? BB->getName() : "..."; SjoerdMeijer: Do we need BBName? Can we not do something like: OS << BB->hasName() ? BB->getName() : "..."…
				alexey.zhikharAuthorUnsubmitted Done Reply Inline Actions The idea here is to print the basic block itself if it has no name, e.g.: 1: %2 = ... %3 = ... We can't say `OS << (BB->hasName() ? BB->getName() : BB);` since both legs of a ternary operator must be of the same type. alexey.zhikhar:* The idea here is to print the basic block itself if it has no name, e.g.: ``` 1: %2 = ...
				BranchInst::Create(TT, FT, SI->getCondition(), StartBlock);
				DTU->applyUpdates({{DominatorTree::Insert, StartBlock, TT},
				{DominatorTree::Insert, StartBlock, FT}});

				// The select is now dead.
				SI->eraseFromParent();
				}

				struct ClonedBlock {
				BasicBlock *BB;
				uint64_t State; ///< \p State corresponds to the next value of a switch stmnt.
				};

				typedef std::deque<BasicBlock *> PathType;
				typedef std::vector<PathType> PathsType;
				typedef std::set<const BasicBlock *> VisitedBlocks;
				typedef std::vector<ClonedBlock> CloneList;

				// This data structure keeps track of all blocks that have been cloned. If two
				SjoerdMeijerUnsubmitted Done Reply Inline Actions I was confused what exactly the determinator was, but I think it's the block that determines the next state. Think some comments about this would be beneficial too. SjoerdMeijer: I was confused what exactly the determinator was, but I think it's the block that determines…
				// different ThreadingPaths clone the same block for a certain state it should
				// be reused, and it can be looked up in this map.
				typedef DenseMap<BasicBlock *, CloneList> DuplicateBlockMap;

				// This map keeps track of all the new definitions for an instruction. This
				// information is needed when restoring SSA form after cloning blocks.
				typedef DenseMap<Instruction , std::vector<Instruction >> DefMap;

				inline raw_ostream &operator<<(raw_ostream &OS, const PathType &Path) {
				OS << "< ";
				dnsampaioUnsubmitted Done Reply Inline Actions In release mode I get warnings this is not used (we use -Werror): llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp:364:21: error: unused function 'operator<<' [-Werror,-Wunused-function] Perhaps put it around #ifndef NDEBUG ? dnsampaio: In release mode I get warnings this is not used (we use -Werror)…
				jkreinerUnsubmitted Done Reply Inline Actions I see, I'll add that #ifndef NDEBUG check. jkreiner: I see, I'll add that #ifndef NDEBUG check.
				for (const BasicBlock *BB : Path) {
				std::string BBName;
				if (BB->hasName())
				raw_string_ostream(BBName) << BB->getName();
				else
				raw_string_ostream(BBName) << BB;
				OS << BBName << " ";
				}
				OS << ">";
				return OS;
				}

				/// ThreadingPath is a path in the control flow of a loop that can be threaded
				/// by cloning necessary basic blocks and replacing conditional branches with
				/// unconditional ones. A threading path includes a list of basic blocks, the
				ChuanqiXuUnsubmitted Not Done Reply Inline Actions Does it necessary to mark `~MainSwitch` as virtual? Since there is no derived class from MainSwitch. ChuanqiXu: Does it necessary to mark `~MainSwitch` as virtual? Since there is no derived class from…
				alexey.zhikharAuthorUnsubmitted Done Reply Inline Actions Classes might derive from `MainSwitch` in the future, it's a good practice to declare destructors virtual. alexey.zhikhar: Classes might derive from `MainSwitch` in the future, it's a good practice to declare…
				ChuanqiXuUnsubmitted Not Done Reply Inline Actions So it may be better to add `virtual ~MainSwitch` in the future. ChuanqiXu: So it may be better to add `virtual ~MainSwitch` in the future.
				alexey.zhikharAuthorUnsubmitted Done Reply Inline Actions Let's see if Sjoerd @SjoerdMeijer has an opinion here. alexey.zhikhar: Let's see if Sjoerd @SjoerdMeijer has an opinion here.
				SjoerdMeijerUnsubmitted Not Done Reply Inline Actions No strong opinion. Looks fine as it is. SjoerdMeijer: No strong opinion. Looks fine as it is.
				/// exit state, and the block that determines the next state.
				struct ThreadingPath {
				/// Exit value is DFA's exit state for the given path.
				uint64_t getExitValue() const { return ExitVal; }
				void setExitValue(const ConstantInt *V) {
				ExitVal = V->getZExtValue();
				IsExitValSet = true;
				}
				bool isExitValueSet() const { return IsExitValSet; }

				/// Determinator is the basic block that determines the next state of the DFA.
				const BasicBlock *getDeterminatorBB() const { return DBB; }
				void setDeterminator(const BasicBlock *BB) { DBB = BB; }

				/// Path is a list of basic blocks.
				const PathType &getPath() const { return Path; }
				void setPath(const PathType &NewPath) { Path = NewPath; }

				void print(raw_ostream &OS) const {
				OS << Path << " [ " << ExitVal << ", " << DBB->getName() << " ]";
				}

				private:
				SjoerdMeijerUnsubmitted Done Reply Inline Actions I don't think we have tests for cases that are not predictable. SjoerdMeijer: I don't think we have tests for cases that are not predictable.
				alexey.zhikharAuthorUnsubmitted Done Reply Inline Actions Please see `negative4` alexey.zhikhar: Please see `negative4`
				PathType Path;
				uint64_t ExitVal;
				const BasicBlock *DBB = nullptr;
				bool IsExitValSet = false;
				};

				#ifndef NDEBUG
				inline raw_ostream &operator<<(raw_ostream &OS, const ThreadingPath &TPath) {
				TPath.print(OS);
				return OS;
				}
				#endif

				struct MainSwitch {
				MainSwitch(SwitchInst SI, OptimizationRemarkEmitter ORE) {
				if (isPredictable(SI)) {
				Instr = SI;
				} else {
				ORE->emit([&]() {
				return OptimizationRemarkMissed(DEBUG_TYPE, "SwitchNotPredictable", SI)
				<< "Switch instruction is not predictable.";
				});
				}
				}

				virtual ~MainSwitch() = default;

				SwitchInst *getInstr() const { return Instr; }
				const SmallVector<SelectInstToUnfold, 4> getSelectInsts() {
				return SelectInsts;
				}

				private:
				/// Do a use-def chain traversal. Make sure the value of the switch variable
				/// is always a known constant. This means that all conditional jumps based on
				/// switch variable can be converted to unconditional jumps.
				bool isPredictable(const SwitchInst *SI) {
				std::deque<Instruction *> Q;
				SmallSet<Value *, 16> SeenValues;
				SelectInsts.clear();

				Value *FirstDef = SI->getOperand(0);
				auto *Inst = dyn_cast<Instruction>(FirstDef);

				// If this is a function argument or another non-instruction, then give up.
				// We are interested in loop local variables.
				if (!Inst)
				return false;

				// Require the first definition to be a PHINode
				if (!isa<PHINode>(Inst))
				return false;

				LLVM_DEBUG(dbgs() << "\tisPredictable() FirstDef: " << *Inst << "\n");
				SjoerdMeijerUnsubmitted Done Reply Inline Actions This could probably do with a more descriptive function name, to make more explicit what kind of Value we expect. SjoerdMeijer: This could probably do with a more descriptive function name, to make more explicit what kind…

				Q.push_back(Inst);
				SeenValues.insert(FirstDef);

				while (!Q.empty()) {
				Instruction *Current = Q.front();
				Q.pop_front();

				if (auto *Phi = dyn_cast<PHINode>(Current)) {
				for (Value *Incoming : Phi->incoming_values()) {
				if (!isPredictableValue(Incoming, SeenValues))
				return false;
				addInstToQueue(Incoming, Q, SeenValues);
				}
				SjoerdMeijerUnsubmitted Done Reply Inline Actions no curly brackets SjoerdMeijer: no curly brackets
				LLVM_DEBUG(dbgs() << "\tisPredictable() phi: " << *Phi << "\n");
				} else if (SelectInst *SelI = dyn_cast<SelectInst>(Current)) {
				if (!isValidSelectInst(SelI))
				return false;
				if (!isPredictableValue(SelI->getTrueValue(), SeenValues) \|\|
				!isPredictableValue(SelI->getFalseValue(), SeenValues)) {
				return false;
				SjoerdMeijerUnsubmitted Done Reply Inline Actions Same SjoerdMeijer: Same
				}
				addInstToQueue(SelI->getTrueValue(), Q, SeenValues);
				addInstToQueue(SelI->getFalseValue(), Q, SeenValues);
				LLVM_DEBUG(dbgs() << "\tisPredictable() select: " << *SelI << "\n");
				if (auto *SelIUse = dyn_cast<PHINode>(SelI->user_back()))
				SelectInsts.push_back(SelectInstToUnfold(SelI, SelIUse));
				SjoerdMeijerUnsubmitted Done Reply Inline Actions Same, and actually in a lot more cases, so will stop mentioning it from now on, but there's opportunities to get rid of a lot of curly brackets. :-) SjoerdMeijer: Same, and actually in a lot more cases, so will stop mentioning it from now on, but there's…
				alexey.zhikharAuthorUnsubmitted Done Reply Inline Actions Thanks, I went through the whole file looking for unnecessary curly braces, so now they should be no more. Please feel free to let me know if I missed something. alexey.zhikhar: Thanks, I went through the whole file looking for unnecessary curly braces, so now they should…
				} else {
				// If it is neither a phi nor a select, then we give up.
				return false;
				}
				}

				return true;
				}

				bool isPredictableValue(Value InpVal, SmallSet<Value , 16> &SeenValues) {
				SjoerdMeijerUnsubmitted Done Reply Inline Actions Don't need the brackets here SjoerdMeijer: Don't need the brackets here
				if (SeenValues.find(InpVal) != SeenValues.end())
				return true;

				if (isa<ConstantInt>(InpVal))
				return true;
				SjoerdMeijerUnsubmitted Done Reply Inline Actions and here. SjoerdMeijer: and here.

				// If this is a function argument or another non-instruction, then give up.
				if (!isa<Instruction>(InpVal))
				return false;

				return true;
				}

				void addInstToQueue(Value Val, std::deque<Instruction > &Q,
				SmallSet<Value *, 16> &SeenValues) {
				if (SeenValues.find(Val) != SeenValues.end())
				return;
				if (Instruction *I = dyn_cast<Instruction>(Val))
				Q.push_back(I);
				SeenValues.insert(Val);
				}

				bool isValidSelectInst(SelectInst *SI) {
				if (!SI->hasOneUse())
				return false;

				Instruction *SIUse = dyn_cast<Instruction>(SI->user_back());
				// The use of the select inst should be either a phi or another select.
				if (!SIUse && !(isa<PHINode>(SIUse) \|\| isa<SelectInst>(SIUse)))
				return false;

				BasicBlock *SIBB = SI->getParent();

				// Currently, we can only expand select instructions in basic blocks with
				// one successor.
				BranchInst *SITerm = dyn_cast<BranchInst>(SIBB->getTerminator());
				if (!SITerm \|\| !SITerm->isUnconditional())
				return false;

				if (isa<PHINode>(SIUse) &&
				SIBB->getSingleSuccessor() != dyn_cast<Instruction>(SIUse)->getParent())
				return false;

				// If select will not be sunk during unfolding, and it is in the same basic
				// block as another state defining select, then cannot unfold both.
				for (SelectInstToUnfold SIToUnfold : SelectInsts) {
				SelectInst *PrevSI = SIToUnfold.getInst();
				if (PrevSI->getTrueValue() != SI && PrevSI->getFalseValue() != SI &&
				PrevSI->getParent() == SI->getParent())
				return false;
				}

				return true;
				}

				SwitchInst *Instr = nullptr;
				SmallVector<SelectInstToUnfold, 4> SelectInsts;
				};

				struct AllSwitchPaths {
				AllSwitchPaths(const MainSwitch MSwitch, OptimizationRemarkEmitter ORE)
				: Switch(MSwitch->getInstr()), SwitchBlock(Switch->getParent()),
				ORE(ORE) {}

				std::vector<ThreadingPath> &getThreadingPaths() { return TPaths; }
				unsigned getNumThreadingPaths() { return TPaths.size(); }
				SwitchInst *getSwitchInst() { return Switch; }
				BasicBlock *getSwitchBlock() { return SwitchBlock; }

				void run() {
				VisitedBlocks Visited;
				PathsType LoopPaths = paths(SwitchBlock, Visited, /* PathDepth = */ 1);
				StateDefMap StateDef = getStateDefMap();

				for (PathType Path : LoopPaths) {
				ThreadingPath TPath;

				const BasicBlock *PrevBB = Path.back();
				for (const BasicBlock *BB : Path) {
				if (StateDef.count(BB) != 0) {
				const PHINode *Phi = dyn_cast<PHINode>(StateDef[BB]);
				assert(Phi && "Expected a state-defining instr to be a phi node.");

				const Value *V = Phi->getIncomingValueForBlock(PrevBB);
				if (const ConstantInt *C = dyn_cast<const ConstantInt>(V)) {
				TPath.setExitValue(C);
				TPath.setDeterminator(BB);
				TPath.setPath(Path);
				}
				}

				// Switch block is the determinator, this is the final exit value.
				if (TPath.isExitValueSet() && BB == Path.front())
				break;

				PrevBB = BB;
				}

				if (TPath.isExitValueSet())
				TPaths.push_back(TPath);
				}
				}

				private:
				// Value: an instruction that defines a switch state;
				// Key: the parent basic block of that instruction.
				typedef DenseMap<const BasicBlock , const PHINode > StateDefMap;

				PathsType paths(BasicBlock *BB, VisitedBlocks &Visited,
				unsigned PathDepth) const {
				PathsType Res;

				// Stop exploring paths after visiting MaxPathLength blocks
				if (PathDepth > MaxPathLength) {
				ORE->emit([&]() {
				return OptimizationRemarkAnalysis(DEBUG_TYPE, "MaxPathLengthReached",
				Switch)
				<< "Exploration stopped after visiting MaxPathLength="
				<< ore::NV("MaxPathLength", MaxPathLength) << " blocks.";
				});
				return Res;
				}

				Visited.insert(BB);

				// Some blocks have multiple edges to the same successor, and this set
				// is used to prevent a duplicate path from being generated
				SmallSet<BasicBlock *, 4> Successors;

				for (succ_iterator SI = succ_begin(BB), E = succ_end(BB); SI != E; ++SI) {
				BasicBlock Succ = SI;

				if (Successors.find(Succ) != Successors.end())
				continue;
				Successors.insert(Succ);

				// Found a cycle through the SwitchBlock
				if (Succ == SwitchBlock) {
				Res.push_back({BB});
				continue;
				}

				// We have encountered a cycle, do not get caught in it
				if (Visited.find(Succ) != Visited.end())
				continue;

				PathsType SuccPaths = paths(Succ, Visited, PathDepth + 1);
				for (PathType Path : SuccPaths) {
				PathType NewPath(Path);
				NewPath.push_front(BB);
				Res.push_back(NewPath);
				}
				}
				// This block could now be visited again from a different predecessor. Note
				// that this will result in exponential runtime. Subpaths could possibly be
				// cached but it takes a lot of memory to store them.
				Visited.erase(BB);
				return Res;
				}

				/// Walk the use-def chain and collect all the state-defining instructions.
				StateDefMap getStateDefMap() const {
				StateDefMap Res;

				Value *FirstDef = Switch->getOperand(0);

				assert(isa<PHINode>(FirstDef) && "After select unfolding, all state "
				"definitions are expected to be phi "
				"nodes.");

				SmallVector<PHINode *, 8> Stack;
				Stack.push_back(dyn_cast<PHINode>(FirstDef));
				SmallSet<Value *, 16> SeenValues;

				while (!Stack.empty()) {
				PHINode *CurPhi = Stack.back();
				Stack.pop_back();

				Res[CurPhi->getParent()] = CurPhi;
				SeenValues.insert(CurPhi);

				for (Value *Incoming : CurPhi->incoming_values()) {
				if (Incoming == FirstDef \|\| isa<ConstantInt>(Incoming) \|\|
				SeenValues.find(Incoming) != SeenValues.end()) {
				continue;
				}

				assert(isa<PHINode>(Incoming) && "After select unfolding, all state "
				"definitions are expected to be phi "
				"nodes.");

				Stack.push_back(cast<PHINode>(Incoming));
				}
				}

				return Res;
				}

				SwitchInst *Switch;
				BasicBlock *SwitchBlock;
				OptimizationRemarkEmitter *ORE;
				std::vector<ThreadingPath> TPaths;
				};

				struct TransformDFA {
				TransformDFA(AllSwitchPaths SwitchPaths, DominatorTree DT,
				AssumptionCache AC, TargetTransformInfo TTI,
				OptimizationRemarkEmitter *ORE,
				SmallPtrSet<const Value *, 32> EphValues)
				: SwitchPaths(SwitchPaths), DT(DT), AC(AC), TTI(TTI), ORE(ORE),
				EphValues(EphValues) {}

				void run() {
				if (isLegalAndProfitableToTransform()) {
				createAllExitPaths();
				NumTransforms++;
				}
				}

				private:
				/// This function performs both a legality check and profitability check at
				/// the same time since it is convenient to do so. It iterates through all
				/// blocks that will be cloned, and keeps track of the duplication cost. It
				/// also returns false if it is illegal to clone some required block.
				bool isLegalAndProfitableToTransform() {
				CodeMetrics Metrics;
				SwitchInst *Switch = SwitchPaths->getSwitchInst();
				ChuanqiXuUnsubmitted Done Reply Inline Actions What's the rationale to use ceil log base 2? Maybe there is a conclusion in math. But I guess it may be better to tell it. ChuanqiXu: What's the rationale to use ceil log base 2? Maybe there is a conclusion in math. But I guess…
				alexey.zhikharAuthorUnsubmitted Done Reply Inline Actions Added a comment. alexey.zhikhar: Added a comment.
				ChuanqiXuUnsubmitted Done Reply Inline Actions Is this assumption stable? For the one hand, the motivation example in Coremark wouldn't be lowered Into a binary tree. On the other hand, it may be possible that the switch statement may be lowered to a jump table. ChuanqiXu: Is this assumption stable? For the one hand, the motivation example in Coremark wouldn't be…
				alexey.zhikharAuthorUnsubmitted Done Reply Inline Actions That's a good point, I added a separate clause for when a jump table is expected instead of binary search (the same hook is used in inlining). alexey.zhikhar: That's a good point, I added a separate clause for when a jump table is expected instead of…

				// Note that DuplicateBlockMap is not being used as intended here. It is
				// just being used to ensure (BB, State) pairs are only counted once.
				DuplicateBlockMap DuplicateMap;

				for (ThreadingPath &TPath : SwitchPaths->getThreadingPaths()) {
				PathType PathBBs = TPath.getPath();
				uint64_t NextState = TPath.getExitValue();
				const BasicBlock *Determinator = TPath.getDeterminatorBB();

				// Update Metrics for the Switch block, this is always cloned
				BasicBlock *BB = SwitchPaths->getSwitchBlock();
				BasicBlock *VisitedBB = getClonedBB(BB, NextState, DuplicateMap);
				if (!VisitedBB) {
				Metrics.analyzeBasicBlock(BB, *TTI, EphValues);
				DuplicateMap[BB].push_back({BB, NextState});
				}
				SjoerdMeijerUnsubmitted Done Reply Inline Actions Could you include here, or where ever most appropiate, the definition of threading path. Copied from a test case: A threadable path includes a list of basic blocks, the exit state, and the block that determines the next state. < path of BBs that form a cycle > [ state, determinator ] SjoerdMeijer: Could you include here, or where ever most appropiate, the definition of threading path. Copied…
				alexey.zhikharAuthorUnsubmitted Done Reply Inline Actions Added a description to the definition of the `ThreadingPath` class. alexey.zhikhar: Added a description to the definition of the `ThreadingPath` class.

				// If the Switch block is the Determinator, then we can continue since
				// this is the only block that is cloned and we already counted for it.
				if (PathBBs.front() == Determinator)
				continue;

				// Otherwise update Metrics for all blocks that will be cloned. If any
				// block is already cloned and would be reused, don't double count it.
				auto DetIt = std::find(PathBBs.begin(), PathBBs.end(), Determinator);
				for (auto BBIt = DetIt; BBIt != PathBBs.end(); BBIt++) {
				BB = *BBIt;
				VisitedBB = getClonedBB(BB, NextState, DuplicateMap);
				if (VisitedBB)
				continue;
				Metrics.analyzeBasicBlock(BB, *TTI, EphValues);
				DuplicateMap[BB].push_back({BB, NextState});
				}

				if (Metrics.notDuplicatable) {
				LLVM_DEBUG(dbgs() << "DFA Jump Threading: Not jump threading, contains "
				<< "non-duplicatable instructions.\n");
				ORE->emit([&]() {
				return OptimizationRemarkMissed(DEBUG_TYPE, "NonDuplicatableInst",
				Switch)
				<< "Contains non-duplicatable instructions.";
				});
				return false;
				}

				if (Metrics.convergent) {
				LLVM_DEBUG(dbgs() << "DFA Jump Threading: Not jump threading, contains "
				<< "convergent instructions.\n");
				ORE->emit([&]() {
				return OptimizationRemarkMissed(DEBUG_TYPE, "ConvergentInst", Switch)
				<< "Contains convergent instructions.";
				});
				return false;
				}
				}

				unsigned DuplicationCost = 0;

				unsigned JumpTableSize = 0;
				TTI->getEstimatedNumberOfCaseClusters(*Switch, JumpTableSize, nullptr,
				nullptr);
				if (JumpTableSize == 0) {
				// Factor in the number of conditional branches reduced from jump
				// threading. Assume that lowering the switch block is implemented by
				// using binary search, hence the LogBase2().
				unsigned CondBranches =
				APInt(32, Switch->getNumSuccessors()).ceilLogBase2();
				DuplicationCost = Metrics.NumInsts / CondBranches;
				} else {
				// Compared with jump tables, the DFA optimizer removes an indirect branch
				// on each loop iteration, thus making branch prediction more precise. The
				// more branch targets there are, the more likely it is for the branch
				// predictor to make a mistake, and the more benefit there is in the DFA
				// optimizer. Thus, the more branch targets there are, the lower is the
				// cost of the DFA opt.
				DuplicationCost = Metrics.NumInsts / JumpTableSize;
				}

				LLVM_DEBUG(dbgs() << "\nDFA Jump Threading: Cost to jump thread block "
				<< SwitchPaths->getSwitchBlock()->getName()
				<< " is: " << DuplicationCost << "\n\n");

				if (DuplicationCost > CostThreshold) {
				LLVM_DEBUG(dbgs() << "Not jump threading, duplication cost exceeds the "
				<< "cost threshold.\n");
				ORE->emit([&]() {
				return OptimizationRemarkMissed(DEBUG_TYPE, "NotProfitable", Switch)
				<< "Duplication cost exceeds the cost threshold (cost="
				<< ore::NV("Cost", DuplicationCost)
				<< ", threshold=" << ore::NV("Threshold", CostThreshold) << ").";
				});
				return false;
				}

				ORE->emit([&]() {
				return OptimizationRemark(DEBUG_TYPE, "JumpThreaded", Switch)
				<< "Switch statement jump-threaded.";
				});

				return true;
				}

				/// Transform each threading path to effectively jump thread the DFA.
				void createAllExitPaths() {
				DomTreeUpdater DTU(*DT, DomTreeUpdater::UpdateStrategy::Eager);

				// Move the switch block to the end of the path, since it will be duplicated
				BasicBlock *SwitchBlock = SwitchPaths->getSwitchBlock();
				for (ThreadingPath &TPath : SwitchPaths->getThreadingPaths()) {
				LLVM_DEBUG(dbgs() << TPath << "\n");
				PathType NewPath(TPath.getPath());
				NewPath.push_back(SwitchBlock);
				TPath.setPath(NewPath);
				}

				// Transform the ThreadingPaths and keep track of the cloned values
				DuplicateBlockMap DuplicateMap;
				DefMap NewDefs;

				SmallSet<BasicBlock *, 16> BlocksToClean;
				for (BasicBlock *BB : successors(SwitchBlock))
				BlocksToClean.insert(BB);

				for (ThreadingPath &TPath : SwitchPaths->getThreadingPaths()) {
				createExitPath(NewDefs, TPath, DuplicateMap, BlocksToClean, &DTU);
				NumPaths++;
				}

				// After all paths are cloned, now update the last successor of the cloned
				// path so it skips over the switch statement
				for (ThreadingPath &TPath : SwitchPaths->getThreadingPaths())
				updateLastSuccessor(TPath, DuplicateMap, &DTU);

				// For each instruction that was cloned and used outside, update its uses
				updateSSA(NewDefs);

				// Clean PHI Nodes for the newly created blocks
				for (BasicBlock *BB : BlocksToClean)
				cleanPhiNodes(BB);
				}

				/// For a specific ThreadingPath \p Path, create an exit path starting from
				/// the determinator block.
				///
				/// To remember the correct destination, we have to duplicate blocks
				/// corresponding to each state. Also update the terminating instruction of
				/// the predecessors, and phis in the successor blocks.
				void createExitPath(DefMap &NewDefs, ThreadingPath &Path,
				DuplicateBlockMap &DuplicateMap,
				SmallSet<BasicBlock *, 16> &BlocksToClean,
				DomTreeUpdater *DTU) {
				uint64_t NextState = Path.getExitValue();
				const BasicBlock *Determinator = Path.getDeterminatorBB();
				PathType PathBBs = Path.getPath();

				// Don't select the placeholder block in front
				if (PathBBs.front() == Determinator)
				PathBBs.pop_front();

				auto DetIt = std::find(PathBBs.begin(), PathBBs.end(), Determinator);
				auto Prev = std::prev(DetIt);
				BasicBlock PrevBB = Prev;
				for (auto BBIt = DetIt; BBIt != PathBBs.end(); BBIt++) {
				BasicBlock BB = BBIt;
				BlocksToClean.insert(BB);

				// We already cloned BB for this NextState, now just update the branch
				// and continue.
				BasicBlock *NextBB = getClonedBB(BB, NextState, DuplicateMap);
				if (NextBB) {
				updatePredecessor(PrevBB, BB, NextBB, DTU);
				PrevBB = NextBB;
				continue;
				}

				// Clone the BB and update the successor of Prev to jump to the new block
				BasicBlock *NewBB = cloneBlockAndUpdatePredecessor(
				BB, PrevBB, NextState, DuplicateMap, NewDefs, DTU);
				DuplicateMap[BB].push_back({NewBB, NextState});
				BlocksToClean.insert(NewBB);
				PrevBB = NewBB;
				}
				}

				/// Restore SSA form after cloning blocks.
				///
				/// Each cloned block creates new defs for a variable, and the uses need to be
				/// updated to reflect this. The uses may be replaced with a cloned value, or
				/// some derived phi instruction. Note that all uses of a value defined in the
				/// same block were already remapped when cloning the block.
				void updateSSA(DefMap &NewDefs) {
				SSAUpdaterBulk SSAUpdate;
				SmallVector<Use *, 16> UsesToRename;

				for (auto KV : NewDefs) {
				Instruction *I = KV.first;
				BasicBlock *BB = I->getParent();
				std::vector<Instruction *> Cloned = KV.second;

				// Scan all uses of this instruction to see if it is used outside of its
				// block, and if so, record them in UsesToRename.
				for (Use &U : I->uses()) {
				Instruction *User = cast<Instruction>(U.getUser());
				if (PHINode *UserPN = dyn_cast<PHINode>(User)) {
				if (UserPN->getIncomingBlock(U) == BB)
				continue;
				} else if (User->getParent() == BB) {
				continue;
				}

				UsesToRename.push_back(&U);
				}

				// If there are no uses outside the block, we're done with this
				// instruction.
				if (UsesToRename.empty())
				continue;
				LLVM_DEBUG(dbgs() << "DFA-JT: Renaming non-local uses of: " << *I
				<< "\n");

				// We found a use of I outside of BB. Rename all uses of I that are
				// outside its block to be uses of the appropriate PHI node etc. See
				// ValuesInBlocks with the values we know.
				unsigned VarNum = SSAUpdate.AddVariable(I->getName(), I->getType());
				SSAUpdate.AddAvailableValue(VarNum, BB, I);
				for (Instruction *New : Cloned)
				SSAUpdate.AddAvailableValue(VarNum, New->getParent(), New);

				while (!UsesToRename.empty())
				SSAUpdate.AddUse(VarNum, UsesToRename.pop_back_val());

				LLVM_DEBUG(dbgs() << "\n");
				}
				// SSAUpdater handles phi placement and renaming uses with the appropriate
				// value.
				SSAUpdate.RewriteAllUses(DT);
				}

				/// Clones a basic block, and adds it to the CFG.
				///
				/// This function also includes updating phi nodes in the successors of the
				/// BB, and remapping uses that were defined locally in the cloned BB.
				BasicBlock cloneBlockAndUpdatePredecessor(BasicBlock BB, BasicBlock *PrevBB,
				uint64_t NextState,
				DuplicateBlockMap &DuplicateMap,
				DefMap &NewDefs,
				DomTreeUpdater *DTU) {
				ValueToValueMapTy VMap;
				BasicBlock *NewBB = CloneBasicBlock(
				BB, VMap, ".jt" + std::to_string(NextState), BB->getParent());
				NewBB->moveAfter(BB);
				NumCloned++;

				for (Instruction &I : *NewBB) {
				// Do not remap operands of PHINode in case a definition in BB is an
				// incoming value to a phi in the same block. This incoming value will
				// be renamed later while restoring SSA.
				if (isa<PHINode>(&I))
				continue;
				RemapInstruction(&I, VMap,
				RF_IgnoreMissingLocals \| RF_NoModuleLevelChanges);
				if (AssumeInst *II = dyn_cast<AssumeInst>(&I))
				AC->registerAssumption(II);
				}

				updateSuccessorPhis(BB, NewBB, NextState, VMap, DuplicateMap);
				updatePredecessor(PrevBB, BB, NewBB, DTU);
				updateDefMap(NewDefs, VMap);

				// Add all successors to the DominatorTree
				SmallPtrSet<BasicBlock *, 4> SuccSet;
				for (auto *SuccBB : successors(NewBB)) {
				if (SuccSet.insert(SuccBB).second)
				DTU->applyUpdates({{DominatorTree::Insert, NewBB, SuccBB}});
				}
				SuccSet.clear();
				return NewBB;
				}

				/// Update the phi nodes in BB's successors.
				///
				/// This means creating a new incoming value from NewBB with the new
				/// instruction wherever there is an incoming value from BB.
				void updateSuccessorPhis(BasicBlock BB, BasicBlock ClonedBB,
				uint64_t NextState, ValueToValueMapTy &VMap,
				DuplicateBlockMap &DuplicateMap) {
				std::vector<BasicBlock *> BlocksToUpdate;

				// If BB is the last block in the path, we can simply update the one case
				// successor that will be reached.
				if (BB == SwitchPaths->getSwitchBlock()) {
				SwitchInst *Switch = SwitchPaths->getSwitchInst();
				BasicBlock *NextCase = getNextCaseSuccessor(Switch, NextState);
				BlocksToUpdate.push_back(NextCase);
				BasicBlock *ClonedSucc = getClonedBB(NextCase, NextState, DuplicateMap);
				if (ClonedSucc)
				BlocksToUpdate.push_back(ClonedSucc);
				}
				// Otherwise update phis in all successors.
				else {
				for (BasicBlock *Succ : successors(BB)) {
				BlocksToUpdate.push_back(Succ);

				// Check if a successor has already been cloned for the particular exit
				// value. In this case if a successor was already cloned, the phi nodes
				// in the cloned block should be updated directly.
				BasicBlock *ClonedSucc = getClonedBB(Succ, NextState, DuplicateMap);
				if (ClonedSucc)
				BlocksToUpdate.push_back(ClonedSucc);
				}
				}

				// If there is a phi with an incoming value from BB, create a new incoming
				// value for the new predecessor ClonedBB. The value will either be the same
				// value from BB or a cloned value.
				for (BasicBlock *Succ : BlocksToUpdate) {
				for (auto II = Succ->begin(); PHINode *Phi = dyn_cast<PHINode>(II);
				++II) {
				Value *Incoming = Phi->getIncomingValueForBlock(BB);
				if (Incoming) {
				if (isa<Constant>(Incoming)) {
				Phi->addIncoming(Incoming, ClonedBB);
				continue;
				}
				Value *ClonedVal = VMap[Incoming];
				if (ClonedVal)
				Phi->addIncoming(ClonedVal, ClonedBB);
				else
				Phi->addIncoming(Incoming, ClonedBB);
				}
				}
				}
				}

				/// Sets the successor of PrevBB to be NewBB instead of OldBB. Note that all
				/// other successors are kept as well.
				void updatePredecessor(BasicBlock PrevBB, BasicBlock OldBB,
				BasicBlock NewBB, DomTreeUpdater DTU) {
				// When a path is reused, there is a chance that predecessors were already
				// updated before. Check if the predecessor needs to be updated first.
				if (!isPredecessor(OldBB, PrevBB))
				return;

				Instruction *PrevTerm = PrevBB->getTerminator();
				for (unsigned Idx = 0; Idx < PrevTerm->getNumSuccessors(); Idx++) {
				if (PrevTerm->getSuccessor(Idx) == OldBB) {
				OldBB->removePredecessor(PrevBB, /* KeepOneInputPHIs = */ true);
				PrevTerm->setSuccessor(Idx, NewBB);
				}
				}
				DTU->applyUpdates({{DominatorTree::Delete, PrevBB, OldBB},
				{DominatorTree::Insert, PrevBB, NewBB}});
				}

				/// Add new value mappings to the DefMap to keep track of all new definitions
				/// for a particular instruction. These will be used while updating SSA form.
				void updateDefMap(DefMap &NewDefs, ValueToValueMapTy &VMap) {
				for (auto Entry : VMap) {
				Instruction *Inst =
				dyn_cast<Instruction>(const_cast<Value *>(Entry.first));
				if (!Inst \|\| !Entry.second \|\| isa<BranchInst>(Inst) \|\|
				isa<SwitchInst>(Inst)) {
				continue;
				}

				Instruction *Cloned = dyn_cast<Instruction>(Entry.second);
				if (!Cloned)
				continue;

				if (NewDefs.find(Inst) == NewDefs.end())
				NewDefs[Inst] = {Cloned};
				else
				NewDefs[Inst].push_back(Cloned);
				}
				}

				/// Update the last branch of a particular cloned path to point to the correct
				/// case successor.
				///
				/// Note that this is an optional step and would have been done in later
				/// optimizations, but it makes the CFG significantly easier to work with.
				void updateLastSuccessor(ThreadingPath &TPath,
				DuplicateBlockMap &DuplicateMap,
				DomTreeUpdater *DTU) {
				uint64_t NextState = TPath.getExitValue();
				BasicBlock *BB = TPath.getPath().back();
				BasicBlock *LastBlock = getClonedBB(BB, NextState, DuplicateMap);

				// Note multiple paths can end at the same block so check that it is not
				// updated yet
				if (!isa<SwitchInst>(LastBlock->getTerminator()))
				return;
				SwitchInst *Switch = cast<SwitchInst>(LastBlock->getTerminator());
				BasicBlock *NextCase = getNextCaseSuccessor(Switch, NextState);

				std::vector<DominatorTree::UpdateType> DTUpdates;
				SmallPtrSet<BasicBlock *, 4> SuccSet;
				for (BasicBlock *Succ : successors(LastBlock)) {
				if (Succ != NextCase && SuccSet.insert(Succ).second)
				DTUpdates.push_back({DominatorTree::Delete, LastBlock, Succ});
				}

				Switch->eraseFromParent();
				BranchInst::Create(NextCase, LastBlock);

				DTU->applyUpdates(DTUpdates);
				}

				/// After cloning blocks, some of the phi nodes have extra incoming values
				/// that are no longer used. This function removes them.
				void cleanPhiNodes(BasicBlock *BB) {
				// If BB is no longer reachable, remove any remaining phi nodes
				if (pred_empty(BB)) {
				std::vector<PHINode *> PhiToRemove;
				for (auto II = BB->begin(); PHINode *Phi = dyn_cast<PHINode>(II); ++II) {
				ChuanqiXuUnsubmitted Done Reply Inline Actions It may be better to comment why we don't use range-based for here. ChuanqiXu: It may be better to comment why we don't use range-based for here.
				alexey.zhikharAuthorUnsubmitted Done Reply Inline Actions No good reason, fixed. alexey.zhikhar: No good reason, fixed.
				PhiToRemove.push_back(Phi);
				}
				for (PHINode *PN : PhiToRemove) {
				PN->replaceAllUsesWith(UndefValue::get(PN->getType()));
				nlopesUnsubmitted Not Done Reply Inline Actions please use a poison value as placeholder instead of undef. We are trying to remove undef. Thank you! nlopes: please use a poison value as placeholder instead of undef. We are trying to remove undef. Thank…
				PN->eraseFromParent();
				}
				return;
				}
				SjoerdMeijerUnsubmitted Done Reply Inline Actions auto SI = dyn_cast<SwitchInst>(BB.getTerminator(); if (!SI) continue; SjoerdMeijer:* auto *SI = dyn_cast<SwitchInst>(BB.getTerminator(); if (!SI) continue;

				// Remove any incoming values that come from an invalid predecessor
				for (auto II = BB->begin(); PHINode *Phi = dyn_cast<PHINode>(II); ++II) {
				std::vector<BasicBlock *> BlocksToRemove;
				for (BasicBlock *IncomingBB : Phi->blocks()) {
				if (!isPredecessor(BB, IncomingBB))
				BlocksToRemove.push_back(IncomingBB);
				}
				for (BasicBlock *BB : BlocksToRemove)
				Phi->removeIncomingValue(BB);
				}
				}

				/// Checks if BB was already cloned for a particular next state value. If it
				/// was then it returns this cloned block, and otherwise null.
				BasicBlock getClonedBB(BasicBlock BB, uint64_t NextState,
				DuplicateBlockMap &DuplicateMap) {
				CloneList ClonedBBs = DuplicateMap[BB];

				// Find an entry in the CloneList with this NextState. If it exists then
				// return the corresponding BB
				auto It = llvm::find_if(ClonedBBs, [NextState](const ClonedBlock &C) {
				return C.State == NextState;
				});
				return It != ClonedBBs.end() ? (*It).BB : nullptr;
				}

				/// Helper to get the successor corresponding to a particular case value for
				/// a switch statement.
				BasicBlock getNextCaseSuccessor(SwitchInst Switch, uint64_t NextState) {
				BasicBlock *NextCase = nullptr;
				for (auto Case : Switch->cases()) {
				if (Case.getCaseValue()->getZExtValue() == NextState) {
				NextCase = Case.getCaseSuccessor();
				break;
				}
				}
				if (!NextCase)
				NextCase = Switch->getDefaultDest();
				return NextCase;
				}

				/// Returns true if IncomingBB is a predecessor of BB.
				bool isPredecessor(BasicBlock BB, BasicBlock IncomingBB) {
				return llvm::find(predecessors(BB), IncomingBB) != pred_end(BB);
				}

				AllSwitchPaths *SwitchPaths;
				DominatorTree *DT;
				AssumptionCache *AC;
				TargetTransformInfo *TTI;
				OptimizationRemarkEmitter *ORE;
				SmallPtrSet<const Value *, 32> EphValues;
				std::vector<ThreadingPath> TPaths;
				};

				bool DFAJumpThreading::run(Function &F) {
				LLVM_DEBUG(dbgs() << "\nDFA Jump threading: " << F.getName() << "\n");

				if (F.hasOptSize()) {
				LLVM_DEBUG(dbgs() << "Skipping due to the 'minsize' attribute\n");
				return false;
				}

				if (ClViewCfgBefore)
				F.viewCFG();

				SmallVector<AllSwitchPaths, 2> ThreadableLoops;
				bool MadeChanges = false;

				for (BasicBlock &BB : F) {
				auto *SI = dyn_cast<SwitchInst>(BB.getTerminator());
				if (!SI)
				continue;

				LLVM_DEBUG(dbgs() << "\nCheck if SwitchInst in BB " << BB.getName()
				<< " is predictable\n");
				MainSwitch Switch(SI, ORE);

				if (!Switch.getInstr())
				continue;

				LLVM_DEBUG(dbgs() << "\nSwitchInst in BB " << BB.getName() << " is a "
				<< "candidate for jump threading\n");
				LLVM_DEBUG(SI->dump());

				unfoldSelectInstrs(DT, Switch.getSelectInsts());
				if (!Switch.getSelectInsts().empty())
				MadeChanges = true;

				AllSwitchPaths SwitchPaths(&Switch, ORE);
				SwitchPaths.run();

				if (SwitchPaths.getNumThreadingPaths() > 0) {
				ThreadableLoops.push_back(SwitchPaths);

				// For the time being limit this optimization to occurring once in a
				// function since it can change the CFG significantly. This is not a
				// strict requirement but it can cause buggy behavior if there is an
				// overlap of blocks in different opportunities. There is a lot of room to
				// experiment with catching more opportunities here.
				break;
				}
				}

				SmallPtrSet<const Value *, 32> EphValues;
				if (ThreadableLoops.size() > 0)
				CodeMetrics::collectEphemeralValues(&F, AC, EphValues);

				for (AllSwitchPaths SwitchPaths : ThreadableLoops) {
				TransformDFA Transform(&SwitchPaths, DT, AC, TTI, ORE, EphValues);
				Transform.run();
				MadeChanges = true;
				}

				#ifdef EXPENSIVE_CHECKS
				assert(DT->verify(DominatorTree::VerificationLevel::Full));
				verifyFunction(F, &dbgs());
				#endif

				return MadeChanges;
				}

				} // end anonymous namespace

				/// Integrate with the new Pass Manager
				PreservedAnalyses DFAJumpThreadingPass::run(Function &F,
				FunctionAnalysisManager &AM) {
				AssumptionCache &AC = AM.getResult<AssumptionAnalysis>(F);
				DominatorTree &DT = AM.getResult<DominatorTreeAnalysis>(F);
				TargetTransformInfo &TTI = AM.getResult<TargetIRAnalysis>(F);
				OptimizationRemarkEmitter ORE(&F);

				if (!DFAJumpThreading(&AC, &DT, &TTI, &ORE).run(F))
				return PreservedAnalyses::all();

				PreservedAnalyses PA;
				PA.preserve<DominatorTreeAnalysis>();
				return PA;
				}

llvm/lib/Transforms/Scalar/Scalar.cpp

Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	void llvm::initializeScalarOpts(PassRegistry &Registry) {
initializeGVNHoistLegacyPassPass(Registry);		initializeGVNHoistLegacyPassPass(Registry);
initializeGVNSinkLegacyPassPass(Registry);		initializeGVNSinkLegacyPassPass(Registry);
initializeFlattenCFGPassPass(Registry);		initializeFlattenCFGPassPass(Registry);
initializeIRCELegacyPassPass(Registry);		initializeIRCELegacyPassPass(Registry);
initializeIndVarSimplifyLegacyPassPass(Registry);		initializeIndVarSimplifyLegacyPassPass(Registry);
initializeInferAddressSpacesPass(Registry);		initializeInferAddressSpacesPass(Registry);
initializeInstSimplifyLegacyPassPass(Registry);		initializeInstSimplifyLegacyPassPass(Registry);
initializeJumpThreadingPass(Registry);		initializeJumpThreadingPass(Registry);
		initializeDFAJumpThreadingLegacyPassPass(Registry);
initializeLegacyLICMPassPass(Registry);		initializeLegacyLICMPassPass(Registry);
initializeLegacyLoopSinkPassPass(Registry);		initializeLegacyLoopSinkPassPass(Registry);
initializeLoopFuseLegacyPass(Registry);		initializeLoopFuseLegacyPass(Registry);
initializeLoopDataPrefetchLegacyPassPass(Registry);		initializeLoopDataPrefetchLegacyPassPass(Registry);
initializeLoopDeletionLegacyPassPass(Registry);		initializeLoopDeletionLegacyPassPass(Registry);
initializeLoopAccessLegacyAnalysisPass(Registry);		initializeLoopAccessLegacyAnalysisPass(Registry);
initializeLoopInstSimplifyLegacyPassPass(Registry);		initializeLoopInstSimplifyLegacyPassPass(Registry);
initializeLoopInterchangeLegacyPassPass(Registry);		initializeLoopInterchangeLegacyPassPass(Registry);
▲ Show 20 Lines • Show All 237 Lines • Show Last 20 Lines

llvm/test/Transforms/DFAJumpThreading/dfa-constant-propagation.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -S -dfa-jump-threading -sccp -simplifycfg %s \| FileCheck %s

				; This test checks that a constant propagation is applied for a basic loop.
				; Related to bug 44679.
				define i32 @test(i32 %a) {
				; CHECK-LABEL: @test(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: ret i32 3
				;
				entry:
				br label %while.cond

				while.cond:
				%num = phi i32 [ 0, %entry ], [ %add, %case1 ]
				%state = phi i32 [ 1, %entry ], [ %state.next, %case1 ]
				switch i32 %state, label %end [
				i32 1, label %case1
				i32 2, label %case2
				]

				case1:
				%state.next = phi i32 [ 3, %case2 ], [ 2, %while.cond ]
				%add = add nsw i32 %num, %state
				br label %while.cond

				case2:
				br label %case1

				end:
				ret i32 %num
				}

llvm/test/Transforms/DFAJumpThreading/dfa-jump-threading-analysis.ll

This file was added.

				; REQUIRES: asserts
				; RUN: opt -S -dfa-jump-threading -debug-only=dfa-jump-threading -disable-output %s 2>&1 \| FileCheck %s

				; This test checks that the analysis identifies all threadable paths in a
				; simple CFG. A threadable path includes a list of basic blocks, the exit
				; state, and the block that determines the next state.
				; < path of BBs that form a cycle > [ state, determinator ]
				define i32 @test1(i32 %num) {
				; CHECK: < for.body for.inc > [ 1, for.inc ]
				; CHECK-NEXT: < for.body case1 for.inc > [ 2, for.inc ]
				; CHECK-NEXT: < for.body case2 for.inc > [ 1, for.inc ]
				; CHECK-NEXT: < for.body case2 si.unfold.false for.inc > [ 2, for.inc ]
				entry:
				br label %for.body

				for.body:
				%count = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
				%state = phi i32 [ 1, %entry ], [ %state.next, %for.inc ]
				switch i32 %state, label %for.inc [
				i32 1, label %case1
				i32 2, label %case2
				]

				case1:
				br label %for.inc

				case2:
				%cmp = icmp eq i32 %count, 50
				%sel = select i1 %cmp, i32 1, i32 2
				br label %for.inc

				for.inc:
				%state.next = phi i32 [ %sel, %case2 ], [ 1, %for.body ], [ 2, %case1 ]
				%inc = add nsw i32 %count, 1
				%cmp.exit = icmp slt i32 %inc, %num
				br i1 %cmp.exit, label %for.body, label %for.end

				for.end:
				ret i32 0
				}

				; This test checks that the analysis finds threadable paths in a more
				; complicated CFG. Here the FSM is represented as a nested loop, with
				; fallthrough cases.
				define i32 @test2(i32 %init) {
				; CHECK: < loop.3 case2 > [ 3, loop.3 ]
				; CHECK-NEXT: < loop.3 case2 loop.1.backedge loop.1 loop.2 > [ 1, loop.1 ]
				; CHECK-NEXT: < loop.3 case2 loop.1.backedge si.unfold.false loop.1 loop.2 > [ 4, loop.1.backedge ]
				; CHECK-NEXT: < loop.3 case3 loop.2.backedge loop.2 > [ 0, loop.2.backedge ]
				; CHECK-NEXT: < loop.3 case3 case4 loop.2.backedge loop.2 > [ 3, loop.2.backedge ]
				; CHECK-NEXT: < loop.3 case3 case4 loop.1.backedge loop.1 loop.2 > [ 1, loop.1 ]
				; CHECK-NEXT: < loop.3 case3 case4 loop.1.backedge si.unfold.false loop.1 loop.2 > [ 2, loop.1.backedge ]
				; CHECK-NEXT: < loop.3 case4 loop.2.backedge loop.2 > [ 3, loop.2.backedge ]
				; CHECK-NEXT: < loop.3 case4 loop.1.backedge loop.1 loop.2 > [ 1, loop.1 ]
				; CHECK-NEXT: < loop.3 case4 loop.1.backedge si.unfold.false loop.1 loop.2 > [ 2, loop.1.backedge ]
				entry:
				%cmp = icmp eq i32 %init, 0
				%sel = select i1 %cmp, i32 0, i32 2
				br label %loop.1

				loop.1:
				%state.1 = phi i32 [ %sel, %entry ], [ %state.1.be2, %loop.1.backedge ]
				br label %loop.2

				loop.2:
				%state.2 = phi i32 [ %state.1, %loop.1 ], [ %state.2.be, %loop.2.backedge ]
				br label %loop.3

				loop.3:
				%state = phi i32 [ %state.2, %loop.2 ], [ 3, %case2 ]
				switch i32 %state, label %infloop.i [
				i32 2, label %case2
				i32 3, label %case3
				i32 4, label %case4
				i32 0, label %case0
				i32 1, label %case1
				]

				case2:
				br i1 %cmp, label %loop.3, label %loop.1.backedge

				case3:
				br i1 %cmp, label %loop.2.backedge, label %case4

				case4:
				br i1 %cmp, label %loop.2.backedge, label %loop.1.backedge

				loop.1.backedge:
				%state.1.be = phi i32 [ 2, %case4 ], [ 4, %case2 ]
				%state.1.be2 = select i1 %cmp, i32 1, i32 %state.1.be
				br label %loop.1

				loop.2.backedge:
				%state.2.be = phi i32 [ 3, %case4 ], [ 0, %case3 ]
				br label %loop.2

				case0:
				br label %exit

				case1:
				br label %exit

				infloop.i:
				br label %infloop.i

				exit:
				ret i32 0
				}

				declare void @baz()

				; Verify that having the switch block as a determinator is handled correctly.
				define i32 @main() {
				; CHECK: < bb43 bb59 bb3 bb31 bb41 > [ 77, bb43 ]
				; CHECK-NEXT: < bb43 bb49 bb59 bb3 bb31 bb41 > [ 77, bb43 ]
				bb:
				%i = alloca [420 x i8], align 1
				%i2 = getelementptr inbounds [420 x i8], [420 x i8]* %i, i64 0, i64 390
				br label %bb3

				bb3: ; preds = %bb59, %bb
				%i4 = phi i8* [ %i2, %bb ], [ %i60, %bb59 ]
				%i5 = phi i8 [ 77, %bb ], [ %i64, %bb59 ]
				%i6 = phi i32 [ 2, %bb ], [ %i63, %bb59 ]
				%i7 = phi i32 [ 26, %bb ], [ %i62, %bb59 ]
				%i8 = phi i32 [ 25, %bb ], [ %i61, %bb59 ]
				%i9 = icmp sgt i32 %i7, 2
				%i10 = select i1 %i9, i32 %i7, i32 2
				%i11 = add i32 %i8, 2
				%i12 = sub i32 %i11, %i10
				%i13 = mul nsw i32 %i12, 3
				%i14 = add nsw i32 %i13, %i6
				%i15 = sext i32 %i14 to i64
				%i16 = getelementptr inbounds i8, i8* %i4, i64 %i15
				%i17 = load i8, i8* %i16, align 1
				%i18 = icmp sgt i8 %i17, 0
				br i1 %i18, label %bb21, label %bb31

				bb21: ; preds = %bb3
				br i1 true, label %bb59, label %bb43

				bb59: ; preds = %bb49, %bb43, %bb31, %bb21
				%i60 = phi i8* [ %i44, %bb49 ], [ %i44, %bb43 ], [ %i34, %bb31 ], [ %i4, %bb21 ]
				%i61 = phi i32 [ %i45, %bb49 ], [ %i45, %bb43 ], [ %i33, %bb31 ], [ %i8, %bb21 ]
				%i62 = phi i32 [ %i47, %bb49 ], [ %i47, %bb43 ], [ %i32, %bb31 ], [ %i7, %bb21 ]
				%i63 = phi i32 [ %i48, %bb49 ], [ %i48, %bb43 ], [ 2, %bb31 ], [ %i6, %bb21 ]
				%i64 = phi i8 [ %i46, %bb49 ], [ %i46, %bb43 ], [ 77, %bb31 ], [ %i5, %bb21 ]
				%i65 = icmp sgt i32 %i62, 0
				br i1 %i65, label %bb3, label %bb66

				bb31: ; preds = %bb3
				%i32 = add nsw i32 %i7, -1
				%i33 = add nsw i32 %i8, -1
				%i34 = getelementptr inbounds i8, i8* %i4, i64 -15
				%i35 = icmp eq i8 %i5, 77
				br i1 %i35, label %bb59, label %bb41

				bb41: ; preds = %bb31
				tail call void @baz()
				br label %bb43

				bb43: ; preds = %bb41, %bb21
				%i44 = phi i8* [ %i34, %bb41 ], [ %i4, %bb21 ]
				%i45 = phi i32 [ %i33, %bb41 ], [ %i8, %bb21 ]
				%i46 = phi i8 [ 77, %bb41 ], [ %i5, %bb21 ]
				%i47 = phi i32 [ %i32, %bb41 ], [ %i7, %bb21 ]
				%i48 = phi i32 [ 2, %bb41 ], [ %i6, %bb21 ]
				tail call void @baz()
				switch i8 %i5, label %bb59 [
				i8 68, label %bb49
				i8 73, label %bb49
				]

				bb49: ; preds = %bb43, %bb43
				tail call void @baz()
				br label %bb59

				bb66: ; preds = %bb59
				ret i32 0
				}

llvm/test/Transforms/DFAJumpThreading/dfa-jump-threading-transform.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -S -dfa-jump-threading %s \| FileCheck %s

				; These tests check that the DFA jump threading transformation is applied
				; properly to two CFGs. It checks that blocks are cloned, branches are updated,
				; and SSA form is restored.
				define i32 @test1(i32 %num) {
				; CHECK-LABEL: @test1(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[COUNT:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_INC:%.]] ]
				; CHECK-NEXT: [[STATE:%.*]] = phi i32 [ 1, [[ENTRY]] ], [ undef, [[FOR_INC]] ]
				; CHECK-NEXT: switch i32 [[STATE]], label [[FOR_INC_JT1:%.*]] [
				; CHECK-NEXT: i32 1, label [[CASE1:%.*]]
				; CHECK-NEXT: i32 2, label [[CASE2:%.*]]
				; CHECK-NEXT: ]
				; CHECK: for.body.jt2:
				; CHECK-NEXT: [[COUNT_JT2:%.]] = phi i32 [ [[INC_JT2:%.]], [[FOR_INC_JT2:%.*]] ]
				; CHECK-NEXT: [[STATE_JT2:%.]] = phi i32 [ [[STATE_NEXT_JT2:%.]], [[FOR_INC_JT2]] ]
				SjoerdMeijerUnsubmitted Done Reply Inline Actions Perhaps check the full IR for clarity. SjoerdMeijer: Perhaps check the full IR for clarity.
				; CHECK-NEXT: br label [[CASE2]]
				; CHECK: for.body.jt1:
				; CHECK-NEXT: [[COUNT_JT1:%.]] = phi i32 [ [[INC_JT1:%.]], [[FOR_INC_JT1]] ]
				; CHECK-NEXT: [[STATE_JT1:%.]] = phi i32 [ [[STATE_NEXT_JT1:%.]], [[FOR_INC_JT1]] ]
				; CHECK-NEXT: br label [[CASE1]]
				; CHECK: case1:
				; CHECK-NEXT: [[COUNT2:%.]] = phi i32 [ [[COUNT_JT1]], [[FOR_BODY_JT1:%.]] ], [ [[COUNT]], [[FOR_BODY]] ]
				; CHECK-NEXT: br label [[FOR_INC_JT2]]
				; CHECK: case2:
				; CHECK-NEXT: [[COUNT1:%.]] = phi i32 [ [[COUNT_JT2]], [[FOR_BODY_JT2:%.]] ], [ [[COUNT]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[COUNT1]], 50
				; CHECK-NEXT: br i1 [[CMP]], label [[FOR_INC_JT1]], label [[SI_UNFOLD_FALSE:%.*]]
				; CHECK: si.unfold.false:
				; CHECK-NEXT: br label [[FOR_INC_JT2]]
				; CHECK: for.inc:
				; CHECK-NEXT: [[INC]] = add nsw i32 undef, 1
				; CHECK-NEXT: [[CMP_EXIT:%.]] = icmp slt i32 [[INC]], [[NUM:%.]]
				; CHECK-NEXT: br i1 [[CMP_EXIT]], label [[FOR_BODY]], label [[FOR_END:%.*]]
				; CHECK: for.inc.jt2:
				; CHECK-NEXT: [[COUNT4:%.*]] = phi i32 [ [[COUNT1]], [[SI_UNFOLD_FALSE]] ], [ [[COUNT2]], [[CASE1]] ]
				; CHECK-NEXT: [[STATE_NEXT_JT2]] = phi i32 [ 2, [[CASE1]] ], [ 2, [[SI_UNFOLD_FALSE]] ]
				; CHECK-NEXT: [[INC_JT2]] = add nsw i32 [[COUNT4]], 1
				; CHECK-NEXT: [[CMP_EXIT_JT2:%.*]] = icmp slt i32 [[INC_JT2]], [[NUM]]
				; CHECK-NEXT: br i1 [[CMP_EXIT_JT2]], label [[FOR_BODY_JT2]], label [[FOR_END]]
				; CHECK: for.inc.jt1:
				; CHECK-NEXT: [[COUNT3:%.*]] = phi i32 [ [[COUNT1]], [[CASE2]] ], [ [[COUNT]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[STATE_NEXT_JT1]] = phi i32 [ 1, [[CASE2]] ], [ 1, [[FOR_BODY]] ]
				; CHECK-NEXT: [[INC_JT1]] = add nsw i32 [[COUNT3]], 1
				; CHECK-NEXT: [[CMP_EXIT_JT1:%.*]] = icmp slt i32 [[INC_JT1]], [[NUM]]
				; CHECK-NEXT: br i1 [[CMP_EXIT_JT1]], label [[FOR_BODY_JT1]], label [[FOR_END]]
				; CHECK: for.end:
				; CHECK-NEXT: ret i32 0
				;
				entry:
				br label %for.body

				for.body:
				%count = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
				%state = phi i32 [ 1, %entry ], [ %state.next, %for.inc ]
				switch i32 %state, label %for.inc [
				i32 1, label %case1
				i32 2, label %case2
				]

				case1:
				br label %for.inc

				case2:
				%cmp = icmp eq i32 %count, 50
				%sel = select i1 %cmp, i32 1, i32 2
				br label %for.inc

				for.inc:
				%state.next = phi i32 [ %sel, %case2 ], [ 1, %for.body ], [ 2, %case1 ]
				%inc = add nsw i32 %count, 1
				%cmp.exit = icmp slt i32 %inc, %num
				br i1 %cmp.exit, label %for.body, label %for.end

				for.end:
				ret i32 0
				}


				define i32 @test2(i32 %init) {
				; CHECK-LABEL: @test2(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[INIT:%.]], 0
				; CHECK-NEXT: br i1 [[CMP]], label [[LOOP_1:%.]], label [[SI_UNFOLD_FALSE1:%.]]
				; CHECK: si.unfold.false:
				; CHECK-NEXT: br label [[LOOP_1]]
				; CHECK: si.unfold.false.jt2:
				; CHECK-NEXT: br label [[LOOP_1_JT2:%.*]]
				; CHECK: si.unfold.false.jt4:
				; CHECK-NEXT: br label [[LOOP_1_JT4:%.*]]
				; CHECK: si.unfold.false1:
				; CHECK-NEXT: br label [[LOOP_1]]
				; CHECK: loop.1:
				; CHECK-NEXT: [[STATE_1:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ undef, [[SI_UNFOLD_FALSE:%.*]] ], [ 2, [[SI_UNFOLD_FALSE1]] ]
				; CHECK-NEXT: br label [[LOOP_2:%.*]]
				; CHECK: loop.1.jt2:
				; CHECK-NEXT: [[STATE_1_JT2:%.]] = phi i32 [ [[STATE_1_BE_JT2:%.]], [[SI_UNFOLD_FALSE_JT2:%.*]] ]
				; CHECK-NEXT: br label [[LOOP_2_JT2:%.*]]
				; CHECK: loop.1.jt4:
				; CHECK-NEXT: [[STATE_1_JT4:%.]] = phi i32 [ [[STATE_1_BE_JT4:%.]], [[SI_UNFOLD_FALSE_JT4:%.*]] ]
				; CHECK-NEXT: br label [[LOOP_2_JT4:%.*]]
				; CHECK: loop.1.jt1:
				; CHECK-NEXT: [[STATE_1_JT1:%.]] = phi i32 [ 1, [[LOOP_1_BACKEDGE:%.]] ], [ 1, [[LOOP_1_BACKEDGE_JT4:%.]] ], [ 1, [[LOOP_1_BACKEDGE_JT2:%.]] ]
				; CHECK-NEXT: br label [[LOOP_2_JT1:%.*]]
				; CHECK: loop.2:
				; CHECK-NEXT: [[STATE_2:%.]] = phi i32 [ [[STATE_1]], [[LOOP_1]] ], [ undef, [[LOOP_2_BACKEDGE:%.]] ]
				; CHECK-NEXT: br label [[LOOP_3:%.*]]
				; CHECK: loop.2.jt2:
				; CHECK-NEXT: [[STATE_2_JT2:%.*]] = phi i32 [ [[STATE_1_JT2]], [[LOOP_1_JT2]] ]
				; CHECK-NEXT: br label [[LOOP_3_JT2:%.*]]
				; CHECK: loop.2.jt3:
				; CHECK-NEXT: [[STATE_2_JT3:%.]] = phi i32 [ [[STATE_2_BE_JT3:%.]], [[LOOP_2_BACKEDGE_JT3:%.*]] ]
				; CHECK-NEXT: br label [[LOOP_3_JT3:%.*]]
				; CHECK: loop.2.jt0:
				; CHECK-NEXT: [[STATE_2_JT0:%.]] = phi i32 [ [[STATE_2_BE_JT0:%.]], [[LOOP_2_BACKEDGE_JT0:%.*]] ]
				; CHECK-NEXT: br label [[LOOP_3_JT0:%.*]]
				; CHECK: loop.2.jt4:
				; CHECK-NEXT: [[STATE_2_JT4:%.*]] = phi i32 [ [[STATE_1_JT4]], [[LOOP_1_JT4]] ]
				; CHECK-NEXT: br label [[LOOP_3_JT4:%.*]]
				; CHECK: loop.2.jt1:
				; CHECK-NEXT: [[STATE_2_JT1:%.]] = phi i32 [ [[STATE_1_JT1]], [[LOOP_1_JT1:%.]] ]
				; CHECK-NEXT: br label [[LOOP_3_JT1:%.*]]
				; CHECK: loop.3:
				; CHECK-NEXT: [[STATE:%.*]] = phi i32 [ [[STATE_2]], [[LOOP_2]] ]
				; CHECK-NEXT: switch i32 [[STATE]], label [[INFLOOP_I:%.*]] [
				; CHECK-NEXT: i32 2, label [[CASE2:%.*]]
				; CHECK-NEXT: i32 3, label [[CASE3:%.*]]
				; CHECK-NEXT: i32 4, label [[CASE4:%.*]]
				; CHECK-NEXT: i32 0, label [[CASE0:%.*]]
				; CHECK-NEXT: i32 1, label [[CASE1:%.*]]
				; CHECK-NEXT: ]
				; CHECK: loop.3.jt2:
				; CHECK-NEXT: [[STATE_JT2:%.*]] = phi i32 [ [[STATE_2_JT2]], [[LOOP_2_JT2]] ]
				; CHECK-NEXT: br label [[CASE2]]
				; CHECK: loop.3.jt0:
				; CHECK-NEXT: [[STATE_JT0:%.]] = phi i32 [ [[STATE_2_JT0]], [[LOOP_2_JT0:%.]] ]
				; CHECK-NEXT: br label [[CASE0]]
				; CHECK: loop.3.jt4:
				; CHECK-NEXT: [[STATE_JT4:%.*]] = phi i32 [ [[STATE_2_JT4]], [[LOOP_2_JT4]] ]
				; CHECK-NEXT: br label [[CASE4]]
				; CHECK: loop.3.jt1:
				; CHECK-NEXT: [[STATE_JT1:%.*]] = phi i32 [ [[STATE_2_JT1]], [[LOOP_2_JT1]] ]
				; CHECK-NEXT: br label [[CASE1]]
				; CHECK: loop.3.jt3:
				; CHECK-NEXT: [[STATE_JT3:%.]] = phi i32 [ 3, [[CASE2]] ], [ [[STATE_2_JT3]], [[LOOP_2_JT3:%.]] ]
				; CHECK-NEXT: br label [[CASE3]]
				; CHECK: case2:
				; CHECK-NEXT: br i1 [[CMP]], label [[LOOP_3_JT3]], label [[LOOP_1_BACKEDGE_JT4]]
				; CHECK: case3:
				; CHECK-NEXT: br i1 [[CMP]], label [[LOOP_2_BACKEDGE_JT0]], label [[CASE4]]
				; CHECK: case4:
				; CHECK-NEXT: br i1 [[CMP]], label [[LOOP_2_BACKEDGE_JT3]], label [[LOOP_1_BACKEDGE_JT2]]
				; CHECK: loop.1.backedge:
				; CHECK-NEXT: br i1 [[CMP]], label [[LOOP_1_JT1]], label [[SI_UNFOLD_FALSE]]
				; CHECK: loop.1.backedge.jt2:
				; CHECK-NEXT: [[STATE_1_BE_JT2]] = phi i32 [ 2, [[CASE4]] ]
				; CHECK-NEXT: br i1 [[CMP]], label [[LOOP_1_JT1]], label [[SI_UNFOLD_FALSE_JT2]]
				; CHECK: loop.1.backedge.jt4:
				; CHECK-NEXT: [[STATE_1_BE_JT4]] = phi i32 [ 4, [[CASE2]] ]
				; CHECK-NEXT: br i1 [[CMP]], label [[LOOP_1_JT1]], label [[SI_UNFOLD_FALSE_JT4]]
				; CHECK: loop.2.backedge:
				; CHECK-NEXT: br label [[LOOP_2]]
				; CHECK: loop.2.backedge.jt3:
				; CHECK-NEXT: [[STATE_2_BE_JT3]] = phi i32 [ 3, [[CASE4]] ]
				; CHECK-NEXT: br label [[LOOP_2_JT3]]
				; CHECK: loop.2.backedge.jt0:
				; CHECK-NEXT: [[STATE_2_BE_JT0]] = phi i32 [ 0, [[CASE3]] ]
				; CHECK-NEXT: br label [[LOOP_2_JT0]]
				; CHECK: case0:
				; CHECK-NEXT: br label [[EXIT:%.*]]
				; CHECK: case1:
				; CHECK-NEXT: br label [[EXIT]]
				; CHECK: infloop.i:
				; CHECK-NEXT: br label [[INFLOOP_I]]
				; CHECK: exit:
				; CHECK-NEXT: ret i32 0
				;
				entry:
				%cmp = icmp eq i32 %init, 0
				%sel = select i1 %cmp, i32 0, i32 2
				br label %loop.1

				loop.1:
				%state.1 = phi i32 [ %sel, %entry ], [ %state.1.be2, %loop.1.backedge ]
				br label %loop.2

				loop.2:
				%state.2 = phi i32 [ %state.1, %loop.1 ], [ %state.2.be, %loop.2.backedge ]
				br label %loop.3

				loop.3:
				%state = phi i32 [ %state.2, %loop.2 ], [ 3, %case2 ]
				switch i32 %state, label %infloop.i [
				i32 2, label %case2
				i32 3, label %case3
				i32 4, label %case4
				i32 0, label %case0
				i32 1, label %case1
				]

				case2:
				br i1 %cmp, label %loop.3, label %loop.1.backedge

				case3:
				br i1 %cmp, label %loop.2.backedge, label %case4

				case4:
				br i1 %cmp, label %loop.2.backedge, label %loop.1.backedge

				loop.1.backedge:
				%state.1.be = phi i32 [ 2, %case4 ], [ 4, %case2 ]
				%state.1.be2 = select i1 %cmp, i32 1, i32 %state.1.be
				br label %loop.1

				loop.2.backedge:
				%state.2.be = phi i32 [ 3, %case4 ], [ 0, %case3 ]
				br label %loop.2

				case0:
				br label %exit

				case1:
				br label %exit

				infloop.i:
				br label %infloop.i

				exit:
				ret i32 0
				}

llvm/test/Transforms/DFAJumpThreading/dfa-unfold-select.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -S -dfa-jump-threading %s \| FileCheck %s

				; These tests check if selects are unfolded properly for jump threading
				; opportunities. There are three different patterns to consider:
				; 1) Both operands are constant and the false branch is unfolded by default
				; 2) One operand is constant and the other is another select to be unfolded. In
				; this case a single select is sunk to a new block to unfold.
				; 3) Both operands are a select, and both should be sunk to new blocks.
				define i32 @test1(i32 %num) {
				; CHECK-LABEL: @test1(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[COUNT:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_INC:%.]] ]
				; CHECK-NEXT: [[STATE:%.*]] = phi i32 [ 1, [[ENTRY]] ], [ undef, [[FOR_INC]] ]
				; CHECK-NEXT: switch i32 [[STATE]], label [[FOR_INC_JT1:%.*]] [
				; CHECK-NEXT: i32 1, label [[CASE1:%.*]]
				; CHECK-NEXT: i32 2, label [[CASE2:%.*]]
				; CHECK-NEXT: ]
				; CHECK: for.body.jt2:
				; CHECK-NEXT: [[COUNT_JT2:%.]] = phi i32 [ [[INC_JT2:%.]], [[FOR_INC_JT2:%.*]] ]
				; CHECK-NEXT: [[STATE_JT2:%.]] = phi i32 [ [[STATE_NEXT_JT2:%.]], [[FOR_INC_JT2]] ]
				; CHECK-NEXT: br label [[CASE2]]
				; CHECK: for.body.jt1:
				; CHECK-NEXT: [[COUNT_JT1:%.]] = phi i32 [ [[INC_JT1:%.]], [[FOR_INC_JT1]] ]
				; CHECK-NEXT: [[STATE_JT1:%.]] = phi i32 [ [[STATE_NEXT_JT1:%.]], [[FOR_INC_JT1]] ]
				; CHECK-NEXT: br label [[CASE1]]
				; CHECK: case1:
				; CHECK-NEXT: [[COUNT2:%.]] = phi i32 [ [[COUNT_JT1]], [[FOR_BODY_JT1:%.]] ], [ [[COUNT]], [[FOR_BODY]] ]
				; CHECK-NEXT: br label [[FOR_INC_JT2]]
				; CHECK: case2:
				; CHECK-NEXT: [[COUNT1:%.]] = phi i32 [ [[COUNT_JT2]], [[FOR_BODY_JT2:%.]] ], [ [[COUNT]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[COUNT1]], 50
				; CHECK-NEXT: br i1 [[CMP]], label [[FOR_INC_JT1]], label [[SI_UNFOLD_FALSE:%.*]]
				; CHECK: si.unfold.false:
				; CHECK-NEXT: br label [[FOR_INC_JT2]]
				; CHECK: for.inc:
				; CHECK-NEXT: [[INC]] = add nsw i32 undef, 1
				; CHECK-NEXT: [[CMP_EXIT:%.]] = icmp slt i32 [[INC]], [[NUM:%.]]
				; CHECK-NEXT: br i1 [[CMP_EXIT]], label [[FOR_BODY]], label [[FOR_END:%.*]]
				; CHECK: for.inc.jt2:
				; CHECK-NEXT: [[COUNT4:%.*]] = phi i32 [ [[COUNT1]], [[SI_UNFOLD_FALSE]] ], [ [[COUNT2]], [[CASE1]] ]
				; CHECK-NEXT: [[STATE_NEXT_JT2]] = phi i32 [ 2, [[CASE1]] ], [ 2, [[SI_UNFOLD_FALSE]] ]
				; CHECK-NEXT: [[INC_JT2]] = add nsw i32 [[COUNT4]], 1
				; CHECK-NEXT: [[CMP_EXIT_JT2:%.*]] = icmp slt i32 [[INC_JT2]], [[NUM]]
				; CHECK-NEXT: br i1 [[CMP_EXIT_JT2]], label [[FOR_BODY_JT2]], label [[FOR_END]]
				; CHECK: for.inc.jt1:
				; CHECK-NEXT: [[COUNT3:%.*]] = phi i32 [ [[COUNT1]], [[CASE2]] ], [ [[COUNT]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[STATE_NEXT_JT1]] = phi i32 [ 1, [[CASE2]] ], [ 1, [[FOR_BODY]] ]
				; CHECK-NEXT: [[INC_JT1]] = add nsw i32 [[COUNT3]], 1
				; CHECK-NEXT: [[CMP_EXIT_JT1:%.*]] = icmp slt i32 [[INC_JT1]], [[NUM]]
				; CHECK-NEXT: br i1 [[CMP_EXIT_JT1]], label [[FOR_BODY_JT1]], label [[FOR_END]]
				; CHECK: for.end:
				; CHECK-NEXT: ret i32 0
				;
				entry:
				br label %for.body

				for.body:
				%count = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
				%state = phi i32 [ 1, %entry ], [ %state.next, %for.inc ]
				switch i32 %state, label %for.inc [
				i32 1, label %case1
				i32 2, label %case2
				]

				case1:
				br label %for.inc

				case2:
				%cmp = icmp slt i32 %count, 50
				%sel = select i1 %cmp, i32 1, i32 2
				br label %for.inc

				for.inc:
				%state.next = phi i32 [ %sel, %case2 ], [ 1, %for.body ], [ 2, %case1 ]
				%inc = add nsw i32 %count, 1
				%cmp.exit = icmp slt i32 %inc, %num
				br i1 %cmp.exit, label %for.body, label %for.end

				for.end:
				ret i32 0
				}

				define i32 @test2(i32 %num) {
				; CHECK-LABEL: @test2(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[COUNT:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_INC:%.]] ]
				; CHECK-NEXT: [[STATE:%.*]] = phi i32 [ 1, [[ENTRY]] ], [ undef, [[FOR_INC]] ]
				; CHECK-NEXT: switch i32 [[STATE]], label [[FOR_INC_JT1:%.*]] [
				; CHECK-NEXT: i32 1, label [[CASE1:%.*]]
				; CHECK-NEXT: i32 2, label [[CASE2:%.*]]
				; CHECK-NEXT: ]
				; CHECK: for.body.jt3:
				; CHECK-NEXT: [[COUNT_JT3:%.]] = phi i32 [ [[INC_JT3:%.]], [[FOR_INC_JT3:%.*]] ]
				; CHECK-NEXT: [[STATE_JT3:%.]] = phi i32 [ [[STATE_NEXT_JT3:%.]], [[FOR_INC_JT3]] ]
				; CHECK-NEXT: br label [[FOR_INC_JT1]]
				; CHECK: for.body.jt2:
				; CHECK-NEXT: [[COUNT_JT2:%.]] = phi i32 [ [[INC_JT2:%.]], [[FOR_INC_JT2:%.*]] ]
				; CHECK-NEXT: [[STATE_JT2:%.]] = phi i32 [ [[STATE_NEXT_JT2:%.]], [[FOR_INC_JT2]] ]
				; CHECK-NEXT: br label [[CASE2]]
				; CHECK: for.body.jt1:
				; CHECK-NEXT: [[COUNT_JT1:%.]] = phi i32 [ [[INC_JT1:%.]], [[FOR_INC_JT1]] ]
				; CHECK-NEXT: [[STATE_JT1:%.]] = phi i32 [ [[STATE_NEXT_JT1:%.]], [[FOR_INC_JT1]] ]
				; CHECK-NEXT: br label [[CASE1]]
				; CHECK: case1:
				; CHECK-NEXT: [[COUNT5:%.]] = phi i32 [ [[COUNT_JT1]], [[FOR_BODY_JT1:%.]] ], [ [[COUNT]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[CMP_C1:%.*]] = icmp slt i32 [[COUNT5]], 50
				; CHECK-NEXT: [[CMP2_C1:%.*]] = icmp slt i32 [[COUNT5]], 100
				; CHECK-NEXT: br i1 [[CMP2_C1]], label [[SI_UNFOLD_TRUE:%.*]], label [[FOR_INC_JT3]]
				; CHECK: case2:
				; CHECK-NEXT: [[COUNT3:%.]] = phi i32 [ [[COUNT_JT2]], [[FOR_BODY_JT2:%.]] ], [ [[COUNT]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[CMP_C2:%.*]] = icmp slt i32 [[COUNT3]], 50
				; CHECK-NEXT: [[CMP2_C2:%.*]] = icmp sgt i32 [[COUNT3]], 100
				; CHECK-NEXT: br i1 [[CMP2_C2]], label [[FOR_INC_JT3]], label [[SI_UNFOLD_FALSE:%.*]]
				; CHECK: si.unfold.false:
				; CHECK-NEXT: br i1 [[CMP_C2]], label [[FOR_INC_JT1]], label [[SI_UNFOLD_FALSE1:%.*]]
				; CHECK: si.unfold.false1:
				; CHECK-NEXT: br label [[FOR_INC_JT2]]
				; CHECK: si.unfold.true:
				; CHECK-NEXT: br i1 [[CMP_C1]], label [[FOR_INC_JT1]], label [[SI_UNFOLD_FALSE2:%.*]]
				; CHECK: si.unfold.false2:
				; CHECK-NEXT: br label [[FOR_INC_JT2]]
				; CHECK: for.inc:
				; CHECK-NEXT: [[INC]] = add nsw i32 undef, 1
				; CHECK-NEXT: [[CMP_EXIT:%.]] = icmp slt i32 [[INC]], [[NUM:%.]]
				; CHECK-NEXT: br i1 [[CMP_EXIT]], label [[FOR_BODY]], label [[FOR_END:%.*]]
				; CHECK: for.inc.jt3:
				; CHECK-NEXT: [[COUNT6:%.*]] = phi i32 [ [[COUNT3]], [[CASE2]] ], [ [[COUNT5]], [[CASE1]] ]
				; CHECK-NEXT: [[STATE_NEXT_JT3]] = phi i32 [ 3, [[CASE1]] ], [ 3, [[CASE2]] ]
				; CHECK-NEXT: [[INC_JT3]] = add nsw i32 [[COUNT6]], 1
				; CHECK-NEXT: [[CMP_EXIT_JT3:%.*]] = icmp slt i32 [[INC_JT3]], [[NUM]]
				; CHECK-NEXT: br i1 [[CMP_EXIT_JT3]], label [[FOR_BODY_JT3:%.*]], label [[FOR_END]]
				; CHECK: for.inc.jt2:
				; CHECK-NEXT: [[COUNT7:%.*]] = phi i32 [ [[COUNT3]], [[SI_UNFOLD_FALSE1]] ], [ [[COUNT5]], [[SI_UNFOLD_FALSE2]] ]
				; CHECK-NEXT: [[STATE_NEXT_JT2]] = phi i32 [ 2, [[SI_UNFOLD_FALSE1]] ], [ 2, [[SI_UNFOLD_FALSE2]] ]
				; CHECK-NEXT: [[INC_JT2]] = add nsw i32 [[COUNT7]], 1
				; CHECK-NEXT: [[CMP_EXIT_JT2:%.*]] = icmp slt i32 [[INC_JT2]], [[NUM]]
				; CHECK-NEXT: br i1 [[CMP_EXIT_JT2]], label [[FOR_BODY_JT2]], label [[FOR_END]]
				; CHECK: for.inc.jt1:
				; CHECK-NEXT: [[COUNT4:%.*]] = phi i32 [ [[COUNT_JT3]], [[FOR_BODY_JT3]] ], [ [[COUNT3]], [[SI_UNFOLD_FALSE]] ], [ [[COUNT5]], [[SI_UNFOLD_TRUE]] ], [ [[COUNT]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[STATE_NEXT_JT1]] = phi i32 [ 1, [[FOR_BODY]] ], [ 1, [[SI_UNFOLD_FALSE]] ], [ 1, [[SI_UNFOLD_TRUE]] ], [ 1, [[FOR_BODY_JT3]] ]
				; CHECK-NEXT: [[INC_JT1]] = add nsw i32 [[COUNT4]], 1
				; CHECK-NEXT: [[CMP_EXIT_JT1:%.*]] = icmp slt i32 [[INC_JT1]], [[NUM]]
				; CHECK-NEXT: br i1 [[CMP_EXIT_JT1]], label [[FOR_BODY_JT1]], label [[FOR_END]]
				; CHECK: for.end:
				; CHECK-NEXT: ret i32 0
				;
				entry:
				br label %for.body

				for.body:
				%count = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
				%state = phi i32 [ 1, %entry ], [ %state.next, %for.inc ]
				switch i32 %state, label %for.inc [
				i32 1, label %case1
				i32 2, label %case2
				]

				case1:
				%cmp.c1 = icmp slt i32 %count, 50
				%cmp2.c1 = icmp slt i32 %count, 100
				%state1.1 = select i1 %cmp.c1, i32 1, i32 2
				%state1.2 = select i1 %cmp2.c1, i32 %state1.1, i32 3
				br label %for.inc

				case2:
				%cmp.c2 = icmp slt i32 %count, 50
				%cmp2.c2 = icmp sgt i32 %count, 100
				%state2.1 = select i1 %cmp.c2, i32 1, i32 2
				%state2.2 = select i1 %cmp2.c2, i32 3, i32 %state2.1
				br label %for.inc

				for.inc:
				%state.next = phi i32 [ %state1.2, %case1 ], [ %state2.2, %case2 ], [ 1, %for.body ]
				%inc = add nsw i32 %count, 1
				%cmp.exit = icmp slt i32 %inc, %num
				br i1 %cmp.exit, label %for.body, label %for.end

				for.end:
				ret i32 0
				}

				define i32 @test3(i32 %num) {
				; CHECK-LABEL: @test3(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[COUNT:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_INC:%.]] ]
				; CHECK-NEXT: [[STATE:%.*]] = phi i32 [ 1, [[ENTRY]] ], [ undef, [[FOR_INC]] ]
				; CHECK-NEXT: switch i32 [[STATE]], label [[FOR_INC_JT1:%.*]] [
				; CHECK-NEXT: i32 1, label [[CASE1:%.*]]
				; CHECK-NEXT: i32 2, label [[CASE2:%.*]]
				; CHECK-NEXT: ]
				; CHECK: for.body.jt4:
				; CHECK-NEXT: [[COUNT_JT4:%.]] = phi i32 [ [[INC_JT4:%.]], [[FOR_INC_JT4:%.*]] ]
				; CHECK-NEXT: [[STATE_JT4:%.]] = phi i32 [ [[STATE_NEXT_JT4:%.]], [[FOR_INC_JT4]] ]
				; CHECK-NEXT: br label [[FOR_INC_JT1]]
				; CHECK: for.body.jt3:
				; CHECK-NEXT: [[COUNT_JT3:%.]] = phi i32 [ [[INC_JT3:%.]], [[FOR_INC_JT3:%.*]] ]
				; CHECK-NEXT: [[STATE_JT3:%.]] = phi i32 [ [[STATE_NEXT_JT3:%.]], [[FOR_INC_JT3]] ]
				; CHECK-NEXT: br label [[FOR_INC_JT1]]
				; CHECK: for.body.jt2:
				; CHECK-NEXT: [[COUNT_JT2:%.]] = phi i32 [ [[INC_JT2:%.]], [[FOR_INC_JT2:%.*]] ]
				; CHECK-NEXT: [[STATE_JT2:%.]] = phi i32 [ [[STATE_NEXT_JT2:%.]], [[FOR_INC_JT2]] ]
				; CHECK-NEXT: br label [[CASE2]]
				; CHECK: for.body.jt1:
				; CHECK-NEXT: [[COUNT_JT1:%.]] = phi i32 [ [[INC_JT1:%.]], [[FOR_INC_JT1]] ]
				; CHECK-NEXT: [[STATE_JT1:%.]] = phi i32 [ [[STATE_NEXT_JT1:%.]], [[FOR_INC_JT1]] ]
				; CHECK-NEXT: br label [[CASE1]]
				; CHECK: case1:
				; CHECK-NEXT: [[COUNT5:%.]] = phi i32 [ [[COUNT_JT1]], [[FOR_BODY_JT1:%.]] ], [ [[COUNT]], [[FOR_BODY]] ]
				; CHECK-NEXT: br label [[FOR_INC_JT2]]
				; CHECK: case2:
				; CHECK-NEXT: [[COUNT4:%.]] = phi i32 [ [[COUNT_JT2]], [[FOR_BODY_JT2:%.]] ], [ [[COUNT]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[CMP_1:%.*]] = icmp slt i32 [[COUNT4]], 50
				; CHECK-NEXT: [[CMP_2:%.*]] = icmp slt i32 [[COUNT4]], 100
				; CHECK-NEXT: [[TMP0:%.*]] = and i32 [[COUNT4]], 1
				; CHECK-NEXT: [[CMP_3:%.*]] = icmp eq i32 [[TMP0]], 0
				; CHECK-NEXT: br i1 [[CMP_3]], label [[SI_UNFOLD_TRUE:%.]], label [[SI_UNFOLD_FALSE:%.]]
				; CHECK: si.unfold.true:
				; CHECK-NEXT: br i1 [[CMP_1]], label [[FOR_INC_JT1]], label [[SI_UNFOLD_FALSE2:%.*]]
				; CHECK: si.unfold.false:
				; CHECK-NEXT: br i1 [[CMP_2]], label [[FOR_INC_JT3]], label [[SI_UNFOLD_FALSE1:%.*]]
				; CHECK: si.unfold.false1:
				; CHECK-NEXT: br label [[FOR_INC_JT4]]
				; CHECK: si.unfold.false2:
				; CHECK-NEXT: br label [[FOR_INC_JT2]]
				; CHECK: for.inc:
				; CHECK-NEXT: [[INC]] = add nsw i32 undef, 1
				; CHECK-NEXT: [[CMP_EXIT:%.]] = icmp slt i32 [[INC]], [[NUM:%.]]
				; CHECK-NEXT: br i1 [[CMP_EXIT]], label [[FOR_BODY]], label [[FOR_END:%.*]]
				; CHECK: for.inc.jt4:
				; CHECK-NEXT: [[STATE_NEXT_JT4]] = phi i32 [ 4, [[SI_UNFOLD_FALSE1]] ]
				; CHECK-NEXT: [[INC_JT4]] = add nsw i32 [[COUNT4]], 1
				; CHECK-NEXT: [[CMP_EXIT_JT4:%.*]] = icmp slt i32 [[INC_JT4]], [[NUM]]
				; CHECK-NEXT: br i1 [[CMP_EXIT_JT4]], label [[FOR_BODY_JT4:%.*]], label [[FOR_END]]
				; CHECK: for.inc.jt3:
				; CHECK-NEXT: [[STATE_NEXT_JT3]] = phi i32 [ 3, [[SI_UNFOLD_FALSE]] ]
				; CHECK-NEXT: [[INC_JT3]] = add nsw i32 [[COUNT4]], 1
				; CHECK-NEXT: [[CMP_EXIT_JT3:%.*]] = icmp slt i32 [[INC_JT3]], [[NUM]]
				; CHECK-NEXT: br i1 [[CMP_EXIT_JT3]], label [[FOR_BODY_JT3:%.*]], label [[FOR_END]]
				; CHECK: for.inc.jt2:
				; CHECK-NEXT: [[COUNT6:%.*]] = phi i32 [ [[COUNT4]], [[SI_UNFOLD_FALSE2]] ], [ [[COUNT5]], [[CASE1]] ]
				; CHECK-NEXT: [[STATE_NEXT_JT2]] = phi i32 [ 2, [[CASE1]] ], [ 2, [[SI_UNFOLD_FALSE2]] ]
				; CHECK-NEXT: [[INC_JT2]] = add nsw i32 [[COUNT6]], 1
				; CHECK-NEXT: [[CMP_EXIT_JT2:%.*]] = icmp slt i32 [[INC_JT2]], [[NUM]]
				; CHECK-NEXT: br i1 [[CMP_EXIT_JT2]], label [[FOR_BODY_JT2]], label [[FOR_END]]
				; CHECK: for.inc.jt1:
				; CHECK-NEXT: [[COUNT3:%.*]] = phi i32 [ [[COUNT_JT4]], [[FOR_BODY_JT4]] ], [ [[COUNT_JT3]], [[FOR_BODY_JT3]] ], [ [[COUNT4]], [[SI_UNFOLD_TRUE]] ], [ [[COUNT]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[STATE_NEXT_JT1]] = phi i32 [ 1, [[FOR_BODY]] ], [ 1, [[SI_UNFOLD_TRUE]] ], [ 1, [[FOR_BODY_JT3]] ], [ 1, [[FOR_BODY_JT4]] ]
				; CHECK-NEXT: [[INC_JT1]] = add nsw i32 [[COUNT3]], 1
				; CHECK-NEXT: [[CMP_EXIT_JT1:%.*]] = icmp slt i32 [[INC_JT1]], [[NUM]]
				; CHECK-NEXT: br i1 [[CMP_EXIT_JT1]], label [[FOR_BODY_JT1]], label [[FOR_END]]
				; CHECK: for.end:
				; CHECK-NEXT: ret i32 0
				;
				entry:
				br label %for.body

				for.body:
				%count = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
				%state = phi i32 [ 1, %entry ], [ %state.next, %for.inc ]
				switch i32 %state, label %for.inc [
				i32 1, label %case1
				i32 2, label %case2
				]

				case1:
				br label %for.inc

				case2:
				%cmp.1 = icmp slt i32 %count, 50
				%cmp.2 = icmp slt i32 %count, 100
				%0 = and i32 %count, 1
				%cmp.3 = icmp eq i32 %0, 0
				%sel.1 = select i1 %cmp.1, i32 1, i32 2
				%sel.2 = select i1 %cmp.2, i32 3, i32 4
				%sel.3 = select i1 %cmp.3, i32 %sel.1, i32 %sel.2
				br label %for.inc

				for.inc:
				%state.next = phi i32 [ %sel.3, %case2 ], [ 1, %for.body ], [ 2, %case1 ]
				%inc = add nsw i32 %count, 1
				%cmp.exit = icmp slt i32 %inc, %num
				br i1 %cmp.exit, label %for.body, label %for.end

				for.end:
				ret i32 0
				}

llvm/test/Transforms/DFAJumpThreading/max-path-length.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -S -dfa-jump-threading -dfa-max-path-length=6 %s \| FileCheck %s

				; Make the path
				; <%for.body %case1 %case1.1 %case1.2 %case1.3 %case1.4 %for.inc %for.end>
				; too long so that it is not jump-threaded.
				define i32 @max_path_length(i32 %num) {
				; CHECK-LABEL: @max_path_length(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[COUNT:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_INC:%.]] ]
				; CHECK-NEXT: [[STATE:%.]] = phi i32 [ 1, [[ENTRY]] ], [ [[STATE_NEXT:%.]], [[FOR_INC]] ]
				; CHECK-NEXT: switch i32 [[STATE]], label [[FOR_INC_JT1:%.*]] [
				; CHECK-NEXT: i32 1, label [[CASE1:%.*]]
				; CHECK-NEXT: i32 2, label [[CASE2:%.*]]
				; CHECK-NEXT: ]
				; CHECK: for.body.jt2:
				; CHECK-NEXT: [[COUNT_JT2:%.]] = phi i32 [ [[INC_JT2:%.]], [[FOR_INC_JT2:%.*]] ]
				; CHECK-NEXT: [[STATE_JT2:%.]] = phi i32 [ [[STATE_NEXT_JT2:%.]], [[FOR_INC_JT2]] ]
				; CHECK-NEXT: br label [[CASE2]]
				; CHECK: for.body.jt1:
				; CHECK-NEXT: [[COUNT_JT1:%.]] = phi i32 [ [[INC_JT1:%.]], [[FOR_INC_JT1]] ]
				; CHECK-NEXT: [[STATE_JT1:%.]] = phi i32 [ [[STATE_NEXT_JT1:%.]], [[FOR_INC_JT1]] ]
				; CHECK-NEXT: br label [[CASE1]]
				; CHECK: case1:
				; CHECK-NEXT: [[COUNT2:%.]] = phi i32 [ [[COUNT_JT1]], [[FOR_BODY_JT1:%.]] ], [ [[COUNT]], [[FOR_BODY]] ]
				; CHECK-NEXT: br label [[CASE1_1:%.*]]
				; CHECK: case1.1:
				; CHECK-NEXT: br label [[CASE1_2:%.*]]
				; CHECK: case1.2:
				; CHECK-NEXT: br label [[CASE1_3:%.*]]
				; CHECK: case1.3:
				; CHECK-NEXT: br label [[CASE1_4:%.*]]
				; CHECK: case1.4:
				; CHECK-NEXT: br label [[FOR_INC]]
				; CHECK: case2:
				; CHECK-NEXT: [[COUNT1:%.]] = phi i32 [ [[COUNT_JT2]], [[FOR_BODY_JT2:%.]] ], [ [[COUNT]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[COUNT1]], 50
				; CHECK-NEXT: br i1 [[CMP]], label [[FOR_INC_JT1]], label [[SI_UNFOLD_FALSE:%.*]]
				; CHECK: si.unfold.false:
				; CHECK-NEXT: br label [[FOR_INC_JT2]]
				; CHECK: for.inc:
				; CHECK-NEXT: [[STATE_NEXT]] = phi i32 [ 2, [[CASE1_4]] ]
				; CHECK-NEXT: [[INC]] = add nsw i32 [[COUNT2]], 1
				; CHECK-NEXT: [[CMP_EXIT:%.]] = icmp slt i32 [[INC]], [[NUM:%.]]
				; CHECK-NEXT: br i1 [[CMP_EXIT]], label [[FOR_BODY]], label [[FOR_END:%.*]]
				; CHECK: for.inc.jt2:
				; CHECK-NEXT: [[STATE_NEXT_JT2]] = phi i32 [ 2, [[SI_UNFOLD_FALSE]] ]
				; CHECK-NEXT: [[INC_JT2]] = add nsw i32 [[COUNT1]], 1
				; CHECK-NEXT: [[CMP_EXIT_JT2:%.*]] = icmp slt i32 [[INC_JT2]], [[NUM]]
				; CHECK-NEXT: br i1 [[CMP_EXIT_JT2]], label [[FOR_BODY_JT2]], label [[FOR_END]]
				; CHECK: for.inc.jt1:
				; CHECK-NEXT: [[COUNT3:%.*]] = phi i32 [ [[COUNT1]], [[CASE2]] ], [ [[COUNT]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[STATE_NEXT_JT1]] = phi i32 [ 1, [[CASE2]] ], [ 1, [[FOR_BODY]] ]
				; CHECK-NEXT: [[INC_JT1]] = add nsw i32 [[COUNT3]], 1
				; CHECK-NEXT: [[CMP_EXIT_JT1:%.*]] = icmp slt i32 [[INC_JT1]], [[NUM]]
				; CHECK-NEXT: br i1 [[CMP_EXIT_JT1]], label [[FOR_BODY_JT1]], label [[FOR_END]]
				; CHECK: for.end:
				; CHECK-NEXT: ret i32 0
				;
				entry:
				br label %for.body

				for.body:
				%count = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
				%state = phi i32 [ 1, %entry ], [ %state.next, %for.inc ]
				switch i32 %state, label %for.inc [
				i32 1, label %case1
				i32 2, label %case2
				]

				case1:
				br label %case1.1

				case1.1:
				br label %case1.2

				case1.2:
				br label %case1.3

				case1.3:
				br label %case1.4

				case1.4:
				br label %for.inc

				case2:
				%cmp = icmp eq i32 %count, 50
				%sel = select i1 %cmp, i32 1, i32 2
				br label %for.inc

				for.inc:
				%state.next = phi i32 [ %sel, %case2 ], [ 1, %for.body ], [ 2, %case1.4 ]
				%inc = add nsw i32 %count, 1
				%cmp.exit = icmp slt i32 %inc, %num
				br i1 %cmp.exit, label %for.body, label %for.end

				for.end:
				ret i32 0
				}

llvm/test/Transforms/DFAJumpThreading/negative.ll

This file was added.

				; RUN: opt -dfa-jump-threading -dfa-cost-threshold=25 -pass-remarks-missed='dfa-jump-threading' -pass-remarks-output=%t -disable-output %s
				; RUN: FileCheck --input-file %t --check-prefix=REMARK %s
				; RUN: opt -S -dfa-jump-threading %s \| FileCheck %s
				SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Perhaps this works, but thought this should be: --check-prefix=REMARK SjoerdMeijer: Perhaps this works, but thought this should be: --check-prefix=REMARK

				; This negative test case checks that the optimization doesn't trigger
				; when the code size cost is too high.
				define i32 @negative1(i32 %num) {
				; REMARK: NotProfitable
				; REMARK-NEXT: negative1
				entry:
				br label %for.body

				for.body:
				%count = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
				%state = phi i32 [ 1, %entry ], [ %state.next, %for.inc ]
				switch i32 %state, label %for.inc [
				i32 1, label %case1
				i32 2, label %case2
				]

				case1:
				br label %for.inc

				case2:
				%cmp = icmp eq i32 %count, 50
				%sel = select i1 %cmp, i32 1, i32 2
				br label %for.inc

				for.inc:
				%state.next = phi i32 [ %sel, %case2 ], [ 1, %for.body ], [ 2, %case1 ]
				%add1 = add i32 %num, %num
				%add2 = add i32 %add1, %add1
				%add3 = add i32 %add2, %add2
				%add4 = add i32 %add3, %add3
				%add5 = add i32 %add4, %add4
				%add6 = add i32 %add5, %add5
				%add7 = add i32 %add6, %add6
				%add8 = add i32 %add7, %add7
				%add9 = add i32 %add8, %add8
				%add10 = add i32 %add9, %add9
				%add11 = add i32 %add10, %add10
				%add12 = add i32 %add11, %add11
				%add13 = add i32 %add12, %add12
				%add14 = add i32 %add13, %add13
				%add15 = add i32 %add14, %add14
				%add16 = add i32 %add15, %add15
				%add17 = add i32 %add16, %add16
				%add18 = add i32 %add17, %add17
				%add19 = add i32 %add18, %add18
				%add20 = add i32 %add19, %add19
				%add21 = add i32 %add20, %add20
				%add22 = add i32 %add21, %add21
				%inc = add nsw i32 %count, 1
				%cmp.exit = icmp slt i32 %inc, %num
				br i1 %cmp.exit, label %for.body, label %for.end

				for.end:
				ret i32 %add22
				}

				declare void @func()

				define i32 @negative2(i32 %num) {
				; REMARK: NonDuplicatableInst
				; REMARK-NEXT: negative2
				entry:
				br label %for.body

				for.body:
				%count = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
				%state = phi i32 [ 1, %entry ], [ %state.next, %for.inc ]
				switch i32 %state, label %for.inc [
				i32 1, label %case1
				i32 2, label %case2
				]

				case1:
				br label %for.inc

				case2:
				%cmp = icmp eq i32 %count, 50
				%sel = select i1 %cmp, i32 1, i32 2
				br label %for.inc

				for.inc:
				%state.next = phi i32 [ %sel, %case2 ], [ 1, %for.body ], [ 2, %case1 ]
				call void @func() noduplicate
				%inc = add nsw i32 %count, 1
				%cmp.exit = icmp slt i32 %inc, %num
				br i1 %cmp.exit, label %for.body, label %for.end

				for.end:
				ret i32 0
				}

				define i32 @negative3(i32 %num) {
				; REMARK: ConvergentInst
				; REMARK-NEXT: negative3
				entry:
				br label %for.body

				for.body:
				%count = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
				%state = phi i32 [ 1, %entry ], [ %state.next, %for.inc ]
				switch i32 %state, label %for.inc [
				i32 1, label %case1
				i32 2, label %case2
				]

				case1:
				br label %for.inc

				case2:
				%cmp = icmp eq i32 %count, 50
				%sel = select i1 %cmp, i32 1, i32 2
				br label %for.inc

				for.inc:
				%state.next = phi i32 [ %sel, %case2 ], [ 1, %for.body ], [ 2, %case1 ]
				call void @func() convergent
				%inc = add nsw i32 %count, 1
				%cmp.exit = icmp slt i32 %inc, %num
				br i1 %cmp.exit, label %for.body, label %for.end

				for.end:
				ret i32 0
				}

				define i32 @negative4(i32 %num) {
				; REMARK: SwitchNotPredictable
				; REMARK-NEXT: negative4
				entry:
				br label %for.body

				for.body:
				%count = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
				%state = phi i32 [ 1, %entry ], [ %state.next, %for.inc ]
				switch i32 %state, label %for.inc [
				i32 1, label %case1
				i32 2, label %case2
				]

				case1:
				br label %for.inc

				case2:
				%cmp = icmp eq i32 %count, 50
				%sel = select i1 %cmp, i32 1, i32 2
				br label %for.inc

				for.inc:
				; the switch variable is not predictable since the exit value for %case1
				; is defined through a non-instruction (function argument).
				%state.next = phi i32 [ %sel, %case2 ], [ 1, %for.body ], [ %num, %case1 ]
				%inc = add nsw i32 %count, 1
				%cmp.exit = icmp slt i32 %inc, %num
				br i1 %cmp.exit, label %for.body, label %for.end

				for.end:
				ret i32 0
				}

				; Do not optimize if marked minsize.
				define i32 @negative5(i32 %num) minsize {
				; CHECK-LABEL: @negative5(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[COUNT:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_INC:%.]] ]
				; CHECK-NEXT: [[STATE:%.]] = phi i32 [ 1, [[ENTRY]] ], [ [[STATE_NEXT:%.]], [[FOR_INC]] ]
				; CHECK-NEXT: switch i32 [[STATE]], label [[FOR_INC]] [
				; CHECK-NEXT: i32 1, label [[CASE1:%.*]]
				; CHECK-NEXT: i32 2, label [[CASE2:%.*]]
				; CHECK-NEXT: ]
				; CHECK: case1:
				; CHECK-NEXT: br label [[FOR_INC]]
				; CHECK: case2:
				; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[COUNT]], 50
				; CHECK-NEXT: [[SEL:%.*]] = select i1 [[CMP]], i32 1, i32 2
				; CHECK-NEXT: br label [[FOR_INC]]
				; CHECK: for.inc:
				; CHECK-NEXT: [[STATE_NEXT]] = phi i32 [ [[SEL]], [[CASE2]] ], [ 1, [[FOR_BODY]] ], [ 2, [[CASE1]] ]
				; CHECK-NEXT: [[INC]] = add nsw i32 [[COUNT]], 1
				; CHECK-NEXT: [[CMP_EXIT:%.]] = icmp slt i32 [[INC]], [[NUM:%.]]
				; CHECK-NEXT: br i1 [[CMP_EXIT]], label [[FOR_BODY]], label [[FOR_END:%.*]]
				; CHECK: for.end:
				SjoerdMeijerUnsubmitted Done Reply Inline Actions I don't think we want to jump thread a function that has a `minsize` attribute? For example: define i32 @negative5( ) minsize { Need a test for that? SjoerdMeijer: I don't think we want to jump thread a function that has a `minsize` attribute? For example…
				; CHECK-NEXT: ret i32 0
				;
				entry:
				br label %for.body

				for.body:
				%count = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
				%state = phi i32 [ 1, %entry ], [ %state.next, %for.inc ]
				switch i32 %state, label %for.inc [
				i32 1, label %case1
				i32 2, label %case2
				]

				case1:
				br label %for.inc

				case2:
				%cmp = icmp eq i32 %count, 50
				%sel = select i1 %cmp, i32 1, i32 2
				br label %for.inc

				for.inc:
				%state.next = phi i32 [ %sel, %case2 ], [ 1, %for.body ], [ 2, %case1 ]
				%inc = add nsw i32 %count, 1
				%cmp.exit = icmp slt i32 %inc, %num
				br i1 %cmp.exit, label %for.body, label %for.end

				for.end:
				ret i32 0
				}

This is an archive of the discontinued LLVM Phabricator instance.

Add jump-threading optimization for deterministic finite automataClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 362117

llvm/include/llvm/InitializePasses.h

llvm/include/llvm/LinkAllPasses.h

llvm/include/llvm/Transforms/Scalar.h

llvm/include/llvm/Transforms/Scalar/DFAJumpThreading.h

llvm/lib/Passes/PassBuilder.cpp

llvm/lib/Passes/PassRegistry.def

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

llvm/lib/Transforms/Scalar/CMakeLists.txt

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp

llvm/lib/Transforms/Scalar/Scalar.cpp

llvm/test/Transforms/DFAJumpThreading/dfa-constant-propagation.ll

llvm/test/Transforms/DFAJumpThreading/dfa-jump-threading-analysis.ll

llvm/test/Transforms/DFAJumpThreading/dfa-jump-threading-transform.ll

llvm/test/Transforms/DFAJumpThreading/dfa-unfold-select.ll

llvm/test/Transforms/DFAJumpThreading/max-path-length.ll

llvm/test/Transforms/DFAJumpThreading/negative.ll

Add jump-threading optimization for deterministic finite automata
ClosedPublic