This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
CodeGen/
-
Passes.h
-
InitializePasses.h
-
LinkAllPasses.h
-
lib/
-
CodeGen/
-
CMakeLists.txt
-
CodeGen.cpp
17/18
DFAJumpThreading.cpp
-
Target/AArch64/
-
AArch64/
6/6
AArch64TargetMachine.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
O3-pipeline.ll
1/1
dfa-constant-propagation.ll
-
dfa-jump-threading-analysis.ll
-
dfa-jump-threading-transform.ll
-
dfa-unfold-select.ll
-
tools/opt/
-
opt/
-
opt.cpp

Differential D99205

Add jump-threading optimization for deterministic finite automata
ClosedPublic

Authored by alexey.zhikhar on Mar 23 2021, 12:16 PM.

Download Raw Diff

Details

Reviewers

alanphipps
xbolva00
efriedma
jkreiner
sebpop
SjoerdMeijer

Commits

rG02077da7e7a8: Add jump-threading optimization for deterministic finite automata

Summary

The current JumpThreading pass does not jump thread loops since it can
result in irreducible control flow that harms other optimizations. This
prevents switch statements inside a loop from being optimized to use
unconditional branches.

This code pattern occurs in the core_state_transition function of
Coremark. The state machine can be implemented manually with goto
statements resulting in a large runtime improvement, and this transform
makes the switch implementation match the goto version in performance.

This patch specifically targets switch statements inside a loop that
have the opportunity to be threaded. Once it identifies an opportunity,
it creates new paths that branch directly to the correct code block.
For example, the left CFG could be transformed to the right CFG:

       sw.bb                        sw.bb
     /   |   \                    /   |   \
case1  case2  case3          case1  case2  case3
     \   |   /                /       |       \
     latch.bb             latch.2  latch.3  latch.1
      br sw.bb              /         |         \
                        sw.bb.2     sw.bb.3     sw.bb.1
                         br case2    br case3    br case1

Co-author: Justin Kreiner @jkreiner
Co-author: Ehsan Amiri @amehsan

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

efriedma added inline comments.May 4 2021, 1:35 PM

llvm/lib/CodeGen/DFAJumpThreading.cpp
643	`cast<PHINode>(Incoming)`

In D99205#2737106, @efriedma wrote:

If we're going to be cloning basic blocks, we need some sort of cost computation. Some blocks aren't legal to clone, and some are expensive simply due to size. (Illegal to clone like this: convergent, noduplicate, indirectbr predecessors.) See llvm/Analysis/CodeMetrics.h.

That is a good point, I agree we will need a cost model for the transformation. I have started working on one and gathering more performance stats. Also thanks for sharing CodeMetrics.h, I can add a check for those types of blocks that aren't legal to clone.

If I'm understanding correctly, the goal here is to specifically handle cases where we can completely eliminate the switch. That's a bit narrow, but I guess it makes sense, as least as a first cut.

This is correct for how it has been implemented, but the transformation should also work for the general case where the switch can be partially threaded. The analysis part will need to be changed a bit to identify the partial threading opportunities, but it is possible in the future.

In case anyone is interested, there is some context about this patch over here: https://reviews.llvm.org/D88307

SjoerdMeijer added a subscriber: SjoerdMeijer.May 5 2021, 12:34 AM

Implemented a cost model to avoid cloning too much code, and addressed reviewer comments.

I removed the unnecessary conditions, and used the llvm::DenseMap instead.

llvm/lib/CodeGen/DFAJumpThreading.cpp
453	Yes, I agree the vector select check was unnecessary here.
741	Here is the cost calculation we implemented. It could be more accurate by accounting for instructions and blocks that get simplified, but for now it at least prevents code explosion.

Harbormaster completed remote builds in B104335: Diff 345215.May 13 2021, 12:08 PM

efriedma added inline comments.May 17 2021, 3:45 PM

llvm/lib/CodeGen/DFAJumpThreading.cpp
14	Could probably use a brief outline of the overall algorithm here.
110	Don't use std::list. I think you can use SmallVector and pop_back(), assuming the iteration order doesn't matter. If you need LIFO order, use std::deque.
183	I don't think this applyUpdates() call does anything? The DomTree doesn't care about unreachable edges.
214	Can you merge together the two triangle codepaths? They appear nearly identical. Probably just need a couple std::swap calls.
376	Do you check somewhere that there's exactly one ConstantInt associated with a given successor? In general, there could be any number.
451	isa<Instruction>() if you're not going to use the pointer returned by dyn_cast<>.
691	Probably want to ensure we're only calling collectEphemeralValues once per function.
741	Okay, makes sense.
llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
5	This is kind of late to be running this pass. That might be useful in some cases: we might be making the code harder to analyze if there's irreducible control flow. On the other hand, we're skipping interesting optimizations if the CFG becomes significantly simpler. My instinct is that the advantages of running earlier outweigh the disadvantages.
llvm/test/CodeGen/AArch64/dfa-constant-propagation.ll
2	Please put the tests in llvm/test/Transforms/DFAJumpThreading.

Matt added a subscriber: Matt.May 18 2021, 7:10 AM

jkreiner marked 2 inline comments as done.May 18 2021, 7:38 AM

jkreiner added inline comments.

llvm/lib/CodeGen/DFAJumpThreading.cpp
110	Okay, there's a few places std::list was used so I'll replace them all.
183	Wouldn't it be needed since a new block is created? This edge is reached sometimes. I may be misunderstanding the usage of the DomTreeUpdater though.
214	Sure I'll look into merging these two cases.
376	Good point, this case isn't checked for so it would currently cause buggy behavior in this function. Now that I think about it, this function can probably be removed since the return value stored as the EntryValue isn't used in the transformation.
llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
5	I agree that it may be beneficial to run the pass earlier. I'm not aware yet of which optimizations it could interfere with. Do you have any suggestions of where to run it instead?

dnsampaio added inline comments.May 21 2021, 1:41 AM

llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
5	I would suggest trying just before llvm's original jump-threading. For the proposed implemented here in D88307, it actually produced code that llvm's original jump-threading would further optimize, where the other way around, the original jump-threading would just not do anything. I'm running over llvm-12, so it may take me some time to test this patch.

xbolva00 added inline comments.May 21 2021, 1:48 AM

llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
5	and part of standard opt 2+ pipeline? why aarch64 specific?

ChuanqiXu added a subscriber: ChuanqiXu.May 21 2021, 1:49 AM

jkreiner added inline comments.May 21 2021, 6:53 AM

llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
5	Thank you for the suggestion, I'll try it out before the original jump-threading then. I only had it there since I was testing it on AArch64, but it could be moved to the general opt pipeline.

jkreiner added inline comments.May 21 2021, 1:35 PM

llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
5	This seems to be a better location for the pass. I've measured a performance gain 5% greater than before for Coremark by placing it before the original jump-threading.

Addressed reviewer comments, and moved the pass earlier in the pipeline.

Herald added subscribers: wenlei, steven_wu. · View Herald TranscriptMay 26 2021, 11:51 AM

jkreiner marked 10 inline comments as done.May 26 2021, 11:55 AM

Harbormaster completed remote builds in B106345: Diff 348040.May 26 2021, 12:18 PM

Fixed a failing opt pipeline test.

Herald added subscribers: kerbowa, nhaehnle, jvesely. · View Herald TranscriptMay 26 2021, 1:40 PM

Harbormaster completed remote builds in B106367: Diff 348077.May 26 2021, 2:15 PM

I've finally got time to test this.
For our downstream arch we're seeing gains up to 27%.
Very good, thanks for working on this.

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
363–366 ↗	(On Diff #348077)	In release mode I get warnings this is not used (we use -Werror): llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp:364:21: error: unused function 'operator<<' [-Werror,-Wunused-function] Perhaps put it around #ifndef NDEBUG ?

Naive question: does your new pass "result in irreducible control flow that harms other optimizations"? From the ASCII art diagram in the commit message it looks like it does. Why is that OK?

llvm/include/llvm/Transforms/Scalar/DFAJumpThreading.h
1 ↗	(On Diff #348077)	Typo ".cpp" :)

In D99205#2784375, @dnsampaio wrote:

For our downstream arch we're seeing gains up to 27%.

That's great to hear!

In D99205#2784461, @foad wrote:

Naive question: does your new pass "result in irreducible control flow that harms other optimizations"? From the ASCII art diagram in the commit message it looks like it does. Why is that OK?

It could produce irreducible control flow like that, but the pass is late enough in the pipeline that it won't have a negative impact. I experimented with putting it early in the pipeline and the gains I measured weren't great, but it seems to be profitable where it is right now.

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
363–366 ↗	(On Diff #348077)	I see, I'll add that #ifndef NDEBUG check.

jkreiner updated this revision to Diff 348343.May 27 2021, 11:54 AM

jkreiner marked 2 inline comments as done.

Harbormaster completed remote builds in B106571: Diff 348343.May 27 2021, 12:17 PM

In D99205#2785323, @jkreiner wrote:

In D99205#2784461, @foad wrote:

Naive question: does your new pass "result in irreducible control flow that harms other optimizations"? From the ASCII art diagram in the commit message it looks like it does. Why is that OK?

It could produce irreducible control flow like that, but the pass is late enough in the pipeline that it won't have a negative impact. I experimented with putting it early in the pipeline and the gains I measured weren't great, but it seems to be profitable where it is right now.

Naive follow-up question: why does this have to be a complete new implementation of jump threading? Would it be feasible to have a single implementation that takes a "don't create irreducible control flow" flag?

@jkreiner with both llvm 12 and head I get a compiler crash in this function for file F17082111. Just run opt --dfa-jump-threading. Sorry, I couldn't reduce it further, I just don't know how to get bugpoint to work with opt.
With a llvm-12 in debug build I get:

opt --dfa-jump-threading -S < fail.ll
opt: /work1/dsampaio/csw/llvm-project/llvm/include/llvm/IR/Instructions.h:2767: llvm::Value *llvm::PHINode::getIncomingValueForBlock(const llvm::BasicBlock *) const: Assertion `Idx >= 0 && "Invalid basic block argument!"' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: opt --dfa-jump-threading -S
1.      Running pass 'Function Pass Manager' on module '<stdin>'.
2.      Running pass 'DFA Jump Threading' on function '@main'
 #0 0x00007f6a3415119a llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /work1/dsampaio/csw/llvm-project/llvm/lib/Support/Unix/Signals.inc:565:11
 #1 0x00007f6a3415136b PrintStackTraceSignalHandler(void*) /work1/dsampaio/csw/llvm-project/llvm/lib/Support/Unix/Signals.inc:632:1
 #2 0x00007f6a3414f95b llvm::sys::RunSignalHandlers() /work1/dsampaio/csw/llvm-project/llvm/lib/Support/Signals.cpp:70:5
 #3 0x00007f6a34151ae1 SignalHandler(int) /work1/dsampaio/csw/llvm-project/llvm/lib/Support/Unix/Signals.inc:407:1
 #4 0x00007f6a32d7a980 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12980)
 #5 0x00007f6a32076fb7 raise /build/glibc-S9d2JN/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0
 #6 0x00007f6a32078921 abort /build/glibc-S9d2JN/glibc-2.27/stdlib/abort.c:81:0
 #7 0x00007f6a3206848a __assert_fail_base /build/glibc-S9d2JN/glibc-2.27/assert/assert.c:89:0
 #8 0x00007f6a32068502 (/lib/x86_64-linux-gnu/libc.so.6+0x30502)
 #9 0x00007f6a344eb474 llvm::PHINode::getIncomingValueForBlock(llvm::BasicBlock const*) const /work1/dsampaio/csw/llvm-project/llvm/include/llvm/IR/Instructions.h:2768:29
#10 0x00007f6a35a37265 (anonymous namespace)::AllSwitchPaths::run() /work1/dsampaio/csw/llvm-project/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp:532:24
#11 0x00007f6a35a36826 (anonymous namespace)::DFAJumpThreading::run(llvm::Function&) /work1/dsampaio/csw/llvm-project/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp:1149:23
#12 0x00007f6a35a36c1f (anonymous namespace)::DFAJumpThreadingLegacyPass::runOnFunction(llvm::Function&) /work1/dsampaio/csw/llvm-project/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp:164:5
#13 0x00007f6a3440f84c llvm::FPPassManager::runOnFunction(llvm::Function&) /work1/dsampaio/csw/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1435:23
#14 0x00007f6a34414ac5 llvm::FPPassManager::runOnModule(llvm::Module&) /work1/dsampaio/csw/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1481:16
#15 0x00007f6a34410214 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /work1/dsampaio/csw/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1550:23
#16 0x00007f6a3440fd38 llvm::legacy::PassManagerImpl::run(llvm::Module&) /work1/dsampaio/csw/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:541:16
#17 0x00007f6a34414dd1 llvm::legacy::PassManager::run(llvm::Module&) /work1/dsampaio/csw/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1677:3
#18 0x0000000000472e17 main /work1/dsampaio/csw/llvm-project/llvm/tools/opt/opt.cpp:998:3
#19 0x00007f6a32059bf7 __libc_start_main /build/glibc-S9d2JN/glibc-2.27/csu/../csu/libc-start.c:344:0
#20 0x000000000043577a _start (/work1/dsampaio/csw/devimage/toolchain_default/toolroot/opt/kalray/accesscore/bin/opt+0x43577a)
Aborted (core dumped)

In D99205#2786462, @foad wrote:

In D99205#2785323, @jkreiner wrote:

In D99205#2784461, @foad wrote:

Naive question: does your new pass "result in irreducible control flow that harms other optimizations"? From the ASCII art diagram in the commit message it looks like it does. Why is that OK?

It could produce irreducible control flow like that, but the pass is late enough in the pipeline that it won't have a negative impact. I experimented with putting it early in the pipeline and the gains I measured weren't great, but it seems to be profitable where it is right now.

Naive follow-up question: why does this have to be a complete new implementation of jump threading? Would it be feasible to have a single implementation that takes a "don't create irreducible control flow" flag?

I just want to clarify that this is not a complete new implementation of jump threading, the cases that both of the passes catch are mutually exclusive. There were some discussions about implementing a more aggressive jump threading versus a separate pass for specific uses (FSMs) that might answer your question over here: D88307

In D99205#2786565, @dnsampaio wrote:

@jkreiner with both llvm 12 and head I get a compiler crash in this function for file F17082111. Just run opt --dfa-jump-threading. Sorry, I couldn't reduce it further, I just don't know how to get bugpoint to work with opt.

Thank you for providing the testcase, I reproduced it locally. It is the last day of my internship, so I won't be able to submit a fix for it. @alexey.zhikhar initially wrote some parts of the analysis, and will continue where I left off on the remaining work items for this patch.

Thanks for all of the effort! On our downstream Cortex-R5 compiler, I'm seeing a 20.4% speedup on Coremark with this patch, which is good, however, the older patch (https://reviews.llvm.org/D88307) gave me a 21.6% speedup. Any idea what could account for the difference there?

In D99205#2792215, @alanphipps wrote:

Thanks for all of the effort! On our downstream Cortex-R5 compiler, I'm seeing a 20.4% speedup on Coremark with this patch, which is good, however, the older patch (https://reviews.llvm.org/D88307) gave me a 21.6% speedup. Any idea what could account for the difference there?

It would be very hard for us to explain the difference without having access to both the downstream compiler and the hardware but something that comes to mind is:

(1) were the two experiments performed on the same baseline compiler?
(2) 1% could be within noise and might be due to trivial changes in the generated code (change of alignment, change of branch addresses and branch target addresses, etc.)
(3) were both passes in the same place in the pipeline in both experiments?

From the information provided, it is not possible to understand the root cause of the difference.

Herald added a subscriber: ormris. · View Herald TranscriptJun 7 2021, 8:45 AM

ormris removed a subscriber: ormris.Jun 7 2021, 4:08 PM

In D99205#2802936, @alexey.zhikhar wrote:

In D99205#2792215, @alanphipps wrote:

Thanks for all of the effort! On our downstream Cortex-R5 compiler, I'm seeing a 20.4% speedup on Coremark with this patch, which is good, however, the older patch (https://reviews.llvm.org/D88307) gave me a 21.6% speedup. Any idea what could account for the difference there?

It would be very hard for us to explain the difference without having access to both the downstream compiler and the hardware but something that comes to mind is:

(1) were the two experiments performed on the same baseline compiler?
(2) 1% could be within noise and might be due to trivial changes in the generated code (change of alignment, change of branch addresses and branch target addresses, etc.)
(3) were both passes in the same place in the pipeline in both experiments?

From the information provided, it is not possible to understand the root cause of the difference.

Thanks, and understood. I'll see if I can do some more digging here, but I suspect it may be noise. I wasn't sure if there was an obvious difference in the algorithms here that might account for a degradation. I'm pretty sure both passes are not in the same place in the pipeline (just taking the patch from the other review).

marksl added a subscriber: marksl.Jun 8 2021, 2:15 PM

alexey.zhikhar commandeered this revision.Jun 9 2021, 7:00 AM

alexey.zhikhar added a reviewer: jkreiner.

Fixed the bug submitted by @dnsampaio, test case added

Misc minor changes

Harbormaster completed remote builds in B108408: Diff 350880.Jun 9 2021, 8:08 AM

Fix a clang-format warning

Harbormaster completed remote builds in B108879: Diff 351540.Jun 11 2021, 1:54 PM

@efriedma Eli, I believe all your comments have been addressed, please take another look when you get a chance, thanks.

@efriedma ping.

We understand that you might be busy, so maybe you could recommend another reviewer that could make the final decision? Thank you.

@sebpop Do you mind reviewing this patch? This is jump threading for finite state machine. You may remember we had a discussion about it back in 2018 and 2019 llvm dev meeting....We didn't follow up as quickly as I hoped but recently we got the opportunity to improve the code and post it here. As you can see some review has been done, but currently it is waiting for further review or green light to merge.

Earlier discussion that resulted in this patch is also here: https://reviews.llvm.org/D88307

amehsan added a reviewer: sebpop.Jul 8 2021, 9:21 AM

Did a first pass, see comments inline. Plan to look at this closer soon.

I think some tests are missing, mostly negative tests:

test for min code size,
test with blocks that cannot be cloned,
test with some unsupported instructions.

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp
102 ↗	(On Diff #351540)	Just a thought on "deploying" this. Perhaps easier to start having this off by default? In case problems are found after committing (correctness, or perhaps compile times), you don't need to revert the whole pass, but instead just toggle this and can keep the pass in tree.
llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
12 ↗	(On Diff #351540)	Nit: perhaps you can be more specific here: Currently this does not happen in LLVM jump threading and say this is JumpThreading.cpp?
24 ↗	(On Diff #351540)	Nit picking some words here. What do you mean by "switch variable" and what do you mean by "decided"?
29 ↗	(On Diff #351540)	Nit: runs -> triggers?
32 ↗	(On Diff #351540)	Just curious, is this algorithm described in literature? Can we include a reference?
185 ↗	(On Diff #351540)	Can you define what unfolding means in this context?
190 ↗	(On Diff #351540)	Do you mean that this is code that could be shared? Should this be a TODO?
321 ↗	(On Diff #351540)	// This data structure keeps track of all blocks that have been cloned in the // optimization. -> // This data structure keeps track of all blocks that have been cloned.
739 ↗	(On Diff #351540)	Could you include here, or where ever most appropiate, the definition of threading path. Copied from a test case: A threadable path includes a list of basic blocks, the exit state, and the block that determines the next state. < path of BBs that form a cycle > [ state, determinator ]
llvm/test/Transforms/DFAJumpThreading/dfa-jump-threading-transform.ll
19 ↗	(On Diff #351540)	Perhaps check the full IR for clarity.

I am making an AggressiveJumpThreading pass in downstream to solve the State Machine in Coremark.
And I omit the problem that more aggressive jump threading would cause irreducible control flow. I heard about that gcc has implemented a version which overcomes the problem. It may be beneficial to look into the details of the gcc implementation for all of us.
This patch looks like a big pattern match to me. It requires a switch statement whose condition are predictable, in other words, DFA.
First question is that is it profitable to make a pass for specific pattern? Since I feel that the DFA pattern wouldn't be normal in codes. Then if the pass turned off by default, I have no problems.

The second one is that this pass would collect all paths available in AllSwitchPaths and considering cost model when performing the transformation.
One concern is that if it may be possible wasting too many times in collecting useless paths who is cost is more than benefit.

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
72 ↗	(On Diff #351540)	IsViewCFGBefore => WouldViewCFGBeforeTransform or ViewCFGBeforeTransform. Now the name looks a little bit confusing.
261–262 ↗	(On Diff #351540)	It may be suitable to use range based for.
381 ↗	(On Diff #351540)	Does it necessary to mark `~MainSwitch` as virtual? Since there is no derived class from MainSwitch.
722 ↗	(On Diff #351540)	What's the rationale to use ceil log base 2? Maybe there is a conclusion in math. But I guess it may be better to tell it.
1138 ↗	(On Diff #351540)	It may be better to comment why we don't use range-based for here.

jaykang10 added a subscriber: jaykang10.Jul 9 2021, 2:06 AM

In D99205#2866301, @ChuanqiXu wrote:

I am making an AggressiveJumpThreading pass in downstream to solve the State Machine in Coremark.

We have 2 aggressive jump threading passes in upstream review: this work, and also D88307. If I understand this correctly, you're working on another downstream? If so, would it not be more efficient if we support this work? For me, this work has the most potential as it is under active development as opposed to D88307. Your remark about irreducible control flow is a valid one, which was made earlier too. Eli said this about that:

This is kind of late to be running this pass. That might be useful in some cases: we might be making the code harder to analyze if there's irreducible control flow. On the other hand, we're skipping interesting optimizations if the CFG becomes significantly simpler. My instinct is that the advantages of running earlier outweigh the disadvantages.

which is something that still needs to be figured out I think. I.e., it's run earlier now, but do we need more performance numbers to see if this doesn't regress other stuff? But @ChuanqiXu, if you're working on something similar and have ideas, please consider sharing them here.

And I omit the problem that more aggressive jump threading would cause irreducible control flow. I heard about that gcc has implemented a version which overcomes the problem. It may be beneficial to look into the details of the gcc implementation for all of us.
This patch looks like a big pattern match to me. It requires a switch statement whose condition are predictable, in other words, DFA.
First question is that is it profitable to make a pass for specific pattern? Since I feel that the DFA pattern wouldn't be normal in codes. Then if the pass turned off by default, I have no problems.

My colleague @jaykang10 found that this also triggers on Perlbench in SPEC.

I have seen similar code pattern in perlbench of spec2017. In the case, there are cleanups for lifetime.marker and the goto statements go through the cleanups. I have discussed it with cfe-dev. https://lists.llvm.org/pipermail/cfe-dev/2021-July/068478.html It looks like this patch resolves the issue with perlbench as well as coremark.

alexey.zhikhar added inline comments.Jul 9 2021, 9:32 AM

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
32 ↗	(On Diff #351540)	This is the algorithm that we came up with so we cannot provide a reference. "Codasip" has a whitepaper with a more general algorithm for jump threading that covers DFA as well. Justin @jkreiner compared our approach with the one by Codasip, and found that we can cover all cases that Codasip can (for DFA) with one exception: currently our implementation is limited to cases in which the state variable of a switch is always predictable, but it can be extended to cover the case where the state variable is only sometimes predictable. Thank you very much for your comments, I'm working on updating the diff, will submit it shortly.

Addressed the inline comments, added a co-author.

alexey.zhikhar marked 14 inline comments as done.Jul 9 2021, 1:53 PM

alexey.zhikhar added inline comments.

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp
102 ↗	(On Diff #351540)	Good idea, done.
llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
72 ↗	(On Diff #351540)	I usually name all my boolean variables `IsSomething` but I agree that `IsView...` sounds awkward. Also having a verb in the beginning make it sound like a routine rather than a variable. I updated it to `ClViewCFGBefore`, `Cl` for command line.
381 ↗	(On Diff #351540)	Classes might derive from `MainSwitch` in the future, it's a good practice to declare destructors virtual.
722 ↗	(On Diff #351540)	Added a comment.
739 ↗	(On Diff #351540)	Added a description to the definition of the `ThreadingPath` class.
1138 ↗	(On Diff #351540)	No good reason, fixed.

Harbormaster completed remote builds in B113280: Diff 357619.Jul 9 2021, 3:30 PM

It's a lot of code, and I am still reading it, but here's a bunch of mostly nits if you don't mind that... (while I am going to continue reading it).

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
120 ↗	(On Diff #357619)	Nit: don't need the brackets.
133 ↗	(On Diff #357619)	Nit: don't need the brackets.
215 ↗	(On Diff #357619)	Create a helper function for this...
225 ↗	(On Diff #357619)	... so that we don't need to duplicate this here.
287 ↗	(On Diff #357619)	Nit: no need for all the curly brackets. (sorry, I just find it a lot easier to read, and it's the coding style )
313 ↗	(On Diff #357619)	Perhaps a comment here that State just corresponds to the value of a case statement?
337 ↗	(On Diff #357619)	Do we need BBName? Can we not do something like: OS << BB->hasName() ? BB->getName() : "...";
356 ↗	(On Diff #357619)	I was confused what exactly the determinator was, but I think it's the block that determines the next state. Think some comments about this would be beneficial too.
472 ↗	(On Diff #357619)	no curly brackets
479 ↗	(On Diff #357619)	Same
485 ↗	(On Diff #357619)	Same, and actually in a lot more cases, so will stop mentioning it from now on, but there's opportunities to get rid of a lot of curly brackets. :-)
1146 ↗	(On Diff #357619)	auto *SI = dyn_cast<SwitchInst>(BB.getTerminator(); if (!SI) continue;

alexey.zhikhar updated this revision to Diff 358059.Jul 12 2021, 1:53 PM

alexey.zhikhar marked 5 inline comments as done.

alexey.zhikhar marked 11 inline comments as done.Jul 12 2021, 2:02 PM

alexey.zhikhar added inline comments.

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
337 ↗	(On Diff #357619)	The idea here is to print the basic block itself if it has no name, e.g.: 1: %2 = ... %3 = ... We can't say `OS << (BB->hasName() ? BB->getName() : *BB);` since both legs of a ternary operator must be of the same type.
485 ↗	(On Diff #357619)	Thanks, I went through the whole file looking for unnecessary curly braces, so now they should be no more. Please feel free to let me know if I missed something.

Harbormaster completed remote builds in B113584: Diff 358059.Jul 12 2021, 3:12 PM

In D99205#2866668, @SjoerdMeijer wrote:

In D99205#2866301, @ChuanqiXu wrote:

I am making an AggressiveJumpThreading pass in downstream to solve the State Machine in Coremark.

We have 2 aggressive jump threading passes in upstream review: this work, and also D88307. If I understand this correctly, you're working on another downstream? If so, would it not be more efficient if we support this work? For me, this work has the most potential as it is under active development as opposed to D88307. Your remark about irreducible control flow is a valid one, which was made earlier too. Eli said this about that:

This is kind of late to be running this pass. That might be useful in some cases: we might be making the code harder to analyze if there's irreducible control flow. On the other hand, we're skipping interesting optimizations if the CFG becomes significantly simpler. My instinct is that the advantages of running earlier outweigh the disadvantages.

which is something that still needs to be figured out I think. I.e., it's run earlier now, but do we need more performance numbers to see if this doesn't regress other stuff? But @ChuanqiXu, if you're working on something similar and have ideas, please consider sharing them here.

Since I only started to see Coremark recently and you have much more experience. I should be fine if there isn't any regression found.

And I omit the problem that more aggressive jump threading would cause irreducible control flow. I heard about that gcc has implemented a version which overcomes the problem. It may be beneficial to look into the details of the gcc implementation for all of us.
This patch looks like a big pattern match to me. It requires a switch statement whose condition are predictable, in other words, DFA.
First question is that is it profitable to make a pass for specific pattern? Since I feel that the DFA pattern wouldn't be normal in codes. Then if the pass turned off by default, I have no problems.

My colleague @jaykang10 found that this also triggers on Perlbench in SPEC.

Cool, I would try to measure it.

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
381 ↗	(On Diff #351540)	So it may be better to add `virtual ~MainSwitch` in the future.
722 ↗	(On Diff #351540)	Is this assumption stable? For the one hand, the motivation example in Coremark wouldn't be lowered Into a binary tree. On the other hand, it may be possible that the switch statement may be lowered to a jump table.

Thanks for the updates.
If I am not mistaken, I still think tests are missing for cases when jump threading should *not* trigger:

I think some tests are missing, mostly negative tests:

test for min code size,

test with blocks that cannot be cloned,

test with some unsupported instructions

In D99205#2873336, @SjoerdMeijer wrote:

Thanks for the updates.
If I am not mistaken, I still think tests are missing for cases when jump threading should *not* trigger:

I think some tests are missing, mostly negative tests:

test for min code size,

test with blocks that cannot be cloned,

test with some unsupported instructions

You're right that they're missing, I have not forgotten about the test cases, I'm working on them and will add them soon. Thanks

SjoerdMeijer added inline comments.Jul 14 2021, 1:43 AM

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
32 ↗	(On Diff #351540)	Thanks for this, and for addressing my previous nitpicks about definitions and terminology. I think it would be convenient to have most of them in one place, i.e. here, describing the high level ideas and terminology used. Sketching a little bit, and copy-pasting from different places, I was thinking of something like this: // Transform each threading path to effectively jump thread the FSM. For // example the CFG below could be transformed as follows, where the cloned // blocks unconditionally branch to the next correct case based on what is // identified in the analysis. // sw.bb sw.bb // / \| \ / \| \ // case1 case2 case3 case1 case2 case3 // \ \| / \| \| \| // determinator det.2 det.3 det.1 // br sw.bb / \| \ // sw.bb.2 sw.bb.3 sw.bb.1 // br case2 br case3 br case1§ // // Defintions and Terminology: // // * Threadable path: // a list of basic blocks, the exit state, and the block that determines // the next state, for which the following notation will be used: // < path of BBs that form a cycle > [ state, determinator ] // // * Predictable switch: // The switch variable is always a known constant so that all conditional // jumps based on switch variable can be converted to unconditional jump. // // * Determinator: // The basic block that determines the next state of the DFA. // // ETC And I think you can keep the comments at the different places inlined in the code, don't think that duplication will harm.
78 ↗	(On Diff #358059)	I think this option is not tested.
83 ↗	(On Diff #358059)	And this one too, so need some more tests for this.
404 ↗	(On Diff #358059)	I don't think we have tests for cases that are not predictable.
458 ↗	(On Diff #358059)	This could probably do with a more descriptive function name, to make more explicit what kind of Value we expect.
495 ↗	(On Diff #358059)	Don't need the brackets here
500 ↗	(On Diff #358059)	and here.

Add optimization remark emitter
Test for min code size
Test with blocks that cannot be cloned
Test with some unsupported instructions
Test unpredictable switch
Replace all mentions of FSM with DFA
Always use the term "Threading Path", instead of interchanging "threading" with "threadable"

alexey.zhikhar marked 7 inline comments as done.Jul 14 2021, 2:52 PM

alexey.zhikhar added inline comments.

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
32 ↗	(On Diff #351540)	Thanks, Sjoerd, I added this to the file header. Generally I prefer to not duplicate documentation for the same reasons as duplicating code, so I removed the long comment from `TransformDFA::createAllExitPaths()` but I can put it back if you insist.
381 ↗	(On Diff #351540)	Let's see if Sjoerd @SjoerdMeijer has an opinion here.
78 ↗	(On Diff #358059)	A test case for `MaxPathDepth` is not ready yet, but duly noted. Thanks
404 ↗	(On Diff #358059)	Please see `negative4`

Harbormaster completed remote builds in B114100: Diff 358757.Jul 14 2021, 4:28 PM

SjoerdMeijer added inline comments.Jul 15 2021, 5:29 AM

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
381 ↗	(On Diff #351540)	No strong opinion. Looks fine as it is.
llvm/test/Transforms/DFAJumpThreading/negative.ll
2 ↗	(On Diff #358757)	Perhaps this works, but thought this should be: --check-prefix=REMARK
185 ↗	(On Diff #358757)	I don't think we want to jump thread a function that has a `minsize` attribute? For example: define i32 @negative5( ) minsize { Need a test for that?

Add a test for the minsize attribute
Test MaxPathDepth with a test case where only a subset of paths is threaded
Reduce the test case for the cost model by passing the threshold through CLI
Fix a bug in how the NumTransforms statistic is incremented
Expand the cost model to include the case for jump tables

alexey.zhikhar marked 3 inline comments as done.Jul 22 2021, 2:46 PM

alexey.zhikhar added inline comments.

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
722 ↗	(On Diff #351540)	That's a good point, I added a separate clause for when a jump table is expected instead of binary search (the same hook is used in inlining).
78 ↗	(On Diff #358059)	Please see `max_path_length`

Harbormaster completed remote builds in B115700: Diff 360979.Jul 22 2021, 5:05 PM

Thanks, looking good. But one last request, sorry, can you add tests for the optimisation remarks too?

In D99205#2899231, @SjoerdMeijer wrote:

Thanks, looking good. But one last request, sorry, can you add tests for the optimisation remarks too?

No worries, Sjoerd, I appreciate your comments.

There are optimization remark tests in negative.ll, do you mean something in addition to that?

In D99205#2899915, @alexey.zhikhar wrote:

In D99205#2899231, @SjoerdMeijer wrote:

Thanks, looking good. But one last request, sorry, can you add tests for the optimisation remarks too?

No worries, Sjoerd, I appreciate your comments.

There are optimization remark tests in negative.ll, do you mean something in addition to that?

Ah, sorry, I had missed that, so ignore that comment.

Thanks for working on this. This LGTM as a first version that is off by default. Having this in tree makes testing and getting different (compile-time) numbers easier, which we need for the follow up to get this enabled by default.

This revision is now accepted and ready to land.Jul 23 2021, 7:08 AM

Rebase on top of the latest main

Harbormaster completed remote builds in B116303: Diff 361839.Jul 26 2021, 5:59 PM

Still LGTM :)

alexey.zhikhar edited the summary of this revision. (Show Details)Jul 27 2021, 7:38 AM

This revision was landed with ongoing or failed builds.Jul 27 2021, 11:36 AM

Closed by commit rG02077da7e7a8: Add jump-threading optimization for deterministic finite automata (authored by alexey.zhikhar, committed by dancgr). · Explain Why

This revision was automatically updated to reflect the committed changes.

dancgr added a commit: rG02077da7e7a8: Add jump-threading optimization for deterministic finite automata.

This appears to create a layering violation. llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp includes llvm/include/llvm/CodeGen/Passes.h and llvm/lib/CodeGen/HardwareLoops.cpp includes llvm/include/llvm/Transforms/Scalar.h creating a cycle between LLVMCodeGen and LLVMScalarOpts

@chandlerc as the owner of layering

Is the include of llvm/CodeGen/Passes.h required? My build appears to succeed without it

In D99205#2908261, @GMNGeoffrey wrote:

This appears to create a layering violation. llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp includes llvm/include/llvm/CodeGen/Passes.h and llvm/lib/CodeGen/HardwareLoops.cpp includes llvm/include/llvm/Transforms/Scalar.h creating a cycle between LLVMCodeGen and LLVMScalarOpts

@chandlerc as the owner of layering

Is the include of llvm/CodeGen/Passes.h required? My build appears to succeed without it

Ah looks like @bkramer just fixed it with https://github.com/llvm/llvm-project/commit/05815c9f638c2a62e1ce9b28b26d74c7bea81f2e, thanks!

Thanks guys

Compile time statistics gathered on CTMark (no change, basically):

	Compile Time
	dfa TRUE	dfa FALSE
test-suite :: CTMark/7zip/7zip-benchmark.test	83.2031	83.3182
test-suite :: CTMark/Bullet/bullet.test	54.6959	54.7607
test-suite :: CTMark/ClamAV/clamscan.test	32.4064	32.8996
test-suite :: CTMark/SPASS/SPASS.test	30.1786	30.0229
test-suite :: CTMark/consumer-typeset/consumer-typeset.test	24.089	24.0834
test-suite :: CTMark/kimwitu++/kc.test	33.7907	34.073
test-suite :: CTMark/lencod/lencod.test	39.3401	39.3303
test-suite :: CTMark/mafft/pairlocalalign.test	21.475	21.408
test-suite :: CTMark/sqlite3/sqlite3.test	31.1729	31.1351
test-suite :: CTMark/tramp3d-v4/tramp3d-v4.test	57.5609	56.1245

Number of transformed switch statements:

CTMark/kimwitu++/kimwl.stats:   "dfa-jump-threading.NumTransforms": 1
CTMark/7zip/LzmaEnc.stats:      "dfa-jump-threading.NumTransforms": 1
CTMark/7zip/DeflateDecoder.stats:       "dfa-jump-threading.NumTransforms": 1
CTMark/7zip/GzHandler.stats:    "dfa-jump-threading.NumTransforms": 1
CTMark/7zip/CabHandler.stats:   "dfa-jump-threading.NumTransforms": 3
CTMark/7zip/ShrinkDecoder.stats:        "dfa-jump-threading.NumTransforms": 1
CTMark/7zip/Update.stats:       "dfa-jump-threading.NumTransforms": 1
CTMark/7zip/List.stats: "dfa-jump-threading.NumTransforms": 1
CTMark/7zip/Extract.stats:      "dfa-jump-threading.NumTransforms": 1
CTMark/7zip/ZipHandlerOut.stats:        "dfa-jump-threading.NumTransforms": 1
CTMark/7zip/TarHandler.stats:   "dfa-jump-threading.NumTransforms": 1
CTMark/7zip/7zUpdate.stats:     "dfa-jump-threading.NumTransforms": 2
CTMark/7zip/XzDec.stats:        "dfa-jump-threading.NumTransforms": 1
CTMark/7zip/OpenArchive.stats:  "dfa-jump-threading.NumTransforms": 1
CTMark/7zip/BwtSort.stats:      "dfa-jump-threading.NumTransforms": 1
CTMark/ClamAV/libclamav_untar.stats:    "dfa-jump-threading.NumTransforms": 1
CTMark/ClamAV/libclamav_message.stats:  "dfa-jump-threading.NumTransforms": 1
CTMark/sqlite3/sqlite3.stats:   "dfa-jump-threading.NumTransforms": 10
CTMark/SPASS/iascanner.stats:   "dfa-jump-threading.NumTransforms": 1
CTMark/SPASS/dfgscanner.stats:  "dfa-jump-threading.NumTransforms": 1
CTMark/consumer-typeset/z36.stats:      "dfa-jump-threading.NumTransforms": 1
CTMark/consumer-typeset/z49.stats:      "dfa-jump-threading.NumTransforms": 1
CTMark/consumer-typeset/z38.stats:      "dfa-jump-threading.NumTransforms": 1

Nice numbers, actually, more “hits” than expected. +1.

Can you cooperate with @nikic to get fresh CT data for enabled dfa pass? Your numbers look fine so it would be great to enable it on by default.

Yes, that looks very promising. I think it is best to create a new patch for that, so that we can have and move this discussion there.

After fixing DT preservation under LegacyPM (https://github.com/llvm/llvm-project/commit/e97524cba2825ed412cd57a95a82c12ab439e171), I get these compile-time numbers: http://llvm-compile-time-tracker.com/compare.php?from=380b8a603c6e8997819726b15a76b8f6c94aa21a&to=abb759c879725b7bc09a466e92c9b9eca7f8f483&stat=instructions Assuming no pathological cases, that looks okay to me.

Thanks everybody.

Internally, we had had a few rounds of testing before posting this patch, but before enabling it by default, we'd like to have a few more to make sure that nothing unexpected will arise. I will submit a patch as soon as we're done with that.

Just a heads up. I have benchmarked this version against our downstream implementation of jump threading and see some regressions:

       Regression
core1  0.4%
core2  0.28%
core3  2.7%
core4  0.24%
core5  1.12%

Especially for the more capable cores "core3" and "core5" the difference is quite big, so we do leave some performance on the table.

I will probably add a reproducer as a regression test and raise a ticket for this, so that we can look into this.

cynecx removed a subscriber: cynecx.Aug 4 2021, 10:10 AM

A couple of comments/clarifications about irreducible CFG that came up in earlier comments: This pass does not generate irreducible CFG for coremark. It does generate irreducible CFG for https://bugs.llvm.org/show_bug.cgi?id=42313. We also checked gcc and it seems that gcc also generate irreducible CFG for PR42313.

The following observation could be helpful: CFG of the output of this pass, (if we collapse some nodes of CFG into one node) will be the same as the structure of the DFA graph represented by the code. I have not thought about it, but I suspect this is generalizable to cases where we partially jump thread a switch statement. This observation can be used to detect irreducible CFG in the analysis step and potentially disable the transformation or convert the code to reducible CFG using known techniques (cost analysis will be needed).

In D99205#2923056, @SjoerdMeijer wrote:
Just a heads up. I have benchmarked this version against our downstream implementation of jump threading and see some regressions:
       Regression
core1  0.4%
core2  0.28%
core3  2.7%
core4  0.24%
core5  1.12%
Especially for the more capable cores "core3" and "core5" the difference is quite big, so we do leave some performance on the table.

I will probably add a reproducer as a regression test and raise a ticket for this, so that we can look into this.

Thanks Sjoerd for the careful review of this patch and feedback provided. It will be interesting to see if the root cause of the performance degradation is in the code generated by the pass, or not. If that is the case, we can investigate whether changing the code gen in this pass is the best way forward, or some extra clean up/optimization is needed somewhere else. We will look into it when more information is available. Thanks again for all the feedback.

No worries, and thanks for working on this!

I had a first look, and I think the output code is just different (*) for the 2 jump threading implementations, so difficult to tell more about this at this point. I will look into this more, but that will be in September after the holiday period.

(*) It is the switch.o3.ll test case from D88307 that I took, which I think is just a reproducer from coremark.

zhaozhengpeng added a subscriber: zhaozhengpeng.Oct 11 2021, 1:31 AM

ychen added a subscriber: ychen.Nov 18 2021, 11:25 AM

I wrote a ticket about a crash with this pass that we stumbled upon in fuzzy testing:
https://github.com/llvm/llvm-project/issues/64860

Herald added a project: Restricted Project. · View Herald TranscriptAug 21 2023, 5:57 AM

Herald added subscribers: hoy, nlopes. · View Herald Transcript

nlopes added inline comments.Aug 21 2023, 10:48 AM

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
1143 ↗	(On Diff #362117)	please use a poison value as placeholder instead of undef. We are trying to remove undef. Thank you!

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

	Passes.h
	Passes.h

5 lines

	InitializePasses.h
	InitializePasses.h

1 line

	LinkAllPasses.h
	LinkAllPasses.h

1 line

lib/

CodeGen/

	CMakeLists.txt
	CMakeLists.txt

1 line

	CodeGen.cpp
	CodeGen.cpp

1 line

Target/

AArch64/

	AArch64TargetMachine.cpp
	AArch64TargetMachine.cpp

3 lines

test/

CodeGen/

AArch64/

	O3-pipeline.ll
	O3-pipeline.ll

2 lines

tools/

opt/

	opt.cpp
	opt.cpp

3 lines

	llvm/

	lib/	CodeGen/
		dev/

	DFAJumpThreading.cpp
	null

1106 lines

	test/	CodeGen/	AArch64/
			dev/

	dfa-constant-propagation.ll
	null

31 lines

	dfa-jump-threading-analysis.ll
	null

107 lines

	dfa-jump-threading-transform.ll
	null

174 lines

	dfa-unfold-select.ll
	null

151 lines

Diff 342832

llvm/include/llvm/CodeGen/Passes.h

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	namespace llvm {
/// If AbortOnFailedISel is true, abort compilation instead of resetting.		/// If AbortOnFailedISel is true, abort compilation instead of resetting.
MachineFunctionPass *createResetMachineFunctionPass(bool EmitFallbackDiag,		MachineFunctionPass *createResetMachineFunctionPass(bool EmitFallbackDiag,
bool AbortOnFailedISel);		bool AbortOnFailedISel);

/// createCodeGenPreparePass - Transform the code to expose more pattern		/// createCodeGenPreparePass - Transform the code to expose more pattern
/// matching during instruction selection.		/// matching during instruction selection.
FunctionPass *createCodeGenPreparePass();		FunctionPass *createCodeGenPreparePass();

		// DFAJumpThreadingPass - When a switch statement inside a loop is used to
		// implement a deterministic finite automata we can jump thread the switch
		// statement reducing number of conditional jumps.
		FunctionPass createDFAJumpThreadingPass(const TargetMachine TM = nullptr);

/// AtomicExpandID -- Lowers atomic operations in terms of either cmpxchg		/// AtomicExpandID -- Lowers atomic operations in terms of either cmpxchg
/// load-linked/store-conditional loops.		/// load-linked/store-conditional loops.
extern char &AtomicExpandID;		extern char &AtomicExpandID;

/// MachineLoopInfo - This pass is a loop analysis pass.		/// MachineLoopInfo - This pass is a loop analysis pass.
extern char &MachineLoopInfoID;		extern char &MachineLoopInfoID;

/// MachineDominators - This pass is a machine dominators analysis pass.		/// MachineDominators - This pass is a machine dominators analysis pass.
▲ Show 20 Lines • Show All 429 Lines • Show Last 20 Lines

llvm/include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 118 Lines • ▼ Show 20 Lines
	void initializeConstraintEliminationPass(PassRegistry &);			void initializeConstraintEliminationPass(PassRegistry &);
	void initializeControlHeightReductionLegacyPassPass(PassRegistry&);			void initializeControlHeightReductionLegacyPassPass(PassRegistry&);
	void initializeCorrelatedValuePropagationPass(PassRegistry&);			void initializeCorrelatedValuePropagationPass(PassRegistry&);
	void initializeCostModelAnalysisPass(PassRegistry&);			void initializeCostModelAnalysisPass(PassRegistry&);
	void initializeCrossDSOCFIPass(PassRegistry&);			void initializeCrossDSOCFIPass(PassRegistry&);
	void initializeDAEPass(PassRegistry&);			void initializeDAEPass(PassRegistry&);
	void initializeDAHPass(PassRegistry&);			void initializeDAHPass(PassRegistry&);
	void initializeDCELegacyPassPass(PassRegistry&);			void initializeDCELegacyPassPass(PassRegistry&);
				void initializeDFAJumpThreadingPass(PassRegistry&);
	void initializeDSELegacyPassPass(PassRegistry&);			void initializeDSELegacyPassPass(PassRegistry&);
	void initializeDataFlowSanitizerLegacyPassPass(PassRegistry &);			void initializeDataFlowSanitizerLegacyPassPass(PassRegistry &);
	void initializeDeadMachineInstructionElimPass(PassRegistry&);			void initializeDeadMachineInstructionElimPass(PassRegistry&);
	void initializeDebugifyMachineModulePass(PassRegistry &);			void initializeDebugifyMachineModulePass(PassRegistry &);
	void initializeDelinearizationPass(PassRegistry&);			void initializeDelinearizationPass(PassRegistry&);
	void initializeDemandedBitsWrapperPassPass(PassRegistry&);			void initializeDemandedBitsWrapperPassPass(PassRegistry&);
	void initializeDependenceAnalysisPass(PassRegistry&);			void initializeDependenceAnalysisPass(PassRegistry&);
	void initializeDependenceAnalysisWrapperPassPass(PassRegistry&);			void initializeDependenceAnalysisWrapperPassPass(PassRegistry&);
	▲ Show 20 Lines • Show All 323 Lines • Show Last 20 Lines

llvm/include/llvm/LinkAllPasses.h

Show First 20 Lines • Show All 168 Lines • ▼ Show 20 Lines	ForcePassLinking() {
(void) llvm::createSROAPass();		(void) llvm::createSROAPass();
(void) llvm::createSingleLoopExtractorPass();		(void) llvm::createSingleLoopExtractorPass();
(void) llvm::createStripSymbolsPass();		(void) llvm::createStripSymbolsPass();
(void) llvm::createStripNonDebugSymbolsPass();		(void) llvm::createStripNonDebugSymbolsPass();
(void) llvm::createStripDeadDebugInfoPass();		(void) llvm::createStripDeadDebugInfoPass();
(void) llvm::createStripDeadPrototypesPass();		(void) llvm::createStripDeadPrototypesPass();
(void) llvm::createTailCallEliminationPass();		(void) llvm::createTailCallEliminationPass();
(void) llvm::createJumpThreadingPass();		(void) llvm::createJumpThreadingPass();
		(void) llvm::createDFAJumpThreadingPass();
(void) llvm::createUnifyFunctionExitNodesPass();		(void) llvm::createUnifyFunctionExitNodesPass();
(void) llvm::createInstCountPass();		(void) llvm::createInstCountPass();
(void) llvm::createConstantHoistingPass();		(void) llvm::createConstantHoistingPass();
(void) llvm::createCodeGenPreparePass();		(void) llvm::createCodeGenPreparePass();
(void) llvm::createEntryExitInstrumenterPass();		(void) llvm::createEntryExitInstrumenterPass();
(void) llvm::createPostInlineEntryExitInstrumenterPass();		(void) llvm::createPostInlineEntryExitInstrumenterPass();
(void) llvm::createEarlyCSEPass();		(void) llvm::createEarlyCSEPass();
(void) llvm::createGVNHoistPass();		(void) llvm::createGVNHoistPass();
▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

llvm/lib/CodeGen/CMakeLists.txt

Show All 14 Lines	add_llvm_component_library(LLVMCodeGen
CFIInstrInserter.cpp		CFIInstrInserter.cpp
CodeGen.cpp		CodeGen.cpp
CodeGenPassBuilder.cpp		CodeGenPassBuilder.cpp
CodeGenPrepare.cpp		CodeGenPrepare.cpp
CommandFlags.cpp		CommandFlags.cpp
CriticalAntiDepBreaker.cpp		CriticalAntiDepBreaker.cpp
DeadMachineInstructionElim.cpp		DeadMachineInstructionElim.cpp
DetectDeadLanes.cpp		DetectDeadLanes.cpp
		DFAJumpThreading.cpp
DFAPacketizer.cpp		DFAPacketizer.cpp
DwarfEHPrepare.cpp		DwarfEHPrepare.cpp
EarlyIfConversion.cpp		EarlyIfConversion.cpp
EdgeBundles.cpp		EdgeBundles.cpp
EHContGuardCatchret.cpp		EHContGuardCatchret.cpp
ExecutionDomainFix.cpp		ExecutionDomainFix.cpp
ExpandMemCmp.cpp		ExpandMemCmp.cpp
ExpandPostRAPseudos.cpp		ExpandPostRAPseudos.cpp
▲ Show 20 Lines • Show All 192 Lines • Show Last 20 Lines

llvm/lib/CodeGen/CodeGen.cpp

Show All 23 Lines	void llvm::initializeCodeGen(PassRegistry &Registry) {
initializeBranchFolderPassPass(Registry);		initializeBranchFolderPassPass(Registry);
initializeBranchRelaxationPass(Registry);		initializeBranchRelaxationPass(Registry);
initializeCFGuardLongjmpPass(Registry);		initializeCFGuardLongjmpPass(Registry);
initializeCFIInstrInserterPass(Registry);		initializeCFIInstrInserterPass(Registry);
initializeCheckDebugMachineModulePass(Registry);		initializeCheckDebugMachineModulePass(Registry);
initializeCodeGenPreparePass(Registry);		initializeCodeGenPreparePass(Registry);
initializeDeadMachineInstructionElimPass(Registry);		initializeDeadMachineInstructionElimPass(Registry);
initializeDebugifyMachineModulePass(Registry);		initializeDebugifyMachineModulePass(Registry);
		initializeDFAJumpThreadingPass(Registry);
initializeDetectDeadLanesPass(Registry);		initializeDetectDeadLanesPass(Registry);
initializeDwarfEHPrepareLegacyPassPass(Registry);		initializeDwarfEHPrepareLegacyPassPass(Registry);
initializeEarlyIfConverterPass(Registry);		initializeEarlyIfConverterPass(Registry);
initializeEarlyIfPredicatorPass(Registry);		initializeEarlyIfPredicatorPass(Registry);
initializeEarlyMachineLICMPass(Registry);		initializeEarlyMachineLICMPass(Registry);
initializeEarlyTailDuplicatePass(Registry);		initializeEarlyTailDuplicatePass(Registry);
initializeExpandMemCmpPassPass(Registry);		initializeExpandMemCmpPassPass(Registry);
initializeExpandPostRAPass(Registry);		initializeExpandPostRAPass(Registry);
▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

llvm/lib/CodeGen/DFAJumpThreading.cpp

This file was added.

				//===- DFAJumpThreading.cpp - Threads a switch statement inside a loop ----===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// When a switch statement inside a loop is used to implement a deterministic
				// finite automaton, we can jump thread the switch statement reducing number of
				// conditional jumps. Currently this does not happen in LLVM jump threading.
				//
				//===----------------------------------------------------------------------===//
				efriedmaUnsubmitted Done Reply Inline Actions Could probably use a brief outline of the overall algorithm here. efriedma: Could probably use a brief outline of the overall algorithm here.

				#include "llvm/ADT/DepthFirstIterator.h"
				#include "llvm/ADT/SmallSet.h"
				#include "llvm/Analysis/LoopIterator.h"
				#include "llvm/CodeGen/Passes.h"
				#include "llvm/IR/CFG.h"
				#include "llvm/IR/Constants.h"
				#include "llvm/IR/Verifier.h"
				#include "llvm/InitializePasses.h"
				#include "llvm/Pass.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Transforms/Utils/BasicBlockUtils.h"
				#include "llvm/Transforms/Utils/Cloning.h"
				#include "llvm/Transforms/Utils/SSAUpdaterBulk.h"
				#include "llvm/Transforms/Utils/ValueMapper.h"
				#include <list>
				#include <unordered_map>
				#include <unordered_set>

				using namespace llvm;

				#define DEBUG_TYPE "dfa-jump-threading"

				static cl::opt<bool> DisableThis("disable-dfa-jump-thread",
				cl::desc("Disable DFA jump threading."),
				cl::Hidden, cl::init(false));

				static cl::opt<bool>
				IsViewCFG("dfa-jump-view-cfg",
				cl::desc("View the CFG before DFA Jump Threading"), cl::Hidden,
				cl::init(false));

				static cl::opt<int> MaxPathDepth(
				"dfa-max-path-length",
				cl::desc("Max number of blocks searched to find threadable path"),
				cl::Hidden, cl::init(20));

				namespace {

				class SelectInstToUnfold {
				SelectInst *SI;
				PHINode *SIUse;

				public:
				SelectInstToUnfold(SelectInst SI, PHINode SIUse) : SI(SI), SIUse(SIUse) {}

				SelectInst *getInst() { return SI; }
				PHINode *getUse() { return SIUse; }

				explicit operator bool() const { return SI && SIUse; }
				};

				void unfold(DomTreeUpdater *DTU, SelectInstToUnfold SIToUnfold,
				std::vector<SelectInstToUnfold> *NewSIsToUnfold,
				std::vector<BasicBlock > NewBBs);

				class DFAJumpThreading : public FunctionPass {
				public:
				static char ID; // Pass identification
				DFAJumpThreading() : FunctionPass(ID) {
				initializeDFAJumpThreadingPass(*PassRegistry::getPassRegistry());
				}

				~DFAJumpThreading() override = default;

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.addRequired<DominatorTreeWrapperPass>();
				}

				bool runOnFunction(Function &F) override;

				private:
				void unfoldSelectInstrs(DominatorTree *DT,
				const std::vector<SelectInstToUnfold> &SelectInsts) {
				DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);
				std::list<SelectInstToUnfold> Q;
				for (SelectInstToUnfold SIToUnfold : SelectInsts) {
				Q.push_back(SIToUnfold);
				}

				while (!Q.empty()) {
				SelectInstToUnfold SIToUnfold = Q.front();
				Q.pop_front();

				std::vector<SelectInstToUnfold> NewSIsToUnfold;
				std::vector<BasicBlock *> NewBBs;
				unfold(&DTU, SIToUnfold, &NewSIsToUnfold, &NewBBs);

				// Put newly discovered select instructions into the work list.
				for (const SelectInstToUnfold &NewSIToUnfold : NewSIsToUnfold) {
				Q.push_back(NewSIToUnfold);
				}
				}
				}
				};
				efriedmaUnsubmitted Done Reply Inline Actions Don't use std::list. I think you can use SmallVector and pop_back(), assuming the iteration order doesn't matter. If you need LIFO order, use std::deque. efriedma: Don't use std::list. I think you can use SmallVector and pop_back(), assuming the iteration…
				jkreinerUnsubmitted Done Reply Inline Actions Okay, there's a few places std::list was used so I'll replace them all. jkreiner: Okay, there's a few places std::list was used so I'll replace them all.
				} // end anonymous namespace

				char DFAJumpThreading::ID = 0;
				INITIALIZE_PASS_BEGIN(DFAJumpThreading, "dfa-jump-threading",
				"DFA Jump Threading", false, false)
				INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
				INITIALIZE_PASS_END(DFAJumpThreading, "dfa-jump-threading",
				"DFA Jump Threading", false, false)

				// Public interface to the DFA Jump Threading pass
				FunctionPass llvm::createDFAJumpThreadingPass(const TargetMachine TM) {
				return new DFAJumpThreading();
				}

				namespace {

				/// Unfold the select instruction held in \p SIToUnfold.
				///
				/// Put newly discovered select instructions into \p NewSIsToUnfold. Put newly
				/// created basic blocks into \p NewBBs.
				///
				/// This routine is inspired by CodeGenPrepare::optimizeSelectInst().
				void unfold(DomTreeUpdater *DTU, SelectInstToUnfold SIToUnfold,
				std::vector<SelectInstToUnfold> *NewSIsToUnfold,
				std::vector<BasicBlock > NewBBs) {
				SelectInst *SI = SIToUnfold.getInst();
				PHINode *SIUse = SIToUnfold.getUse();
				BasicBlock *StartBlock = SI->getParent();
				BasicBlock *EndBlock = SIUse->getParent();
				BranchInst *StartBlockTerm =
				dyn_cast<BranchInst>(StartBlock->getTerminator());

				assert(StartBlockTerm && StartBlockTerm->isUnconditional());
				assert(SI->hasOneUse());

				// These are the new basic blocks for the conditional branch.
				// At least one will become an actual new basic block.
				BasicBlock *TrueBlock = nullptr;
				BasicBlock *FalseBlock = nullptr;
				BranchInst *TrueBranch = nullptr;
				BranchInst *FalseBranch = nullptr;

				// Sink select instructions to be able to unfold them later.
				if (SelectInst *SIOp = dyn_cast<SelectInst>(SI->getTrueValue())) {
				assert(SIOp->hasOneUse());
				TrueBlock = BasicBlock::Create(SI->getContext(), "si.unfold.true",
				EndBlock->getParent(), EndBlock);
				NewBBs->push_back(TrueBlock);
				TrueBranch = BranchInst::Create(EndBlock, TrueBlock);
				SIOp->moveBefore(TrueBranch);
				NewSIsToUnfold->push_back(SelectInstToUnfold(SIOp, SIUse));
				DTU->applyUpdates({{DominatorTree::Insert, TrueBlock, EndBlock}});
				}
				if (SelectInst *SIOp = dyn_cast<SelectInst>(SI->getFalseValue())) {
				assert(SIOp->hasOneUse());
				FalseBlock = BasicBlock::Create(SI->getContext(), "si.unfold.false",
				EndBlock->getParent(), EndBlock);
				NewBBs->push_back(FalseBlock);
				FalseBranch = BranchInst::Create(EndBlock, FalseBlock);
				SIOp->moveBefore(FalseBranch);
				NewSIsToUnfold->push_back(SelectInstToUnfold(SIOp, SIUse));
				DTU->applyUpdates({{DominatorTree::Insert, FalseBlock, EndBlock}});
				}

				// If there was nothing to sink, then arbitrarily choose the 'false' side
				// for a new input value to the PHI.
				if (!TrueBlock && !FalseBlock) {
				FalseBlock = BasicBlock::Create(SI->getContext(), "si.unfold.false",
				EndBlock->getParent(), EndBlock);
				NewBBs->push_back(FalseBlock);
				BranchInst::Create(EndBlock, FalseBlock);
				DTU->applyUpdates({{DominatorTree::Insert, FalseBlock, EndBlock}});
				}
				efriedmaUnsubmitted Not Done Reply Inline Actions I don't think this applyUpdates() call does anything? The DomTree doesn't care about unreachable edges. efriedma: I don't think this applyUpdates() call does anything? The DomTree doesn't care about…
				jkreinerUnsubmitted Done Reply Inline Actions Wouldn't it be needed since a new block is created? This edge is reached sometimes. I may be misunderstanding the usage of the DomTreeUpdater though. jkreiner: Wouldn't it be needed since a new block is created? This edge is reached sometimes. I may be…

				// Insert the real conditional branch based on the original condition.
				// If we did not create a new block for one of the 'true' or 'false' paths
				// of the condition, it means that side of the branch goes to the end block
				// directly and the path originates from the start block from the point of
				// view of the new PHI.
				BasicBlock *TT = nullptr;
				BasicBlock *FT = nullptr;
				if (!TrueBlock) {
				// A triangle pointing right.
				TT = EndBlock;
				FT = FalseBlock;
				TrueBlock = StartBlock;

				// Update the phi node of SI.
				for (unsigned Idx = 0; Idx < SIUse->getNumIncomingValues(); ++Idx) {
				if (SIUse->getIncomingBlock(Idx) == StartBlock) {
				SIUse->setIncomingValue(Idx, SI->getTrueValue());
				}
				}
				SIUse->addIncoming(SI->getFalseValue(), FalseBlock);

				// Update any other PHI nodes in EndBlock.
				for (auto II = EndBlock->begin(); PHINode *Phi = dyn_cast<PHINode>(II);
				++II) {
				if (Phi != SIUse) {
				Phi->addIncoming(Phi->getIncomingValueForBlock(StartBlock), FalseBlock);
				}
				}
				} else if (!FalseBlock) {
				// A triangle pointing left.
				efriedmaUnsubmitted Done Reply Inline Actions Can you merge together the two triangle codepaths? They appear nearly identical. Probably just need a couple std::swap calls. efriedma: Can you merge together the two triangle codepaths? They appear nearly identical. Probably…
				jkreinerUnsubmitted Done Reply Inline Actions Sure I'll look into merging these two cases. jkreiner: Sure I'll look into merging these two cases.
				TT = TrueBlock;
				FT = EndBlock;
				FalseBlock = StartBlock;

				// Update the phi node of SI.
				for (unsigned Idx = 0; Idx < SIUse->getNumIncomingValues(); ++Idx) {
				if (SIUse->getIncomingBlock(Idx) == StartBlock) {
				SIUse->setIncomingValue(Idx, SI->getFalseValue());
				}
				}
				SIUse->addIncoming(SI->getTrueValue(), TrueBlock);

				// Update any other PHI nodes in EndBlock.
				for (auto II = EndBlock->begin(); PHINode *Phi = dyn_cast<PHINode>(II);
				++II) {
				if (Phi != SIUse) {
				Phi->addIncoming(Phi->getIncomingValueForBlock(StartBlock), TrueBlock);
				}
				}
				} else {
				// A diamond.
				TT = TrueBlock;
				FT = FalseBlock;

				// Update the phi node of SI.
				SIUse->removeIncomingValue(StartBlock, /* DeletePHIIfEmpty = */ false);
				SIUse->addIncoming(SI->getTrueValue(), TrueBlock);
				SIUse->addIncoming(SI->getFalseValue(), FalseBlock);

				// Update any other PHI nodes in EndBlock.
				for (auto II = EndBlock->begin(); PHINode *Phi = dyn_cast<PHINode>(II);
				++II) {
				if (Phi != SIUse) {
				Phi->addIncoming(Phi->getIncomingValueForBlock(StartBlock), TrueBlock);
				Phi->addIncoming(Phi->getIncomingValueForBlock(StartBlock), FalseBlock);
				}
				}
				}
				StartBlockTerm->eraseFromParent();
				BranchInst::Create(TT, FT, SI->getCondition(), StartBlock);
				DTU->applyUpdates({{DominatorTree::Insert, StartBlock, TT},
				{DominatorTree::Insert, StartBlock, FT}});

				// The select is now dead.
				SI->eraseFromParent();
				}

				struct ClonedBlock {
				BasicBlock *BB;
				uint64_t State;
				};

				typedef std::list<BasicBlock *> PathType;
				typedef std::list<PathType> PathsType;
				typedef std::set<const BasicBlock *> VisitedBlocks;
				typedef std::vector<ClonedBlock> CloneList;

				// This data structure keeps track of all blocks that have been cloned in the
				// optimization. If two different ThreadingPaths clone the same block for a
				// certain state it should be reused, and it can be looked up in this map.
				typedef std::unordered_map<BasicBlock *, CloneList> DuplicateBlockMap;

				// This map keeps track of all the new definitions for an instruction. This
				// information is needed when restoring SSA form after cloning blocks.
				typedef std::unordered_map<Instruction , std::vector<Instruction >> DefMap;

				inline raw_ostream &operator<<(raw_ostream &OS, const PathType &Path) {
				OS << "< ";
				for (const BasicBlock *BB : Path) {
				OS << BB->getName() << " ";
				}
				OS << ">";
				return OS;
				}

				struct ThreadingPath {
				uint64_t getEntryValue() const { return EntryVal; }
				void setEntryValue(const ConstantInt *V) {
				EntryVal = V->getZExtValue();
				IsEntryValSet = true;
				}
				bool isEntryValueSet() const { return IsEntryValSet; }

				uint64_t getExitValue() const { return ExitVal; }
				void setExitValue(const ConstantInt *V) {
				ExitVal = V->getZExtValue();
				IsExitValSet = true;
				}
				bool isExitValueSet() const { return IsExitValSet; }

				const BasicBlock *getDeterminatorBB() const { return DBB; }
				void setDeterminator(const BasicBlock *BB) { DBB = BB; }

				const PathType &getPath() const { return Path; }
				void setPath(const PathType &NewPath) { Path = NewPath; }

				void print(raw_ostream &OS) const {
				OS << Path << " [ " << EntryVal << ", " << ExitVal << ", " << DBB->getName()
				<< " ]";
				}

				private:
				PathType Path;
				uint64_t EntryVal;
				uint64_t ExitVal;
				const BasicBlock *DBB = nullptr;
				bool IsEntryValSet = false;
				bool IsExitValSet = false;
				};

				inline raw_ostream &operator<<(raw_ostream &OS, const ThreadingPath &TPath) {
				TPath.print(OS);
				return OS;
				}

				struct MainSwitch {
				MainSwitch(SwitchInst *SI) : SI(SI) {
				if (isPredictable(SI))
				Instr = SI;
				}

				virtual ~MainSwitch() = default;

				SwitchInst *getInstr() const { return Instr; }
				const std::vector<SelectInstToUnfold> getSelectInsts() { return SelectInsts; }

				/// Returns true if \p BB is a destination of the MainSwitch that is not
				/// default.
				bool isNonDefaultDest(const BasicBlock *BB) const {
				for (auto Case : dyn_cast<SwitchInst>(Instr)->cases()) {
				if (Case.getCaseSuccessor() == BB) {
				return true;
				}
				}
				return false;
				}

				const ConstantInt getValueFor(const BasicBlock BB) const {
				for (auto Case : dyn_cast<SwitchInst>(Instr)->cases()) {
				if (Case.getCaseSuccessor() == BB) {
				return Case.getCaseValue();
				}
				}
				return nullptr;
				}

				private:
				/// Do a use-def chain traversal. Make sure the value of the switch variable
				/// is always a known constant. This means that all conditional jumps based on
				/// switch variable can be converted to unconditional jump.
				bool isPredictable(const SwitchInst *SI) {
				std::list<Instruction *> Q;
				SmallSet<Value *, 16> SeenValues;
				SelectInsts.clear();

				Value *FirstDef = SI->getOperand(0);

				if (!FirstDef->getType()->isIntegerTy())
				return false;
				efriedmaUnsubmitted Done Reply Inline Actions Checking `FirstDef->getType()->isIntegerTy()` isn't necessary, by the definition of SwitchInst. efriedma: Checking `FirstDef->getType()->isIntegerTy()` isn't necessary, by the definition of SwitchInst.

				auto *Inst = dyn_cast<Instruction>(FirstDef);

				efriedmaUnsubmitted Done Reply Inline Actions Do you check somewhere that there's exactly one ConstantInt associated with a given successor? In general, there could be any number. efriedma: Do you check somewhere that there's exactly one ConstantInt associated with a given successor?
				jkreinerUnsubmitted Done Reply Inline Actions Good point, this case isn't checked for so it would currently cause buggy behavior in this function. Now that I think about it, this function can probably be removed since the return value stored as the EntryValue isn't used in the transformation. jkreiner: Good point, this case isn't checked for so it would currently cause buggy behavior in this…
				// If this is a function argument or another non-instruction, then give up.
				// We are interested in loop local variables.
				if (!Inst)
				return false;

				// Require the first definition to be a PHINode
				if (!isa<PHINode>(Inst))
				return false;

				LLVM_DEBUG(dbgs() << "\tisPredictable() FirstDef: " << *Inst << "\n");

				Q.push_back(Inst);
				SeenValues.insert(FirstDef);

				while (!Q.empty()) {
				Instruction *Current = Q.front();
				Q.pop_front();

				if (auto *Phi = dyn_cast<PHINode>(Current)) {
				for (Value *Incoming : Phi->incoming_values()) {
				if (!isValidValue(Incoming, SeenValues)) {
				return false;
				}
				addInstToQueue(Incoming, Q, SeenValues);
				}
				LLVM_DEBUG(dbgs() << "\tisPredictable() phi: " << *Phi << "\n");
				} else if (SelectInst *SelI = dyn_cast<SelectInst>(Current)) {
				if (!isValidSelectInst(SelI)) {
				return false;
				}
				if (!isValidValue(SelI->getTrueValue(), SeenValues) \|\|
				!isValidValue(SelI->getFalseValue(), SeenValues)) {
				return false;
				}
				addInstToQueue(SelI->getTrueValue(), Q, SeenValues);
				addInstToQueue(SelI->getFalseValue(), Q, SeenValues);
				LLVM_DEBUG(dbgs() << "\tisPredictable() select: " << *SelI << "\n");
				if (auto SelIUse = dyn_cast<PHINode>(SelI->user_back())) {
				SelectInsts.push_back(SelectInstToUnfold(SelI, SelIUse));
				}
				} else {
				// If it is neither a phi nor a select, then we give up.
				return false;
				}
				}

				return true;
				}

				bool isValidValue(Value InpVal, SmallSet<Value , 16> &SeenValues) {
				if (SeenValues.find(InpVal) != SeenValues.end())
				return true;

				if (isa<ConstantInt>(InpVal))
				return true;

				// If this is a function argument or another non-instruction, then give up.
				Instruction *I = dyn_cast<Instruction>(InpVal);
				if (I == nullptr)
				return false;

				return true;
				}

				void addInstToQueue(Value Val, std::list<Instruction > &Q,
				SmallSet<Value *, 16> &SeenValues) {
				if (SeenValues.find(Val) != SeenValues.end())
				return;
				if (Instruction *I = dyn_cast<Instruction>(Val)) {
				Q.push_back(I);
				}
				SeenValues.insert(Val);
				}

				bool isValidSelectInst(SelectInst *SI) {
				efriedmaUnsubmitted Done Reply Inline Actions isa<Instruction>() if you're not going to use the pointer returned by dyn_cast<>. efriedma: isa<Instruction>() if you're not going to use the pointer returned by dyn_cast<>.
				// Vector select instructions are not supported.
				if (!SI->getCondition()->getType()->isIntegerTy(1)) {
				efriedmaUnsubmitted Done Reply Inline Actions Don't think you can end up with a vector select here; you've already constrained the result to be an integer. efriedma: Don't think you can end up with a vector select here; you've already constrained the result to…
				jkreinerUnsubmitted Done Reply Inline Actions Yes, I agree the vector select check was unnecessary here. jkreiner: Yes, I agree the vector select check was unnecessary here.
				return false;
				}

				if (!SI->hasOneUse()) {
				return false;
				}

				Instruction *SIUse = dyn_cast<Instruction>(SI->user_back());
				// The use of the select inst should be either a phi or another select.
				if (!SIUse && !(isa<PHINode>(SIUse) \|\| isa<SelectInst>(SIUse))) {
				return false;
				}

				BasicBlock *SIBB = SI->getParent();

				// Currently, we can only expand select instructions in basic blocks with
				// one successor.
				BranchInst *SITerm = dyn_cast<BranchInst>(SIBB->getTerminator());
				if (!SITerm \|\| !SITerm->isUnconditional()) {
				return false;
				}

				if (isa<PHINode>(SIUse) && SIBB->getSingleSuccessor() !=
				dyn_cast<Instruction>(SIUse)->getParent()) {
				return false;
				}

				// If select will not be sunk during unfolding, and it is in the same basic
				// block as another state defining select, then cannot unfold both.
				for (SelectInstToUnfold SIToUnfold : SelectInsts) {
				SelectInst *PrevSI = SIToUnfold.getInst();
				if (PrevSI->getTrueValue() != SI && PrevSI->getFalseValue() != SI &&
				PrevSI->getParent() == SI->getParent())
				return false;
				}

				return true;
				}

				SwitchInst *SI;
				SwitchInst *Instr = nullptr;
				std::vector<SelectInstToUnfold> SelectInsts;
				};

				struct AllSwitchPaths {
				AllSwitchPaths(const MainSwitch *MSwitch)
				: Switch(MSwitch->getInstr()), SwitchBlock(Switch->getParent()),
				MSwitch(MSwitch) {}

				std::list<ThreadingPath> &getThreadingPaths() { return TPaths; }
				unsigned getNumThreadingPaths() { return TPaths.size(); }
				SwitchInst *getSwitchInst() { return Switch; }
				BasicBlock *getSwitchBlock() { return SwitchBlock; }

				void run() {
				VisitedBlocks Visited;
				PathsType LoopPaths = paths(SwitchBlock, Visited, /* PathDepth = */ 1);
				StateDefMap StateDef = getStateDefMap();

				for (PathType Path : LoopPaths) {
				ThreadingPath TPath;

				const BasicBlock *PrevBB = nullptr;
				for (const BasicBlock *BB : Path) {
				if (MSwitch->isNonDefaultDest(BB) && !TPath.isEntryValueSet()) {
				const ConstantInt *Val = MSwitch->getValueFor(BB);
				TPath.setEntryValue(Val);
				}

				if (TPath.isEntryValueSet() && StateDef.count(BB) != 0) {
				const PHINode *Phi = dyn_cast<PHINode>(StateDef[BB]);
				assert(Phi && "Expected a state-defining instr to be a phi node.");

				const Value *V = Phi->getIncomingValueForBlock(PrevBB);
				if (const ConstantInt *C = dyn_cast<const ConstantInt>(V)) {
				TPath.setExitValue(C);
				TPath.setDeterminator(BB);
				TPath.setPath(Path);
				}
				}

				PrevBB = BB;
				}

				// The next state may be defined in the switch block itself. If an exit
				// value was not found, then check if the switch block is the determinator
				if (!TPath.isExitValueSet() && StateDef.count(SwitchBlock) != 0) {
				const PHINode *Phi = dyn_cast<PHINode>(StateDef[SwitchBlock]);
				assert(Phi && "Expected a state-defining instr to be a phi node.");

				if (Phi->getBasicBlockIndex(PrevBB) != -1) {
				const Value *V = Phi->getIncomingValueForBlock(PrevBB);
				if (const ConstantInt *C = dyn_cast<const ConstantInt>(V)) {
				TPath.setExitValue(C);
				TPath.setDeterminator(SwitchBlock);
				TPath.setPath(Path);
				}
				}
				}

				if (TPath.isExitValueSet())
				TPaths.push_back(TPath);
				}

				for (const ThreadingPath &TPath : TPaths) {
				LLVM_DEBUG(dbgs() << TPath << "\n");
				}
				}

				private:
				// Value: an instruction that defines a switch state;
				// Key: the parent basic block of that instruction.
				typedef std::unordered_map<const BasicBlock , const PHINode > StateDefMap;
				efriedmaUnsubmitted Done Reply Inline Actions Prefer llvm::DenseMap efriedma: Prefer llvm::DenseMap

				PathsType paths(BasicBlock *BB, VisitedBlocks &Visited, int PathDepth) const {
				PathsType Res;

				// Stop exploring paths after visiting MaxPathLength blocks
				if (PathDepth > MaxPathDepth)
				return Res;

				Visited.insert(BB);

				// Some blocks have multiple edges to the same successor, and this set
				// is used to prevent a duplicate path from being generated
				SmallSet<BasicBlock *, 4> Successors;

				for (succ_iterator SI = succ_begin(BB), E = succ_end(BB); SI != E; ++SI) {
				BasicBlock Succ = SI;

				if (Successors.find(Succ) != Successors.end())
				continue;
				Successors.insert(Succ);

				// Found a cycle through the SwitchBlock
				if (Succ == SwitchBlock) {
				Res.push_back({BB});
				continue;
				}

				// We have encountered a cycle, do not get caught in it
				if (Visited.find(Succ) != Visited.end())
				continue;

				PathsType SuccPaths = paths(Succ, Visited, PathDepth + 1);
				for (PathType Path : SuccPaths) {
				PathType NewPath(Path);
				NewPath.push_front(BB);
				Res.push_back(NewPath);
				}
				}
				// This block could now be visited again from a different predecessor. Note
				// that this will result in exponential runtime. Subpaths could possibly be
				// cached but it takes a lot of memory to store them.
				Visited.erase(BB);
				return Res;
				}

				/// Walk the use-def chain and collect all the state-defining instructions.
				StateDefMap getStateDefMap() const {
				StateDefMap Res;

				Value *FirstDef = Switch->getOperand(0);

				assert(isa<PHINode>(FirstDef) && "After select unfolding, all state "
				"definitions are expected to be phi "
				"nodes.");

				std::list<PHINode *> Q;
				Q.push_back(dyn_cast<PHINode>(FirstDef));
				SmallSet<Value *, 16> SeenValues;

				while (!Q.empty()) {
				PHINode *CurPhi = Q.front();
				Q.pop_front();

				Res[CurPhi->getParent()] = CurPhi;
				SeenValues.insert(CurPhi);

				for (Value *Incoming : CurPhi->incoming_values()) {
				if (Incoming == FirstDef \|\| isa<ConstantInt>(Incoming) \|\|
				SeenValues.find(Incoming) != SeenValues.end()) {
				continue;
				}

				assert(isa<PHINode>(Incoming) && "After select unfolding, all state "
				"definitions are expected to be phi "
				"nodes.");

				Q.push_back(dyn_cast<PHINode>(Incoming));
				efriedmaUnsubmitted Done Reply Inline Actions `cast<PHINode>(Incoming)` efriedma: `cast<PHINode>(Incoming)`
				}
				}

				return Res;
				}

				SwitchInst *Switch;
				BasicBlock *SwitchBlock;
				const MainSwitch *MSwitch;
				std::list<ThreadingPath> TPaths;
				};

				struct TransformFSM {
				TransformFSM(AllSwitchPaths SwitchPaths, DominatorTree DT)
				: SwitchPaths(SwitchPaths), DT(DT) {}

				void run() { createAllExitPaths(); }

				private:
				/// Transform each threading path to effectively jump thread the FSM. For
				/// example the CFG below could be transformed as follows, where the cloned
				/// blocks unconditionally branch to the next correct case based on what is
				/// identified in the analysis.
				/// sw.bb sw.bb
				/// / \| \ / \| \
				/// case1 case2 case3 case1 case2 case3
				/// \ \| / \| \| \|
				/// determinator det.2 det.3 det.1
				/// br sw.bb / \| \
				/// sw.bb.2 sw.bb.3 sw.bb.1
				/// br case2 br case3 br case1
				void createAllExitPaths() {
				DomTreeUpdater DTU(*DT, DomTreeUpdater::UpdateStrategy::Eager);

				// Move the switch block to the end of the path, since it will be duplicated
				BasicBlock *SwitchBlock = SwitchPaths->getSwitchBlock();
				for (ThreadingPath &TPath : SwitchPaths->getThreadingPaths()) {
				PathType NewPath(TPath.getPath());
				NewPath.push_back(SwitchBlock);
				TPath.setPath(NewPath);
				}

				// Transform the ThreadingPaths and keep track of the cloned values
				DuplicateBlockMap DuplicateMap;
				DefMap NewDefs;

				SmallSet<BasicBlock *, 16> BlocksToClean;
				for (BasicBlock *BB : successors(SwitchBlock)) {
				efriedmaUnsubmitted Done Reply Inline Actions Probably want to ensure we're only calling collectEphemeralValues once per function. efriedma: Probably want to ensure we're only calling collectEphemeralValues once per function.
				BlocksToClean.insert(BB);
				}

				for (ThreadingPath &TPath : SwitchPaths->getThreadingPaths()) {
				createExitPath(NewDefs, TPath, DuplicateMap, BlocksToClean, &DTU);
				}

				// After all paths are cloned, now update the last successor of the cloned
				// path so it skips over the switch statement
				for (ThreadingPath &TPath : SwitchPaths->getThreadingPaths()) {
				updateLastSuccessor(TPath, DuplicateMap, &DTU);
				}

				// For each instruction that was cloned and used outside, update its uses
				updateSSA(NewDefs);

				// Clean PHI Nodes for the newly created blocks
				for (BasicBlock *BB : BlocksToClean) {
				cleanPhiNodes(BB);
				}
				}

				/// For a specific ThreadingPath, create an exit path starting from the
				/// determinator block. To remember the correct destination, we have to
				/// duplicate blocks corresponding to each state. Also update the terminating
				/// instruction of the predecessors, and phis in the successor blocks.
				void createExitPath(DefMap &NewDefs, ThreadingPath &Path,
				DuplicateBlockMap &DuplicateMap,
				SmallSet<BasicBlock *, 16> &BlocksToClean,
				DomTreeUpdater *DTU) {
				uint64_t NextState = Path.getExitValue();
				const BasicBlock *Determinator = Path.getDeterminatorBB();
				PathType PathBBs = Path.getPath();

				// Don't select the placeholder block in front
				if (PathBBs.front() == Determinator)
				PathBBs.pop_front();

				auto DetIt = std::find(PathBBs.begin(), PathBBs.end(), Determinator);
				auto Prev = std::prev(DetIt);
				BasicBlock PrevBB = Prev;
				for (auto BBIt = DetIt; BBIt != PathBBs.end(); BBIt++) {
				BasicBlock BB = BBIt;
				BlocksToClean.insert(BB);

				// We already cloned BB for this NextState, now just update the branch
				// and continue.
				BasicBlock *NextBB = getClonedBB(BB, NextState, DuplicateMap);
				if (NextBB) {
				updatePredecessor(PrevBB, BB, NextBB, DTU);
				jkreinerUnsubmitted Done Reply Inline Actions Here is the cost calculation we implemented. It could be more accurate by accounting for instructions and blocks that get simplified, but for now it at least prevents code explosion. jkreiner: Here is the cost calculation we implemented. It could be more accurate by accounting for…
				efriedmaUnsubmitted Done Reply Inline Actions Okay, makes sense. efriedma: Okay, makes sense.
				PrevBB = NextBB;
				continue;
				}

				// Clone the BB and update the successor of Prev to jump to the new block
				BasicBlock *NewBB = cloneBlockAndUpdatePredecessor(
				BB, PrevBB, NextState, DuplicateMap, NewDefs, DTU);
				DuplicateMap[BB].push_back({NewBB, NextState});
				BlocksToClean.insert(NewBB);
				PrevBB = NewBB;
				}
				}

				/// Restore SSA form after cloning blocks. Each cloned block creates new defs
				/// for a variable, and the uses need to be updated to reflect this. The uses
				/// may be replaced with a cloned value, or some derived phi instruction. Note
				/// that all uses of a value defined in the same block were already remapped
				/// when cloning the block.
				void updateSSA(DefMap &NewDefs) {
				SSAUpdaterBulk SSAUpdate;
				SmallVector<Use *, 16> UsesToRename;

				for (auto KV : NewDefs) {
				Instruction *I = KV.first;
				BasicBlock *BB = I->getParent();
				std::vector<Instruction *> Cloned = KV.second;

				// Scan all uses of this instruction to see if it is used outside of its
				// block, and if so, record them in UsesToRename.
				for (Use &U : I->uses()) {
				Instruction *User = cast<Instruction>(U.getUser());
				if (PHINode *UserPN = dyn_cast<PHINode>(User)) {
				if (UserPN->getIncomingBlock(U) == BB)
				continue;
				} else if (User->getParent() == BB) {
				continue;
				}

				UsesToRename.push_back(&U);
				}

				// If there are no uses outside the block, we're done with this
				// instruction.
				if (UsesToRename.empty())
				continue;
				LLVM_DEBUG(dbgs() << "DFA-JT: Renaming non-local uses of: " << *I
				<< "\n");

				// We found a use of I outside of BB. Rename all uses of I that are
				// outside its block to be uses of the appropriate PHI node etc. See
				// ValuesInBlocks with the values we know.
				unsigned VarNum = SSAUpdate.AddVariable(I->getName(), I->getType());
				SSAUpdate.AddAvailableValue(VarNum, BB, I);
				for (Instruction *New : Cloned)
				SSAUpdate.AddAvailableValue(VarNum, New->getParent(), New);

				while (!UsesToRename.empty())
				SSAUpdate.AddUse(VarNum, UsesToRename.pop_back_val());

				LLVM_DEBUG(dbgs() << "\n");
				}
				// SSAUpdater handles phi placement and renaming uses with the appropriate
				// value.
				SSAUpdate.RewriteAllUses(DT);
				}

				/// Clones a basic block, and adds it to the CFG. This function also includes
				/// updating phi nodes in the successors of the BB, and remapping uses that
				/// were defined locally in the cloned BB.
				BasicBlock cloneBlockAndUpdatePredecessor(BasicBlock BB, BasicBlock *PrevBB,
				uint64_t NextState,
				DuplicateBlockMap &DuplicateMap,
				DefMap &NewDefs,
				DomTreeUpdater *DTU) {
				ValueToValueMapTy VMap;
				BasicBlock *NewBB = CloneBasicBlock(
				BB, VMap, ".jt" + std::to_string(NextState), BB->getParent());
				NewBB->moveAfter(BB);

				for (Instruction &I : *NewBB) {
				// Do not remap operands of PHINode in case a definition in BB is an
				// incoming value to a phi in the same block. This incoming value will
				// be renamed later while restoring SSA.
				if (isa<PHINode>(&I))
				continue;
				RemapInstruction(&I, VMap,
				RF_IgnoreMissingLocals \| RF_NoModuleLevelChanges);
				}

				updateSuccessorPhis(BB, NewBB, NextState, VMap, DuplicateMap);
				updatePredecessor(PrevBB, BB, NewBB, DTU);
				updateDefMap(NewDefs, VMap);

				// Add all successors to the DominatorTree
				SmallPtrSet<BasicBlock *, 4> SuccSet;
				for (auto *SuccBB : successors(NewBB)) {
				if (SuccSet.insert(SuccBB).second)
				DTU->applyUpdates({{DominatorTree::Insert, NewBB, SuccBB}});
				}
				SuccSet.clear();
				return NewBB;
				}

				/// Update the phi nodes in BB's successors. This means creating a new
				/// incoming value from NewBB with the new instruction wherever there is
				/// an incoming value from BB.
				void updateSuccessorPhis(BasicBlock BB, BasicBlock ClonedBB,
				uint64_t NextState, ValueToValueMapTy &VMap,
				DuplicateBlockMap &DuplicateMap) {
				std::vector<BasicBlock *> BlocksToUpdate;

				// If BB is the last block in the path, we can simply update the one case
				// successor that will be reached.
				if (BB == SwitchPaths->getSwitchBlock()) {
				SwitchInst *Switch = SwitchPaths->getSwitchInst();
				BasicBlock *NextCase = getNextCaseSuccessor(Switch, NextState);
				BlocksToUpdate.push_back(NextCase);
				BasicBlock *ClonedSucc = getClonedBB(NextCase, NextState, DuplicateMap);
				if (ClonedSucc) {
				BlocksToUpdate.push_back(ClonedSucc);
				}
				}
				// Otherwise update phis in all successors.
				else {
				for (BasicBlock *Succ : successors(BB)) {
				BlocksToUpdate.push_back(Succ);

				// Check if a successor has already been cloned for the particular exit
				// value. In this case if a successor was already cloned, the phi nodes
				// in the cloned block should be updated directly.
				BasicBlock *ClonedSucc = getClonedBB(Succ, NextState, DuplicateMap);
				if (ClonedSucc) {
				BlocksToUpdate.push_back(ClonedSucc);
				}
				}
				}

				// If there is a phi with an incoming value from BB, create a new incoming
				// value for the new predecessor ClonedBB. The value will either be the same
				// value from BB or a cloned value.
				for (BasicBlock *Succ : BlocksToUpdate) {
				for (auto II = Succ->begin(); PHINode *Phi = dyn_cast<PHINode>(II);
				++II) {
				Value *Incoming = Phi->getIncomingValueForBlock(BB);
				if (Incoming) {
				if (isa<Constant>(Incoming)) {
				Phi->addIncoming(Incoming, ClonedBB);
				continue;
				}
				Value *ClonedVal = VMap[Incoming];
				if (ClonedVal) {
				Phi->addIncoming(ClonedVal, ClonedBB);
				} else {
				Phi->addIncoming(Incoming, ClonedBB);
				}
				}
				}
				}
				}

				/// Sets the successor of PrevBB to be NewBB instead of OldBB. Note that all
				/// other successors are kept as well.
				void updatePredecessor(BasicBlock PrevBB, BasicBlock OldBB,
				BasicBlock NewBB, DomTreeUpdater DTU) {
				// When a path is reused, there is a chance that predecessors were already
				// updated before. Check if the predecessor needs to be updated first.
				if (!isPredecessor(OldBB, PrevBB))
				return;

				Instruction *PrevTerm = PrevBB->getTerminator();
				for (unsigned i = 0; i < PrevTerm->getNumSuccessors(); i++) {
				if (PrevTerm->getSuccessor(i) == OldBB) {
				OldBB->removePredecessor(PrevBB, /* KeepOneInputPHIs = */ true);
				PrevTerm->setSuccessor(i, NewBB);
				}
				}
				DTU->applyUpdates({{DominatorTree::Delete, PrevBB, OldBB},
				{DominatorTree::Insert, PrevBB, NewBB}});
				}

				/// Add new value mappings to the DefMap to keep track of all new definitions
				/// for a particular instruction. These will be used while updating SSA form.
				void updateDefMap(DefMap &NewDefs, ValueToValueMapTy &VMap) {
				for (auto Entry : VMap) {
				Instruction *Inst =
				dyn_cast<Instruction>(const_cast<Value *>(Entry.first));
				if (!Inst \|\| !Entry.second \|\| isa<BranchInst>(Inst) \|\|
				isa<SwitchInst>(Inst)) {
				continue;
				}

				Instruction *Cloned = dyn_cast<Instruction>(Entry.second);
				if (!Cloned)
				continue;

				if (NewDefs.find(Inst) == NewDefs.end())
				NewDefs[Inst] = {Cloned};
				else
				NewDefs[Inst].push_back(Cloned);
				}
				}

				/// Update the last branch of a particular cloned path to point to the correct
				/// case successor. Note that this is an optional step and would have been
				/// done in later optimizations, but it makes the CFG significantly easier to
				/// work with.
				void updateLastSuccessor(ThreadingPath &TPath,
				DuplicateBlockMap &DuplicateMap,
				DomTreeUpdater *DTU) {
				uint64_t NextState = TPath.getExitValue();
				BasicBlock *BB = TPath.getPath().back();
				BasicBlock *LastBlock = getClonedBB(BB, NextState, DuplicateMap);

				// Note multiple paths can end at the same block so check that it is not
				// updated yet
				if (!isa<SwitchInst>(LastBlock->getTerminator()))
				return;
				SwitchInst *Switch = cast<SwitchInst>(LastBlock->getTerminator());
				BasicBlock *NextCase = getNextCaseSuccessor(Switch, NextState);

				std::vector<DominatorTree::UpdateType> DTUpdates;
				SmallPtrSet<BasicBlock *, 4> SuccSet;
				for (BasicBlock *Succ : successors(LastBlock)) {
				if (Succ != NextCase && SuccSet.insert(Succ).second)
				DTUpdates.push_back({DominatorTree::Delete, LastBlock, Succ});
				}

				Switch->eraseFromParent();
				BranchInst::Create(NextCase, LastBlock);

				DTU->applyUpdates(DTUpdates);
				}

				/// After cloning blocks, some of the phi nodes have extra incoming values
				/// that are no longer used. This function removes them.
				void cleanPhiNodes(BasicBlock *BB) {
				// If BB is no longer reachable, remove any remaining phi nodes
				if (pred_empty(BB)) {
				std::vector<PHINode *> PhiToRemove;
				for (auto II = BB->begin(); PHINode *Phi = dyn_cast<PHINode>(II); ++II) {
				PhiToRemove.push_back(Phi);
				}
				for (PHINode *PN : PhiToRemove) {
				PN->replaceAllUsesWith(UndefValue::get(PN->getType()));
				PN->eraseFromParent();
				}
				return;
				}

				// Remove any incoming values that come from an invalid predecessor
				for (auto II = BB->begin(); PHINode *Phi = dyn_cast<PHINode>(II); ++II) {
				std::vector<BasicBlock *> BlocksToRemove;
				for (BasicBlock *IncomingBB : Phi->blocks()) {
				if (!isPredecessor(BB, IncomingBB))
				BlocksToRemove.push_back(IncomingBB);
				}
				for (BasicBlock *BB : BlocksToRemove) {
				Phi->removeIncomingValue(BB);
				}
				}
				}

				/// Checks if BB was already cloned for a particular next state value. If it
				/// was then it returns this cloned block, and otherwise null.
				BasicBlock getClonedBB(BasicBlock BB, uint64_t NextState,
				DuplicateBlockMap &DuplicateMap) {
				CloneList ClonedBBs = DuplicateMap[BB];

				// Find an entry in the CloneList with this NextState. If it exists then
				// return the corresponding BB
				auto It = llvm::find_if(ClonedBBs, [NextState](const ClonedBlock &C) {
				return C.State == NextState;
				});
				return It != ClonedBBs.end() ? (*It).BB : nullptr;
				}

				/// Helper to get the successor corresponding to a particular case value for
				/// a switch statement.
				BasicBlock getNextCaseSuccessor(SwitchInst Switch, uint64_t NextState) {
				BasicBlock *NextCase = nullptr;
				for (auto Case : Switch->cases()) {
				if (Case.getCaseValue()->getZExtValue() == NextState) {
				NextCase = Case.getCaseSuccessor();
				break;
				}
				}
				if (!NextCase) {
				NextCase = Switch->getDefaultDest();
				}
				return NextCase;
				}

				/// Returns true if IncomingBB is a predecessor of BB.
				bool isPredecessor(BasicBlock BB, BasicBlock IncomingBB) {
				return llvm::find(predecessors(BB), IncomingBB) != pred_end(BB);
				}

				AllSwitchPaths *SwitchPaths;
				DominatorTree *DT;
				std::list<ThreadingPath> TPaths;
				};

				bool DFAJumpThreading::runOnFunction(Function &F) {
				if (skipFunction(F) \|\| DisableThis)
				return false;

				LLVM_DEBUG(dbgs() << "\nDFA Jump threading: " << F.getName() << "\n");

				DominatorTree *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();

				std::list<AllSwitchPaths> ThreadableLoops;
				bool MadeChanges = false;

				for (Function::iterator B = F.begin(), BE = F.end(); B != BE; B++) {
				if (auto *SI = dyn_cast<SwitchInst>(B->getTerminator())) {

				LLVM_DEBUG(dbgs() << "\nCheck if SwitchInst in BB " << B->getName()
				<< " is predictable\n");
				MainSwitch Switch(SI);

				if (!Switch.getInstr())
				continue;

				LLVM_DEBUG(dbgs() << "\nSwitchInst in BB " << B->getName() << " is a "
				<< "candidate for jump threading\n");
				LLVM_DEBUG(SI->dump());

				unfoldSelectInstrs(DT, Switch.getSelectInsts());
				if (!Switch.getSelectInsts().empty())
				MadeChanges = true;

				AllSwitchPaths SwitchPaths(&Switch);
				SwitchPaths.run();

				if (SwitchPaths.getNumThreadingPaths() > 0) {
				ThreadableLoops.push_back(SwitchPaths);

				// For the time being limit this optimization to occurring once in a
				// function since it can change the CFG significantly. This is not a
				// strict requirement but it can cause buggy behavior if there is an
				// overlap of blocks in different opportunities. There is a lot of room
				// to experiment with catching more opportunities here.
				break;
				}
				}
				}

				for (AllSwitchPaths SwitchPaths : ThreadableLoops) {
				TransformFSM Transform(&SwitchPaths, DT);
				Transform.run();
				MadeChanges = true;
				}

				if (IsViewCFG) {
				F.viewCFG();
				}

				#ifdef EXPENSIVE_CHECKS
				assert(DT->verify(DominatorTree::VerificationLevel::Full));
				verifyFunction(F, &dbgs());
				#endif

				return MadeChanges;
				}
				} // end anonymous namespace

llvm/lib/Target/AArch64/AArch64TargetMachine.cpp

//===-- AArch64TargetMachine.cpp - Define TargetMachine for AArch64 -------===//		//===-- AArch64TargetMachine.cpp - Define TargetMachine for AArch64 -------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
		efriedmaUnsubmitted Done Reply Inline Actions This is kind of late to be running this pass. That might be useful in some cases: we might be making the code harder to analyze if there's irreducible control flow. On the other hand, we're skipping interesting optimizations if the CFG becomes significantly simpler. My instinct is that the advantages of running earlier outweigh the disadvantages. efriedma: This is kind of late to be running this pass. That might be useful in some cases: we might be…
		jkreinerUnsubmitted Done Reply Inline Actions I agree that it may be beneficial to run the pass earlier. I'm not aware yet of which optimizations it could interfere with. Do you have any suggestions of where to run it instead? jkreiner: I agree that it may be beneficial to run the pass earlier. I'm not aware yet of which…
		dnsampaioUnsubmitted Done Reply Inline Actions I would suggest trying just before llvm's original jump-threading. For the proposed implemented here in D88307, it actually produced code that llvm's original jump-threading would further optimize, where the other way around, the original jump-threading would just not do anything. I'm running over llvm-12, so it may take me some time to test this patch. dnsampaio: I would suggest trying just before llvm's original jump-threading. For the proposed implemented…
		xbolva00Unsubmitted Done Reply Inline Actions and part of standard opt 2+ pipeline? why aarch64 specific? xbolva00: and part of standard opt 2+ pipeline? why aarch64 specific?
		jkreinerUnsubmitted Done Reply Inline Actions Thank you for the suggestion, I'll try it out before the original jump-threading then. I only had it there since I was testing it on AArch64, but it could be moved to the general opt pipeline. jkreiner: Thank you for the suggestion, I'll try it out before the original jump-threading then. I only…
		jkreinerUnsubmitted Done Reply Inline Actions This seems to be a better location for the pass. I've measured a performance gain 5% greater than before for Coremark by placing it before the original jump-threading. jkreiner: This seems to be a better location for the pass. I've measured a performance gain 5% greater…
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AArch64TargetMachine.h"		#include "AArch64TargetMachine.h"
#include "AArch64.h"		#include "AArch64.h"
▲ Show 20 Lines • Show All 437 Lines • ▼ Show 20 Lines	std::unique_ptr<CSEConfigBase> AArch64PassConfig::getCSEConfig() const {
return getStandardCSEConfigForOpt(TM->getOptLevel());		return getStandardCSEConfigForOpt(TM->getOptLevel());
}		}

void AArch64PassConfig::addIRPasses() {		void AArch64PassConfig::addIRPasses() {
// Always expand atomic operations, we don't deal with atomicrmw or cmpxchg		// Always expand atomic operations, we don't deal with atomicrmw or cmpxchg
// ourselves.		// ourselves.
addPass(createAtomicExpandPass());		addPass(createAtomicExpandPass());

		if (TM->getOptLevel() >= CodeGenOpt::Default)
		addPass(createDFAJumpThreadingPass(TM));

// Expand any SVE vector library calls that we can't code generate directly.		// Expand any SVE vector library calls that we can't code generate directly.
if (EnableSVEIntrinsicOpts && TM->getOptLevel() == CodeGenOpt::Aggressive)		if (EnableSVEIntrinsicOpts && TM->getOptLevel() == CodeGenOpt::Aggressive)
addPass(createSVEIntrinsicOptsPass());		addPass(createSVEIntrinsicOptsPass());

// Cmpxchg instructions are often used with a subsequent comparison to		// Cmpxchg instructions are often used with a subsequent comparison to
// determine whether it succeeded. We can exploit existing control-flow in		// determine whether it succeeded. We can exploit existing control-flow in
// ldrex/strex loops to simplify this, but it needs tidying up.		// ldrex/strex loops to simplify this, but it needs tidying up.
if (TM->getOptLevel() != CodeGenOpt::None && EnableAtomicTidy)		if (TM->getOptLevel() != CodeGenOpt::None && EnableAtomicTidy)
▲ Show 20 Lines • Show All 260 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/O3-pipeline.ll

	Show All 12 Lines
	; CHECK-NEXT: Type-Based Alias Analysis			; CHECK-NEXT: Type-Based Alias Analysis
	; CHECK-NEXT: Scoped NoAlias Alias Analysis			; CHECK-NEXT: Scoped NoAlias Alias Analysis
	; CHECK-NEXT: Create Garbage Collector Module Metadata			; CHECK-NEXT: Create Garbage Collector Module Metadata
	; CHECK-NEXT: Machine Branch Probability Analysis			; CHECK-NEXT: Machine Branch Probability Analysis
	; CHECK-NEXT: ModulePass Manager			; CHECK-NEXT: ModulePass Manager
	; CHECK-NEXT: Pre-ISel Intrinsic Lowering			; CHECK-NEXT: Pre-ISel Intrinsic Lowering
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Expand Atomic instructions			; CHECK-NEXT: Expand Atomic instructions
				; CHECK-NEXT: Dominator Tree Construction
				; CHECK-NEXT: DFA Jump Threading
	; CHECK-NEXT: SVE intrinsics optimizations			; CHECK-NEXT: SVE intrinsics optimizations
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Simplify the CFG			; CHECK-NEXT: Simplify the CFG
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	▲ Show 20 Lines • Show All 198 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/dfa-constant-propagation.ll

This file was added.

				; RUN: opt -S -dfa-jump-threading -sccp -simplifycfg %s \| FileCheck %s

				efriedmaUnsubmitted Done Reply Inline Actions Please put the tests in llvm/test/Transforms/DFAJumpThreading. efriedma: Please put the tests in llvm/test/Transforms/DFAJumpThreading.
				; CHECK: define i32 @test
				; CHECK-NEXT: entry:
				; CHECK-NEXT: ret i32 3

				; This test checks that a constant propagation is applied for a basic loop.
				; Related to bug 44679.
				define i32 @test(i32 %a) {
				entry:
				br label %while.cond

				while.cond:
				%num = phi i32 [ 0, %entry ], [ %add, %case1 ]
				%state = phi i32 [ 1, %entry ], [ %state.next, %case1 ]
				switch i32 %state, label %end [
				i32 1, label %case1
				i32 2, label %case2
				]

				case1:
				%state.next = phi i32 [ 3, %case2 ], [ 2, %while.cond ]
				%add = add nsw i32 %num, %state
				br label %while.cond

				case2:
				br label %case1

				end:
				ret i32 %num
				}
				No newline at end of file

llvm/test/CodeGen/AArch64/dfa-jump-threading-analysis.ll

This file was added.

				; REQUIRES: asserts
				; RUN: opt -S -dfa-jump-threading -debug-only=dfa-jump-threading -disable-output %s 2>&1 \| FileCheck %s

				; This test checks that the analysis identifies all threadable paths in a
				; simple CFG. A threadable path includes a list of basic blocks, the entry and
				; exit states, and the block that determines the next state.
				; < path of BBs that form a cycle > [ entry, exit, determinator ]
				define i32 @test1(i32 %num) {
				; CHECK: < for.body case1 for.inc > [ 1, 2, for.inc ]
				; CHECK-NEXT: < for.body case2 for.inc > [ 2, 1, for.inc ]
				; CHECK-NEXT: < for.body case2 si.unfold.false for.inc > [ 2, 2, for.inc ]
				entry:
				br label %for.body

				for.body:
				%count = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
				%state = phi i32 [ 1, %entry ], [ %state.next, %for.inc ]
				switch i32 %state, label %for.inc [
				i32 1, label %case1
				i32 2, label %case2
				]

				case1:
				br label %for.inc

				case2:
				%cmp = icmp eq i32 %count, 50
				%sel = select i1 %cmp, i32 1, i32 2
				br label %for.inc

				for.inc:
				%state.next = phi i32 [ %sel, %case2 ], [ 1, %for.body ], [ 2, %case1 ]
				%inc = add nsw i32 %count, 1
				%cmp.exit = icmp slt i32 %inc, %num
				br i1 %cmp.exit, label %for.body, label %for.end

				for.end:
				ret i32 0
				}

				; This test checks that the analysis finds threadable paths in a more
				; complicated CFG. Here the FSM is represented as a nested loop, with
				; fallthrough cases.
				define i32 @test2(i32 %init) {
				; CHECK: < loop.3 case2 > [ 2, 3, loop.3 ]
				; CHECK-NEXT: < loop.3 case2 loop.1.backedge loop.1 loop.2 > [ 2, 1, loop.1 ]
				; CHECK-NEXT: < loop.3 case2 loop.1.backedge si.unfold.false1 loop.1 loop.2 > [ 2, 4, loop.1.backedge ]
				; CHECK-NEXT: < loop.3 case3 loop.2.backedge loop.2 > [ 3, 0, loop.2.backedge ]
				; CHECK-NEXT: < loop.3 case3 case4 loop.2.backedge loop.2 > [ 3, 3, loop.2.backedge ]
				; CHECK-NEXT: < loop.3 case3 case4 loop.1.backedge loop.1 loop.2 > [ 3, 1, loop.1 ]
				; CHECK-NEXT: < loop.3 case3 case4 loop.1.backedge si.unfold.false1 loop.1 loop.2 > [ 3, 2, loop.1.backedge ]
				; CHECK-NEXT: < loop.3 case4 loop.2.backedge loop.2 > [ 4, 3, loop.2.backedge ]
				; CHECK-NEXT: < loop.3 case4 loop.1.backedge loop.1 loop.2 > [ 4, 1, loop.1 ]
				; CHECK-NEXT: < loop.3 case4 loop.1.backedge si.unfold.false1 loop.1 loop.2 > [ 4, 2, loop.1.backedge ]
				entry:
				%cmp = icmp eq i32 %init, 0
				%sel = select i1 %cmp, i32 0, i32 2
				br label %loop.1

				loop.1:
				%state.1 = phi i32 [ %sel, %entry ], [ %state.1.be2, %loop.1.backedge ]
				br label %loop.2

				loop.2:
				%state.2 = phi i32 [ %state.1, %loop.1 ], [ %state.2.be, %loop.2.backedge ]
				br label %loop.3

				loop.3:
				%state = phi i32 [ %state.2, %loop.2 ], [ 3, %case2 ]
				switch i32 %state, label %infloop.i [
				i32 2, label %case2
				i32 3, label %case3
				i32 4, label %case4
				i32 0, label %case0
				i32 1, label %case1
				]

				case2:
				br i1 %cmp, label %loop.3, label %loop.1.backedge

				case3:
				br i1 %cmp, label %loop.2.backedge, label %case4

				case4:
				br i1 %cmp, label %loop.2.backedge, label %loop.1.backedge

				loop.1.backedge:
				%state.1.be = phi i32 [ 2, %case4 ], [ 4, %case2 ]
				%state.1.be2 = select i1 %cmp, i32 1, i32 %state.1.be
				br label %loop.1

				loop.2.backedge:
				%state.2.be = phi i32 [ 3, %case4 ], [ 0, %case3 ]
				br label %loop.2

				case0:
				br label %exit

				case1:
				br label %exit

				infloop.i:
				br label %infloop.i

				exit:
				ret i32 0
				}

llvm/test/CodeGen/AArch64/dfa-jump-threading-transform.ll

This file was added.

				; RUN: opt -S -dfa-jump-threading %s \| FileCheck %s

				; These tests check that the DFA jump threading transformation is applied
				; properly to two CFGs. It checks that blocks are cloned, branches are updated,
				; and SSA form is restored.
				define i32 @test1(i32 %num) {
				entry:
				br label %for.body

				for.body:
				%count = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
				%state = phi i32 [ 1, %entry ], [ %state.next, %for.inc ]
				switch i32 %state, label %for.inc [
				i32 1, label %case1
				i32 2, label %case2
				]

				; CHECK: for.body.jt1:
				; CHECK-NEXT: %count.jt1
				; CHECK-NEXT: %state.jt1
				; CHECK-NEXT: br label %case1

				; CHECK: for.body.jt2:
				; CHECK-NEXT: %count.jt2
				; CHECK-NEXT: %state.jt2
				; CHECK-NEXT: br label %case2

				case1:
				br label %for.inc

				; CHECK: case1:
				; CHECK-NEXT: %count1 = phi i32 [ %count.jt1, %for.body.jt1 ], [ %count, %for.body ]

				case2:
				%cmp = icmp eq i32 %count, 50
				%sel = select i1 %cmp, i32 1, i32 2
				br label %for.inc

				; CHECK: case2:
				; CHECK-NEXT: %count2 = phi i32 [ %count.jt2, %for.body.jt2 ], [ %count, %for.body ]
				; CHECK: br i1 %cmp, label %for.inc.jt1, label %si.unfold.false

				; CHECK: si.unfold.false:
				; CHECK-NEXT: br label %for.inc.jt2

				for.inc:
				%state.next = phi i32 [ %sel, %case2 ], [ 1, %for.body ], [ 2, %case1 ]
				%inc = add nsw i32 %count, 1
				%cmp.exit = icmp slt i32 %inc, %num
				br i1 %cmp.exit, label %for.body, label %for.end

				; CHECK: for.inc.jt1:
				; CHECK: br i1 %cmp.exit.jt1, label %for.body.jt1, label %for.end

				; CHECK: for.inc.jt2:
				; CHECK-NEXT: %count3 = phi i32 [ %count2, %si.unfold.false ], [ %count1, %case1 ]
				; CHECK: br i1 %cmp.exit.jt2, label %for.body.jt2, label %for.end

				for.end:
				ret i32 0
				}


				define i32 @test2(i32 %init) {
				entry:
				%cmp = icmp eq i32 %init, 0
				%sel = select i1 %cmp, i32 0, i32 2
				br label %loop.1

				; CHECK: si.unfold.false:
				; CHECK-NEXT: br label %loop.1
				; CHECK: si.unfold.false1:
				; CHECK-NEXT: br label %loop.1
				; CHECK: si.unfold.false1.jt2:
				; CHECK-NEXT: br label %loop.1.jt2
				; CHECK: si.unfold.false1.jt4:
				; CHECK-NEXT: br label %loop.1.jt4

				loop.1:
				%state.1 = phi i32 [ %sel, %entry ], [ %state.1.be2, %loop.1.backedge ]
				br label %loop.2

				; CHECK: loop.1.jt2:
				; CHECK :br label %loop.2.jt2
				; CHECK: loop.1.jt4:
				; CHECK: br label %loop.2.jt4
				; CHECK: loop.1.jt1:
				; CHECK: br label %loop.2.jt1

				loop.2:
				%state.2 = phi i32 [ %state.1, %loop.1 ], [ %state.2.be, %loop.2.backedge ]
				br label %loop.3

				; CHECK: loop.2.jt2:
				; CHECK: br label %loop.3.jt2
				; CHECK: loop.2.jt3:
				; CHECK: br label %loop.3.jt3
				; CHECK: loop.2.jt0:
				; CHECK: br label %loop.3.jt0
				; CHECK: loop.2.jt4:
				; CHECK: br label %loop.3.jt4
				; CHECK: loop.2.jt1:
				; CHECK: br label %loop.3.jt1

				loop.3:
				%state = phi i32 [ %state.2, %loop.2 ], [ 3, %case2 ]
				switch i32 %state, label %infloop.i [
				i32 2, label %case2
				i32 3, label %case3
				i32 4, label %case4
				i32 0, label %case0
				i32 1, label %case1
				]

				; CHECK: loop.3.jt2:
				; CHECK: br label %case2
				; CHECK: loop.3.jt0:
				; CHECK: br label %case0
				; CHECK: loop.3.jt4:
				; CHECK: br label %case4
				; CHECK: loop.3.jt1:
				; CHECK: br label %case1
				; CHECK: loop.3.jt3:
				; CHECK: br label %case3

				case2:
				br i1 %cmp, label %loop.3, label %loop.1.backedge

				; CHECK: case2:
				; CHECK-NEXT: br i1 %cmp, label %loop.3.jt3, label %loop.1.backedge.jt4

				case3:
				br i1 %cmp, label %loop.2.backedge, label %case4

				; CHECK: case3:
				; CHECK-NEXT: br i1 %cmp, label %loop.2.backedge.jt0, label %case4

				case4:
				br i1 %cmp, label %loop.2.backedge, label %loop.1.backedge

				; CHECK: case4:
				; CHECK-NEXT: br i1 %cmp, label %loop.2.backedge.jt3, label %loop.1.backedge.jt2

				loop.1.backedge:
				%state.1.be = phi i32 [ 2, %case4 ], [ 4, %case2 ]
				%state.1.be2 = select i1 %cmp, i32 1, i32 %state.1.be
				br label %loop.1

				; CHECK: loop.1.backedge.jt2:
				; CHECK: br i1 %cmp, label %loop.1.jt1, label %si.unfold.false1.jt2
				; CHECK: loop.1.backedge.jt4:
				; CHECK: br i1 %cmp, label %loop.1.jt1, label %si.unfold.false1.jt4

				loop.2.backedge:
				%state.2.be = phi i32 [ 3, %case4 ], [ 0, %case3 ]
				br label %loop.2

				; CHECK: loop.2.backedge.jt3:
				; CHECK: br label %loop.2.jt3
				; CHECK: loop.2.backedge.jt0:
				; CHECK: br label %loop.2.jt0

				case0:
				br label %exit

				case1:
				br label %exit

				infloop.i:
				br label %infloop.i

				exit:
				ret i32 0
				}

llvm/test/CodeGen/AArch64/dfa-unfold-select.ll

This file was added.

				; RUN: opt -S -dfa-jump-threading %s \| FileCheck %s

				; These tests check if selects are unfolded properly for jump threading
				; opportunities. There are three different patterns to consider:
				; 1) Both operands are constant and the false branch is unfolded by default
				; 2) One operand is constant and the other is another select to be unfolded. In
				; this case a single select is sunk to a new block to unfold.
				; 3) Both operands are a select, and both should be sunk to new blocks.
				define i32 @test1(i32 %num) {
				entry:
				br label %for.body

				for.body:
				%count = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
				%state = phi i32 [ 1, %entry ], [ %state.next, %for.inc ]
				switch i32 %state, label %for.inc [
				i32 1, label %case1
				i32 2, label %case2
				]

				case1:
				br label %for.inc

				case2:
				; CHECK: %cmp = icmp
				; CHECK-NOT: %sel = select i1 %cmp, i32 1, i32 2
				; CHECK-NEXT: br i1 %cmp, label %for.inc
				; CHECK: si.unfold.false
				%cmp = icmp slt i32 %count, 50
				%sel = select i1 %cmp, i32 1, i32 2
				br label %for.inc

				; CHECK: si.unfold.false:
				; CHECK-NEXT: br label %for.inc

				for.inc:
				; CHECK: %state.next = phi
				; CHECK-NOT: [ %sel, %case2 ]
				%state.next = phi i32 [ %sel, %case2 ], [ 1, %for.body ], [ 2, %case1 ]
				%inc = add nsw i32 %count, 1
				%cmp.exit = icmp slt i32 %inc, %num
				br i1 %cmp.exit, label %for.body, label %for.end

				for.end:
				ret i32 0
				}

				define i32 @test2(i32 %num) {
				entry:
				br label %for.body

				for.body:
				%count = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
				%state = phi i32 [ 1, %entry ], [ %state.next, %for.inc ]
				switch i32 %state, label %for.inc [
				i32 1, label %case1
				i32 2, label %case2
				]

				case1:
				; CHECK: %cmp.c1 = icmp
				; CHECK: %cmp2.c1 = icmp
				; CHECK-NOT: select i1
				; CHECK-NEXT: br i1 %cmp2.c1
				%cmp.c1 = icmp slt i32 %count, 50
				%cmp2.c1 = icmp slt i32 %count, 100
				%state1.1 = select i1 %cmp.c1, i32 1, i32 2
				%state1.2 = select i1 %cmp2.c1, i32 %state1.1, i32 3
				br label %for.inc

				case2:
				; CHECK: %cmp.c2 = icmp
				; CHECK: %cmp2.c2 = icmp
				; CHECK-NOT: select i1
				; CHECK-NEXT: br i1 %cmp2.c2
				%cmp.c2 = icmp slt i32 %count, 50
				%cmp2.c2 = icmp sgt i32 %count, 100
				%state2.1 = select i1 %cmp.c2, i32 1, i32 2
				%state2.2 = select i1 %cmp2.c2, i32 3, i32 %state2.1
				br label %for.inc

				; CHECK: si.unfold.true:
				; CHECK-NEXT: br i1 %cmp.c1
				; CHECK: si.unfold.false:
				; CHECK-NEXT: br i1 %cmp.c2
				; CHECK: si.unfold.false1:
				; CHECK-NEXT: br label %for.inc
				; CHECK: si.unfold.false2:
				; CHECK-NEXT: br label %for.inc

				for.inc:
				; CHECK: %state.next = phi
				; CHECK-NOT: [ %state1.2, %case1 ]
				; CHECK-NOT: [ %state2.2, %case2 ]
				%state.next = phi i32 [ %state1.2, %case1 ], [ %state2.2, %case2 ], [ 1, %for.body ]
				%inc = add nsw i32 %count, 1
				%cmp.exit = icmp slt i32 %inc, %num
				br i1 %cmp.exit, label %for.body, label %for.end

				for.end:
				ret i32 0
				}

				define i32 @test3(i32 %num) {
				entry:
				br label %for.body

				for.body:
				%count = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
				%state = phi i32 [ 1, %entry ], [ %state.next, %for.inc ]
				switch i32 %state, label %for.inc [
				i32 1, label %case1
				i32 2, label %case2
				]

				case1:
				br label %for.inc

				case2:
				; CHECK: %cmp.3 = icmp eq i32 %0, 0
				; CHECK-NOT: select i1
				; CHECK-NEXT: br i1 %cmp.3, label %si.unfold.true, label %si.unfold.false
				%cmp.1 = icmp slt i32 %count, 50
				%cmp.2 = icmp slt i32 %count, 100
				%0 = and i32 %count, 1
				%cmp.3 = icmp eq i32 %0, 0
				%sel.1 = select i1 %cmp.1, i32 1, i32 2
				%sel.2 = select i1 %cmp.2, i32 3, i32 4
				%sel.3 = select i1 %cmp.3, i32 %sel.1, i32 %sel.2
				br label %for.inc

				; CHECK: si.unfold.true:
				; CHECK-NEXT: br i1 %cmp.1
				; CHECK: si.unfold.false:
				; CHECK-NEXT: br i1 %cmp.2
				; CHECK: si.unfold.false1:
				; CHECK-NEXT: br label %for.inc
				; CHECK: si.unfold.false2:
				; CHECK-NEXT: br label %for.inc

				for.inc:
				; CHECK: %state.next = phi
				; CHECK-NOT: %case2
				%state.next = phi i32 [ %sel.3, %case2 ], [ 1, %for.body ], [ 2, %case1 ]
				%inc = add nsw i32 %count, 1
				%cmp.exit = icmp slt i32 %inc, %num
				br i1 %cmp.exit, label %for.body, label %for.end

				for.end:
				ret i32 0
				}

llvm/tools/opt/opt.cpp

Show First 20 Lines • Show All 515 Lines • ▼ Show 20 Lines	std::vector<StringRef> PassNameExact = {
"atomic-expand", "expandvp",		"atomic-expand", "expandvp",
"hardware-loops", "type-promotion",		"hardware-loops", "type-promotion",
"mve-tail-predication", "interleaved-access",		"mve-tail-predication", "interleaved-access",
"global-merge", "pre-isel-intrinsic-lowering",		"global-merge", "pre-isel-intrinsic-lowering",
"expand-reductions", "indirectbr-expand",		"expand-reductions", "indirectbr-expand",
"generic-to-nvvm", "expandmemcmp",		"generic-to-nvvm", "expandmemcmp",
"loop-reduce", "lower-amx-type",		"loop-reduce", "lower-amx-type",
"lower-amx-intrinsics", "polyhedral-info",		"lower-amx-intrinsics", "polyhedral-info",
"replace-with-veclib"};		"replace-with-veclib", "dfa-jump-threading"};
for (const auto &P : PassNamePrefix)		for (const auto &P : PassNamePrefix)
if (Pass.startswith(P))		if (Pass.startswith(P))
return true;		return true;
for (const auto &P : PassNameContain)		for (const auto &P : PassNameContain)
if (Pass.contains(P))		if (Pass.contains(P))
return true;		return true;
return llvm::is_contained(PassNameExact, Pass);		return llvm::is_contained(PassNameExact, Pass);
}		}
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	int main(int argc, char **argv) {
// For codegen passes, only passes that do IR to IR transformation are		// For codegen passes, only passes that do IR to IR transformation are
// supported.		// supported.
initializeExpandMemCmpPassPass(Registry);		initializeExpandMemCmpPassPass(Registry);
initializeScalarizeMaskedMemIntrinLegacyPassPass(Registry);		initializeScalarizeMaskedMemIntrinLegacyPassPass(Registry);
initializeCodeGenPreparePass(Registry);		initializeCodeGenPreparePass(Registry);
initializeAtomicExpandPass(Registry);		initializeAtomicExpandPass(Registry);
initializeRewriteSymbolsLegacyPassPass(Registry);		initializeRewriteSymbolsLegacyPassPass(Registry);
initializeWinEHPreparePass(Registry);		initializeWinEHPreparePass(Registry);
		initializeDFAJumpThreadingPass(Registry);
initializeDwarfEHPrepareLegacyPassPass(Registry);		initializeDwarfEHPrepareLegacyPassPass(Registry);
initializeSafeStackLegacyPassPass(Registry);		initializeSafeStackLegacyPassPass(Registry);
initializeSjLjEHPreparePass(Registry);		initializeSjLjEHPreparePass(Registry);
initializePreISelIntrinsicLoweringLegacyPassPass(Registry);		initializePreISelIntrinsicLoweringLegacyPassPass(Registry);
initializeGlobalMergePass(Registry);		initializeGlobalMergePass(Registry);
initializeIndirectBrExpandPassPass(Registry);		initializeIndirectBrExpandPassPass(Registry);
initializeInterleavedLoadCombinePass(Registry);		initializeInterleavedLoadCombinePass(Registry);
initializeInterleavedAccessPass(Registry);		initializeInterleavedAccessPass(Registry);
▲ Show 20 Lines • Show All 526 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Add jump-threading optimization for deterministic finite automataClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 342832

llvm/include/llvm/CodeGen/Passes.h

llvm/include/llvm/InitializePasses.h

llvm/include/llvm/LinkAllPasses.h

llvm/lib/CodeGen/CMakeLists.txt

llvm/lib/CodeGen/CodeGen.cpp

llvm/lib/CodeGen/DFAJumpThreading.cpp

llvm/lib/Target/AArch64/AArch64TargetMachine.cpp

llvm/test/CodeGen/AArch64/O3-pipeline.ll

llvm/test/CodeGen/AArch64/dfa-constant-propagation.ll

llvm/test/CodeGen/AArch64/dfa-jump-threading-analysis.ll

llvm/test/CodeGen/AArch64/dfa-jump-threading-transform.ll

llvm/test/CodeGen/AArch64/dfa-unfold-select.ll

llvm/tools/opt/opt.cpp

Add jump-threading optimization for deterministic finite automata
ClosedPublic