This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/polly/
-
polly/
-
CodeGen/
12
BlockGenerators.h
6
IslNodeBuilder.h
-
Support/
5
ScopHelper.h
-
lib/
-
Analysis/
2
ScopDetection.cpp
-
CodeGen/
24
BlockGenerators.cpp
-
CodeGeneration.cpp
13
IslNodeBuilder.cpp
-
Support/
2
RegisterPasses.cpp
10
ScopHelper.cpp
-
test/
-
Isl/CodeGen/
-
CodeGen/
-
MemAccess/
-
update_access_functions.ll
-
OpenMP/
-
invariant_base_pointer_preloaded_different_bb.ll
-
single_loop_with_param.ll
-
entry_with_trivial_phi_other_bb.ll
-
invariant_load_escaping.ll
-
invariant_load_scalar_escape_alloca_sharing.ll
-
large-numbers-in-boundary-context.ll
-
non-affine-dominance-generated-entering.ll
-
non-affine-exit-node-dominance.ll
-
non-affine-phi-node-expansion-2.ll
-
non-affine-phi-node-expansion-3.ll
-
non-affine-phi-node-expansion-4.ll
-
non-affine-region-exit-phi-incoming-synthesize.ll
-
non-affine-region-implicit-store.ll
-
non-affine-synthesized-in-branch.ll
-
non_affine_float_compare.ll
-
out-of-scop-phi-node-use.ll
-
phi-defined-before-scop.ll
-
phi-in-non-affine-subregion-entry.ll
-
phi_condition_modeling_1.ll
-
phi_condition_modeling_2.ll
-
phi_conditional_simple_1.ll
-
phi_in_exit_early_lnt_failure_2.ll
-
phi_in_exit_early_lnt_failure_3.ll
-
phi_in_exit_early_lnt_failure_5.ll
-
phi_loop_carried_float.ll
-
phi_loop_carried_float_2.ll
-
phi_loop_carried_float_3.ll
-
phi_loop_carried_float_4.ll
1
phi_loop_carried_float_5.ll
-
phi_loop_carried_float_6.ll
-
phi_loop_carried_float_escape.ll
1
phi_scalar_simple_1.ll
1
phi_scalar_simple_2.ll
-
phi_with_multi_exiting_edges_2.ll
-
phi_with_one_exit_edge.ll
-
pr25241.ll
-
read-only-scalars.ll
-
scalar-dependence-reverse-text-order-two-uses.ll
-
scalar-dependence-reverse-text-order.ll
1
scalar-store-from-same-bb.ll
-
sdrto___%loopA---%exit.jscop
-
sdrtotu___%loopA---%exit.jscop
1
simple_vec_call.ll
1
simple_vec_stride_one.ll
1
srem-in-other-bb.ll
-
uninitialized_scalar_memory.ll
-
ScopInfo/
-
invariant_load_access_classes_different_base_type.ll
-
invariant_load_access_classes_different_base_type_escaping.ll
-
invariant_load_access_classes_different_base_type_same_pointer.ll
-
invariant_load_access_classes_different_base_type_same_pointer_escaping.ll
-
invariant_load_zext_parameter.ll
1
out-of-scop-use-in-region-entry-phi-node-nonaffine-subregion.ll

Differential D15722

[WIP][Polly] SSA Codegen
Needs ReviewPublic

Authored by jdoerfert on Dec 22 2015, 11:34 AM.

Download Raw Diff

Details

Reviewers

Meinersbur
• zinob
hiraditya

Summary

TODO

Performance-Regressions-Compile-Time : Δ : Previous : Current : σ
S/B/P/l/k/3mm/3mm : 15.28% : 0.8641 : 0.9961 : 0.0048

Performance-Improvements-Compile-Time : Δ : Previous : Current : σ
S/R/C/matrixTranspose : -17.39% : 0.0920 : 0.0760 : 0.0061
M/B/A/C/CrystalMk : -6.69% : 1.0760 : 1.0040 : 0.0141
S/B/P/l/k/cholesky/cholesky : -5.95% : 0.3360 : 0.3160 : 0.0056

Performance-Improvements-Execution-Time : Δ : Previous : Current : σ
M/B/F/m/mason : -5.56% : 0.2160 : 0.2040 : 0.0037
M/B/A/C/CrystalMk : -2.56% : 7.6645 : 7.4685 : 0.0031
M/A/v/viterbi : -2.02% : 2.7722 : 2.7162 : 0.0060

Resolve these errors:
- Error parsing field "Subscribers": The objects you have listed include objects which do not exist (-------------------------------------------------------------------------------, Performance-Regressions-Compile-Time, :, Δ, Previous, Current, σ, S/B/P/l/k/3mm/3mm, 15.28%, 0.8641, 0.9961, 0.0048, Performance-Improvements-Compile-Time, S/R/C/matrixTranspose, -17.39%, 0.0920, 0.0760, 0.0061, M/B/A/C/CrystalMk, -6.69%, 1.0760, 1.0040, 0.0141, S/B/P/l/k/cholesky/cholesky, -5.95%, 0.3360, 0.3160, 0.0056, Performance-Improvements-Execution-Time, M/B/F/m/mason, -5.56%, 0.2160, 0.2040, 0.0037, -2.56%, 7.6645, 7.4685, 0.0031, M/A/v/viterbi, -2.02%, 2.7722, 2.7162, 0.0060).

Diff Detail

Event Timeline

jdoerfert updated this revision to Diff 43467.Dec 22 2015, 11:34 AM

jdoerfert retitled this revision from to [WIP][Polly] SSA Codegen.

jdoerfert added reviewers: grosser, Meinersbur.

jdoerfert updated this object.

jdoerfert added a subscriber: Restricted Project.

Allow changes in textual order for scalar dependences + 2 test cases

Actually add test cases and remove left-over debug messages

Fix last lnt compile time errors, running performance numbers on clean build now

The idea behind this code generation is that we have only 3 distinct points were we need to insert PHI nodes in order to merge (possibly) different definitions of a scalar value.

After a conditional.
After a loop when it was guarded by a conditional.
In the loop header.

To be able to place these PHI nodes we keep scalar mappings that might be needed after a statement. Hence, we copy the mappings from the BBMap to the current ScalarMap. These ScalarMaps are stacked according to the nesting of the AST. Whenever we hit a merge point we will check if the two incoming ScalarMaps (e.g., from the different conditional branches, or the mappings prior and after the loop) to see if a scalar was mapped and if so mapped to different values. In such a case we place a PHI. Howerver, in loops we need the PHI even before we finish building the loop. To this end, scalars that are used but not defined in a loop, or originally loop carried scalars are mapped to a new loop carried scalar in the optimized code. To make the scalars usable in a statement we use the current ScalarMap to initialize the BBMap and reuse the existing lookup logic to find a suitable mapping for a needed scalar.

lib/CodeGen/BlockGenerators.cpp
159	Debug left-over

Hi Johannes,

I did not get along to go through this in detail, but wanted to provide some first feedback.

First, thanks a lot for looking into ways to make Polly generate good scalar code without further preprocessing!

Here some first comments on the general approach:

(you already know about this) I am concerned that this approach might be more complex and/or may have more corner cases we need to worry about. Aditya mentioned that in their implementation in graphite, they fail to generate code in various situations. My understanding is that you are not aware of any such limitations in your code and you already checked several of the examples Aditya provided. It would probably be good for Aditya and Sebastian to have a look at this patch, as they have experience with SSA code generation and might be able to

provide you with some test cases.

It is interesting to see that this patch actually removes a lot of code. So maybe I am wrong about point 1)

An alternative approach might be to just call PromoteMemToRegisters on the scalar allocs we introduced. This would allow us to keep the same mental model, but to still generate good code.

Some questions I have before i get into reviewing this code:

a) Does your patch need to make any effort to eliminate dead PHI nodes? Will we have a lot of PHI nodes that might need to be dead-code eliminated after your patch? More precisely, would we now need to run a phase of DCE after Polly instead of Mem2Reg.

b) Are there any remaining issues or inherent limitations you are aware in comparison to the old approach?

c) I know you are still running performance numbers, which will be interesting to look at. Otherwise, is this patch ready to be reviewed.

Sebastian, Aditya (and ZIno as well),

this is Johannes' new SSA based code generation. You might have useful input on this change. This change is especially important as it affects larger parts of the back-end code generation.

Performance Improvements - Compile Time Δ Previous Current σ
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/cholesky/cholesky -4.40% 0.3640 0.3480 0.0047
MultiSource/Benchmarks/Prolangs-C/football/football -1.26% 1.9161 1.8920 0.0089

In D15722#334871, @grosser wrote:

(you already know about this) I am concerned that this approach might be more complex and/or may have more corner cases we need to worry about. Aditya mentioned that in their implementation in graphite, they fail to generate code in various situations. My understanding is that you are not aware of any such limitations in your code and you already checked several of the examples Aditya provided. It would probably be good for Aditya and Sebastian to have a look at this patch, as they have experience with SSA code generation and might be able to

provide you with some test cases.

Thest cases would be good yes. And yes, I am not aware of general limitations or bugs.

It is interesting to see that this patch actually removes a lot of code. So maybe I am wrong about point 1)

In my opinion it is not (so much) about the amount of code but the complexity. And most of the new code is rather straight forward (e.g., everything in ScopHelper.cpp, merging after conditionals and after the loop, and the code that is produced by Polly).

An alternative approach might be to just call PromoteMemToRegisters on the scalar allocs we introduced. This would allow us to keep the same mental model, but to still generate good code.

At some point you want to run the vectorizer after Polly. Why require post-transformation if good code generation is not harder than what we have at the moment?

Some questions I have before i get into reviewing this code:

a) Does your patch need to make any effort to eliminate dead PHI nodes? Will we have a lot of PHI nodes that might need to be dead-code eliminated after your patch? More precisely, would we now need to run a phase of DCE after Polly instead of Mem2Reg.

It does produce dead PHI nodes, though, judging from my experiments not many. Additionally, these dead nodes should not harm vectorization or anything else and will be removed eventually. I did not implement an AST SSA generation that does not generate dead PHIs (e.g., http://pp.info.uni-karlsruhe.de/uploads/publikationen/braun13cc.pdf) since I wanted it to be simple. In our special casse one could easily remember all placed PHIs and use the number of users + the operands to remove all of them after code generation in an efficent way (without another pass).

b) Are there any remaining issues or inherent limitations you are aware in comparison to the old approach?

Not that I am aware of.

c) I know you are still running performance numbers, which will be interesting to look at. Otherwise, is this patch ready to be reviewed.

Performance numbers are back. Both on parkas2, almost identical except 2 small compile time improvements.

In general I like the change, though I do not understand the use of undef values in several places in the patch.

include/polly/CodeGen/BlockGenerators.h
308	What is a loop carried value?
313	Could we call these "loop phi" nodes instead of "loop carried phi" nodes? There is possibly confusion in reading the abbreviated LCPHI: the existing convention is to read it Loop Closed Phi, as in LCSSA.
lib/CodeGen/BlockGenerators.cpp
136	This comment is confusing: either remove it, or move it one line down and fix it to say "outside the scop".
1055	Unfinished comment?

sebpop added inline comments.Jan 27 2016, 2:27 PM

lib/CodeGen/BlockGenerators.cpp
425	Why are we forcing copy of instructions that can be synthesized? Aren't these synthesizable instructions discarded in the first place by not being added to the scalar memory accesses of the stmt?
lib/CodeGen/IslNodeBuilder.cpp
549	I don't understand how these undef values are supposed to work. Are you cleaning them up in a later pass? I see in some of the testcases that they are still left in the IR after Polly.

Here is what I think the two major improvements of this patch over the existing codegens:

the use of the scalar references for the stmt being translated: these are actually the only scalars we care about for cross BB dependences, so they need renaming and phi nodes at CFG junctions,
the use of the structure of the generated code to place the new phi nodes.

The more I think about this algorithm, the more I like it ;-)
Thanks Johannes!

There is also an advantage in that we can verify the generated IR statically, st wrong code (e.g. forget to initialize a value) is noticed much earlier by the IR verifier before generating a wrong program that only gets noticed at runtime.

sebpop added inline comments.Jan 29 2016, 11:16 AM

lib/Support/ScopHelper.cpp
481	You could also check whether all arguments of the MergePHI node are the same, in which case you do not need the phi node, and just return the first arg.

In D15722#334871, @grosser wrote:

(you already know about this) I am concerned that this approach might be more complex and/or may have more corner cases we need to worry about. Aditya mentioned that in their implementation in graphite, they fail to generate code in various situations. My understanding is that you are not aware of any such limitations in your code and you already checked several of the examples Aditya provided. It would probably be good for Aditya and Sebastian to have a look at this patch, as they have experience with SSA code generation and might be able to

provide you with some test cases.

Thest cases would be good yes. And yes, I am not aware of general limitations or bugs.

It is interesting to see that this patch actually removes a lot of code. So maybe I am wrong about point 1)

An alternative approach might be to just call PromoteMemToRegisters on the scalar allocs we introduced. This would allow us to keep the same mental model, but to still generate good code.

At some point you want to run the vectorizer after Polly. Why require post-transformation if good code generation is not harder than what we have at the moment?

Some questions I have before i get into reviewing this code:

a) Does your patch need to make any effort to eliminate dead PHI nodes? Will we have a lot of PHI nodes that might need to be dead-code eliminated after your patch? More precisely, would we now need to run a phase of DCE after Polly instead of Mem2Reg.

b) Are there any remaining issues or inherent limitations you are aware in comparison to the old approach?

Not that I am aware of.

c) I know you are still running performance numbers, which will be interesting to look at. Otherwise, is this patch ready to be reviewed.

Performance numbers are back. Both on parkas2, almost identical except 2 small compile time improvements.

In D15722#338442, @Meinersbur wrote:

There is also an advantage in that we can verify the generated IR statically, st wrong code (e.g. forget to initialize a value) is noticed much earlier by the IR verifier before generating a wrong program that only gets noticed at runtime.

I fully agree. There were almost only compile time problems when I wrote this. When I rememeber the time we fixed the scalar codegen the first time we saw a lot of runtime problems due to not or wrong initialization.

include/polly/CodeGen/BlockGenerators.h
308	Any scalar value that is defined and used in different iterations of the same loop. Prior to the transformation this means it is used by a PHI, however after the transformation it might also refere to a scalar that is used in a loop textually before it was defined. An example would be the folloing original code: for (i = 0...N) a = A[i]; A[i+1] = a; with the following "optimized" version: for (i = 0...N+1) if (i > 0) A[i] = a; if (i < N + 1) a = A[i]; Here the former non-loop carried scalar a is now loop-carried (thus needs a PHI).
313	Sure.
lib/CodeGen/BlockGenerators.cpp
136	I'll repair this comment.
425	Because we (might) need these values as operands of PHI nodes we haven't constructed yet and we need them to be computed/placed at the correct location. Later on, when we create the PHI, we do not know anymore where we should place the operand code and not even which operand we should create. In the following example that could be the result after scheduling we would not know where to place which version of x (x1 or x2) when we create code for the join point after the conditional. It would require a backward search on the AST to find the location where the PHI x was written last by each predecessor in order to get the operand and location. However, forcing them to be copyied in the first place solves this quite nice as we alwas know the values we will later need are actually present and mapped. if (...) { x1 = 2i; S: / lots of code / } else { x2 = 3i; P: /* lots of code */ } x3 = phi(x1, x2)
1055	Yes, I'll fix this.
lib/CodeGen/IslNodeBuilder.cpp
549	Undef values are already "used" in the current code generation but simply hidden well enough. When a scalar is used before it is initialized we basically get an undef. Currently this means a load from an uninitialized alloca is basically an undef. As we promoted these allocas, loads and stores now we see the undefs in the IR, however they should only exist where the alloca content was undefined before. An example: int x; for i = 0...N { if (i > 0) A[i-1] = x; x = A[i]; } Here the initial value of x is undefined and therefor the PHI in the loop header as well as the PHI after the loop will have one undef as operand. In the following example there were no undefs prior to Polly in the IR but right after code generation there are. if (a) { S: x = A[i]; /* split BB */ P: A[i] = x; } Now we split the conditional for whatever reason and Polly will generate a CFG that kinda looks like this: if (a) S: x1 = A[i]; x = phi(undef, x1) if (a) P: A[i] = x; We need the undef for the path that does not define x (or x1). Does that make sense?
lib/Support/ScopHelper.cpp
481	That is what this code is doing, at least it was supposted to do that. Note that the set "Values" contains all operands of the MergePHI and if it containts only one Value we know all operands are the same, otherwise we know they are not. Do you see a problematic case?

Ping.

Meinersbur mentioned this in D12975: [Polly] De-LICM and De-GVN (WIP).Feb 24 2016, 9:01 AM

Any progress?

Thank you for the awseome patch. AFAIU the algorithm is "follow each generated MK_PHIKind through to the end of the generated CFG while adding PHIs to make them usable", ignoring whether the defs are used or how the PHIs where in the original code. Could you explain this somewhere, preferable not only in the commit message?

Is it correct that this gives us O(n*m) generated PHIs (n=number of MK_PHIKind, m=number of statements)?

I still don't really understand the "ScalarMap". What is the difference to BBMaps? Do we need both? It uses original defs as well merge PHIs used as key. I see the test cases motivating this, but could this be avoided by e.g. adding the incoming values in a separate pass instead on-the-fly?

I had 17 LNT test fails with -polly-position=before-vectorizer (none without it). Interestingly I also got 4 unit test fails under Windows but not Linux. Is there maybe some indeterministic ordering? The failing tests are:

Polly :: Isl/CodeGen/phi_loop_carried_float.ll
Polly :: Isl/CodeGen/phi_loop_carried_float_4.ll
Polly :: Isl/CodeGen/phi_loop_carried_float_escape.ll
Polly :: Isl/CodeGen/phi_scalar_simple_1.ll

If you rebase and the problem is still there, I'd debug this for you under Windows.

include/polly/CodeGen/BlockGenerators.h
125	I generally don't like the idea of a private/protected field being sneakingly modified from outside the object (here: IslNodeBuilder)
308	Referring to "(possibly)": What is the function supposed to do if the value is not loop-carried (in what loop)?
313	Idea: Maybe define in some comment what exactly a "loop-carried phi" is? Eg. a PHI in a loop header.
574	ScalarMaps should have its comment. AFAIK Doxygen treats this as uncommented field.
include/polly/CodeGen/IslNodeBuilder.h
69	Can you explain "floating"? As Sebastian also noted, on seeing "LCPHI" I was first thinking about the PHIs for LCSSA.
71	Could you explain a bit more about "ScalarMaps"? What is it mapping from/to? How/where is it used? What are valid keys?
82	Why moving this definition?
include/polly/Support/ScopHelper.h
170	This would be an excellent place to explain how a "merge PHI" is different from other PHIs.
176	This function returns void
lib/Analysis/ScopDetection.cpp
90	I assume this is a leftover from debugging.
lib/CodeGen/BlockGenerators.cpp
232	This change is surprising to me. Aren't the hoisted loads stored in GlobalMap anymore?
265	What is the reason to not insert the PHI into the generated BB at this point (but to keep them "floating")?
277–281	Could you add comments to explain these cases?
339	block-intern_al_? mapping_s_
442	"textually" doesn't seem the right word; we are not doing text processing here. Should be something about processing order.
1165	Unrelated?
lib/CodeGen/IslNodeBuilder.cpp
61	`Array->getNumberOfDimensions()!=0` seems redundant. MK_PHI always has 0 dimensions.
522	Naive question: Is there some way we could have a nested instance of IslNodeBuilder/function call with fresh values instead of temporarily storing `LCPHIs`, `LoopDepth`, `PreMap` away and restoring it afterwards?
523–524	std::move
549	Could you explain this in the code as a comment?
798	std::move
lib/Support/RegisterPasses.cpp
45	Assuming debugging leftover
lib/Support/ScopHelper.cpp
468	If the incoming value is not found in ScalarMap, I can think of two reasons: It's defined before the Scop We forgot to put in there It was defined only in one incoming branch. In none of these cases "undef" would make sense. In 3) it cannot be used without an explicit PHI in the original code. What other case is there?
476	Wouldn't it be better to name the PHI after the original value instead one of the incoming values?
479	Because Value.size() will be equal to BBs.size(), this condition can be tested before creating an not used MergePHI.

In D15722#390590, @Meinersbur wrote:

Thank you for the awseome patch.

I am glad somebody else likes the idea too and I got some more feedback!

AFAIU the algorithm is "follow each generated MK_PHIKind through to the end of the generated CFG while adding PHIs to make them usable", ignoring whether the defs are used or how the PHIs where in the original code. Could you explain this somewhere, preferable not only in the commit message?

It is a bit more general but the idea is correct. Track all generated MK_Value and MK_PHI accesses through the CFG and add new PHIs if different versions for the same base address join in a block with multiple predecessors. [This is a short description I can build on and place somewhere].

Is it correct that this gives us O(n*m) generated PHIs (n=number of MK_PHIKind, m=number of statements)?

Mh, it is unfortunatly not that simple. First, "n" should also include MK_Value but that is not even the problem. The problem is with "m" which is not clearly defined. "m" has to be the number of "join-points" in the generated CFG. This number can grow exponentially with the number of statements/blocks in the generated CFG (which can be arbitrarily higher than the number of statements in the SCoP). Thus, one can achieve O(n*2^m) PHIs, but that also means the optimized version has 2^m different paths through the region which will cause explosions in various other places too. That said, I am not even sure this kind of analysis makes sense because:

We will now only generate a PHI (for one particular SSA-value) if it is "needed" to join different versions of the original "value". While these PHIs can be dead, the total number of PHIs is linear in the number of paths. Additionally, if a path does not introduce a new version of the SSA-value there won't be a PHI once it is joined and if it does contain a new version of an SSA-value we would have placed a store before and mem2reg would have placed the PHI later. The main difference in the complexity comes from dead PHIs that are (as mentioned) limited by the number of paths and could be removed easily after code generation by us.

I still don't really understand the "ScalarMap". What is the difference to BBMaps? Do we need both?

In short: BBMaps are "intra-statement" maps for all local ssa-values but ScalarMaps are (control-)scope-aware "inter-statement" maps for some ssa-values.

Summarized:

ScalarMaps are used to remember mappings between statements
ScalarMaps are (control-)scope-aware, thus each (control-) scope (here loop or conditional) has it's own ScalarMap that is initialized with the ScalarMap of the parent (control-) scope.
ScalarMaps do not contain all ssa-value mappings but only those that are interesting, thus have a SAI object.

It uses original defs as well merge PHIs used as key. I see the test cases motivating this, but could this be avoided by e.g. adding the incoming values in a separate pass instead on-the-fly?

It basically unifies the old ScalarMap and the old PHIOpMap using this trick. One could have two (control-)scope-aware maps again (or maybe do something else) but I do not think it is worth it.

I had 17 LNT test fails with -polly-position=before-vectorizer (none without it). Interestingly I also got 4 unit test fails under Windows but not Linux. Is there maybe some indeterministic ordering? The failing tests are:
Polly :: Isl/CodeGen/phi_loop_carried_float.ll
Polly :: Isl/CodeGen/phi_loop_carried_float_4.ll
Polly :: Isl/CodeGen/phi_loop_carried_float_escape.ll
Polly :: Isl/CodeGen/phi_scalar_simple_1.ll
If you rebase and the problem is still there, I'd debug this for you under Windows.

I will rebase it soon and push it here. Then I will also check LNT again.

include/polly/CodeGen/BlockGenerators.h
125	Agreed, but to be fair, we had those reference maps before too. The only alternative that I see right now is a "ConstructionContext" (similar to the DetectionContext in ScopDetection) that is handed to the functions and can be modified. While this might be a good idea I do think we can postpone it a bit.
308	Good question. The (possibly) hints on the fact that one does not know if a value will be loop carried or not but if it is one needs the PHI. To this end we can generate the PHI even though we might not need it.
313	Can do.
574	This is true, thanks for the catch.
include/polly/CodeGen/IslNodeBuilder.h
69	I can change the name here too and floating means not placed in a basic block [can be added to the comment].
71	Sure
82	Beacuse the constructor/initilizer code looks nicer when we order the members this way. I do not care to much about it and if you feel we should not move this I can undo it.
include/polly/Support/ScopHelper.h
170	I am not sure what you mean by other PHIs and therefor not what exactly you want to see here but I agree, I can add some comment here.
176	I know, therefor the comment in the @returns clause.
lib/Analysis/ScopDetection.cpp
90	Indeed.
lib/CodeGen/BlockGenerators.cpp
232	Yes and no. We do not need to lookup in the GlobalMap here because the hoisted loads are propagated thorugh the scalar maps to the BBMap. One could as easily not do this and I am not sure which is better.
265	This function and all callers are in the BlockGenerator, but the block where these PHIs are supposed to reside in [the loop header of a Polly generated loop] is created in the IslNodeBuilder (or more precise the LoopGenerator). Thus, we either keep them floating or remember loop header blocks somewhere in the BlockGenerator. I choose the first as we can easily place them in the IslNodeBuilder if they are actually needed or delete them otherwise.
277–281	Sure.
339	Thx.
442	AFAIK this is a standard term that we also used in our discussions and emails before. Processing order would not meet the requirements anyway as the actual dynamically executed order is not what this is about. "textually" means the order in which you statically write/see/discover things. In other words the order in which you first see the statements in a linearized/dumped AST if you read it like a page (from top to bottom).
1165	Maybe, I do not recall.
lib/CodeGen/IslNodeBuilder.cpp
61	Probably true, I do again not recall why I did it this way.
522	Possibly, but that is not an easy task and it would not even help much. We would need to copy ScalarMap to the new "instance" anyway. If we think of a good solution we can implement it in the paralell code generation too (we currently copy the maps there too).
523–524	That would "destory" LCPHIs here, wouldn't it? AFAIK you shall not access something that you passed to std::move. We still need the container but just without its content.
549	Sure, but I will have to rephrase it a bit.
798	[See above for more comments] I would like to get it in like this and we can optimize it later, especially since std::move might trigger hard to debug problems I want to avoid for now.
lib/Support/RegisterPasses.cpp
45	Indeed, again.
lib/Support/ScopHelper.cpp
468	and 2) cannot/should never happen or would at least be bugs. 2) is clearly a bug and 1) should always be present. The reason for the undef here is the same as above, thus 3). Starting with: if (a) { S: x = A[i]; P: A[i] = x; } the scheduler can generate: if (a) { S: x = A[i]; } if (a) { P: A[i] = x; } which needs a PHI (for x after the conditional containing S) even though the original code did not contain/need any. Additionally, there is only one path to that PHI that defines a value for x but we need an operand for the other path reaching the PHI. As I explained Sebastian, these undef's were basically present before, as "content" of the alloca slots we generated. And if you look at the code we generate at the momement for the example abvoe and run mem2reg you will see the same udnef's again.
476	Maybe, I do not recall if this was a concious desicion or not.
479	It is not necessarily equal to BBs.size() as the mappings in several BBs can be the same. If that happens to be the case for all BBs we can remove the PHI immediatly but we do not know that until we looked up all mappings.

Is it correct that this gives us O(n*m) generated PHIs (n=number of MK_PHIKind, m=number of statements)?

Mh, it is unfortunatly not that simple. First, "n" should also include MK_Value but that is not even the problem.

Right; didn't think about MK_Value at that moment.

The problem is with "m" which is not clearly defined. "m" has to be the number of "join-points" in the generated CFG. This number can grow exponentially with the number of statements/blocks in the generated CFG (which can be arbitrarily higher than the number of statements in the SCoP).

I was thinking about statements in the isl_ast, but meant the number of nodes in there. For each isl_ast_node (isl_ast_node_for and isl_ast_node_if), can there be at most one "join-point"?

Thus, one can achieve O(n*2^m) PHIs, but that also means the optimized version has 2^m different paths through the region which will cause explosions in various other places too. That said, I am not even sure this kind of analysis makes sense because:

We will now only generate a PHI (for one particular SSA-value) if it is "needed" to join different versions of the original "value". While these PHIs can be dead, the total number of PHIs is linear in the number of paths.

The number of paths is already 2^m? Per join-point we can also have has many PHIs as scalars, no?

Additionally, if a path does not introduce a new version of the SSA-value there won't be a PHI once it is joined and if it does contain a new version of an SSA-value we would have placed a store before and mem2reg would have placed the PHI later. The main difference in the complexity comes from dead PHIs that are (as mentioned) limited by the number of paths and could be removed easily after code generation by us.

Just exploring whether we can have a case where the number of unused PHIs grows excessively such that it would introduce another scaling problem.

In short: BBMaps are "intra-statement" maps for all local ssa-values but ScalarMaps are (control-)scope-aware "inter-statement" maps for some ssa-values.

Summarized:

ScalarMaps are used to remember mappings between statements

ScalarMaps are (control-)scope-aware, thus each (control-) scope (here loop or conditional) has it's own ScalarMap that is initialized with the ScalarMap of the parent (control-) scope.

ScalarMaps do not contain all ssa-value mappings but only those that are interesting, thus have a SAI object.

Thank you for the explanation.

include/polly/Support/ScopHelper.h
176	`@return`(s) is intended to document the function's return value, but this function doesn't return anything. MergeMap should be documented using eg. /// @param MergeMap Receives the merged mappings in @p MergeMap.
lib/CodeGen/BlockGenerators.cpp
265	OK
442	OK, but never seen this personally.
lib/CodeGen/IslNodeBuilder.cpp
523–524	std::swap then? ValueMapT PreLCPHIs; LCPHIs.swap(PreLCPHIs); Although I'd agree this doesn't really make the intend clearer; It's up to you.
lib/Support/ScopHelper.cpp
468	Thank you for the explanation with example
479	Thank you; I missed that Values is a set that 'collapses' equal values.

I think I could write this patch a little differently and we would never build any dead PHIs. However, that would require backtracking instead which might or might not turn out to be faster.

Backtracking sounds like a function to get the generated value on demand, and following the control flow upstream by calling itself. Sounds like a reasonable idea to avoid generating too many unused PHIs.

But I don't know whether it even could introduce a scaling problem and therefore worth the increased implementation complexity (or maybe it's even simpler?)

What do you think?

Backtracking sounds like a function to get the generated value on
demand, and following the control flow upstream by calling itself.
Sounds like a reasonable idea to avoid generating too many unused
PHIs.

Yes, but there is little evidence so far that we generate "too many"
unused PHIs with this patch for interesting code.

But I don't know whether it even could introduce a scaling problem and therefore worth the increased implementation complexity (or maybe it's even simpler?)

I think we can do it with reasonable implementation complexity, however,
you have to consider that we currently (without SSA-codegen) also have a
trade-off "between number of PHIs" and "complexity to place them" even
though it is only implicit. The patch as presented is very good in
terms of complexity to place PHIs but consequently not to smart in
placing them. Alternatively, backtracking would place PHIs smart but
it can cost a lot more to place a PHI in the first place. The overall
complexity of the code is in the worst case always the same [now and with
either SSA-codegen approaches] but the complexity to get to the code is
what is different.

To conclude: I haven't made up my mind what we want and I don't think
you can say "what is best" it in general.

Cheers,

Johannes

- {F1785844, layout=link}

In D15722#400494, @jdoerfert wrote:

Yes, but there is little evidence so far that we generate "too many"
unused PHIs with this patch for interesting code.

With 'it' in my second paragraph I was referring to D15722 (without backtracking), i.e. was trying to express exactly this. Sorry for my ambiguous writing.

Generally we should be looking for arguments/proof that some code does not introduce scaling issues instead of waiting for evidence that there is one. Otherwise the evidence will come in form of users complaining about excessive compilation time with their sources after a release.

However, in this case I am not too much concerned. Worst case would be to have many Instructions at the beginning of the scop and a PHI node for each of them at each join point. mem2reg might not perform better when we do it with the current implementation. Therefore I suggest to leave it with the current implementation (i.e. D15722).

Hi Johannes,

thank you again for having looked into the SSA codegen. I just spend the day looking through this patch. It contains indeed some very nice ideas and shows that it is probably possible to directly generate SSA without going through memory.

While playing with your patch, I implemented the earlier proposed approach of using LLVM's Mem2Reg utility from inside our code generation (

0001-IslNodeBuilder-Promote-Mem2Reg-when-finalizing-SCoP.patch3 KBDownload

). With an almost trivial change it seems to achieve the same goals while additionally guaranteeing a minimal set of PHI nodes and coming with a minimal risk of regressions.

I tried to find an argument why we want to go for this 500 line patch, which still needs polishing, a review of the polished version, and also brings the risk of introducing regression:

Going through the discussion I found two arguments for this patch:

The new patch exposed some bugs at compile time, that we missed earlier on
The new patch is smaller and easier to understand than what we have today

I agree that 1) may hold for some cases.

Regarding 2) at least in terms of code size there is not yet too much difference. Only looking at include/ lib/ I see 497 insertions(+), 634 deletions(-). Which is 136 instructions less for the new solution. On the other side, this patch drops a larger 120 line comment describing in detail the previous approach. In addition, the removal of getNewScalarValue that was proposed in this patch has already been incorporated in trunk. Hence, I expect that a rebased version of this patch that adds additional documentation and possibly even code to remove unnecessary PHI nodes is at least not shorter in code size.

Regarding the complexity of the approach, I personally believe that the current approach is as easy to understand as the new approach. In parts that may be because I having written a lot of documentation for the old code and I miss a concise description of the new approach with examples that illustrate the corner cases. Some areas such as non-affine-regions are probably easier to understand with the new code, as finding the right spot for the scalar stores is not easy. However, the new approach becomes possibly more difficult when we want to guarantee that a minimal set of PHI nodes is generated (which we probably want as we have a solution that can do this automatically).

To get a better understanding of this patch, I would really need a clean version of this patch where existing review comments are addressed, the remaining Windows/SIMD bugs resolved, the PHI node minimization implemented and some overview documentation added. In the optimal case we can make a convincing statement why the new approach is linear in compile time and results in a minimal set of PHI nodes as we get it when using Mem2Reg.

Considering this trivial patch that I posted, do you (and the other reviewers) see sufficient benefits in this larger patch now that it would be worth spending several days on rebasing and reviewing the larger patch and risking the introduction of regressions?

I personally would prefer to focus currently more on your latest assumption tracking changes and give you feedback on those. Hence, I would suggest to start off with (

0001-IslNodeBuilder-Promote-Mem2Reg-when-finalizing-SCoP.patch3 KBDownload

) and use it to introduce all the interesting test case changes that come with this patch. They should match almost 1:1 and would be non-controversial and clearly beneficial.

From this baseline we can then -- when time allows and the need arises -- pickup this patch. If we only introduce it for stylistic issues, the already changed test cases will ensure that IR changes such as the vectorization issue that has been overlooked stand out nicely. However, as I personally could see additional benefits coming from this patch (e.g. because we want to run cost-models on partially generated IR), it might even make sense to introduce this change when in the future non-memory-scalars become necessary even during code generation.

If you guys strongly believe we should go for the new approach now, I can probably take another day to check it throughly for other corner cases that may cause troubles before you start rebasing. (This is not because I don't trust you, but this is a major change on a piece of code which we spent a significant amount of time to get stable, so I believe we need to be vary careful to not regress here).

lib/CodeGen/BlockGenerators.cpp
458	This patch nicely pointed out that getNewScalarValue is unnecessary and its uses can be replaced with getNewValue. This simplification was performed on Jan 26 in https://llvm.org/svn/llvm-project/polly/trunk@258799, such that future versions of this patch will not need to perform this simplification any more.
1007	Why is the last function dropped? This is a sanity check that works on the ScopInfo and is unrelated to how we generate scalar code. It verifies that no scalar write statements are in this ScopStmt, which is likely to cause trouble both for the to-memory scalar codegen as well as SSA codegen. Consequently, I think it makes sense to keep this check.
1038	By dropping generateScalarVectorLoads this patch changes behavior. Originally the code splatted all scalar values into a vector-value. After this change, no vector values are generated. This is likely the reason test/Isl/CodeGen/simple_vec_stride_one.ll changes.
test/Isl/CodeGen/phi_loop_carried_float_5.ll
5	This test case is incomplete. Without a comment it is unclear what exactly is tested here. (The test case seems to be interesting, though)
test/Isl/CodeGen/phi_scalar_simple_1.ll
61	These check lines are misleading as the current patch is generating more (unnecessary) PHI instructions here. In the optimal case we would not even generate them. As long as we generate them, we should list all of them with CHECK-NEXT and add a TODO that some of these PHIs are unnecessary and expected to be deleted later.
test/Isl/CodeGen/phi_scalar_simple_2.ll
16	If we are dropping here almost all CHECK lines, this test case becomes useless. Either we should make an argument that it does indeed not test anything interesting and just delete it or we should add CHECK lines similar to the ones added in phi_scalar_simple_1.ll. If we make the argument this test case is redundant, it should be deleted in a separate commit.
test/Isl/CodeGen/scalar-store-from-same-bb.ll
14	Nice!
test/Isl/CodeGen/simple_vec_call.ll
35	This is a trivial change, but mostly unrelated to the proposed patch. I would prefer to introduce REGEXP matches ahead of time to keep the review and patch focused. I did this in r267875 so this change won't show up in a possible future version of this patch.
test/Isl/CodeGen/simple_vec_stride_one.ll
8	Instead of a vector store, we now generate four scalar stores. This is a regression that seems to be introduced accidentally in this patch.
test/Isl/CodeGen/srem-in-other-bb.ll
13	Trivial, but nice!
test/ScopInfo/out-of-scop-use-in-region-entry-phi-node-nonaffine-subregion.ll
1	The file mode change is unrelated and should be a separate commit. This has already been fixed in r266473.

For me the main argument for this patch is that it catches potential problems earlier (Your argument 1).

My secondary argument is that it reduces the overall complexity of the system. If currently and with F1855864 we'd do:
MemoryAccesses -> Load/Stores -> Registers
this patch would just do
MemoryAccesses -> Registers
That is, there is one step less, less concern about unpredictable interactions, and maybe even faster compilation.

I don't think there is a meaningful difference between F1855864 and running a mem2reg/SROA pass afterwards, as we do now, eg. in CodegenCleanup.

For me the size of a patch is no argument unless we are maintaining a legacy code base. While new code may introduce new issues that are to be fixed, this will be eventually compensated in the long term, even if the advantages are small. Number of lines is also not a useful metric for complexity/understandability.

My arguments against this patch would be:

I actually find it harder to understand than the current. MemoryAccesses are modeled as READ/WRITE actions, so it is more straightforward to also just generate LoadInst/StoreInst.
I am less than 100% sure that cases where it generates superlinear more PHIs than necessary never happen in practical code.

Hence, this comes down to what our priorities are. I'd vote in favor of SSA codegen.

@Tobias @Michael:

First, thanks for you your thoughts on this patch! Second, I agree on
basically all the key facts, both positive and negative. Nevertheless,
I like to go forward with the general idea as I think the positive
aspects outweigh the negative ones.

@Tobias: I can come up with an improved patch that is less complex and

places only needed PHI nodes. Would that be enough to get it in
if we do not show regressions? If not I will not invest any
more time.

For me the main argument for this patch is that it catches potential problems earlier (Your argument 1).

My secondary argument is that it reduces the overall complexity of the system. If currently and with http://reviews.llvm.org/F1855864 we'd do:
MemoryAccesses -> Load/Stores -> Registers
this patch would just do
MemoryAccesses -> Registers
That is, there is one step less, less concern about unpredictable interactions, and maybe even faster compilation.

I don't think there is a meaningful difference between http://reviews.llvm.org/F1855864 and running a mem2reg/SROA pass afterwards, as we do now, eg. in CodegenCleanup.

For me the size of a patch is no argument unless we are maintaining a legacy code base. While new code may introduce new issues that are to be fixed, this will be eventually compensated in the long term, even if the advantages are small. Number of lines is also not a useful metric for complexity/understandability.

My arguments against this patch would be:

I actually find it harder to understand than the current. MemoryAccesses are modeled as READ/WRITE actions, so it is more straightforward to also just generate LoadInst/StoreInst.

I am less than 100% sure that cases where it generates superlinear more PHIs than necessary never happen in practical code.

Hence, this comes down to what our priorities are. I'd vote in favor of SSA codegen.

http://reviews.llvm.org/D15722

You received this message because you are subscribed to the Google Groups "Polly Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to polly-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

signature.asc219 BDownload

sebpop resigned from this revision.Sep 19 2016, 1:21 PM

sebpop removed a reviewer: sebpop.

I resign from this patch as it is outdated and I am also not yet really convinced that direct SSA generation is the way to go. It is a pretty complicated change which is hard to get right in all corner cases, makes the code more difficult to understand, and does not really improve things either. There is a reason clang does not generate SSA, but relies on -mem2reg ;).
I admit that there may certainly be some benefits in this, but I would suggest to start a discussion on the mailing list before anybody looks into this again.

Revision Contents

Path

Size

include/

polly/

CodeGen/

BlockGenerators.h

359 lines

IslNodeBuilder.h

31 lines

Support/

ScopHelper.h

14 lines

lib/

Analysis/

ScopDetection.cpp

2 lines

CodeGen/

BlockGenerators.cpp

527 lines

CodeGeneration.cpp

1 line

IslNodeBuilder.cpp

144 lines

Support/

RegisterPasses.cpp

2 lines

ScopHelper.cpp

51 lines

test/

Isl/

CodeGen/

MemAccess/

update_access_functions.ll

5 lines

OpenMP/

invariant_base_pointer_preloaded_different_bb.ll

4 lines

single_loop_with_param.ll

16 lines

entry_with_trivial_phi_other_bb.ll

7 lines

invariant_load_escaping.ll

6 lines

invariant_load_scalar_escape_alloca_sharing.ll

21 lines

large-numbers-in-boundary-context.ll

6 lines

non-affine-dominance-generated-entering.ll

12 lines

non-affine-exit-node-dominance.ll

9 lines

non-affine-phi-node-expansion-2.ll

17 lines

non-affine-phi-node-expansion-3.ll

9 lines

non-affine-phi-node-expansion-4.ll

9 lines

non-affine-region-exit-phi-incoming-synthesize.ll

15 lines

non-affine-region-implicit-store.ll

10 lines

non-affine-synthesized-in-branch.ll

15 lines

non_affine_float_compare.ll

4 lines

out-of-scop-phi-node-use.ll

7 lines

phi-defined-before-scop.ll

11 lines

phi-in-non-affine-subregion-entry.ll

48 lines

phi_condition_modeling_1.ll

17 lines

phi_condition_modeling_2.ll

33 lines

phi_conditional_simple_1.ll

17 lines

phi_in_exit_early_lnt_failure_2.ll

14 lines

phi_in_exit_early_lnt_failure_3.ll

2 lines

phi_in_exit_early_lnt_failure_5.ll

2 lines

phi_loop_carried_float.ll

32 lines

phi_loop_carried_float_2.ll

51 lines

phi_loop_carried_float_3.ll

49 lines

phi_loop_carried_float_4.ll

180 lines

phi_loop_carried_float_5.ll

83 lines

phi_loop_carried_float_6.ll

83 lines

phi_loop_carried_float_escape.ll

29 lines

phi_scalar_simple_1.ll

61 lines

phi_scalar_simple_2.ll

43 lines

phi_with_multi_exiting_edges_2.ll

2 lines

phi_with_one_exit_edge.ll

2 lines

pr25241.ll

18 lines

read-only-scalars.ll

12 lines

scalar-dependence-reverse-text-order-two-uses.ll

59 lines

scalar-dependence-reverse-text-order.ll

47 lines

scalar-store-from-same-bb.ll

18 lines

sdrto___%loopA---%exit.jscop

36 lines

sdrtotu___%loopA---%exit.jscop

51 lines

simple_vec_call.ll

10 lines

simple_vec_stride_one.ll

7 lines

srem-in-other-bb.ll

4 lines

uninitialized_scalar_memory.ll

18 lines

ScopInfo/

invariant_load_access_classes_different_base_type.ll

5 lines

invariant_load_access_classes_different_base_type_escaping.ll

15 lines

invariant_load_access_classes_different_base_type_same_pointer.ll

7 lines

invariant_load_access_classes_different_base_type_same_pointer_escaping.ll

12 lines

invariant_load_zext_parameter.ll

1 line

out-of-scop-use-in-region-entry-phi-node-nonaffine-subregion.ll

18 lines

Diff 45834

include/polly/CodeGen/BlockGenerators.h

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
class BlockGenerator {		class BlockGenerator {
public:		public:
typedef llvm::SmallVector<ValueMapT, 8> VectorValueMapT;		typedef llvm::SmallVector<ValueMapT, 8> VectorValueMapT;

/// @brief Map types to resolve scalar dependences.		/// @brief Map types to resolve scalar dependences.
///		///
///@{		///@{

/// @see The ScalarMap and PHIOpMap member.
using ScalarAllocaMapTy = DenseMap<AssertingVH<Value>, AssertingVH<Value>>;

/// @brief Simple vector of instructions to store escape users.		/// @brief Simple vector of instructions to store escape users.
using EscapeUserVectorTy = SmallVector<Instruction *, 4>;		using EscapeUserVectorTy = SmallVector<Instruction *, 4>;

/// @brief Map type to resolve escaping users for scalar instructions.		/// @brief Map type to resolve escaping users for scalar instructions.
///		///
/// @see The EscapeMap member.		/// @see The EscapeMap member.
using EscapeUsersAllocaMapTy =		using EscapeUsersAllocaMapTy = DenseMap<Instruction *, EscapeUserVectorTy>;
DenseMap<Instruction *,
std::pair<AssertingVH<Value>, EscapeUserVectorTy>>;

///@}		///@}

/// @brief Create a generator for basic blocks.		/// @brief Create a generator for basic blocks.
///		///
/// @param Builder The LLVM-IR Builder used to generate the statement. The		/// @param Builder The LLVM-IR Builder used to generate the statement. The
/// code is generated at the location, the Builder points		/// code is generated at the location, the Builder points
/// to.		/// to.
/// @param LI The loop info for the current function		/// @param LI The loop info for the current function
/// @param SE The scalar evolution info for the current function		/// @param SE The scalar evolution info for the current function
/// @param DT The dominator tree of this function.		/// @param DT The dominator tree of this function.
/// @param ScalarMap Map from scalars to their demoted location.		/// @param LCPHIs Map from loop carried PHI nodes to "floating" copies.
/// @param PHIOpMap Map from PHIs to their demoted operand location.		/// @param LoopDepth The loop depth of the block beeing copied.
/// @param EscapeMap Map from scalars to their escape users and locations.		/// @param EscapeMap Map from scalars to their escape users and locations.
/// @param GlobalMap A mapping from llvm::Values used in the original scop		/// @param GlobalMap A mapping from llvm::Values used in the original scop
/// region to a new set of llvm::Values. Each reference to		/// region to a new set of llvm::Values. Each reference to
/// an original value appearing in this mapping is replaced		/// an original value appearing in this mapping is replaced
/// with the new value it is mapped to.		/// with the new value it is mapped to.
/// @param ExprBuilder An expression builder to generate new access functions.		/// @param ExprBuilder An expression builder to generate new access functions.
BlockGenerator(PollyIRBuilder &Builder, LoopInfo &LI, ScalarEvolution &SE,		BlockGenerator(PollyIRBuilder &Builder, LoopInfo &LI, ScalarEvolution &SE,
DominatorTree &DT, ScalarAllocaMapTy &ScalarMap,		DominatorTree &DT, ValueMapT &LCPHIs, int &LoopDepth,
ScalarAllocaMapTy &PHIOpMap, EscapeUsersAllocaMapTy &EscapeMap,		EscapeUsersAllocaMapTy &EscapeMap, ValueMapT &GlobalMap,
ValueMapT &GlobalMap, IslExprBuilder *ExprBuilder = nullptr);		IslExprBuilder *ExprBuilder = nullptr);

/// @brief Copy the basic block.		/// @brief Copy the basic block.
///		///
/// This copies the entire basic block and updates references to old values		/// This copies the entire basic block and updates references to old values
/// with references to new values, as defined by GlobalMap.		/// with references to new values, as defined by GlobalMap.
///		///
/// @param Stmt The block statement to code generate.		/// @param Stmt The block statement to code generate.
		/// @param ScalarMap The scalar mappings that hold when @p Stmt is entered.
/// @param LTS A map from old loops to new induction variables as		/// @param LTS A map from old loops to new induction variables as
/// SCEVs.		/// SCEVs.
/// @param NewAccesses A map from memory access ids to new ast expressions,		/// @param NewAccesses A map from memory access ids to new ast expressions,
/// which may contain new access expressions for certain		/// which may contain new access expressions for certain
/// memory accesses.		/// memory accesses.
void copyStmt(ScopStmt &Stmt, LoopToScevMapT &LTS,		void copyStmt(ScopStmt &Stmt, ValueMapT &ScalarMap, LoopToScevMapT &LTS,
isl_id_to_ast_expr *NewAccesses);		isl_id_to_ast_expr *NewAccesses);

/// @brief Return the scalar alloca for @p ScalarBase
///
/// If no alloca was mapped to @p ScalarBase a new one is created.
///
/// @param ScalarBase The demoted scalar value.
/// @param GlobalMap A mapping from Allocas to other memory locations that
/// can be used to replace the original alloca locations
/// with new memory locations, e.g. when passing values to
/// subfunctions while offloading parallel sections.
///
/// @returns The alloca for @p ScalarBase or a replacement value taken from
/// GlobalMap.
Value getOrCreateScalarAlloca(Value ScalarBase);

/// @brief Return the PHi-node alloca for @p ScalarBase
///
/// If no alloca was mapped to @p ScalarBase a new one is created.
///
/// @param ScalarBase The demoted scalar value.
///
/// @returns The alloca for @p ScalarBase or a replacement value taken from
/// GlobalMap.
Value getOrCreatePHIAlloca(Value ScalarBase);

/// @brief Return the alloca for @p Access
///
/// If no alloca was mapped for @p Access a new one is created.
///
/// @param Access The memory access for which to generate the alloca
///
/// @returns The alloca for @p Access or a replacement value taken from
/// GlobalMap.
Value *getOrCreateAlloca(MemoryAccess &Access);

/// @brief Return the alloca for @p Array
///
/// If no alloca was mapped for @p Array a new one is created.
///
/// @param Array The array for which to generate the alloca
///
/// @returns The alloca for @p Array or a replacement value taken from
/// GlobalMap.
Value getOrCreateAlloca(const ScopArrayInfo Array);

/// @brief Finalize the code generation for the SCoP @p S.		/// @brief Finalize the code generation for the SCoP @p S.
///		///
/// This will initialize and finalize the scalar variables we demoted during		/// @param ScalarMap The scalar mappings that hold after @p S.
/// the code generation.
///		///
/// @see createScalarInitialization(Scop &)		/// @see createScalarFinalization()
/// @see createScalarFinalization(Region &)		/// @see createExitPHINodeMerges ()
void finalizeSCoP(Scop &S);		void finalizeSCoP(Scop &S, ValueMapT &ScalarMap);

/// @brief An empty destructor		/// @brief An empty destructor
virtual ~BlockGenerator(){};		virtual ~BlockGenerator(){};

BlockGenerator(const BlockGenerator &) = default;		BlockGenerator(const BlockGenerator &) = default;

protected:		protected:
PollyIRBuilder &Builder;		PollyIRBuilder &Builder;
LoopInfo &LI;		LoopInfo &LI;
ScalarEvolution &SE;		ScalarEvolution &SE;
IslExprBuilder *ExprBuilder;		IslExprBuilder *ExprBuilder;

/// @brief The dominator tree of this function.		/// @brief The dominator tree of this function.
DominatorTree &DT;		DominatorTree &DT;

/// @brief The entry block of the current function.		/// @brief The entry block of the current function.
BasicBlock *EntryBB;		BasicBlock *EntryBB;

/// @brief Maps to resolve scalar dependences for PHI operands and scalars.		/// @brief Map from loop carried PHI nodes to "floating" copies.
///		ValueMapT &LCPHIs;
/// When translating code that contains scalar dependences as they result from
/// inter-block scalar dependences (including the use of data carrying		/// @brief The loop depth of the currently copied block in the new schedule.
/// PHI nodes), we do not directly regenerate in-register SSA code, but		int &LoopDepth;
		MeinersburUnsubmitted Not Done Reply Inline Actions I generally don't like the idea of a private/protected field being sneakingly modified from outside the object (here: IslNodeBuilder) Meinersbur: I generally don't like the idea of a private/protected field being sneakingly modified from…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Agreed, but to be fair, we had those reference maps before too. The only alternative that I see right now is a "ConstructionContext" (similar to the DetectionContext in ScopDetection) that is handed to the functions and can be modified. While this might be a good idea I do think we can postpone it a bit. jdoerfert: Agreed, but to be fair, we had those reference maps before too. The only alternative that I see…
/// instead allocate some stack memory through which these scalar values are
/// passed. Only a later pass of -mem2reg will then (re)introduce in-register
/// computations.
///
/// To keep track of the memory location(s) used to store the data computed by
/// a given SSA instruction, we use the maps 'ScalarMap' and 'PHIOpMap'. Each
/// maps a given scalar value to a junk of stack allocated memory.
///
/// 'ScalarMap' is used for normal scalar dependences that go from a scalar
/// definition to its use. Such dependences are lowered by directly writing
/// the value an instruction computes into the corresponding chunk of memory
/// and reading it back from this chunk of memory right before every use of
/// this original scalar value. The memory locations in 'ScalarMap' end with
/// '.s2a'.
///
/// 'PHIOpMap' is used to model PHI nodes. For each PHI nodes we introduce,
/// besides the memory in 'ScalarMap', a second chunk of memory into which we
/// write at the end of each basic block preceeding the PHI instruction the
/// value passed through this basic block. At the place where the PHI node is
/// executed, we replace the PHI node with a load from the corresponding
/// memory location in the 'PHIOpMap' table. The memory locations in
/// 'PHIOpMap' end with '.phiops'.
///
/// The ScopArrayInfo objects of accesses that belong to a PHI node may have
/// identical base pointers, even though they refer to two different memory
/// locations, the normal '.s2a' locations and the special '.phiops'
/// locations. For historic reasons we keep such accesses in two maps
/// 'ScalarMap' and 'PHIOpMap', index by the BasePointer. An alternative
/// implemenation, could use a single map that uses the ScopArrayInfo object
/// as index.
///
/// Example:
///
/// Input C Code
/// ============
///
/// S1: x1 = ...
/// for (i=0...N) {
/// S2: x2 = phi(x1, add)
/// S3: add = x2 + 42;
/// }
/// S4: print(x1)
/// print(x2)
/// print(add)
///
///
/// Unmodified IR IR After expansion
/// ============= ==================
///
/// S1: x1 = ... S1: x1 = ...
/// x1.s2a = s1
/// x2.phiops = s1
/// \| \|
/// \| <--<--<--<--< \| <--<--<--<--<
/// \| / \ \| / \ .
/// V V \ V V \ .
/// S2: x2 = phi (x1, add) \| S2: x2 = x2.phiops \|
/// \| x2.s2a = x2 \|
/// \| \|
/// S3: add = x2 + 42 \| S3: add = x2 + 42 \|
/// \| add.s2a = add \|
/// \| x2.phiops = add \|
/// \| \ / \| \ /
/// \| \ / \| \ /
/// \| >-->-->-->--> \| >-->-->-->-->
/// V V
///
/// S4: x1 = x1.s2a
/// S4: ... = x1 ... = x1
/// x2 = x2.s2a
/// ... = x2 ... = x2
/// add = add.s2a
/// ... = add ... = add
///
/// ScalarMap = { x1 -> x1.s2a, x2 -> x2.s2a, add -> add.s2a }
/// PHIOpMap = { x2 -> x2.phiops }
///
///
/// ??? Why does a PHI-node require two memory chunks ???
///
/// One may wonder why a PHI node requires two memory chunks and not just
/// all data is stored in a single location. The following example tries
/// to store all data in .s2a and drops the .phiops location:
///
/// S1: x1 = ...
/// x1.s2a = s1
/// x2.s2a = s1 // use .s2a instead of .phiops
/// \|
/// \| <--<--<--<--<
/// \| / \ .
/// V V \ .
/// S2: x2 = x2.s2a \| // value is same as above, but read
/// \| // from .s2a
/// \|
/// x2.s2a = x2 \| // store into .s2a as normal
/// \|
/// S3: add = x2 + 42 \|
/// add.s2a = add \|
/// x2.s2a = add \| // use s2a instead of .phiops
/// \| \ / // !!! This is wrong, as x2.s2a now
/// \| >-->-->-->--> // contains add instead of x2.
/// V
///
/// S4: x1 = x1.s2a
/// ... = x1
/// x2 = x2.s2a // !!! We now read 'add' instead of
/// ... = x2 // 'x2'
/// add = add.s2a
/// ... = add
///
/// As visible in the example, the SSA value of the PHI node may still be
/// needed _after_ the basic block, which could conceptually branch to the
/// PHI node, has been run and has overwritten the PHI's old value. Hence, a
/// single memory location is not enough to code-generate a PHI node.
///
///{
///
/// @brief Memory locations used for the special PHI node modeling.
ScalarAllocaMapTy &PHIOpMap;

/// @brief Memory locations used to model scalar dependences.
ScalarAllocaMapTy &ScalarMap;
///}

/// @brief Map from instructions to their escape users as well as the alloca.		/// @brief Map from instructions to their escape users as well as the alloca.
EscapeUsersAllocaMapTy &EscapeMap;		EscapeUsersAllocaMapTy &EscapeMap;

/// @brief A map from llvm::Values referenced in the old code to a new set of		/// @brief A map from llvm::Values referenced in the old code to a new set of
/// llvm::Values, which is used to replace these old values during		/// llvm::Values, which is used to replace these old values during
/// code generation.		/// code generation.
ValueMapT &GlobalMap;		ValueMapT &GlobalMap;

		/// @brief Copy interesting mappings from @p BBmap to @p ScalarMap.
		///
		/// @param Stmt The statement to code generate.
		/// @param BB The basic block to code generate.
		/// @param ScalarMap Will be filled with mappings that hold after @p BB.
		/// @param BBMap A mapping from old values to their new values
		/// (for values recalculated within this basic block).
		/// @param LTS A map from old loops to new induction variables as SCEVs.
		///
		/// Not all copied instructions need to be merged (using PHI nodes) at join
		/// points but only those that might be used later on. To this end this
		/// function copies the mapping from @p BBMap to the @p ScalarMap for which a
		/// scalar memory access in @p Stmt exists.
		void generateScalarMappings(ScopStmt &Stmt, BasicBlock *BB,
		ValueMapT &ScalarMap, ValueMapT &BBMap,
		LoopToScevMapT &LTS);

/// @brief Split @p BB to create a new one we can use to clone @p BB in.		/// @brief Split @p BB to create a new one we can use to clone @p BB in.
BasicBlock splitBB(BasicBlock BB);		BasicBlock splitBB(BasicBlock BB);

/// @brief Copy the given basic block.		/// @brief Copy the given basic block.
///		///
/// @param Stmt The statement to code generate.		/// @param Stmt The statement to code generate.
/// @param BB The basic block to code generate.		/// @param BB The basic block to code generate.
/// @param BBMap A mapping from old values to their new values in this		/// @param BBMap A mapping from old values to their new values in this
Show All 19 Lines	protected:
/// SCEVs.		/// SCEVs.
/// @param NewAccesses A map from memory access ids to new ast expressions,		/// @param NewAccesses A map from memory access ids to new ast expressions,
/// which may contain new access expressions for certain		/// which may contain new access expressions for certain
/// memory accesses.		/// memory accesses.
void copyBB(ScopStmt &Stmt, BasicBlock BB, BasicBlock BBCopy,		void copyBB(ScopStmt &Stmt, BasicBlock BB, BasicBlock BBCopy,
ValueMapT &BBMap, LoopToScevMapT &LTS,		ValueMapT &BBMap, LoopToScevMapT &LTS,
isl_id_to_ast_expr *NewAccesses);		isl_id_to_ast_expr *NewAccesses);

/// @brief Return the alloca for @p ScalarBase in @p Map.
///
/// If no alloca was mapped to @p ScalarBase in @p Map a new one is created
/// and named after @p ScalarBase with the suffix @p NameExt.
///
/// @param ScalarBase The demoted scalar value.
/// @param Map The map we should look for a mapped alloca value.
/// @param NameExt The suffix we add to the name of a new created alloca.
///
/// @returns The alloca for @p ScalarBase.
Value getOrCreateAlloca(Value ScalarBase, ScalarAllocaMapTy &Map,
const char *NameExt);

/// @brief Generate reload of scalars demoted to memory and needed by @p Stmt.
///
/// @param Stmt The statement we generate code for.
/// @param BBMap A mapping from old values to their new values in this block.
void generateScalarLoads(ScopStmt &Stmt, ValueMapT &BBMap);

/// @brief Generate the scalar stores for the given statement.
///
/// After the statement @p Stmt was copied all inner-SCoP scalar dependences
/// starting in @p Stmt (hence all scalar write accesses in @p Stmt) need to
/// be demoted to memory.
///
/// @param Stmt The statement we generate code for.
/// @param LTS A mapping from loops virtual canonical induction
/// variable to their new values
/// (for values recalculated in the new ScoP, but not
/// within this basic block)
/// @param BBMap A mapping from old values to their new values in this block.
virtual void generateScalarStores(ScopStmt &Stmt, LoopToScevMapT &LTS,
ValueMapT &BBMap);

/// @brief Handle users of @p Inst outside the SCoP.		/// @brief Handle users of @p Inst outside the SCoP.
///		///
/// @param R The current SCoP region.		/// @param R The current SCoP region.
/// @param Inst The current instruction we check.		/// @param Inst The current instruction we check.
/// @param Address If given it is used as the escape address for @p Inst.		/// @param Address If given it is used as the escape address for @p Inst.
void handleOutsideUsers(const Region &R, Instruction *Inst,		void handleOutsideUsers(const Region &R, Instruction *Inst,
Value *Address = nullptr);		Value *Address = nullptr);

/// @brief Find scalar statements that have outside users.		/// @brief Find scalar statements that have outside users.
///		///
/// We register these scalar values to later update subsequent scalar uses of		/// We register these scalar values to later update subsequent scalar uses of
/// these values to either use the newly computed value from within the scop		/// these values to either use the newly computed value from within the scop
/// (if the scop was executed) or the unchanged original code (if the run-time		/// (if the scop was executed) or the unchanged original code (if the run-time
/// check failed).		/// check failed).
///		///
/// @param S The scop for which to find the outside users.		/// @param S The scop for which to find the outside users.
void findOutsideUsers(Scop &S);		void findOutsideUsers(Scop &S);

/// @brief Initialize the memory of demoted scalars.
///
/// @param S The scop for which to generate the scalar initializers.
void createScalarInitialization(Scop &S);

/// @brief Create exit PHI node merges for PHI nodes with more than two edges		/// @brief Create exit PHI node merges for PHI nodes with more than two edges
/// from inside the scop.		/// from inside the scop.
///		///
/// For scops which have a PHI node in the exit block that has more than two		/// For scops which have a PHI node in the exit block that has more than two
/// incoming edges from inside the scop region, we require some special		/// incoming edges from inside the scop region, we require some special
/// handling to understand which of the possible values will be passed to the		/// handling to understand which of the possible values will be passed to the
/// PHI node from inside the optimized version of the scop. To do so ScopInfo		/// PHI node from inside the optimized version of the scop. To do so ScopInfo
/// models the possible incoming values as write accesses of the ScopStmts.		/// models the possible incoming values as write accesses of the ScopStmts.
///		///
/// This function creates corresponding code to reload the computed outgoing
/// value from the stack slot it has been stored into and to pass it on to the
/// PHI node in the original exit block.
///
/// @param S The scop for which to generate the exiting PHI nodes.		/// @param S The scop for which to generate the exiting PHI nodes.
void createExitPHINodeMerges(Scop &S);		/// @param ScalarMap The scalar mappings that hold after @p S.
		void createExitPHINodeMerges(Scop &S, ValueMapT &ScalarMap);

/// @brief Promote the values of demoted scalars after the SCoP.		/// @brief Merge scalars escaping the SCoP with their original counterpart.
///		///
/// If a scalar value was used outside the SCoP we need to promote the value		/// @param ScalarMap The scalar mappings that hold after @p S.
/// stored in the memory cell allocated for that scalar and combine it with		void createScalarFinalization(Region &R, ValueMapT &ScalarMap);
/// the original value in the non-optimized SCoP.
void createScalarFinalization(Region &R);

/// @brief Try to synthesize a new value		/// @brief Try to synthesize a new value
///		///
/// Given an old value, we try to synthesize it in a new context from its		/// Given an old value, we try to synthesize it in a new context from its
/// original SCEV expression. We start from the original SCEV expression,		/// original SCEV expression. We start from the original SCEV expression,
/// then replace outdated parameter and loop references, and finally		/// then replace outdated parameter and loop references, and finally
/// expand it to code that computes this updated expression.		/// expand it to code that computes this updated expression.
///		///
Show All 34 Lines	protected:
/// @param L The loop that surrounded the instruction that referenced		/// @param L The loop that surrounded the instruction that referenced
/// this value in the original code. This loop is used to		/// this value in the original code. This loop is used to
/// evaluate the scalar evolution at the right scope.		/// evaluate the scalar evolution at the right scope.
///		///
/// @returns o The old value, if it is still valid.		/// @returns o The old value, if it is still valid.
/// o The new value, if available.		/// o The new value, if available.
/// o NULL, if no value is found.		/// o NULL, if no value is found.
Value getNewValue(ScopStmt &Stmt, Value Old, ValueMapT &BBMap,		Value getNewValue(ScopStmt &Stmt, Value Old, ValueMapT &BBMap,
LoopToScevMapT &LTS, Loop *L) const;		LoopToScevMapT &LTS, Loop *L);

void copyInstScalar(ScopStmt &Stmt, Instruction *Inst, ValueMapT &BBMap,		void copyInstScalar(ScopStmt &Stmt, Instruction *Inst, ValueMapT &BBMap,
LoopToScevMapT &LTS);		LoopToScevMapT &LTS);

/// @brief Get the innermost loop that surrounds an instruction.		/// @brief Get the innermost loop that surrounds an instruction.
///		///
/// @param Inst The instruction for which we get the loop.		/// @param Inst The instruction for which we get the loop.
/// @return The innermost loop that surrounds the instruction.		/// @return The innermost loop that surrounds the instruction.
Show All 17 Lines	protected:

/// @param NewAccesses A map from memory access ids to new ast expressions,		/// @param NewAccesses A map from memory access ids to new ast expressions,
/// which may contain new access expressions for certain		/// which may contain new access expressions for certain
/// memory accesses.		/// memory accesses.
void generateScalarStore(ScopStmt &Stmt, StoreInst *store, ValueMapT &BBMap,		void generateScalarStore(ScopStmt &Stmt, StoreInst *store, ValueMapT &BBMap,
LoopToScevMapT &LTS,		LoopToScevMapT &LTS,
isl_id_to_ast_expr *NewAccesses);		isl_id_to_ast_expr *NewAccesses);

		/// @brief Create a loop carried PHI for the value @p V.
		///
		/// @param V A (possibly) loop carried value.
		sebpopUnsubmitted Not Done Reply Inline Actions What is a loop carried value? sebpop: What is a loop carried value?
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Any scalar value that is defined and used in different iterations of the same loop. Prior to the transformation this means it is used by a PHI, however after the transformation it might also refere to a scalar that is used in a loop textually before it was defined. An example would be the folloing original code: for (i = 0...N) a = A[i]; A[i+1] = a; with the following "optimized" version: for (i = 0...N+1) if (i > 0) A[i] = a; if (i < N + 1) a = A[i]; Here the former non-loop carried scalar a is now loop-carried (thus needs a PHI). jdoerfert: Any scalar value that is defined and used in different iterations of the same loop. Prior to…
		MeinersburUnsubmitted Not Done Reply Inline Actions Referring to "(possibly)": What is the function supposed to do if the value is not loop-carried (in what loop)? Meinersbur: Referring to "(possibly)": What is the function supposed to do if the value is not loop-carried…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Good question. The (possibly) hints on the fact that one does not know if a value will be loop carried or not but if it is one needs the PHI. To this end we can generate the PHI even though we might not need it. jdoerfert: Good question. The (possibly) hints on the fact that one does not know if a value will be loop…
		/// @param BBMap A mapping from old values to their new values
		/// (for values recalculated within this basic block).
		///
		/// @returns The new loop carried PHI for @p V.
		Value createLoopCarriedPHI(Value V, ValueMapT &BBMap);
		sebpopUnsubmitted Not Done Reply Inline Actions Could we call these "loop phi" nodes instead of "loop carried phi" nodes? There is possibly confusion in reading the abbreviated LCPHI: the existing convention is to read it Loop Closed Phi, as in LCSSA. sebpop: Could we call these "loop phi" nodes instead of "loop carried phi" nodes? There is possibly…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Sure. jdoerfert: Sure.
		MeinersburUnsubmitted Not Done Reply Inline Actions Idea: Maybe define in some comment what exactly a "loop-carried phi" is? Eg. a PHI in a loop header. Meinersbur: Idea: Maybe define in some comment what exactly a "loop-carried phi" is? Eg. a PHI in a loop…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Can do. jdoerfert: Can do.

/// @brief Copy a single PHI instruction.		/// @brief Copy a single PHI instruction.
///		///
/// The implementation in the BlockGenerator is trivial, however it allows		/// @param Stmt The statement to code generate.
/// subclasses to handle PHIs different.		/// @param PHI The PHI that should be copied.
		/// @param BBMap A mapping from old values to their new values
		/// @param LTS A map from old loops to new induction variables as SCEVs.
///		///
/// @returns The nullptr as the BlockGenerator does not copy PHIs.		/// @returns A nullptr as the BlockGenerator does not copy PHIs (in-place).
virtual Value copyPHIInstruction(ScopStmt &, PHINode , ValueMapT &,		virtual Value copyPHIInstruction(ScopStmt &Stmt, PHINode PHI,
LoopToScevMapT &) {		ValueMapT &BBMap, LoopToScevMapT &LTS);
return nullptr;
}

/// @brief Copy a single Instruction.		/// @brief Copy a single Instruction.
///		///
/// This copies a single Instruction and updates references to old values		/// This copies a single Instruction and updates references to old values
/// with references to new values, as defined by GlobalMap and BBMap.		/// with references to new values, as defined by GlobalMap and BBMap.
///		///
/// @param Stmt The statement to code generate.		/// @param Stmt The statement to code generate.
/// @param Inst The instruction to copy.		/// @param Inst The instruction to copy.
/// @param BBMap A mapping from old values to their new values		/// @param BBMap A mapping from old values to their new values
/// (for values recalculated within this basic block).		/// (for values recalculated within this basic block).
/// @param GlobalMap A mapping from old values to their new values		/// @param GlobalMap A mapping from old values to their new values
/// (for values recalculated in the new ScoP, but not		/// (for values recalculated in the new ScoP, but not
/// within this basic block).		/// within this basic block).
/// @param LTS A mapping from loops virtual canonical induction		/// @param LTS A mapping from loops virtual canonical induction
/// variable to their new values		/// variable to their new values
/// (for values recalculated in the new ScoP, but not		/// (for values recalculated in the new ScoP, but not
/// within this basic block).		/// within this basic block).
/// @param NewAccesses A map from memory access ids to new ast expressions,		/// @param NewAccesses A map from memory access ids to new ast expressions,
/// which may contain new access expressions for certain		/// which may contain new access expressions for certain
/// memory accesses.		/// memory accesses.
		/// @param ForceCopy Flag to indicate a copy is supposed to be made (if
		/// possible). If this flag is not set synthezisable
		/// instructions will not be copied as they are generated
		/// on demand.
void copyInstruction(ScopStmt &Stmt, Instruction *Inst, ValueMapT &BBMap,		void copyInstruction(ScopStmt &Stmt, Instruction *Inst, ValueMapT &BBMap,
LoopToScevMapT &LTS, isl_id_to_ast_expr *NewAccesses);		LoopToScevMapT &LTS, isl_id_to_ast_expr *NewAccesses,
		bool ForceCopy = false);

/// @brief Helper to get the newest version of @p ScalarValue.		/// @brief Helper to determine if @p Inst can be synthezised in @p Stmt.
///		///
/// @param ScalarValue The original value needed.		/// @param ScalarValue The original value needed.
/// @param R The current SCoP region.		/// @param R The current SCoP region.
/// @param Stmt The ScopStmt in which we look up this value.		/// @param Stmt The ScopStmt in which we look up this value.
/// @param LTS A mapping from loops virtual canonical induction		/// @param LTS A mapping from loops virtual canonical induction
/// variable to their new values		/// variable to their new values
/// (for values recalculated in the new ScoP, but not		/// (for values recalculated in the new ScoP, but not
/// within this basic block)		/// within this basic block)
Show All 31 Lines	public:
/// within this basic block), one for each lane.		/// within this basic block), one for each lane.
/// @param Schedule A map from the statement to a schedule where the		/// @param Schedule A map from the statement to a schedule where the
/// innermost dimension is the dimension of the innermost		/// innermost dimension is the dimension of the innermost
/// loop containing the statemenet.		/// loop containing the statemenet.
/// @param NewAccesses A map from memory access ids to new ast expressions,		/// @param NewAccesses A map from memory access ids to new ast expressions,
/// which may contain new access expressions for certain		/// which may contain new access expressions for certain
/// memory accesses.		/// memory accesses.
static void generate(BlockGenerator &BlockGen, ScopStmt &Stmt,		static void generate(BlockGenerator &BlockGen, ScopStmt &Stmt,
std::vector<LoopToScevMapT> &VLTS,		ValueMapT &ScalarMap, std::vector<LoopToScevMapT> &VLTS,
__isl_keep isl_map *Schedule,		__isl_keep isl_map *Schedule,
__isl_keep isl_id_to_ast_expr *NewAccesses) {		__isl_keep isl_id_to_ast_expr *NewAccesses) {
VectorBlockGenerator Generator(BlockGen, VLTS, Schedule);		VectorBlockGenerator Generator(BlockGen, VLTS, Schedule);
Generator.copyStmt(Stmt, NewAccesses);		Generator.copyStmt(Stmt, ScalarMap, NewAccesses);
}		}

private:		private:
// This is a vector of loop->scev maps. The first map is used for the first		// This is a vector of loop->scev maps. The first map is used for the first
// vector lane, ...		// vector lane, ...
// Each map, contains information about Instructions in the old ScoP, which		// Each map, contains information about Instructions in the old ScoP, which
// are recalculated in the new SCoP. When copying the basic block, we replace		// are recalculated in the new SCoP. When copying the basic block, we replace
// all referenes to the old instructions with their recalculated values.		// all referenes to the old instructions with their recalculated values.
▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	void copyInstScalarized(ScopStmt &Stmt, Instruction *Inst,
ValueMapT &VectorMap, VectorValueMapT &ScalarMaps,		ValueMapT &VectorMap, VectorValueMapT &ScalarMaps,
__isl_keep isl_id_to_ast_expr *NewAccesses);		__isl_keep isl_id_to_ast_expr *NewAccesses);

bool extractScalarValues(const Instruction *Inst, ValueMapT &VectorMap,		bool extractScalarValues(const Instruction *Inst, ValueMapT &VectorMap,
VectorValueMapT &ScalarMaps);		VectorValueMapT &ScalarMaps);

bool hasVectorOperands(const Instruction *Inst, ValueMapT &VectorMap);		bool hasVectorOperands(const Instruction *Inst, ValueMapT &VectorMap);

/// @brief Generate vector loads for scalars.
///
/// @param Stmt The scop statement for which to generate the loads.
/// @param VectorBlockMap A map that will be updated to relate the original
/// values with the newly generated vector loads.
void generateScalarVectorLoads(ScopStmt &Stmt, ValueMapT &VectorBlockMap);

/// @brief Verify absence of scalar stores.
///
/// @param Stmt The scop statement to check for scalar stores.
void verifyNoScalarStores(ScopStmt &Stmt);

/// @param NewAccesses A map from memory access ids to new ast expressions,		/// @param NewAccesses A map from memory access ids to new ast expressions,
/// which may contain new access expressions for certain		/// which may contain new access expressions for certain
/// memory accesses.		/// memory accesses.
void copyInstruction(ScopStmt &Stmt, Instruction *Inst, ValueMapT &VectorMap,		void copyInstruction(ScopStmt &Stmt, Instruction *Inst, ValueMapT &VectorMap,
VectorValueMapT &ScalarMaps,		VectorValueMapT &ScalarMaps,
__isl_keep isl_id_to_ast_expr *NewAccesses);		__isl_keep isl_id_to_ast_expr *NewAccesses);

/// @param NewAccesses A map from memory access ids to new ast expressions,		/// @param NewAccesses A map from memory access ids to new ast expressions,
/// which may contain new access expressions for certain		/// which may contain new access expressions for certain
/// memory accesses.		/// memory accesses.
void copyStmt(ScopStmt &Stmt, __isl_keep isl_id_to_ast_expr *NewAccesses);		void copyStmt(ScopStmt &Stmt, ValueMapT &ScalarMap,
		__isl_keep isl_id_to_ast_expr *NewAccesses);
};		};

/// @brief Generator for new versions of polyhedral region statements.		/// @brief Generator for new versions of polyhedral region statements.
class RegionGenerator : public BlockGenerator {		class RegionGenerator : public BlockGenerator {
public:		public:
/// @brief Create a generator for regions.		/// @brief Create a generator for regions.
///		///
/// @param BlockGen A generator for basic blocks.		/// @param BlockGen A generator for basic blocks.
RegionGenerator(BlockGenerator &BlockGen) : BlockGenerator(BlockGen) {}		RegionGenerator(BlockGenerator &BlockGen) : BlockGenerator(BlockGen) {}

virtual ~RegionGenerator(){};		virtual ~RegionGenerator(){};

/// @brief Copy the region statement @p Stmt.		/// @brief Copy the region statement @p Stmt.
///		///
/// This copies the entire region represented by @p Stmt and updates		/// This copies the entire region represented by @p Stmt and updates
/// references to old values with references to new values, as defined by		/// references to old values with references to new values, as defined by
/// GlobalMap.		/// GlobalMap.
///		///
/// @param Stmt The statement to code generate.		/// @param Stmt The statement to code generate.
		/// @param ScalarMap The scalar mappings that hold when @p Stmt is entered.
/// @param LTS A map from old loops to new induction variables as SCEVs.		/// @param LTS A map from old loops to new induction variables as SCEVs.
void copyStmt(ScopStmt &Stmt, LoopToScevMapT &LTS,		void copyStmt(ScopStmt &Stmt, ValueMapT &ScalarMap, LoopToScevMapT &LTS,
__isl_keep isl_id_to_ast_expr *IdToAstExp);		__isl_keep isl_id_to_ast_expr *IdToAstExp);

private:		private:
/// @brief A map from old to new blocks in the region.		/// @brief A map from old to new blocks in the region.
DenseMap<BasicBlock , BasicBlock > BlockMap;		DenseMap<BasicBlock , BasicBlock > BlockMap;

/// @brief The "BBMaps" for the whole region (one for each block).		/// @brief The "BBMaps" for the whole region (one for each block).
DenseMap<BasicBlock *, ValueMapT> RegionMaps;		DenseMap<BasicBlock *, ValueMapT> RegionMaps;
		DenseMap<BasicBlock *, ValueMapT> ScalarMaps;
		MeinersburUnsubmitted Not Done Reply Inline Actions ScalarMaps should have its comment. AFAIK Doxygen treats this as uncommented field. Meinersbur: ScalarMaps should have its comment. AFAIK Doxygen treats this as uncommented field.
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions This is true, thanks for the catch. jdoerfert: This is true, thanks for the catch.

/// @brief Mapping to remember PHI nodes that still need incoming values.		/// @brief Mapping to remember PHI nodes that still need incoming values.
using PHINodePairTy = std::pair<const PHINode , PHINode >;		using PHINodePairTy = std::pair<const PHINode , PHINode >;
DenseMap<BasicBlock *, SmallVector<PHINodePairTy, 4>> IncompletePHINodeMap;		DenseMap<BasicBlock *, SmallVector<PHINodePairTy, 4>> IncompletePHINodeMap;

		/// @brief Create merge PHI nodes in @p BB for predecessors inside @p R.
		///
		/// @returns The PHI mappings are stored in @p MergeScalarMap.
		void createMergePHIs(BasicBlock *BB, Region &R, ValueMapT &MergeScalarMap);

/// @brief Repair the dominance tree after we created a copy block for @p BB.		/// @brief Repair the dominance tree after we created a copy block for @p BB.
///		///
/// @returns The immediate dominator in the DT for @p BBCopy if in the region.		/// @returns The immediate dominator in the DT for @p BBCopy if in the region.
BasicBlock repairDominance(BasicBlock BB, BasicBlock *BBCopy);		BasicBlock repairDominance(BasicBlock BB, BasicBlock *BBCopy);

/// @brief Add the new operand from the copy of @p IncomingBB to @p PHICopy.		/// @brief Add the new operand from the copy of @p IncomingBB to @p PHICopy.
///		///
/// @param Stmt The statement to code generate.		/// @param Stmt The statement to code generate.
/// @param PHI The original PHI we copy.		/// @param PHI The original PHI we copy.
/// @param PHICopy The copy of @p PHI.		/// @param PHICopy The copy of @p PHI.
/// @param IncomingBB An incoming block of @p PHI.		/// @param IncomingBB An incoming block of @p PHI.
		/// @param BBMap A mapping from old values to their new values
/// @param LTS A map from old loops to new induction variables as		/// @param LTS A map from old loops to new induction variables as
/// SCEVs.		/// SCEVs.
void addOperandToPHI(ScopStmt &Stmt, const PHINode PHI, PHINode PHICopy,		void addOperandToPHI(ScopStmt &Stmt, const PHINode PHI, PHINode PHICopy,
BasicBlock *IncomingBB, LoopToScevMapT &LTS);		BasicBlock *IncomingBB, ValueMapT &BBMap,
		LoopToScevMapT &LTS);
/// @brief Generate the scalar stores for the given statement.
///
/// After the statement @p Stmt was copied all inner-SCoP scalar dependences
/// starting in @p Stmt (hence all scalar write accesses in @p Stmt) need to
/// be demoted to memory.
///
/// @param Stmt The statement we generate code for.
/// @param LTS A mapping from loops virtual canonical induction variable to
/// their new values (for values recalculated in the new ScoP,
/// but not within this basic block)
/// @param BBMap A mapping from old values to their new values in this block.
virtual void generateScalarStores(ScopStmt &Stmt, LoopToScevMapT &LTS,
ValueMapT &BBMAp) override;

/// @brief Copy a single PHI instruction.		/// @brief Copy a single PHI instruction.
///		///
/// This copies a single PHI instruction and updates references to old values		/// This copies a single PHI instruction and updates references to old values
/// with references to new values, as defined by GlobalMap and BBMap.		/// with references to new values, as defined by GlobalMap and BBMap.
///		///
/// @param Stmt The statement to code generate.		/// @param Stmt The statement to code generate.
/// @param PHI The PHI instruction to copy.		/// @param PHI The PHI instruction to copy.
Show All 11 Lines

include/polly/CodeGen/IslNodeBuilder.h

	Show All 26 Lines
	struct isl_ast_build;			struct isl_ast_build;
	struct isl_union_map;			struct isl_union_map;

	class IslNodeBuilder {			class IslNodeBuilder {
	public:			public:
	IslNodeBuilder(PollyIRBuilder &Builder, ScopAnnotator &Annotator, Pass *P,			IslNodeBuilder(PollyIRBuilder &Builder, ScopAnnotator &Annotator, Pass *P,
	const DataLayout &DL, LoopInfo &LI, ScalarEvolution &SE,			const DataLayout &DL, LoopInfo &LI, ScalarEvolution &SE,
	DominatorTree &DT, Scop &S)			DominatorTree &DT, Scop &S)
	: S(S), Builder(Builder), Annotator(Annotator),			: S(S), Builder(Builder), Annotator(Annotator), LoopDepth(0),
	ExprBuilder(S, Builder, IDToValue, ValueMap, DL, SE, DT, LI),			ExprBuilder(S, Builder, IDToValue, ValueMap, DL, SE, DT, LI),
	BlockGen(Builder, LI, SE, DT, ScalarMap, PHIOpMap, EscapeMap, ValueMap,			BlockGen(Builder, LI, SE, DT, LCPHIs, LoopDepth, EscapeMap, ValueMap,
	&ExprBuilder),			&ExprBuilder),
	RegionGen(BlockGen), P(P), DL(DL), LI(LI), SE(SE), DT(DT) {}			RegionGen(BlockGen), P(P), DL(DL), LI(LI), SE(SE), DT(DT) {}

	virtual ~IslNodeBuilder() {}			virtual ~IslNodeBuilder() {}

	void addParameters(__isl_take isl_set *Context);			void addParameters(__isl_take isl_set *Context);
	void create(__isl_take isl_ast_node *Node);			void create(__isl_take isl_ast_node *Node);

	/// @brief Preload all memory loads that are invariant.			/// @brief Preload all memory loads that are invariant.
	bool preloadInvariantLoads();			bool preloadInvariantLoads();

				/// @brief Map PHIs in the SCoP entry block to their initial values.
				void createPHIInitialization();

	/// @brief Finalize code generation for the SCoP @p S.			/// @brief Finalize code generation for the SCoP @p S.
	///			///
	/// @see BlockGenerator::finalizeSCoP(Scop &S)			/// @see BlockGenerator::finalizeSCoP(Scop &S)
	void finalizeSCoP(Scop &S) { BlockGen.finalizeSCoP(S); }			void finalizeSCoP(Scop &S) { BlockGen.finalizeSCoP(S, ScalarMap); }

	IslExprBuilder &getExprBuilder() { return ExprBuilder; }			IslExprBuilder &getExprBuilder() { return ExprBuilder; }

	/// @brief Get the associated block generator.
	///
	/// @return A referecne to the associated block generator.
	BlockGenerator &getBlockGenerator() { return BlockGen; }

	protected:			protected:
	Scop &S;			Scop &S;
	PollyIRBuilder &Builder;			PollyIRBuilder &Builder;
	ScopAnnotator &Annotator;			ScopAnnotator &Annotator;

	IslExprBuilder ExprBuilder;			/// @brief Maps used to resolve inter-block scalar uses.

	/// @brief Maps used by the block and region generator to demote scalars.
	///			///
	///@{			///@{

	/// @brief See BlockGenerator::ScalarMap.			/// @brief Map from loop carried PHI nodes to "floating" copies.
	BlockGenerator::ScalarAllocaMapTy ScalarMap;			ValueMapT LCPHIs;
				MeinersburUnsubmitted Not Done Reply Inline Actions Can you explain "floating"? As Sebastian also noted, on seeing "LCPHI" I was first thinking about the PHIs for LCSSA. Meinersbur: Can you explain "floating"? As Sebastian also noted, on seeing "LCPHI" I was first thinking…
				jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions I can change the name here too and floating means not placed in a basic block [can be added to the comment]. jdoerfert: I can change the name here too and floating means not placed in a basic block [can be added to…

	/// @brief See BlockGenerator::PhiOpMap.			/// @brief Currently valid scalar mappings.
				MeinersburUnsubmitted Not Done Reply Inline Actions Could you explain a bit more about "ScalarMaps"? What is it mapping from/to? How/where is it used? What are valid keys? Meinersbur: Could you explain a bit more about "ScalarMaps"? What is it mapping from/to? How/where is it…
				jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Sure jdoerfert: Sure
	BlockGenerator::ScalarAllocaMapTy PHIOpMap;			ValueMapT ScalarMap;

	/// @brief See BlockGenerator::EscapeMap.			/// @brief See BlockGenerator::EscapeMap.
	BlockGenerator::EscapeUsersAllocaMapTy EscapeMap;			BlockGenerator::EscapeUsersAllocaMapTy EscapeMap;

	///@}			///@}

				/// @brief The current loop depth.
				int LoopDepth;

				IslExprBuilder ExprBuilder;
				MeinersburUnsubmitted Not Done Reply Inline Actions Why moving this definition? Meinersbur: Why moving this definition?
				jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Beacuse the constructor/initilizer code looks nicer when we order the members this way. I do not care to much about it and if you feel we should not move this I can undo it. jdoerfert: Beacuse the constructor/initilizer code looks nicer when we order the members this way. I do…

	/// @brief The generator used to copy a basic block.			/// @brief The generator used to copy a basic block.
	BlockGenerator BlockGen;			BlockGenerator BlockGen;

	/// @brief The generator used to copy a non-affine region.			/// @brief The generator used to copy a non-affine region.
	RegionGenerator RegionGen;			RegionGenerator RegionGen;

	Pass *const P;			Pass *const P;
	const DataLayout &DL;			const DataLayout &DL;
	▲ Show 20 Lines • Show All 238 Lines • Show Last 20 Lines

include/polly/Support/ScopHelper.h

	//===------ Support/ScopHelper.h -- Some Helper Functions for Scop. -------===//			//===------ Support/ScopHelper.h -- Some Helper Functions for Scop. -------===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// Small functions that help with LLVM-IR.			// Small functions that help with LLVM-IR.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef POLLY_SUPPORT_IRHELPER_H			#ifndef POLLY_SUPPORT_IRHELPER_H
	#define POLLY_SUPPORT_IRHELPER_H			#define POLLY_SUPPORT_IRHELPER_H

				#include "polly/CodeGen/IRBuilder.h"

	#include "llvm/ADT/DenseMap.h"			#include "llvm/ADT/DenseMap.h"
	#include "llvm/ADT/SetVector.h"			#include "llvm/ADT/SetVector.h"
	#include "llvm/IR/ValueHandle.h"			#include "llvm/IR/ValueHandle.h"

	namespace llvm {			namespace llvm {
	class Type;			class Type;
	class Instruction;			class Instruction;
	class LoadInst;			class LoadInst;
	▲ Show 20 Lines • Show All 134 Lines • ▼ Show 20 Lines
	/// @param LI The LoopInfo analysis.			/// @param LI The LoopInfo analysis.
	/// @param SE The scalar evolution database.			/// @param SE The scalar evolution database.
	/// @param R The region out of which SSA names are parameters.			/// @param R The region out of which SSA names are parameters.
	/// @return If the instruction I can be regenerated from its			/// @return If the instruction I can be regenerated from its
	/// scalar evolution representation, return true,			/// scalar evolution representation, return true,
	/// otherwise return false.			/// otherwise return false.
	bool canSynthesize(const llvm::Value V, const llvm::LoopInfo LI,			bool canSynthesize(const llvm::Value V, const llvm::LoopInfo LI,
	llvm::ScalarEvolution SE, const llvm::Region R);			llvm::ScalarEvolution SE, const llvm::Region R);

				/// @brief Use @p Builder to create merge PHIs.
				MeinersburUnsubmitted Not Done Reply Inline Actions This would be an excellent place to explain how a "merge PHI" is different from other PHIs. Meinersbur: This would be an excellent place to explain how a "merge PHI" is different from other PHIs.
				jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions I am not sure what you mean by other PHIs and therefor not what exactly you want to see here but I agree, I can add some comment here. jdoerfert: I am not sure what you mean by other PHIs and therefor not what exactly you want to see here…
				///
				/// @param Builder The IRBuilder to create the PHIs.
				/// @param BBs The incoming blocks (ordered!).
				/// @param Maps The scalar mappings in the incoming blocks (ordered!).
				///
				/// @returns The merged mappings in @p MergeMap.
				MeinersburUnsubmitted Not Done Reply Inline Actions This function returns void Meinersbur: This function returns void
				jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions I know, therefor the comment in the @returns clause. jdoerfert: I know, therefor the comment in the @returns clause.
				MeinersburUnsubmitted Not Done Reply Inline Actions `@return`(s) is intended to document the function's return value, but this function doesn't return anything. MergeMap should be documented using eg. /// @param MergeMap Receives the merged mappings in @p MergeMap. Meinersbur: `@return`(s) is intended to document the function's return value, but this function doesn't…
				void createMergePHIs(PollyIRBuilder &Builder,
				const llvm::ArrayRef<llvm::BasicBlock *> &BBs,
				const llvm::ArrayRef<ValueMapT *> &Maps,
				ValueMapT &MergeMap);
	}			}
	#endif			#endif

lib/Analysis/ScopDetection.cpp

Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	cl::desc("The minimal number of per-loop instructions before a single loop "
"region is considered profitable"),		"region is considered profitable"),
cl::Hidden, cl::ValueRequired, cl::init(100000000), cl::cat(PollyCategory));		cl::Hidden, cl::ValueRequired, cl::init(100000000), cl::cat(PollyCategory));

bool polly::PollyProcessUnprofitable;		bool polly::PollyProcessUnprofitable;
static cl::opt<bool, true> XPollyProcessUnprofitable(		static cl::opt<bool, true> XPollyProcessUnprofitable(
"polly-process-unprofitable",		"polly-process-unprofitable",
cl::desc(		cl::desc(
"Process scops that are unlikely to benefit from Polly optimizations."),		"Process scops that are unlikely to benefit from Polly optimizations."),
cl::location(PollyProcessUnprofitable), cl::init(false), cl::ZeroOrMore,		cl::location(PollyProcessUnprofitable), cl::init(true), cl::ZeroOrMore,
		MeinersburUnsubmitted Not Done Reply Inline Actions I assume this is a leftover from debugging. Meinersbur: I assume this is a leftover from debugging.
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Indeed. jdoerfert: Indeed.
cl::cat(PollyCategory));		cl::cat(PollyCategory));

static cl::opt<std::string> OnlyFunction(		static cl::opt<std::string> OnlyFunction(
"polly-only-func",		"polly-only-func",
cl::desc("Only run on functions that contain a certain string"),		cl::desc("Only run on functions that contain a certain string"),
cl::value_desc("string"), cl::ValueRequired, cl::init(""),		cl::value_desc("string"), cl::ValueRequired, cl::init(""),
cl::cat(PollyCategory));		cl::cat(PollyCategory));

▲ Show 20 Lines • Show All 1,363 Lines • Show Last 20 Lines

lib/CodeGen/BlockGenerators.cpp

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines

static cl::opt<bool> DebugPrinting(		static cl::opt<bool> DebugPrinting(
"polly-codegen-add-debug-printing",		"polly-codegen-add-debug-printing",
cl::desc("Add printf calls that show the values loaded/stored."),		cl::desc("Add printf calls that show the values loaded/stored."),
cl::Hidden, cl::init(false), cl::ZeroOrMore, cl::cat(PollyCategory));		cl::Hidden, cl::init(false), cl::ZeroOrMore, cl::cat(PollyCategory));

BlockGenerator::BlockGenerator(PollyIRBuilder &B, LoopInfo &LI,		BlockGenerator::BlockGenerator(PollyIRBuilder &B, LoopInfo &LI,
ScalarEvolution &SE, DominatorTree &DT,		ScalarEvolution &SE, DominatorTree &DT,
ScalarAllocaMapTy &ScalarMap,		ValueMapT &LCPHIs, int &LoopDepth,
ScalarAllocaMapTy &PHIOpMap,
EscapeUsersAllocaMapTy &EscapeMap,		EscapeUsersAllocaMapTy &EscapeMap,
ValueMapT &GlobalMap,		ValueMapT &GlobalMap,
IslExprBuilder *ExprBuilder)		IslExprBuilder *ExprBuilder)
: Builder(B), LI(LI), SE(SE), ExprBuilder(ExprBuilder), DT(DT),		: Builder(B), LI(LI), SE(SE), ExprBuilder(ExprBuilder), DT(DT),
EntryBB(nullptr), PHIOpMap(PHIOpMap), ScalarMap(ScalarMap),		EntryBB(nullptr), LCPHIs(LCPHIs), LoopDepth(LoopDepth),
EscapeMap(EscapeMap), GlobalMap(GlobalMap) {}		EscapeMap(EscapeMap), GlobalMap(GlobalMap) {}

Value BlockGenerator::trySynthesizeNewValue(ScopStmt &Stmt, Value Old,		Value BlockGenerator::trySynthesizeNewValue(ScopStmt &Stmt, Value Old,
ValueMapT &BBMap,		ValueMapT &BBMap,
LoopToScevMapT &LTS,		LoopToScevMapT &LTS,
Loop *L) const {		Loop *L) const {
if (!SE.isSCEVable(Old->getType()))		if (!SE.isSCEVable(Old->getType()))
return nullptr;		return nullptr;

const SCEV *Scev = SE.getSCEVAtScope(Old, L);		const SCEV *Scev = SE.getSCEVAtScope(Old, L);
if (!Scev)		if (!Scev)
return nullptr;		return nullptr;

		auto &R = Stmt.getParent()->getRegion();
		SetVector<Value *> Values;
		findValues(Scev, Values);
		for (auto *V : Values)
		if (auto *I = dyn_cast<Instruction>(V))
		if (R.contains(I))
		return nullptr;

if (isa<SCEVCouldNotCompute>(Scev))		if (isa<SCEVCouldNotCompute>(Scev))
return nullptr;		return nullptr;

const SCEV *NewScev = apply(Scev, LTS, SE);		const SCEV *NewScev = apply(Scev, LTS, SE);
ValueMapT VTV;		ValueMapT VTV;
VTV.insert(BBMap.begin(), BBMap.end());		VTV.insert(BBMap.begin(), BBMap.end());
VTV.insert(GlobalMap.begin(), GlobalMap.end());		VTV.insert(GlobalMap.begin(), GlobalMap.end());

Scop &S = *Stmt.getParent();		Scop &S = *Stmt.getParent();
const DataLayout &DL =		const DataLayout &DL =
S.getRegion().getEntry()->getParent()->getParent()->getDataLayout();		S.getRegion().getEntry()->getParent()->getParent()->getDataLayout();
auto IP = Builder.GetInsertPoint();		auto IP = Builder.GetInsertPoint();

assert(IP != Builder.GetInsertBlock()->end() &&		assert(IP != Builder.GetInsertBlock()->end() &&
"Only instructions can be insert points for SCEVExpander");		"Only instructions can be insert points for SCEVExpander");
Value *Expanded =		Value *Expanded =
expandCodeFor(S, SE, DL, "polly", NewScev, Old->getType(), &*IP, &VTV);		expandCodeFor(S, SE, DL, "polly", NewScev, Old->getType(), &*IP, &VTV);

BBMap[Old] = Expanded;		BBMap[Old] = Expanded;
return Expanded;		return Expanded;
}		}

Value BlockGenerator::getNewValue(ScopStmt &Stmt, Value Old, ValueMapT &BBMap,		Value BlockGenerator::getNewValue(ScopStmt &Stmt, Value Old, ValueMapT &BBMap,
LoopToScevMapT &LTS, Loop *L) const {		LoopToScevMapT &LTS, Loop *L) {
// Constants that do not reference any named value can always remain		// Constants that do not reference any named value can always remain
// unchanged. Handle them early to avoid expensive map lookups. We do not take		// unchanged. Handle them early to avoid expensive map lookups. We do not take
// the fast-path for external constants which are referenced through globals		// the fast-path for external constants which are referenced through globals
// as these may need to be rewritten when distributing code accross different		// as these may need to be rewritten when distributing code accross different
// LLVM modules.		// LLVM modules.
if (isa<Constant>(Old) && !isa<GlobalValue>(Old))		if (isa<Constant>(Old) && !isa<GlobalValue>(Old))
return Old;		return Old;

Show All 16 Lines	Value BlockGenerator::getNewValue(ScopStmt &Stmt, Value Old, ValueMapT &BBMap,

if (Value *New = trySynthesizeNewValue(Stmt, Old, BBMap, LTS, L))		if (Value *New = trySynthesizeNewValue(Stmt, Old, BBMap, LTS, L))
return New;		return New;

// A scop-constant value defined by a global or a function parameter.		// A scop-constant value defined by a global or a function parameter.
if (isa<GlobalValue>(Old) \|\| isa<Argument>(Old))		if (isa<GlobalValue>(Old) \|\| isa<Argument>(Old))
return Old;		return Old;

// A scop-constant value defined by an instruction executed outside the scop.		// An instruction defined inside the scop.
		sebpopUnsubmitted Not Done Reply Inline Actions This comment is confusing: either remove it, or move it one line down and fix it to say "outside the scop". sebpop: This comment is confusing: either remove it, or move it one line down and fix it to say…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions I'll repair this comment. jdoerfert: I'll repair this comment.
if (const Instruction *Inst = dyn_cast<Instruction>(Old))		if (const Instruction *Inst = dyn_cast<Instruction>(Old)) {
if (!Stmt.getParent()->getRegion().contains(Inst->getParent()))		if (!Stmt.getParent()->getRegion().contains(Inst->getParent()))
return Old;		return Old;

		// If the value is a scalar read that has not yet been mapped to a new value
		// we either (1) generate dead code that is not constraint by dependences or
		// (2) we reversed the textual order of a scalar dependence in a loop. In
		// the first case we can safely use undef, in the second we introduce a loop
		// carried PHI node that will be later rewired correctly.
		for (auto *MA : Stmt) {
		if (MA->getAccessValue() != Old)
		continue;

		if (LoopDepth == 0)
		return UndefValue::get(Old->getType());

		return createLoopCarriedPHI(Old, BBMap);
		}
		}

// The scalar dependence is neither available nor SCEVCodegenable.		// The scalar dependence is neither available nor SCEVCodegenable.
llvm_unreachable("Unexpected scalar dependence in region!");		llvm_unreachable("Unexpected scalar dependence in region!");
return nullptr;		return nullptr;
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Debug left-over jdoerfert: Debug left-over
}		}

void BlockGenerator::copyInstScalar(ScopStmt &Stmt, Instruction *Inst,		void BlockGenerator::copyInstScalar(ScopStmt &Stmt, Instruction *Inst,
ValueMapT &BBMap, LoopToScevMapT &LTS) {		ValueMapT &BBMap, LoopToScevMapT &LTS) {
// We do not generate debug intrinsics as we did not investigate how to		// We do not generate debug intrinsics as we did not investigate how to
// copy them correctly. At the current state, they just crash the code		// copy them correctly. At the current state, they just crash the code
// generation as the meta-data operands are not correctly copied.		// generation as the meta-data operands are not correctly copied.
if (isa<DbgInfoIntrinsic>(Inst))		if (isa<DbgInfoIntrinsic>(Inst))
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines

Loop BlockGenerator::getLoopForInst(const llvm::Instruction Inst) {		Loop BlockGenerator::getLoopForInst(const llvm::Instruction Inst) {
return LI.getLoopFor(Inst->getParent());		return LI.getLoopFor(Inst->getParent());
}		}

Value BlockGenerator::generateScalarLoad(ScopStmt &Stmt, LoadInst Load,		Value BlockGenerator::generateScalarLoad(ScopStmt &Stmt, LoadInst Load,
ValueMapT &BBMap, LoopToScevMapT &LTS,		ValueMapT &BBMap, LoopToScevMapT &LTS,
isl_id_to_ast_expr *NewAccesses) {		isl_id_to_ast_expr *NewAccesses) {
if (Value *PreloadLoad = GlobalMap.lookup(Load))		if (Value *PreloadLoad = BBMap.lookup(Load))
		MeinersburUnsubmitted Not Done Reply Inline Actions This change is surprising to me. Aren't the hoisted loads stored in GlobalMap anymore? Meinersbur: This change is surprising to me. Aren't the hoisted loads stored in GlobalMap anymore?
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Yes and no. We do not need to lookup in the GlobalMap here because the hoisted loads are propagated thorugh the scalar maps to the BBMap. One could as easily not do this and I am not sure which is better. jdoerfert: Yes and no. We do not need to lookup in the GlobalMap here because the hoisted loads are…
return PreloadLoad;		return PreloadLoad;

auto *Pointer = Load->getPointerOperand();		auto *Pointer = Load->getPointerOperand();
Value *NewPointer =		Value *NewPointer =
generateLocationAccessed(Stmt, Load, Pointer, BBMap, LTS, NewAccesses);		generateLocationAccessed(Stmt, Load, Pointer, BBMap, LTS, NewAccesses);
Value *ScalarLoad = Builder.CreateAlignedLoad(		Value *ScalarLoad = Builder.CreateAlignedLoad(
NewPointer, Load->getAlignment(), Load->getName() + "_p_scalar_");		NewPointer, Load->getAlignment(), Load->getName() + "_p_scalar_");

Show All 15 Lines	void BlockGenerator::generateScalarStore(ScopStmt &Stmt, StoreInst *Store,

if (DebugPrinting)		if (DebugPrinting)
RuntimeDebugBuilder::createCPUPrinter(Builder, "Store to ", NewPointer,		RuntimeDebugBuilder::createCPUPrinter(Builder, "Store to ", NewPointer,
": ", ValueOperand, "\n");		": ", ValueOperand, "\n");

Builder.CreateAlignedStore(ValueOperand, NewPointer, Store->getAlignment());		Builder.CreateAlignedStore(ValueOperand, NewPointer, Store->getAlignment());
}		}

		Value BlockGenerator::createLoopCarriedPHI(Value V, ValueMapT &BBMap) {
		auto *PHI = PHINode::Create(V->getType(), 2, V->getName() + ".polly.lc");
		MeinersburUnsubmitted Not Done Reply Inline Actions What is the reason to not insert the PHI into the generated BB at this point (but to keep them "floating")? Meinersbur: What is the reason to not insert the PHI into the generated BB at this point (but to keep them…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions This function and all callers are in the BlockGenerator, but the block where these PHIs are supposed to reside in [the loop header of a Polly generated loop] is created in the IslNodeBuilder (or more precise the LoopGenerator). Thus, we either keep them floating or remember loop header blocks somewhere in the BlockGenerator. I choose the first as we can easily place them in the IslNodeBuilder if they are actually needed or delete them otherwise. jdoerfert: This function and all callers are in the BlockGenerator, but the block where these PHIs are…
		MeinersburUnsubmitted Not Done Reply Inline Actions OK Meinersbur: OK

		assert(LCPHIs.count(V) == 0);
		LCPHIs[V] = PHI;
		BBMap[V] = PHI;

		return PHI;
		}

		Value BlockGenerator::copyPHIInstruction(ScopStmt &Stmt, PHINode PHI,
		ValueMapT &BBMap,
		LoopToScevMapT &LTS) {
		if (!LI.isLoopHeader(PHI->getParent()))
		return nullptr;

		if (LoopDepth == 0 \|\| canSyntheziseInStmt(Stmt, PHI))
		return nullptr;
		MeinersburUnsubmitted Not Done Reply Inline Actions Could you add comments to explain these cases? Meinersbur: Could you add comments to explain these cases?
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Sure. jdoerfert: Sure.

		createLoopCarriedPHI(PHI, BBMap);

		return nullptr;
		}

bool BlockGenerator::canSyntheziseInStmt(ScopStmt &Stmt, Instruction *Inst) {		bool BlockGenerator::canSyntheziseInStmt(ScopStmt &Stmt, Instruction *Inst) {
Loop *L = getLoopForInst(Inst);		Loop *L = getLoopForInst(Inst);
return (Stmt.isBlockStmt() \|\| !Stmt.getRegion()->contains(L)) &&		return (Stmt.isBlockStmt() \|\| !Stmt.getRegion()->contains(L)) &&
canSynthesize(Inst, &LI, &SE, &Stmt.getParent()->getRegion());		canSynthesize(Inst, &LI, &SE, &Stmt.getParent()->getRegion());
}		}

void BlockGenerator::copyInstruction(ScopStmt &Stmt, Instruction *Inst,		void BlockGenerator::copyInstruction(ScopStmt &Stmt, Instruction *Inst,
ValueMapT &BBMap, LoopToScevMapT &LTS,		ValueMapT &BBMap, LoopToScevMapT &LTS,
isl_id_to_ast_expr *NewAccesses) {		isl_id_to_ast_expr *NewAccesses,
		bool ForceCopy) {
// Terminator instructions control the control flow. They are explicitly		// Terminator instructions control the control flow. They are explicitly
// expressed in the clast and do not need to be copied.		// expressed in the clast and do not need to be copied.
if (Inst->isTerminator())		if (Inst->isTerminator())
return;		return;

// Synthesizable statements will be generated on-demand.		// Synthesizable statements will be generated on-demand.
if (canSyntheziseInStmt(Stmt, Inst))		if (!ForceCopy && canSyntheziseInStmt(Stmt, Inst))
return;		return;

if (auto *Load = dyn_cast<LoadInst>(Inst)) {		if (auto *Load = dyn_cast<LoadInst>(Inst)) {
Value *NewLoad = generateScalarLoad(Stmt, Load, BBMap, LTS, NewAccesses);		Value *NewLoad = generateScalarLoad(Stmt, Load, BBMap, LTS, NewAccesses);
// Compute NewLoad before its insertion in BBMap to make the insertion		// Compute NewLoad before its insertion in BBMap to make the insertion
// deterministic.		// deterministic.
BBMap[Load] = NewLoad;		BBMap[Load] = NewLoad;
return;		return;
Show All 12 Lines	void BlockGenerator::copyInstruction(ScopStmt &Stmt, Instruction *Inst,
// Skip some special intrinsics for which we do not adjust the semantics to		// Skip some special intrinsics for which we do not adjust the semantics to
// the new schedule. All others are handled like every other instruction.		// the new schedule. All others are handled like every other instruction.
if (isIgnoredIntrinsic(Inst))		if (isIgnoredIntrinsic(Inst))
return;		return;

copyInstScalar(Stmt, Inst, BBMap, LTS);		copyInstScalar(Stmt, Inst, BBMap, LTS);
}		}

void BlockGenerator::copyStmt(ScopStmt &Stmt, LoopToScevMapT &LTS,		void BlockGenerator::copyStmt(ScopStmt &Stmt, ValueMapT &ScalarMap,
		LoopToScevMapT &LTS,
isl_id_to_ast_expr *NewAccesses) {		isl_id_to_ast_expr *NewAccesses) {
assert(Stmt.isBlockStmt() &&		assert(Stmt.isBlockStmt() &&
"Only block statements can be copied by the block generator");		"Only block statements can be copied by the block generator");

ValueMapT BBMap;		// Initialize the block intern mappings with all mapping that hold when this
		MeinersburUnsubmitted Not Done Reply Inline Actions block-intern_al_? mapping_s_ Meinersbur: block-intern_al_? mapping_s_
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Thx. jdoerfert: Thx.
		// block is entered.
		ValueMapT BBMap = ScalarMap;

BasicBlock *BB = Stmt.getBasicBlock();		BasicBlock *BB = Stmt.getBasicBlock();
copyBB(Stmt, BB, BBMap, LTS, NewAccesses);		copyBB(Stmt, BB, BBMap, LTS, NewAccesses);

		// Copy scalar mappings from the BBMap back to the ScalarMap if they
		// might be needed after this block.
		generateScalarMappings(Stmt, BB, ScalarMap, BBMap, LTS);
}		}

BasicBlock BlockGenerator::splitBB(BasicBlock BB) {		BasicBlock BlockGenerator::splitBB(BasicBlock BB) {
BasicBlock *CopyBB = SplitBlock(Builder.GetInsertBlock(),		BasicBlock *CopyBB = SplitBlock(Builder.GetInsertBlock(),
&*Builder.GetInsertPoint(), &DT, &LI);		&*Builder.GetInsertPoint(), &DT, &LI);
CopyBB->setName("polly.stmt." + BB->getName());		CopyBB->setName("polly.stmt." + BB->getName());
return CopyBB;		return CopyBB;
}		}

BasicBlock BlockGenerator::copyBB(ScopStmt &Stmt, BasicBlock BB,		BasicBlock BlockGenerator::copyBB(ScopStmt &Stmt, BasicBlock BB,
ValueMapT &BBMap, LoopToScevMapT &LTS,		ValueMapT &BBMap, LoopToScevMapT &LTS,
isl_id_to_ast_expr *NewAccesses) {		isl_id_to_ast_expr *NewAccesses) {
BasicBlock *CopyBB = splitBB(BB);		BasicBlock *CopyBB = splitBB(BB);
Builder.SetInsertPoint(&CopyBB->front());		Builder.SetInsertPoint(&CopyBB->front());
generateScalarLoads(Stmt, BBMap);

copyBB(Stmt, BB, CopyBB, BBMap, LTS, NewAccesses);		copyBB(Stmt, BB, CopyBB, BBMap, LTS, NewAccesses);

// After a basic block was copied store all scalars that escape this block in
// their alloca.
generateScalarStores(Stmt, LTS, BBMap);
return CopyBB;		return CopyBB;
}		}

void BlockGenerator::copyBB(ScopStmt &Stmt, BasicBlock BB, BasicBlock CopyBB,		void BlockGenerator::copyBB(ScopStmt &Stmt, BasicBlock BB, BasicBlock CopyBB,
ValueMapT &BBMap, LoopToScevMapT &LTS,		ValueMapT &BBMap, LoopToScevMapT &LTS,
isl_id_to_ast_expr *NewAccesses) {		isl_id_to_ast_expr *NewAccesses) {
EntryBB = &CopyBB->getParent()->getEntryBlock();		EntryBB = &CopyBB->getParent()->getEntryBlock();

for (Instruction &Inst : *BB)		for (Instruction &Inst : *BB)
copyInstruction(Stmt, &Inst, BBMap, LTS, NewAccesses);		copyInstruction(Stmt, &Inst, BBMap, LTS, NewAccesses);
}		}

Value BlockGenerator::getOrCreateAlloca(Value ScalarBase,
ScalarAllocaMapTy &Map,
const char *NameExt) {
// If no alloca was found create one and insert it in the entry block.
if (!Map.count(ScalarBase)) {
auto *Ty = ScalarBase->getType();
auto NewAddr = new AllocaInst(Ty, ScalarBase->getName() + NameExt);
EntryBB = &Builder.GetInsertBlock()->getParent()->getEntryBlock();
NewAddr->insertBefore(&*EntryBB->getFirstInsertionPt());
Map[ScalarBase] = NewAddr;
}

auto Addr = Map[ScalarBase];

if (GlobalMap.count(Addr))
return GlobalMap[Addr];

return Addr;
}

Value *BlockGenerator::getOrCreateAlloca(MemoryAccess &Access) {
if (Access.isPHIKind())
return getOrCreatePHIAlloca(Access.getBaseAddr());
else
return getOrCreateScalarAlloca(Access.getBaseAddr());
}

Value BlockGenerator::getOrCreateAlloca(const ScopArrayInfo Array) {
if (Array->isPHIKind())
return getOrCreatePHIAlloca(Array->getBasePtr());
else
return getOrCreateScalarAlloca(Array->getBasePtr());
}

Value BlockGenerator::getOrCreateScalarAlloca(Value ScalarBase) {
return getOrCreateAlloca(ScalarBase, ScalarMap, ".s2a");
}

Value BlockGenerator::getOrCreatePHIAlloca(Value ScalarBase) {
return getOrCreateAlloca(ScalarBase, PHIOpMap, ".phiops");
}

void BlockGenerator::handleOutsideUsers(const Region &R, Instruction *Inst,		void BlockGenerator::handleOutsideUsers(const Region &R, Instruction *Inst,
Value *Address) {		Value *Address) {
// If there are escape users we get the alloca for this instruction and put it		// If there are escape users we get the alloca for this instruction and put it
// in the EscapeMap for later finalization. Lastly, if the instruction was		// in the EscapeMap for later finalization. Lastly, if the instruction was
// copied multiple times we already did this and can exit.		// copied multiple times we already did this and can exit.
if (EscapeMap.count(Inst))		if (EscapeMap.count(Inst))
return;		return;

Show All 10 Lines	for (User *U : Inst->users()) {

EscapeUsers.push_back(UI);		EscapeUsers.push_back(UI);
}		}

// Exit if no escape uses were found.		// Exit if no escape uses were found.
if (EscapeUsers.empty())		if (EscapeUsers.empty())
return;		return;

// Get or create an escape alloca for this instruction.
auto *ScalarAddr = Address ? Address : getOrCreateScalarAlloca(Inst);

// Remember that this instruction has escape uses and the escape alloca.		// Remember that this instruction has escape uses and the escape alloca.
EscapeMap[Inst] = std::make_pair(ScalarAddr, std::move(EscapeUsers));		EscapeMap[Inst] = std::move(EscapeUsers);
}		}

void BlockGenerator::generateScalarLoads(ScopStmt &Stmt, ValueMapT &BBMap) {		void BlockGenerator::generateScalarMappings(ScopStmt &Stmt, BasicBlock *BB,
for (MemoryAccess *MA : Stmt) {		ValueMapT &ScalarMap,
if (MA->isArrayKind() \|\| MA->isWrite())		ValueMapT &BBMap,
continue;		LoopToScevMapT &LTS) {

auto Address = getOrCreateAlloca(MA);
BBMap[MA->getBaseAddr()] =
Builder.CreateLoad(Address, Address->getName() + ".reload");
}
}

Value BlockGenerator::getNewScalarValue(Value ScalarValue, const Region &R,
ScopStmt &Stmt, LoopToScevMapT &LTS,
ValueMapT &BBMap) {
// If the value we want to store is an instruction we might have demoted it
// in order to make it accessible here. In such a case a reload is
// necessary. If it is no instruction it will always be a value that
// dominates the current point and we can just use it. In total there are 4
// options:
// (1) The value is no instruction ==> use the value.
// (2) The value is an instruction that was split out of the region prior to
// code generation ==> use the instruction as it dominates the region.
// (3) The value is an instruction:
// (a) The value was defined in the current block, thus a copy is in
// the BBMap ==> use the mapped value.
// (b) The value was defined in a previous block, thus we demoted it
// earlier ==> use the reloaded value.
Instruction *ScalarValueInst = dyn_cast<Instruction>(ScalarValue);
if (!ScalarValueInst)
return ScalarValue;

if (!R.contains(ScalarValueInst)) {
if (Value *ScalarValueCopy = GlobalMap.lookup(ScalarValueInst))
return /* Case (3a) */ ScalarValueCopy;
else
return /* Case 2 */ ScalarValue;
}

if (Value *ScalarValueCopy = BBMap.lookup(ScalarValueInst))
return /* Case (3a) */ ScalarValueCopy;

if ((Stmt.isBlockStmt() &&
Stmt.getBasicBlock() == ScalarValueInst->getParent()) \|\|
(Stmt.isRegionStmt() && Stmt.getRegion()->contains(ScalarValueInst))) {
auto SynthesizedValue = trySynthesizeNewValue(
Stmt, ScalarValueInst, BBMap, LTS, getLoopForInst(ScalarValueInst));

if (SynthesizedValue)
return SynthesizedValue;
}

// Case (3b)
Value *Address = getOrCreateScalarAlloca(ScalarValueInst);
ScalarValue = Builder.CreateLoad(Address, Address->getName() + ".reload");

return ScalarValue;
}
grosserUnsubmitted Not Done Reply Inline Actions This patch nicely pointed out that getNewScalarValue is unnecessary and its uses can be replaced with getNewValue. This simplification was performed on Jan 26 in https://llvm.org/svn/llvm-project/polly/trunk@258799, such that future versions of this patch will not need to perform this simplification any more. grosser: This patch nicely pointed out that getNewScalarValue is unnecessary and its uses can be…

void BlockGenerator::generateScalarStores(ScopStmt &Stmt, LoopToScevMapT &LTS,
ValueMapT &BBMap) {
const Region &R = Stmt.getParent()->getRegion();

assert(Stmt.isBlockStmt() && "Region statements need to use the "		Scop &S = *Stmt.getParent();
"generateScalarStores() function in the "		for (auto *MA : Stmt) {
"RegionGenerator");

for (MemoryAccess *MA : Stmt) {
if (MA->isArrayKind() \|\| MA->isRead())		if (MA->isArrayKind() \|\| MA->isRead())
continue;		continue;

Value *Val = MA->getAccessValue();		// For each scalar defined in this statement that has a MemoryAccess
auto Address = getOrCreateAlloca(MA);		// we make sure the access value was actually copied.
		auto *AccessValue = MA->getAccessValue();
Val = getNewScalarValue(Val, R, Stmt, LTS, BBMap);		if (auto *AccessValueInst = dyn_cast<Instruction>(AccessValue)) {
Builder.CreateStore(Val, Address);		if (AccessValueInst->getParent() == BB &&
}		canSyntheziseInStmt(Stmt, AccessValueInst))
		copyInstruction(Stmt, AccessValueInst, BBMap, LTS, nullptr,
		sebpopUnsubmitted Not Done Reply Inline Actions Why are we forcing copy of instructions that can be synthesized? Aren't these synthesizable instructions discarded in the first place by not being added to the scalar memory accesses of the stmt? sebpop: Why are we forcing copy of instructions that can be synthesized? Aren't these synthesizable…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Because we (might) need these values as operands of PHI nodes we haven't constructed yet and we need them to be computed/placed at the correct location. Later on, when we create the PHI, we do not know anymore where we should place the operand code and not even which operand we should create. In the following example that could be the result after scheduling we would not know where to place which version of x (x1 or x2) when we create code for the join point after the conditional. It would require a backward search on the AST to find the location where the PHI x was written last by each predecessor in order to get the operand and location. However, forcing them to be copyied in the first place solves this quite nice as we alwas know the values we will later need are actually present and mapped. if (...) { x1 = 2i; S: / lots of code / } else { x2 = 3i; P: /* lots of code / } x3 = phi(x1, x2) jdoerfert:* Because we (might) need these values as operands of PHI nodes we haven't constructed yet and we…
		/* Force */ true);
}		}

void BlockGenerator::createScalarInitialization(Scop &S) {		// If the scalar was not defined in this block (only for region statements)
Region &R = S.getRegion();		// we can skip it for now and only proceed when the block containing the
BasicBlock *ExitBB = R.getExit();		// access instruction is copied.
		if (BB != MA->getAccessInstruction()->getParent())
// The split block __just before__ the region and optimized region.
BasicBlock *SplitBB = R.getEnteringBlock();
BranchInst *SplitBBTerm = cast<BranchInst>(SplitBB->getTerminator());
assert(SplitBBTerm->getNumSuccessors() == 2 && "Bad region entering block!");

// Get the start block of the __optimized__ region.
BasicBlock *StartBB = SplitBBTerm->getSuccessor(0);
if (StartBB == R.getEntry())
StartBB = SplitBBTerm->getSuccessor(1);

Builder.SetInsertPoint(StartBB->getTerminator());

for (auto &Pair : S.arrays()) {
auto &Array = Pair.second;
if (Array->getNumberOfDimensions() != 0)
continue;
if (Array->isPHIKind()) {
// For PHI nodes, the only values we need to store are the ones that
// reach the PHI node from outside the region. In general there should
// only be one such incoming edge and this edge should enter through
// 'SplitBB'.
auto PHI = cast<PHINode>(Array->getBasePtr());

for (auto BI = PHI->block_begin(), BE = PHI->block_end(); BI != BE; BI++)
if (!R.contains(BI) && BI != SplitBB)
llvm_unreachable("Incoming edges from outside the scop should always "
"come from SplitBB");

int Idx = PHI->getBasicBlockIndex(SplitBB);
if (Idx < 0)
continue;		continue;

Value *ScalarValue = PHI->getIncomingValue(Idx);		auto *BaseAddr = MA->getBaseAddr();
		auto *NewVal = getNewValue(Stmt, AccessValue, BBMap, LTS,
		getLoopForInst(MA->getAccessInstruction()));

Builder.CreateStore(ScalarValue, getOrCreatePHIAlloca(PHI));		// If the base address has a loop carried PHI it is either a real PHI
continue;		// (= MA is PHIKind) or a scalar that was used before we saw a definition.
		// In the second case we map both, the PHICopy as well as the BaseAddr to
		// the NewValue to allow uses textually before as well as after the scalar
		MeinersburUnsubmitted Not Done Reply Inline Actions "textually" doesn't seem the right word; we are not doing text processing here. Should be something about processing order. Meinersbur: "textually" doesn't seem the right word; we are not doing text processing here. Should be…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions AFAIK this is a standard term that we also used in our discussions and emails before. Processing order would not meet the requirements anyway as the actual dynamically executed order is not what this is about. "textually" means the order in which you statically write/see/discover things. In other words the order in which you first see the statements in a linearized/dumped AST if you read it like a page (from top to bottom). jdoerfert: AFAIK this is a standard term that we also used in our discussions and emails before.
		MeinersburUnsubmitted Not Done Reply Inline Actions OK, but never seen this personally. Meinersbur: OK, but never seen this personally.
		// definition.
		if (Value *PHICopy = LCPHIs.lookup(BaseAddr)) {
		if (MA->isPHIKind() && LI.isLoopHeader(BB))
		BaseAddr = PHICopy;
		else if (NewVal != PHICopy)
		ScalarMap[PHICopy] = NewVal;
}		}

auto *Inst = dyn_cast<Instruction>(Array->getBasePtr());		// For exit PHIs that have been split due to single exit edge creation
		// we map the PHI with multiple exits to the new value, not the new one.
if (Inst && R.contains(Inst))		if (MA->isExitPHIKind() && !S.hasSingleExitEdge()) {
continue;		auto *PHI = cast<PHINode>(BaseAddr);
		BaseAddr = PHI->getIncomingValueForBlock(S.getRegion().getExit());
// PHI nodes that are not marked as such in their SAI object are either exit		}
// PHI nodes we model as common scalars but without initialization, or
// incoming phi nodes that need to be initialized. Check if the first is the
// case for Inst and do not create and initialize memory if so.
if (auto *PHI = dyn_cast_or_null<PHINode>(Inst))
if (!S.hasSingleExitEdge() && PHI->getBasicBlockIndex(ExitBB) >= 0)
continue;

Builder.CreateStore(Array->getBasePtr(),		assert(BaseAddr);
getOrCreateScalarAlloca(Array->getBasePtr()));		ScalarMap[BaseAddr] = NewVal;
}		}
}		}

void BlockGenerator::createScalarFinalization(Region &R) {		void BlockGenerator::createScalarFinalization(Region &R, ValueMapT &ScalarMap) {
// The exit block of the __unoptimized__ region.		// The exit block of the __unoptimized__ region.
BasicBlock *ExitBB = R.getExitingBlock();		BasicBlock *ExitBB = R.getExitingBlock();
// The merge block __just after__ the region and the optimized region.		// The merge block __just after__ the region and the optimized region.
BasicBlock *MergeBB = R.getExit();		BasicBlock *MergeBB = R.getExit();

// The exit block of the __optimized__ region.		// The exit block of the __optimized__ region.
BasicBlock OptExitBB = (pred_begin(MergeBB));		BasicBlock OptExitBB = (pred_begin(MergeBB));
if (OptExitBB == ExitBB)		if (OptExitBB == ExitBB)
OptExitBB = *(++pred_begin(MergeBB));		OptExitBB = *(++pred_begin(MergeBB));

Builder.SetInsertPoint(OptExitBB->getTerminator());		Builder.SetInsertPoint(OptExitBB->getTerminator());
for (const auto &EscapeMapping : EscapeMap) {		for (const auto &EscapeMapping : EscapeMap) {
// Extract the escaping instruction and the escaping users as well as the		// Extract the escaping instruction and the escaping users as well as the
// alloca the instruction was demoted to.		// alloca the instruction was demoted to.
Instruction *EscapeInst = EscapeMapping.getFirst();		Instruction *EscapeInst = EscapeMapping.getFirst();
const auto &EscapeMappingValue = EscapeMapping.getSecond();		const EscapeUserVectorTy &EscapeUsers = EscapeMapping.getSecond();
const EscapeUserVectorTy &EscapeUsers = EscapeMappingValue.second;
Value *ScalarAddr = EscapeMappingValue.first;		Value *EscapeInstMapped = ScalarMap[EscapeInst];
		assert(EscapeInstMapped);
// Reload the demoted instruction in the optimized version of the SCoP.
Value *EscapeInstReload =
Builder.CreateLoad(ScalarAddr, EscapeInst->getName() + ".final_reload");
EscapeInstReload =
Builder.CreateBitOrPointerCast(EscapeInstReload, EscapeInst->getType());

// Create the merge PHI that merges the optimized and unoptimized version.		// Create the merge PHI that merges the optimized and unoptimized version.
PHINode *MergePHI = PHINode::Create(EscapeInst->getType(), 2,		PHINode *MergePHI = PHINode::Create(EscapeInst->getType(), 2,
EscapeInst->getName() + ".merge");		EscapeInst->getName() + ".merge");
MergePHI->insertBefore(&*MergeBB->getFirstInsertionPt());		MergePHI->insertBefore(&*MergeBB->getFirstInsertionPt());

// Add the respective values to the merge PHI.		// Add the respective values to the merge PHI.
MergePHI->addIncoming(EscapeInstReload, OptExitBB);		MergePHI->addIncoming(EscapeInstMapped, OptExitBB);
MergePHI->addIncoming(EscapeInst, ExitBB);		MergePHI->addIncoming(EscapeInst, ExitBB);

// The information of scalar evolution about the escaping instruction needs		// The information of scalar evolution about the escaping instruction needs
// to be revoked so the new merged instruction will be used.		// to be revoked so the new merged instruction will be used.
if (SE.isSCEVable(EscapeInst->getType()))		if (SE.isSCEVable(EscapeInst->getType()))
SE.forgetValue(EscapeInst);		SE.forgetValue(EscapeInst);

// Replace all uses of the demoted instruction with the merge PHI.		// Replace all uses of the demoted instruction with the merge PHI.
Show All 23 Lines	for (auto &Pair : S.arrays()) {
// relevant outside users.		// relevant outside users.
if (!R.contains(Inst))		if (!R.contains(Inst))
continue;		continue;

handleOutsideUsers(R, Inst, nullptr);		handleOutsideUsers(R, Inst, nullptr);
}		}
}		}

void BlockGenerator::createExitPHINodeMerges(Scop &S) {		void BlockGenerator::createExitPHINodeMerges(Scop &S, ValueMapT &ScalarMap) {
if (S.hasSingleExitEdge())		if (S.hasSingleExitEdge())
return;		return;

Region &R = S.getRegion();		Region &R = S.getRegion();

auto *ExitBB = R.getExitingBlock();		auto *ExitBB = R.getExitingBlock();
auto *MergeBB = R.getExit();		auto *MergeBB = R.getExit();
auto *AfterMergeBB = MergeBB->getSingleSuccessor();		auto *AfterMergeBB = MergeBB->getSingleSuccessor();
Show All 10 Lines	for (auto &Pair : S.arrays()) {
PHINode *PHI = dyn_cast<PHINode>(Val);		PHINode *PHI = dyn_cast<PHINode>(Val);
if (!PHI)		if (!PHI)
continue;		continue;

if (PHI->getParent() != AfterMergeBB)		if (PHI->getParent() != AfterMergeBB)
continue;		continue;

std::string Name = PHI->getName();		std::string Name = PHI->getName();
Value *ScalarAddr = getOrCreateScalarAlloca(PHI);
Value *Reload = Builder.CreateLoad(ScalarAddr, Name + ".ph.final_reload");
Reload = Builder.CreateBitOrPointerCast(Reload, PHI->getType());
Value *OriginalValue = PHI->getIncomingValueForBlock(MergeBB);		Value *OriginalValue = PHI->getIncomingValueForBlock(MergeBB);
		Value *CopiedValue = ScalarMap.lookup(OriginalValue);
		if (!CopiedValue) {
		assert(!isa<Instruction>(OriginalValue) \|\|
		!R.contains(cast<Instruction>(OriginalValue)));
		CopiedValue = OriginalValue;
		}
		assert(CopiedValue);

auto *MergePHI = PHINode::Create(PHI->getType(), 2, Name + ".ph.merge");		auto *MergePHI = PHINode::Create(PHI->getType(), 2, Name + ".ph.merge");
MergePHI->insertBefore(&*MergeBB->getFirstInsertionPt());		MergePHI->insertBefore(&*MergeBB->getFirstInsertionPt());
MergePHI->addIncoming(Reload, OptExitBB);		MergePHI->addIncoming(CopiedValue, OptExitBB);
MergePHI->addIncoming(OriginalValue, ExitBB);		MergePHI->addIncoming(OriginalValue, ExitBB);
int Idx = PHI->getBasicBlockIndex(MergeBB);		int Idx = PHI->getBasicBlockIndex(MergeBB);
PHI->setIncomingValue(Idx, MergePHI);		PHI->setIncomingValue(Idx, MergePHI);
}		}
}		}

void BlockGenerator::finalizeSCoP(Scop &S) {		void BlockGenerator::finalizeSCoP(Scop &S, ValueMapT &ScalarMap) {
findOutsideUsers(S);		findOutsideUsers(S);
createScalarInitialization(S);		createExitPHINodeMerges(S, ScalarMap);
createExitPHINodeMerges(S);		createScalarFinalization(S.getRegion(), ScalarMap);
createScalarFinalization(S.getRegion());
}		}

VectorBlockGenerator::VectorBlockGenerator(BlockGenerator &BlockGen,		VectorBlockGenerator::VectorBlockGenerator(BlockGenerator &BlockGen,
std::vector<LoopToScevMapT> &VLTS,		std::vector<LoopToScevMapT> &VLTS,
isl_map *Schedule)		isl_map *Schedule)
: BlockGenerator(BlockGen), VLTS(VLTS), Schedule(Schedule) {		: BlockGenerator(BlockGen), VLTS(VLTS), Schedule(Schedule) {
assert(Schedule && "No statement domain provided");		assert(Schedule && "No statement domain provided");
}		}
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	Value *VectorBlockGenerator::generateUnknownStrideLoad(
}		}

return Vector;		return Vector;
}		}

void VectorBlockGenerator::generateLoad(		void VectorBlockGenerator::generateLoad(
ScopStmt &Stmt, LoadInst *Load, ValueMapT &VectorMap,		ScopStmt &Stmt, LoadInst *Load, ValueMapT &VectorMap,
VectorValueMapT &ScalarMaps, __isl_keep isl_id_to_ast_expr *NewAccesses) {		VectorValueMapT &ScalarMaps, __isl_keep isl_id_to_ast_expr *NewAccesses) {
if (Value *PreloadLoad = GlobalMap.lookup(Load)) {		if (Value *PreloadLoad = ScalarMaps[0].lookup(Load)) {
VectorMap[Load] = Builder.CreateVectorSplat(getVectorWidth(), PreloadLoad,		VectorMap[Load] = Builder.CreateVectorSplat(getVectorWidth(), PreloadLoad,
Load->getName() + "_p");		Load->getName() + "_p");
return;		return;
}		}

if (!VectorType::isValidElementType(Load->getType())) {		if (!VectorType::isValidElementType(Load->getType())) {
for (int i = 0; i < getVectorWidth(); i++)		for (int i = 0; i < getVectorWidth(); i++)
ScalarMaps[i][Load] =		ScalarMaps[i][Load] =
▲ Show 20 Lines • Show All 186 Lines • ▼ Show 20 Lines	if (hasVectorOperands(Inst, VectorMap)) {

// Falltrough: We generate scalar instructions, if we don't know how to		// Falltrough: We generate scalar instructions, if we don't know how to
// generate vector code.		// generate vector code.
}		}

copyInstScalarized(Stmt, Inst, VectorMap, ScalarMaps, NewAccesses);		copyInstScalarized(Stmt, Inst, VectorMap, ScalarMaps, NewAccesses);
}		}

void VectorBlockGenerator::generateScalarVectorLoads(
ScopStmt &Stmt, ValueMapT &VectorBlockMap) {
for (MemoryAccess *MA : Stmt) {
if (MA->isArrayKind() \|\| MA->isWrite())
continue;

auto Address = getOrCreateAlloca(MA);
Type *VectorPtrType = getVectorPtrTy(Address, 1);
Value *VectorPtr = Builder.CreateBitCast(Address, VectorPtrType,
Address->getName() + "_p_vec_p");
auto *Val = Builder.CreateLoad(VectorPtr, Address->getName() + ".reload");
Constant *SplatVector = Constant::getNullValue(
VectorType::get(Builder.getInt32Ty(), getVectorWidth()));

Value *VectorVal = Builder.CreateShuffleVector(
Val, Val, SplatVector, Address->getName() + "_p_splat");
VectorBlockMap[MA->getBaseAddr()] = VectorVal;
VectorVal->dump();
}
}

void VectorBlockGenerator::verifyNoScalarStores(ScopStmt &Stmt) {
for (MemoryAccess *MA : Stmt) {
if (MA->isArrayKind() \|\| MA->isRead())
continue;

llvm_unreachable("Scalar stores not expected in vector loop");
}
}
grosserUnsubmitted Not Done Reply Inline Actions Why is the last function dropped? This is a sanity check that works on the ScopInfo and is unrelated to how we generate scalar code. It verifies that no scalar write statements are in this ScopStmt, which is likely to cause trouble both for the to-memory scalar codegen as well as SSA codegen. Consequently, I think it makes sense to keep this check. grosser: Why is the last function dropped? This is a sanity check that works on the ScopInfo and is…

void VectorBlockGenerator::copyStmt(		void VectorBlockGenerator::copyStmt(
ScopStmt &Stmt, __isl_keep isl_id_to_ast_expr *NewAccesses) {		ScopStmt &Stmt, ValueMapT &ScalarMap,
		__isl_keep isl_id_to_ast_expr *NewAccesses) {
assert(Stmt.isBlockStmt() && "TODO: Only block statements can be copied by "		assert(Stmt.isBlockStmt() && "TODO: Only block statements can be copied by "
"the vector block generator");		"the vector block generator");

BasicBlock *BB = Stmt.getBasicBlock();		BasicBlock *BB = Stmt.getBasicBlock();
BasicBlock *CopyBB = SplitBlock(Builder.GetInsertBlock(),		BasicBlock *CopyBB = SplitBlock(Builder.GetInsertBlock(),
&*Builder.GetInsertPoint(), &DT, &LI);		&*Builder.GetInsertPoint(), &DT, &LI);
CopyBB->setName("polly.stmt." + BB->getName());		CopyBB->setName("polly.stmt." + BB->getName());
Builder.SetInsertPoint(&CopyBB->front());		Builder.SetInsertPoint(&CopyBB->front());

// Create two maps that store the mapping from the original instructions of		// Create two maps that store the mapping from the original instructions of
// the old basic block to their copies in the new basic block. Those maps		// the old basic block to their copies in the new basic block. Those maps
// are basic block local.		// are basic block local.
//		//
// As vector code generation is supported there is one map for scalar values		// As vector code generation is supported there is one map for scalar values
// and one for vector values.		// and one for vector values.
//		//
// In case we just do scalar code generation, the vectorMap is not used and		// In case we just do scalar code generation, the vectorMap is not used and
// the scalarMap has just one dimension, which contains the mapping.		// the scalarMap has just one dimension, which contains the mapping.
//		//
// In case vector code generation is done, an instruction may either appear		// In case vector code generation is done, an instruction may either appear
// in the vector map once (as it is calculating >vectorwidth< values at a		// in the vector map once (as it is calculating >vectorwidth< values at a
// time. Or (if the values are calculated using scalar operations), it		// time. Or (if the values are calculated using scalar operations), it
// appears once in every dimension of the scalarMap.		// appears once in every dimension of the scalarMap.
VectorValueMapT ScalarBlockMap(getVectorWidth());		VectorValueMapT ScalarBlockMap;
		ScalarBlockMap.append(getVectorWidth(), ScalarMap);
ValueMapT VectorBlockMap;		ValueMapT VectorBlockMap;

generateScalarVectorLoads(Stmt, VectorBlockMap);

grosserUnsubmitted Not Done Reply Inline Actions By dropping generateScalarVectorLoads this patch changes behavior. Originally the code splatted all scalar values into a vector-value. After this change, no vector values are generated. This is likely the reason test/Isl/CodeGen/simple_vec_stride_one.ll changes. grosser: By dropping generateScalarVectorLoads this patch changes behavior. Originally the code splatted…
for (Instruction &Inst : *BB)		for (Instruction &Inst : *BB)
copyInstruction(Stmt, &Inst, VectorBlockMap, ScalarBlockMap, NewAccesses);		copyInstruction(Stmt, &Inst, VectorBlockMap, ScalarBlockMap, NewAccesses);

verifyNoScalarStores(Stmt);
}		}

BasicBlock RegionGenerator::repairDominance(BasicBlock BB,		BasicBlock RegionGenerator::repairDominance(BasicBlock BB,
BasicBlock *BBCopy) {		BasicBlock *BBCopy) {

BasicBlock *BBIDom = DT.getNode(BB)->getIDom()->getBlock();		BasicBlock *BBIDom = DT.getNode(BB)->getIDom()->getBlock();
BasicBlock *BBCopyIDom = BlockMap.lookup(BBIDom);		BasicBlock *BBCopyIDom = BlockMap.lookup(BBIDom);

if (BBCopyIDom)		if (BBCopyIDom)
DT.changeImmediateDominator(BBCopy, BBCopyIDom);		DT.changeImmediateDominator(BBCopy, BBCopyIDom);

return BBCopyIDom;		return BBCopyIDom;
}		}

void RegionGenerator::copyStmt(ScopStmt &Stmt, LoopToScevMapT &LTS,		void RegionGenerator::createMergePHIs(BasicBlock *BB, Region &R,
		ValueMapT &MergeScalarMap) {
		SmallVector<BasicBlock *, 8> PredBBs;
		SmallVector<ValueMapT *, 8> PredScalarMaps;

		for (BasicBlock *PredBB : predecessors(BB)) {
		if (!R.contains(PredBB))
		continue;

		auto *PredBBCopy = BlockMap[PredBB];
		assert(PredBBCopy && ScalarMaps.count(PredBBCopy));

		PredBBs.push_back(PredBBCopy);
		PredScalarMaps.push_back(&ScalarMaps[PredBBCopy]);
		}

		Builder.SetInsertPoint(&*BlockMap[BB]->getFirstInsertionPt());

		polly::createMergePHIs(Builder, PredBBs, PredScalarMaps, MergeScalarMap);
		}

		void RegionGenerator::copyStmt(ScopStmt &Stmt, ValueMapT &OuterScalarMap,
		LoopToScevMapT &LTS,
isl_id_to_ast_expr *IdToAstExp) {		isl_id_to_ast_expr *IdToAstExp) {
assert(Stmt.isRegionStmt() &&		assert(Stmt.isRegionStmt() &&
"Only region statements can be copied by the region generator");		"Only region statements can be copied by the region generator");

Scop *S = Stmt.getParent();		Scop *S = Stmt.getParent();

// Forget all old mappings.		// Forget all old mappings.
BlockMap.clear();		BlockMap.clear();
RegionMaps.clear();		RegionMaps.clear();
		ScalarMaps.clear();
IncompletePHINodeMap.clear();		IncompletePHINodeMap.clear();

// Collection of all values related to this subregion.		// Collection of all values related to this subregion.
ValueMapT ValueMap;		ValueMapT ValueMap;

// The region represented by the statement.		// The region represented by the statement.
Region *R = Stmt.getRegion();		Region *R = Stmt.getRegion();

// Create a dedicated entry for the region where we can reload all demoted		// Create a dedicated entry for the region where we can reload all demoted
// inputs.		// inputs.
BasicBlock *EntryBB = R->getEntry();		BasicBlock *EntryBB = R->getEntry();
BasicBlock *EntryBBCopy = SplitBlock(Builder.GetInsertBlock(),		BasicBlock *EntryBBCopy = SplitBlock(Builder.GetInsertBlock(),
&*Builder.GetInsertPoint(), &DT, &LI);		&*Builder.GetInsertPoint(), &DT, &LI);
EntryBBCopy->setName("polly.stmt." + EntryBB->getName() + ".entry");		EntryBBCopy->setName("polly.stmt." + EntryBB->getName() + ".entry");
Builder.SetInsertPoint(&EntryBBCopy->front());		Builder.SetInsertPoint(&EntryBBCopy->front());

ValueMapT &EntryBBMap = RegionMaps[EntryBBCopy];		ValueMapT &EntryBBMap = RegionMaps[EntryBBCopy];
generateScalarLoads(Stmt, EntryBBMap);		ValueMapT &EntryScalarMap = ScalarMaps[EntryBBCopy];
		EntryScalarMap.insert(OuterScalarMap.begin(), OuterScalarMap.end());
		EntryBBMap.insert(OuterScalarMap.begin(), OuterScalarMap.end());

for (auto PI = pred_begin(EntryBB), PE = pred_end(EntryBB); PI != PE; ++PI)		for (auto PI = pred_begin(EntryBB), PE = pred_end(EntryBB); PI != PE; ++PI)
if (!R->contains(*PI))		if (!R->contains(*PI))
BlockMap[*PI] = EntryBBCopy;		BlockMap[*PI] = EntryBBCopy;

// Determine the original exit block of this subregion. If it the exit block		// Determine the original exit block of this subregion. If it the exit block
// is also the scop's exit, it it has been changed to polly.merge_new_and_old.		// is also the scop's exit, it it has been changed to polly.merge_new_and_old.
// We move one block back to find the original block. This only happens if the		// We move one block back to find the original block. This only happens if the
Show All 21 Lines	while (!Blocks.empty()) {

// In order to remap PHI nodes we store also basic block mappings.		// In order to remap PHI nodes we store also basic block mappings.
BlockMap[BB] = BBCopy;		BlockMap[BB] = BBCopy;

// Get the mapping for this block and initialize it with either the scalar		// Get the mapping for this block and initialize it with either the scalar
// loads from the generated entering block (which dominates all blocks of		// loads from the generated entering block (which dominates all blocks of
// this subregion) or the maps of the immediate dominator, if part of the		// this subregion) or the maps of the immediate dominator, if part of the
// subregion. The latter necessarily includes the former.		// subregion. The latter necessarily includes the former.
ValueMapT *InitBBMap;		ValueMapT InitBBMap, InitScalarMap;
if (BBCopyIDom) {		if (BBCopyIDom) {
assert(RegionMaps.count(BBCopyIDom));		assert(RegionMaps.count(BBCopyIDom));
InitBBMap = &RegionMaps[BBCopyIDom];		InitBBMap = &RegionMaps[BBCopyIDom];
} else		InitScalarMap = &ScalarMaps[BBCopyIDom];
		} else {
InitBBMap = &EntryBBMap;		InitBBMap = &EntryBBMap;
		InitScalarMap = &EntryScalarMap;
		}
auto Inserted = RegionMaps.insert(std::make_pair(BBCopy, *InitBBMap));		auto Inserted = RegionMaps.insert(std::make_pair(BBCopy, *InitBBMap));
ValueMapT &RegionMap = Inserted.first->second;		ValueMapT &RegionMap = Inserted.first->second;
		ValueMapT &ScalarMap =
		ScalarMaps.insert(std::make_pair(BBCopy, *InitScalarMap)).first->second;

// Copy the block with the BlockGenerator.		// Copy the block with the BlockGenerator.
Builder.SetInsertPoint(&BBCopy->front());		Builder.SetInsertPoint(&BBCopy->front());
copyBB(Stmt, BB, BBCopy, RegionMap, LTS, IdToAstExp);		copyBB(Stmt, BB, BBCopy, RegionMap, LTS, IdToAstExp);

		// TODO
		sebpopUnsubmitted Not Done Reply Inline Actions Unfinished comment? sebpop: Unfinished comment?
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Yes, I'll fix this. jdoerfert: Yes, I'll fix this.
		generateScalarMappings(Stmt, BB, ScalarMap, RegionMap, LTS);

// In order to remap PHI nodes we store also basic block mappings.		// In order to remap PHI nodes we store also basic block mappings.
BlockMap[BB] = BBCopy;		BlockMap[BB] = BBCopy;

// Add values to incomplete PHI nodes waiting for this block to be copied.		// Add values to incomplete PHI nodes waiting for this block to be copied.
for (const PHINodePairTy &PHINodePair : IncompletePHINodeMap[BB])		for (const PHINodePairTy &PHINodePair : IncompletePHINodeMap[BB])
addOperandToPHI(Stmt, PHINodePair.first, PHINodePair.second, BB, LTS);		addOperandToPHI(Stmt, PHINodePair.first, PHINodePair.second, BB,
		RegionMap, LTS);
IncompletePHINodeMap[BB].clear();		IncompletePHINodeMap[BB].clear();

// And continue with new successors inside the region.		// And continue with new successors inside the region.
for (auto SI = succ_begin(BB), SE = succ_end(BB); SI != SE; SI++)		for (auto SI = succ_begin(BB), SE = succ_end(BB); SI != SE; SI++)
if (R->contains(SI) && SeenBlocks.insert(SI).second)		if (R->contains(SI) && SeenBlocks.insert(SI).second)
Blocks.push_back(*SI);		Blocks.push_back(*SI);

// Remember value in case it is visible after this subregion.		// Remember value in case it is visible after this subregion.
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	for (BasicBlock *BB : SeenBlocks) {

for (auto *PredBBCopy : make_range(pred_begin(BBCopy), pred_end(BBCopy)))		for (auto *PredBBCopy : make_range(pred_begin(BBCopy), pred_end(BBCopy)))
if (LoopPHI->getBasicBlockIndex(PredBBCopy) < 0)		if (LoopPHI->getBasicBlockIndex(PredBBCopy) < 0)
LoopPHI->addIncoming(NullVal, PredBBCopy);		LoopPHI->addIncoming(NullVal, PredBBCopy);

LTS[L] = SE.getUnknown(LoopPHI);		LTS[L] = SE.getUnknown(LoopPHI);
}		}

// Continue generating code in the exit block.		// Merge values in the non-affine region exit node (except it was split when
Builder.SetInsertPoint(&*ExitBBCopy->getFirstInsertionPt());		// the SCoP CFG structure was changed).
		if (!S->hasSingleExitEdge() &&
		S->getRegion().getExitingBlock() == R->getExitingBlock())
		createMergePHIs(R->getExitingBlock(), *R, OuterScalarMap);
		else
		createMergePHIs(R->getExit(), *R, OuterScalarMap);

		Builder.SetInsertPoint(ExitBBCopy->getTerminator());

// Write values visible to other statements.
generateScalarStores(Stmt, LTS, ValueMap);
BlockMap.clear();		BlockMap.clear();
RegionMaps.clear();		RegionMaps.clear();
IncompletePHINodeMap.clear();		IncompletePHINodeMap.clear();
}		}

void RegionGenerator::generateScalarStores(ScopStmt &Stmt, LoopToScevMapT &LTS,
ValueMapT &BBMap) {
const Region &R = Stmt.getParent()->getRegion();

assert(Stmt.getRegion() &&
"Block statements need to use the generateScalarStores() "
"function in the BlockGenerator");

for (MemoryAccess *MA : Stmt) {
if (MA->isArrayKind() \|\| MA->isRead())
continue;

Instruction *ScalarInst = MA->getAccessInstruction();
Value *Val = MA->getAccessValue();

// In case we add the store into an exiting block, we need to restore the
// position for stores in the exit node.
BasicBlock *SavedInsertBB = Builder.GetInsertBlock();
auto SavedInsertionPoint = Builder.GetInsertPoint();
ValueMapT *LocalBBMap = &BBMap;

// Scalar writes induced by PHIs must be written in the incoming blocks.
if (MA->isPHIKind() \|\| MA->isExitPHIKind()) {
BasicBlock *ExitingBB = ScalarInst->getParent();
BasicBlock *ExitingBBCopy = BlockMap[ExitingBB];
Builder.SetInsertPoint(ExitingBBCopy->getTerminator());

// For the incoming blocks, use the block's BBMap instead of the one for
// the entire region.
LocalBBMap = &RegionMaps[ExitingBBCopy];
}

auto Address = getOrCreateAlloca(*MA);

Val = getNewScalarValue(Val, R, Stmt, LTS, *LocalBBMap);
Builder.CreateStore(Val, Address);

// Restore the insertion point if necessary.
if (MA->isPHIKind() \|\| MA->isExitPHIKind())
Builder.SetInsertPoint(SavedInsertBB, SavedInsertionPoint);
}
}

void RegionGenerator::addOperandToPHI(ScopStmt &Stmt, const PHINode *PHI,		void RegionGenerator::addOperandToPHI(ScopStmt &Stmt, const PHINode *PHI,
PHINode PHICopy, BasicBlock IncomingBB,		PHINode PHICopy, BasicBlock IncomingBB,
LoopToScevMapT &LTS) {		ValueMapT &BBMap, LoopToScevMapT &LTS) {
Region *StmtR = Stmt.getRegion();		Region *StmtR = Stmt.getRegion();

// If the incoming block was not yet copied mark this PHI as incomplete.		// If the incoming block was not yet copied mark this PHI as incomplete.
// Once the block will be copied the incoming value will be added.		// Once the block will be copied the incoming value will be added.
BasicBlock *BBCopy = BlockMap[IncomingBB];		BasicBlock *BBCopy = BlockMap.lookup(IncomingBB);
		MeinersburUnsubmitted Not Done Reply Inline Actions Unrelated? Meinersbur: Unrelated?
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Maybe, I do not recall. jdoerfert: Maybe, I do not recall.
if (!BBCopy) {		if (!BBCopy) {
assert(StmtR->contains(IncomingBB) &&		assert(StmtR->contains(IncomingBB) &&
"Bad incoming block for PHI in non-affine region");		"Bad incoming block for PHI in non-affine region");
IncompletePHINodeMap[IncomingBB].push_back(std::make_pair(PHI, PHICopy));		IncompletePHINodeMap[IncomingBB].push_back(std::make_pair(PHI, PHICopy));
return;		return;
}		}

Value *OpCopy = nullptr;		if (PHICopy->getBasicBlockIndex(BBCopy) >= 0)
if (StmtR->contains(IncomingBB)) {		return;
assert(RegionMaps.count(BBCopy) &&
"Incoming PHI block did not have a BBMap");		ValueMapT &BBCopyMap =
ValueMapT &BBCopyMap = RegionMaps[BBCopy];		StmtR->contains(IncomingBB) ? RegionMaps[BBCopy] : BBMap;

Value *Op = PHI->getIncomingValueForBlock(IncomingBB);		Value *Op = PHI->getIncomingValueForBlock(IncomingBB);

BasicBlock *OldBlock = Builder.GetInsertBlock();		BasicBlock *OldBlock = Builder.GetInsertBlock();
auto OldIP = Builder.GetInsertPoint();		auto OldIP = Builder.GetInsertPoint();
Builder.SetInsertPoint(BBCopy->getTerminator());		Builder.SetInsertPoint(BBCopy->getTerminator());
OpCopy = getNewValue(Stmt, Op, BBCopyMap, LTS, getLoopForInst(PHI));		auto *OpCopy = getNewValue(Stmt, Op, BBCopyMap, LTS, getLoopForInst(PHI));
Builder.SetInsertPoint(OldBlock, OldIP);		Builder.SetInsertPoint(OldBlock, OldIP);
} else {

if (PHICopy->getBasicBlockIndex(BBCopy) >= 0)
return;

Value PHIOpAddr = getOrCreatePHIAlloca(const_cast<PHINode >(PHI));
OpCopy = new LoadInst(PHIOpAddr, PHIOpAddr->getName() + ".reload",
BlockMap[IncomingBB]->getTerminator());
}

assert(OpCopy && "Incoming PHI value was not copied properly");		assert(OpCopy && "Incoming PHI value was not copied properly");
assert(BBCopy && "Incoming PHI block was not copied properly");		assert(BBCopy && "Incoming PHI block was not copied properly");
PHICopy->addIncoming(OpCopy, BBCopy);		PHICopy->addIncoming(OpCopy, BBCopy);

		assert(ScalarMaps.count(BBCopy));
		if (ScalarMaps[BBCopy].count(const_cast<PHINode *>(PHI))) {
		assert(ScalarMaps.count(PHICopy->getParent()));
		ScalarMaps[PHICopy->getParent()][const_cast<PHINode *>(PHI)] = PHICopy;
		}
}		}

Value RegionGenerator::copyPHIInstruction(ScopStmt &Stmt, PHINode PHI,		Value RegionGenerator::copyPHIInstruction(ScopStmt &Stmt, PHINode PHI,
ValueMapT &BBMap,		ValueMapT &BBMap,
LoopToScevMapT &LTS) {		LoopToScevMapT &LTS) {
		// Check if the PHI node is in the entry of the non-affine region. If so, two
		// cases need to be distinguished:
		// 1) It is a loop-carried PHI of a loop that is part of the non-affine
		// region. If so we treat it as any other PHI in the non-affine region.
		// 2) It is not loop-carried or the loop is outside the non-affine region.
		// In this case we revert to the BlockGenerator::copyPHIInstruction.
		assert(Stmt.isRegionStmt());
		Region *R = Stmt.getRegion();
		if (R->getEntry() == PHI->getParent()) {
		Loop *L = LI.getLoopFor(PHI->getParent());
		if (!L \|\| L->getHeader() != PHI->getParent() \|\| !R->contains(L))
		return BlockGenerator::copyPHIInstruction(Stmt, PHI, BBMap, LTS);
		}

unsigned NumIncoming = PHI->getNumIncomingValues();		unsigned NumIncoming = PHI->getNumIncomingValues();
PHINode *PHICopy =		PHINode *PHICopy =
Builder.CreatePHI(PHI->getType(), NumIncoming, "polly." + PHI->getName());		Builder.CreatePHI(PHI->getType(), NumIncoming, "polly." + PHI->getName());
PHICopy->moveBefore(PHICopy->getParent()->getFirstNonPHI());		PHICopy->moveBefore(PHICopy->getParent()->getFirstNonPHI());
BBMap[PHI] = PHICopy;		BBMap[PHI] = PHICopy;

for (unsigned u = 0; u < NumIncoming; u++)		for (unsigned u = 0; u < NumIncoming; u++)
addOperandToPHI(Stmt, PHI, PHICopy, PHI->getIncomingBlock(u), LTS);		addOperandToPHI(Stmt, PHI, PHICopy, PHI->getIncomingBlock(u), BBMap, LTS);
return PHICopy;		return PHICopy;
}		}

lib/CodeGen/CodeGeneration.cpp

Show First 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	if (!NodeBuilder.preloadInvariantLoads()) {
auto *StartBBTerm = StartBlock->getTerminator();		auto *StartBBTerm = StartBlock->getTerminator();
Builder.SetInsertPoint(StartBBTerm);		Builder.SetInsertPoint(StartBBTerm);
Builder.CreateUnreachable();		Builder.CreateUnreachable();
StartBBTerm->eraseFromParent();		StartBBTerm->eraseFromParent();
isl_ast_node_free(AstRoot);		isl_ast_node_free(AstRoot);

} else {		} else {

		NodeBuilder.createPHIInitialization();
NodeBuilder.addParameters(S.getContext());		NodeBuilder.addParameters(S.getContext());

Value *RTC = buildRTC(Builder, NodeBuilder.getExprBuilder());		Value *RTC = buildRTC(Builder, NodeBuilder.getExprBuilder());
Builder.GetInsertBlock()->getTerminator()->setOperand(0, RTC);		Builder.GetInsertBlock()->getTerminator()->setOperand(0, RTC);
Builder.SetInsertPoint(&StartBlock->front());		Builder.SetInsertPoint(&StartBlock->front());

NodeBuilder.create(AstRoot);		NodeBuilder.create(AstRoot);

▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

lib/CodeGen/IslNodeBuilder.cpp

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
#include "isl/map.h"		#include "isl/map.h"
#include "isl/set.h"		#include "isl/set.h"
#include "isl/union_map.h"		#include "isl/union_map.h"
#include "isl/union_set.h"		#include "isl/union_set.h"

using namespace polly;		using namespace polly;
using namespace llvm;		using namespace llvm;

		void IslNodeBuilder::createPHIInitialization() {

		Region &R = S.getRegion();

		// The split block __just before__ the region and optimized region.
		BasicBlock *SplitBB = R.getEnteringBlock();
		BasicBlock *EntryBB = R.getEntry();

		for (auto &Pair : S.arrays()) {
		auto &Array = Pair.second;
		if (Array->getNumberOfDimensions() != 0 \|\| !Array->isPHIKind())
		MeinersburUnsubmitted Not Done Reply Inline Actions `Array->getNumberOfDimensions()!=0` seems redundant. MK_PHI always has 0 dimensions. Meinersbur: `Array->getNumberOfDimensions()!=0` seems redundant. MK_PHI always has 0 dimensions.
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Probably true, I do again not recall why I did it this way. jdoerfert: Probably true, I do again not recall why I did it this way.
		continue;

		// For PHI nodes, the only values we need to store are the ones that
		// reach the PHI node from outside the region. In general there should
		// only be one such incoming edge and this edge should enter through
		// 'SplitBB'.
		auto *PHI = cast<PHINode>(Array->getBasePtr());
		if (PHI->getParent() != EntryBB)
		continue;

		for (auto BI = PHI->block_begin(), BE = PHI->block_end(); BI != BE; BI++)
		if (!R.contains(BI) && BI != SplitBB)
		llvm_unreachable("Incoming edges from outside the scop should always "
		"come from SplitBB");

		int Idx = PHI->getBasicBlockIndex(SplitBB);
		if (Idx < 0)
		continue;

		Value *ScalarValue = PHI->getIncomingValue(Idx);
		ScalarMap[PHI] = ScalarValue;
		}
		}

__isl_give isl_ast_expr *		__isl_give isl_ast_expr *
IslNodeBuilder::getUpperBound(__isl_keep isl_ast_node *For,		IslNodeBuilder::getUpperBound(__isl_keep isl_ast_node *For,
ICmpInst::Predicate &Predicate) {		ICmpInst::Predicate &Predicate) {
isl_id UBID, IteratorID;		isl_id UBID, IteratorID;
isl_ast_expr Cond, Iterator, UB, Arg0;		isl_ast_expr Cond, Iterator, UB, Arg0;
isl_ast_op_type Type;		isl_ast_op_type Type;

Cond = isl_ast_node_for_get_cond(For);		Cond = isl_ast_node_for_get_cond(For);
▲ Show 20 Lines • Show All 114 Lines • ▼ Show 20 Lines

struct SubtreeReferences {		struct SubtreeReferences {
LoopInfo &LI;		LoopInfo &LI;
ScalarEvolution &SE;		ScalarEvolution &SE;
Region &R;		Region &R;
ValueMapT &GlobalMap;		ValueMapT &GlobalMap;
SetVector<Value *> &Values;		SetVector<Value *> &Values;
SetVector<const SCEV *> &SCEVs;		SetVector<const SCEV *> &SCEVs;
BlockGenerator &BlockGen;		ValueMapT &ScalarMap;
};		};

/// @brief Extract the values and SCEVs needed to generate code for a block.		/// @brief Extract the values and SCEVs needed to generate code for a block.
static int findReferencesInBlock(struct SubtreeReferences &References,		static int findReferencesInBlock(struct SubtreeReferences &References,
const ScopStmt Stmt, const BasicBlock BB) {		const ScopStmt Stmt, const BasicBlock BB) {
for (const Instruction &Inst : *BB)		for (const Instruction &Inst : *BB)
for (Value *SrcVal : Inst.operands())		for (Value *SrcVal : Inst.operands())
if (canSynthesize(SrcVal, &References.LI, &References.SE,		if (canSynthesize(SrcVal, &References.LI, &References.SE,
Show All 34 Lines	if (Access->isArrayKind()) {
if (Instruction *OpInst = dyn_cast<Instruction>(BasePtr))		if (Instruction *OpInst = dyn_cast<Instruction>(BasePtr))
if (Stmt->getParent()->getRegion().contains(OpInst))		if (Stmt->getParent()->getRegion().contains(OpInst))
continue;		continue;

References.Values.insert(BasePtr);		References.Values.insert(BasePtr);
continue;		continue;
}		}

References.Values.insert(References.BlockGen.getOrCreateAlloca(*Access));		if (!Access->isRead())
		continue;

		// Copy the newest version of the access value if it is a SCoP intern
		// instruction and a new version exists. Also copy the access value if it
		// is SCoP extern.
		auto *AccessValue = Access->getAccessValue();
		if (auto *AccessInst = dyn_cast<Instruction>(AccessValue))
		if (References.R.contains(AccessInst)) {
		if (Value *NewAccessInst = References.ScalarMap.lookup(AccessInst))
		References.Values.insert(NewAccessInst);
		continue;
		}

		References.Values.insert(AccessValue);
}		}

return isl_stat_ok;		return isl_stat_ok;
}		}

/// Extract the out-of-scop values and SCEVs referenced from a set describing		/// Extract the out-of-scop values and SCEVs referenced from a set describing
/// a ScopStmt.		/// a ScopStmt.
///		///
Show All 38 Lines	IslNodeBuilder::getScheduleForAstNode(__isl_keep isl_ast_node *For) {
return IslAstInfo::getSchedule(For);		return IslAstInfo::getSchedule(For);
}		}

void IslNodeBuilder::getReferencesInSubtree(__isl_keep isl_ast_node *For,		void IslNodeBuilder::getReferencesInSubtree(__isl_keep isl_ast_node *For,
SetVector<Value *> &Values,		SetVector<Value *> &Values,
SetVector<const Loop *> &Loops) {		SetVector<const Loop *> &Loops) {

SetVector<const SCEV *> SCEVs;		SetVector<const SCEV *> SCEVs;
struct SubtreeReferences References = {		struct SubtreeReferences References = {LI, SE, S.getRegion(), ValueMap,
LI, SE, S.getRegion(), ValueMap, Values, SCEVs, getBlockGenerator()};		Values, SCEVs, ScalarMap};

for (const auto &I : IDToValue)		for (const auto &I : IDToValue)
Values.insert(I.second);		Values.insert(I.second);

for (const auto &I : OutsideLoopIterations)		for (const auto &I : OutsideLoopIterations)
Values.insert(cast<SCEVUnknown>(I.second)->getValue());		Values.insert(cast<SCEVUnknown>(I.second)->getValue());

isl_union_set *Schedule = isl_union_map_domain(getScheduleForAstNode(For));		isl_union_set *Schedule = isl_union_map_domain(getScheduleForAstNode(For));
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	void IslNodeBuilder::createUserVector(__isl_take isl_ast_node *User,
std::vector<LoopToScevMapT> VLTS(IVS.size());		std::vector<LoopToScevMapT> VLTS(IVS.size());

isl_union_set *Domain = isl_union_set_from_set(Stmt->getDomain());		isl_union_set *Domain = isl_union_set_from_set(Stmt->getDomain());
Schedule = isl_union_map_intersect_domain(Schedule, Domain);		Schedule = isl_union_map_intersect_domain(Schedule, Domain);
isl_map *S = isl_map_from_union_map(Schedule);		isl_map *S = isl_map_from_union_map(Schedule);

auto *NewAccesses = createNewAccesses(Stmt, User);		auto *NewAccesses = createNewAccesses(Stmt, User);
createSubstitutionsVector(Expr, Stmt, VLTS, IVS, IteratorID);		createSubstitutionsVector(Expr, Stmt, VLTS, IVS, IteratorID);
VectorBlockGenerator::generate(BlockGen, *Stmt, VLTS, S, NewAccesses);		VectorBlockGenerator::generate(BlockGen, *Stmt, ScalarMap, VLTS, S,
		NewAccesses);
isl_id_to_ast_expr_free(NewAccesses);		isl_id_to_ast_expr_free(NewAccesses);
isl_map_free(S);		isl_map_free(S);
isl_id_free(Id);		isl_id_free(Id);
isl_ast_node_free(User);		isl_ast_node_free(User);
}		}

void IslNodeBuilder::createMark(__isl_take isl_ast_node *Node) {		void IslNodeBuilder::createMark(__isl_take isl_ast_node *Node) {
auto Child = isl_ast_node_mark_get_node(Node);		auto Child = isl_ast_node_mark_get_node(Node);
▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	void IslNodeBuilder::createForSequential(__isl_take isl_ast_node *For) {
// If we can show that LB <Predicate> UB holds at least once, we can		// If we can show that LB <Predicate> UB holds at least once, we can
// omit the GuardBB in front of the loop.		// omit the GuardBB in front of the loop.
bool UseGuardBB =		bool UseGuardBB =
!SE.isKnownPredicate(Predicate, SE.getSCEV(ValueLB), SE.getSCEV(ValueUB));		!SE.isKnownPredicate(Predicate, SE.getSCEV(ValueLB), SE.getSCEV(ValueUB));
IV = createLoop(ValueLB, ValueUB, ValueInc, Builder, P, LI, DT, ExitBlock,		IV = createLoop(ValueLB, ValueUB, ValueInc, Builder, P, LI, DT, ExitBlock,
Predicate, &Annotator, Parallel, UseGuardBB);		Predicate, &Annotator, Parallel, UseGuardBB);
IDToValue[IteratorID] = IV;		IDToValue[IteratorID] = IV;

		auto PreMap = ScalarMap;
		MeinersburUnsubmitted Not Done Reply Inline Actions Naive question: Is there some way we could have a nested instance of IslNodeBuilder/function call with fresh values instead of temporarily storing `LCPHIs`, `LoopDepth`, `PreMap` away and restoring it afterwards? Meinersbur: Naive question: Is there some way we could have a nested instance of IslNodeBuilder/function…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Possibly, but that is not an easy task and it would not even help much. We would need to copy ScalarMap to the new "instance" anyway. If we think of a good solution we can implement it in the paralell code generation too (we currently copy the maps there too). jdoerfert: Possibly, but that is not an easy task and it would not even help much. We would need to copy…
		auto PreLCPHIs = LCPHIs;
		LCPHIs.clear();
		MeinersburUnsubmitted Not Done Reply Inline Actions std::move Meinersbur: std::move
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions That would "destory" LCPHIs here, wouldn't it? AFAIK you shall not access something that you passed to std::move. We still need the container but just without its content. jdoerfert: That would "destory" LCPHIs here, wouldn't it? AFAIK you shall not access something that you…
		MeinersburUnsubmitted Not Done Reply Inline Actions std::swap then? ValueMapT PreLCPHIs; LCPHIs.swap(PreLCPHIs); Although I'd agree this doesn't really make the intend clearer; It's up to you. Meinersbur: std::swap then? ``` ValueMapT PreLCPHIs; LCPHIs.swap(PreLCPHIs); ``` Although I'd agree this…

		LoopDepth++;

create(Body);		create(Body);

Annotator.popLoop(Parallel);		Annotator.popLoop(Parallel);

IDToValue.erase(IDToValue.find(IteratorID));		IDToValue.erase(IDToValue.find(IteratorID));

		auto *IVPHI = dyn_cast<PHINode>(IV);
		auto *HeaderBB = IVPHI ? IVPHI->getParent() : nullptr;
		if (HeaderBB) {
		Builder.SetInsertPoint(HeaderBB->getFirstNonPHI());

		auto *PreLoopBB = IVPHI->getIncomingBlock(0);
		auto *BackedgeBB = IVPHI->getIncomingBlock(1);
		for (const auto &PHIMapping : LCPHIs) {

		Value *V = PHIMapping.first;
		Value *PHICopyVal = PHIMapping.second;
		PHINode *PHICopy = cast<PHINode>(PHICopyVal);

		Value *PreVal = PreMap.lookup(V);
		if (!PreVal)
		PreVal = UndefValue::get(V->getType());
		sebpopUnsubmitted Not Done Reply Inline Actions I don't understand how these undef values are supposed to work. Are you cleaning them up in a later pass? I see in some of the testcases that they are still left in the IR after Polly. sebpop: I don't understand how these undef values are supposed to work. Are you cleaning them up in a…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Undef values are already "used" in the current code generation but simply hidden well enough. When a scalar is used before it is initialized we basically get an undef. Currently this means a load from an uninitialized alloca is basically an undef. As we promoted these allocas, loads and stores now we see the undefs in the IR, however they should only exist where the alloca content was undefined before. An example: int x; for i = 0...N { if (i > 0) A[i-1] = x; x = A[i]; } Here the initial value of x is undefined and therefor the PHI in the loop header as well as the PHI after the loop will have one undef as operand. In the following example there were no undefs prior to Polly in the IR but right after code generation there are. if (a) { S: x = A[i]; /* split BB / P: A[i] = x; } Now we split the conditional for whatever reason and Polly will generate a CFG that kinda looks like this: if (a) S: x1 = A[i]; x = phi(undef, x1) if (a) P: A[i] = x; We need the undef for the path that does not define x (or x1). Does that make sense? jdoerfert:* Undef values are already "used" in the current code generation but simply hidden well enough.
		MeinersburUnsubmitted Not Done Reply Inline Actions Could you explain this in the code as a comment? Meinersbur: Could you explain this in the code as a comment?
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Sure, but I will have to rephrase it a bit. jdoerfert: Sure, but I will have to rephrase it a bit.

		Value *BodyVal = ScalarMap.lookup(PHICopy);
		if (!BodyVal)
		BodyVal = UndefValue::get(V->getType());

		PHICopy->insertBefore(HeaderBB->getFirstNonPHI());
		PHICopy->addIncoming(PreVal, PreLoopBB);
		PHICopy->addIncoming(BodyVal, BackedgeBB);
		}
		}

Builder.SetInsertPoint(&ExitBlock->front());		Builder.SetInsertPoint(&ExitBlock->front());

		if (UseGuardBB) {
		auto PredIt = pred_begin(ExitBlock);
		auto LoopExitingBB = PredIt++;
		auto PreLoopBB = PredIt++;
		assert(PredIt == pred_end(ExitBlock));
		createMergePHIs(Builder, {PreLoopBB, LoopExitingBB}, {&PreMap, &ScalarMap},
		PreMap);
		ScalarMap = PreMap;
		}

		LoopDepth--;
		LCPHIs = PreLCPHIs;

isl_ast_node_free(For);		isl_ast_node_free(For);
isl_ast_expr_free(Iterator);		isl_ast_expr_free(Iterator);
isl_id_free(IteratorID);		isl_id_free(IteratorID);
}		}

/// @brief Remove the BBs contained in a (sub)function from the dominator tree.		/// @brief Remove the BBs contained in a (sub)function from the dominator tree.
///		///
/// This function removes the basic blocks that are part of a subfunction from		/// This function removes the basic blocks that are part of a subfunction from
▲ Show 20 Lines • Show All 192 Lines • ▼ Show 20 Lines	void IslNodeBuilder::createIf(__isl_take isl_ast_node *If) {
Value *Predicate = ExprBuilder.create(Cond);		Value *Predicate = ExprBuilder.create(Cond);
Builder.CreateCondBr(Predicate, ThenBB, ElseBB);		Builder.CreateCondBr(Predicate, ThenBB, ElseBB);
Builder.SetInsertPoint(ThenBB);		Builder.SetInsertPoint(ThenBB);
Builder.CreateBr(MergeBB);		Builder.CreateBr(MergeBB);
Builder.SetInsertPoint(ElseBB);		Builder.SetInsertPoint(ElseBB);
Builder.CreateBr(MergeBB);		Builder.CreateBr(MergeBB);
Builder.SetInsertPoint(&ThenBB->front());		Builder.SetInsertPoint(&ThenBB->front());

		auto PreMap = ScalarMap;

create(isl_ast_node_if_get_then(If));		create(isl_ast_node_if_get_then(If));

		auto *ThenExitBB = Builder.GetInsertBlock();
		auto ThenMap = ScalarMap;

Builder.SetInsertPoint(&ElseBB->front());		Builder.SetInsertPoint(&ElseBB->front());

		ScalarMap = PreMap;

if (isl_ast_node_if_has_else(If))		if (isl_ast_node_if_has_else(If))
create(isl_ast_node_if_get_else(If));		create(isl_ast_node_if_get_else(If));

		auto ElseMap = ScalarMap;
		MeinersburUnsubmitted Not Done Reply Inline Actions std::move Meinersbur: std::move
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions [See above for more comments] I would like to get it in like this and we can optimize it later, especially since std::move might trigger hard to debug problems I want to avoid for now. jdoerfert: [See above for more comments] I would like to get it in like this and we can optimize it later…
		auto *ElseExitBB = Builder.GetInsertBlock();

Builder.SetInsertPoint(&MergeBB->front());		Builder.SetInsertPoint(&MergeBB->front());

		ScalarMap = PreMap;
		createMergePHIs(Builder, {ThenExitBB, ElseExitBB}, {&ThenMap, &ElseMap},
		ScalarMap);

isl_ast_node_free(If);		isl_ast_node_free(If);
}		}

__isl_give isl_id_to_ast_expr *		__isl_give isl_id_to_ast_expr *
IslNodeBuilder::createNewAccesses(ScopStmt *Stmt,		IslNodeBuilder::createNewAccesses(ScopStmt *Stmt,
__isl_keep isl_ast_node *Node) {		__isl_keep isl_ast_node *Node) {
isl_id_to_ast_expr *NewAccesses =		isl_id_to_ast_expr *NewAccesses =
isl_id_to_ast_expr_alloc(Stmt->getParent()->getIslCtx(), 0);		isl_id_to_ast_expr_alloc(Stmt->getParent()->getIslCtx(), 0);
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	void IslNodeBuilder::createUser(__isl_take isl_ast_node *User) {

LTS.insert(OutsideLoopIterations.begin(), OutsideLoopIterations.end());		LTS.insert(OutsideLoopIterations.begin(), OutsideLoopIterations.end());

Stmt = (ScopStmt *)isl_id_get_user(Id);		Stmt = (ScopStmt *)isl_id_get_user(Id);
auto *NewAccesses = createNewAccesses(Stmt, User);		auto *NewAccesses = createNewAccesses(Stmt, User);
createSubstitutions(Expr, Stmt, LTS);		createSubstitutions(Expr, Stmt, LTS);

if (Stmt->isBlockStmt())		if (Stmt->isBlockStmt())
BlockGen.copyStmt(*Stmt, LTS, NewAccesses);		BlockGen.copyStmt(*Stmt, ScalarMap, LTS, NewAccesses);
else		else
RegionGen.copyStmt(*Stmt, LTS, NewAccesses);		RegionGen.copyStmt(*Stmt, ScalarMap, LTS, NewAccesses);

isl_id_to_ast_expr_free(NewAccesses);		isl_id_to_ast_expr_free(NewAccesses);
isl_ast_node_free(User);		isl_ast_node_free(User);
isl_id_free(Id);		isl_id_free(Id);
}		}

void IslNodeBuilder::createBlock(__isl_take isl_ast_node *Block) {		void IslNodeBuilder::createBlock(__isl_take isl_ast_node *Block) {
isl_ast_node_list *List = isl_ast_node_block_get_children(Block);		isl_ast_node_list *List = isl_ast_node_block_get_children(Block);
▲ Show 20 Lines • Show All 257 Lines • ▼ Show 20 Lines	bool IslNodeBuilder::preloadInvariantEquivClass(
if (!PreloadVal)		if (!PreloadVal)
return false;		return false;

assert(PreloadVal->getType() == AccInst->getType());		assert(PreloadVal->getType() == AccInst->getType());
for (const MemoryAccess *MA : MAs) {		for (const MemoryAccess *MA : MAs) {
Instruction *MAAccInst = MA->getAccessInstruction();		Instruction *MAAccInst = MA->getAccessInstruction();
// TODO: The bitcast here is wrong. In case of floating and non-floating		// TODO: The bitcast here is wrong. In case of floating and non-floating
// point values we need to reload the value or convert it.		// point values we need to reload the value or convert it.
ValueMap[MAAccInst] =		Value *CastedVal =
Builder.CreateBitOrPointerCast(PreloadVal, MAAccInst->getType());		Builder.CreateBitOrPointerCast(PreloadVal, MAAccInst->getType());
		ValueMap[MAAccInst] = CastedVal;
		ScalarMap[MAAccInst] = CastedVal;
}		}

if (SE.isSCEVable(AccInstTy)) {		if (SE.isSCEVable(AccInstTy)) {
isl_id *ParamId = S.getIdForParam(SE.getSCEV(AccInst));		isl_id *ParamId = S.getIdForParam(SE.getSCEV(AccInst));
if (ParamId)		if (ParamId)
IDToValue[ParamId] = PreloadVal;		IDToValue[ParamId] = PreloadVal;
isl_id_free(ParamId);		isl_id_free(ParamId);
}		}

BasicBlock *EntryBB = &Builder.GetInsertBlock()->getParent()->getEntryBlock();		ScalarMap[AccInst] = PreloadVal;
auto *Alloca = new AllocaInst(AccInstTy, AccInst->getName() + ".preload.s2a");
Alloca->insertBefore(&*EntryBB->getFirstInsertionPt());
Builder.CreateStore(PreloadVal, Alloca);

for (auto *DerivedSAI : SAI->getDerivedSAIs()) {		for (auto *DerivedSAI : SAI->getDerivedSAIs()) {
Value *BasePtr = DerivedSAI->getBasePtr();		Value *BasePtr = DerivedSAI->getBasePtr();

for (const MemoryAccess *MA : MAs) {		for (const MemoryAccess *MA : MAs) {
// As the derived SAI information is quite coarse, any load from the		// As the derived SAI information is quite coarse, any load from the
// current SAI could be the base pointer of the derived SAI, however we		// current SAI could be the base pointer of the derived SAI, however we
// should only change the base pointer of the derived SAI if we actually		// should only change the base pointer of the derived SAI if we actually
// preloaded it.		// preloaded it.
if (BasePtr == MA->getBaseAddr()) {		if (BasePtr == MA->getBaseAddr()) {
// TODO: The bitcast here is wrong. In case of floating and non-floating		// TODO: The bitcast here is wrong. In case of floating and non-floating
// point values we need to reload the value or convert it.		// point values we need to reload the value or convert it.
BasePtr =		BasePtr =
Builder.CreateBitOrPointerCast(PreloadVal, BasePtr->getType());		Builder.CreateBitOrPointerCast(PreloadVal, BasePtr->getType());
DerivedSAI->setBasePtr(BasePtr);		DerivedSAI->setBasePtr(BasePtr);
}		}

// For scalar derived SAIs we remap the alloca used for the derived value.
if (BasePtr == MA->getAccessInstruction()) {
if (DerivedSAI->isPHIKind())
PHIOpMap[BasePtr] = Alloca;
else
ScalarMap[BasePtr] = Alloca;
}
}		}
}		}

const Region &R = S.getRegion();		const Region &R = S.getRegion();
for (const MemoryAccess *MA : MAs) {		for (const MemoryAccess *MA : MAs) {

Instruction *MAAccInst = MA->getAccessInstruction();		Instruction *MAAccInst = MA->getAccessInstruction();
// Use the escape system to get the correct value to users outside the SCoP.		// Use the escape system to get the correct value to users outside the SCoP.
BlockGenerator::EscapeUserVectorTy EscapeUsers;		BlockGenerator::EscapeUserVectorTy EscapeUsers;
for (auto *U : MAAccInst->users())		for (auto *U : MAAccInst->users())
if (Instruction *UI = dyn_cast<Instruction>(U))		if (Instruction *UI = dyn_cast<Instruction>(U))
if (!R.contains(UI))		if (!R.contains(UI))
EscapeUsers.push_back(UI);		EscapeUsers.push_back(UI);

if (EscapeUsers.empty())		if (EscapeUsers.empty())
continue;		continue;

EscapeMap[MA->getAccessInstruction()] =		EscapeMap[MA->getAccessInstruction()] = std::move(EscapeUsers);
std::make_pair(Alloca, std::move(EscapeUsers));
}		}

return true;		return true;
}		}

bool IslNodeBuilder::preloadInvariantLoads() {		bool IslNodeBuilder::preloadInvariantLoads() {

const auto &InvariantEquivClasses = S.getInvariantAccesses();		const auto &InvariantEquivClasses = S.getInvariantAccesses();
▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

lib/Support/RegisterPasses.cpp

	Show All 36 Lines
	using namespace llvm;			using namespace llvm;
	using namespace polly;			using namespace polly;

	cl::OptionCategory PollyCategory("Polly Options",			cl::OptionCategory PollyCategory("Polly Options",
	"Configure the polly loop optimizer");			"Configure the polly loop optimizer");

	static cl::opt<bool>			static cl::opt<bool>
	PollyEnabled("polly", cl::desc("Enable the polly optimizer (only at -O3)"),			PollyEnabled("polly", cl::desc("Enable the polly optimizer (only at -O3)"),
	cl::init(false), cl::ZeroOrMore, cl::cat(PollyCategory));			cl::init(true), cl::ZeroOrMore, cl::cat(PollyCategory));
				MeinersburUnsubmitted Not Done Reply Inline Actions Assuming debugging leftover Meinersbur: Assuming debugging leftover
				jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Indeed, again. jdoerfert: Indeed, again.

	static cl::opt<bool> PollyDetectOnly(			static cl::opt<bool> PollyDetectOnly(
	"polly-only-scop-detection",			"polly-only-scop-detection",
	cl::desc("Only run scop detection, but no other optimizations"),			cl::desc("Only run scop detection, but no other optimizations"),
	cl::init(false), cl::ZeroOrMore, cl::cat(PollyCategory));			cl::init(false), cl::ZeroOrMore, cl::cat(PollyCategory));

	enum PassPositionChoice {			enum PassPositionChoice {
	POSITION_EARLY,			POSITION_EARLY,
	▲ Show 20 Lines • Show All 280 Lines • Show Last 20 Lines

lib/Support/ScopHelper.cpp

Show First 20 Lines • Show All 447 Lines • ▼ Show 20 Lines	bool polly::canSynthesize(const Value V, const llvm::LoopInfo LI,

if (const SCEV Scev = SE->getSCEV(const_cast<Value >(V)))		if (const SCEV Scev = SE->getSCEV(const_cast<Value >(V)))
if (!isa<SCEVCouldNotCompute>(Scev))		if (!isa<SCEVCouldNotCompute>(Scev))
if (!hasScalarDepsInsideRegion(Scev, R))		if (!hasScalarDepsInsideRegion(Scev, R))
return true;		return true;

return false;		return false;
}		}

		static Value mergeValues(PollyIRBuilder &Builder, Value Val,
		const ArrayRef<BasicBlock *> &BBs,
		const ArrayRef<ValueMapT *> &Maps) {
		assert(BBs.size() > 0 && BBs.size() == Maps.size());
		SmallPtrSet<Value *, 4> Values;

		auto *MergePHI = Builder.CreatePHI(Val->getType(), BBs.size());
		for (unsigned u = 0, e = BBs.size(); u < e; u++) {
		Value *IncomingVal = Maps[u]->lookup(Val);

		if (!IncomingVal)
		IncomingVal = UndefValue::get(Val->getType());
		MeinersburUnsubmitted Not Done Reply Inline Actions If the incoming value is not found in ScalarMap, I can think of two reasons: It's defined before the Scop We forgot to put in there It was defined only in one incoming branch. In none of these cases "undef" would make sense. In 3) it cannot be used without an explicit PHI in the original code. What other case is there? Meinersbur: If the incoming value is not found in ScalarMap, I can think of two reasons: 1) It's defined…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions and 2) cannot/should never happen or would at least be bugs. 2) is clearly a bug and 1) should always be present. The reason for the undef here is the same as above, thus 3). Starting with: if (a) { S: x = A[i]; P: A[i] = x; } the scheduler can generate: if (a) { S: x = A[i]; } if (a) { P: A[i] = x; } which needs a PHI (for x after the conditional containing S) even though the original code did not contain/need any. Additionally, there is only one path to that PHI that defines a value for x but we need an operand for the other path reaching the PHI. As I explained Sebastian, these undef's were basically present before, as "content" of the alloca slots we generated. And if you look at the code we generate at the momement for the example abvoe and run mem2reg you will see the same udnef's again. jdoerfert: 1) and 2) cannot/should never happen or would at least be bugs. 2) is clearly a bug and 1)…
		MeinersburUnsubmitted Not Done Reply Inline Actions Thank you for the explanation with example Meinersbur: Thank you for the explanation with example

		MergePHI->addIncoming(IncomingVal, BBs[u]);
		Values.insert(IncomingVal);

		if (!IncomingVal->hasName() \|\| MergePHI->hasName())
		continue;

		MergePHI->setName(IncomingVal->getName() + ".merge");
		MeinersburUnsubmitted Not Done Reply Inline Actions Wouldn't it be better to name the PHI after the original value instead one of the incoming values? Meinersbur: Wouldn't it be better to name the PHI after the original value instead one of the incoming…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Maybe, I do not recall if this was a concious desicion or not. jdoerfert: Maybe, I do not recall if this was a concious desicion or not.
		}

		if (Values.size() == 1) {
		MeinersburUnsubmitted Not Done Reply Inline Actions Because Value.size() will be equal to BBs.size(), this condition can be tested before creating an not used MergePHI. Meinersbur: Because Value.size() will be equal to BBs.size(), this condition can be tested before creating…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions It is not necessarily equal to BBs.size() as the mappings in several BBs can be the same. If that happens to be the case for all BBs we can remove the PHI immediatly but we do not know that until we looked up all mappings. jdoerfert: It is not necessarily equal to BBs.size() as the mappings in several BBs can be the same. If…
		MeinersburUnsubmitted Not Done Reply Inline Actions Thank you; I missed that Values is a set that 'collapses' equal values. Meinersbur: Thank you; I missed that Values is a set that 'collapses' equal values.
		MergePHI->eraseFromParent();
		return *Values.begin();
		sebpopUnsubmitted Not Done Reply Inline Actions You could also check whether all arguments of the MergePHI node are the same, in which case you do not need the phi node, and just return the first arg. sebpop: You could also check whether all arguments of the MergePHI node are the same, in which case you…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions That is what this code is doing, at least it was supposted to do that. Note that the set "Values" contains all operands of the MergePHI and if it containts only one Value we know all operands are the same, otherwise we know they are not. Do you see a problematic case? jdoerfert: That is what this code is doing, at least it was supposted to do that. Note that the set…
		}

		return MergePHI;
		}

		void polly::createMergePHIs(PollyIRBuilder &Builder,
		const ArrayRef<BasicBlock *> &BBs,
		const ArrayRef<ValueMapT *> &Maps,
		ValueMapT &MergeMap) {
		assert(BBs.size() == Maps.size());

		SmallPtrSet<Value *, 16> MergedValues;
		for (const auto &Map : Maps) {
		for (const auto &Item : *Map) {
		Value *OriginalValue = Item.first;
		if (!isa<Instruction>(OriginalValue))
		continue;
		if (!MergedValues.insert(OriginalValue).second)
		continue;

		auto *MergedVal = mergeValues(Builder, OriginalValue, BBs, Maps);
		MergeMap[OriginalValue] = MergedVal;
		}
		}
		}

test/Isl/CodeGen/MemAccess/update_access_functions.ll

	; RUN: opt %loadPolly -polly-import-jscop -polly-import-jscop-dir=%S \			; RUN: opt %loadPolly -polly-import-jscop -polly-import-jscop-dir=%S \
	; RUN: -polly-import-jscop-postfix=transformed -polly-codegen \			; RUN: -polly-import-jscop-postfix=transformed -polly-codegen \
	; RUN: < %s -S \| FileCheck %s			; RUN: < %s -S \| FileCheck %s

				; CHECK: %val_p_scalar_.merge = phi double [ undef, %polly.loop_if2 ], [ %val_p_scalar_, %polly.stmt.loop2
				;
	; CHECK: polly.stmt.loop2:			; CHECK: polly.stmt.loop2:
	; CHECK-NEXT: %polly.access.A = getelementptr double, double* %A, i64 42			; CHECK-NEXT: %polly.access.A = getelementptr double, double* %A, i64 42
	; CHECK-NEXT: %val_p_scalar_ = load double, double* %polly.access.A			; CHECK-NEXT: %val_p_scalar_ = load double, double* %polly.access.A

	; CHECK: polly.stmt.loop3:			; CHECK: polly.stmt.loop3:
	; CHECK-NEXT: %val.s2a.reload = load double, double* %val.s2a
	; CHECK-NEXT: %polly.access.A20 = getelementptr double, double* %A, i64 42			; CHECK-NEXT: %polly.access.A20 = getelementptr double, double* %A, i64 42
	; CHECK-NEXT: store double %val.s2a.reload, double* %polly.access.A20			; CHECK-NEXT: store double %val_p_scalar_.merge, double* %polly.access.A20

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define void @update_access_functions(i64 %arg, double* %A) {			define void @update_access_functions(i64 %arg, double* %A) {
	bb3:			bb3:
	br label %loop1			br label %loop1

	loop1:			loop1:
	Show All 26 Lines

test/Isl/CodeGen/OpenMP/invariant_base_pointer_preloaded_different_bb.ll

	; RUN: opt %loadPolly -polly-codegen -polly-parallel \			; RUN: opt %loadPolly -polly-codegen -polly-parallel \
	; RUN: -polly-parallel-force -S < %s \| FileCheck %s			; RUN: -polly-parallel-force -S < %s \| FileCheck %s
	;			;
	; Test to verify that we hand down the preloaded A[0] to the OpenMP subfunction.			; Test to verify that we hand down the preloaded A[0] to the OpenMP subfunction.
	;			;
	; void f(float *A) {			; void f(float *A) {
	; for (int i = 1; i < 1000; i++)			; for (int i = 1; i < 1000; i++)
	; A[i] += /* split bb */ A[0];			; A[i] += /* split bb */ A[0];
	; }			; }
	; A[0] tmp (unused) A			; A[0] A
	; CHECK: %polly.par.userContext = alloca { float, float, float }			; CHECK: %polly.par.userContext = alloca { float, float* }
	;			;
	; CHECK: %polly.subfn.storeaddr.polly.access.A.load = getelementptr inbounds			; CHECK: %polly.subfn.storeaddr.polly.access.A.load = getelementptr inbounds
	; CHECK: store float %polly.access.A.load, float* %polly.subfn.storeaddr.polly.access.A.load			; CHECK: store float %polly.access.A.load, float* %polly.subfn.storeaddr.polly.access.A.load
	;			;
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define void @f(float* nocapture %A) {			define void @f(float* nocapture %A) {
	entry:			entry:
	Show All 19 Lines

test/Isl/CodeGen/OpenMP/single_loop_with_param.ll

	; RUN: opt %loadPolly -polly-parallel \			; RUN: opt %loadPolly -polly-parallel -polly-parallel-force -polly-codegen -S \
	; RUN: -polly-parallel-force -polly-codegen -S -verify-dom-info < %s \			; RUN: -verify-dom-info < %s \| FileCheck %s
	; RUN: \| FileCheck %s -check-prefix=IR

	; #define N 1024			; #define N 1024
	; float A[N];			; float A[N];
	;			;
	; void single_parallel_loop(float alpha) {			; void single_parallel_loop(float alpha) {
	; for (long i = 0; i < N; i++)			; for (long i = 0; i < N; i++)
	; A[i] = alpha;			; A[i] = alpha;
	; }			; }

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"

	; Ensure the scalars are initialized before the OpenMP code is launched.			; CHECK-LABEL: polly.parallel.for:
				; CHECK-NEXT: %0 = bitcast { float }* %polly.par.userContext to i8*
				; CHECK-NEXT: call void @llvm.lifetime.start(i64 4, i8* %0)
				; CHECK-NEXT: %polly.subfn.storeaddr.alpha = getelementptr inbounds { float }, { float }* %polly.par.userContext, i32 0, i32 0
				; CHECK-NEXT: store float %alpha, float* %polly.subfn.storeaddr.alpha

	; IR-LABEL: polly.start:			; CHECK: GOMP_parallel_loop_runtime_start
	; IR-NEXT: store float %alpha, float* %alpha.s2a

	; IR: GOMP_parallel_loop_runtime_start

	@A = common global [1024 x float] zeroinitializer, align 16			@A = common global [1024 x float] zeroinitializer, align 16

	define void @single_parallel_loop(float %alpha) nounwind {			define void @single_parallel_loop(float %alpha) nounwind {
	entry:			entry:
	br label %for.i			br label %for.i

	for.i:			for.i:
	Show All 17 Lines

test/Isl/CodeGen/entry_with_trivial_phi_other_bb.ll

	; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s			; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s
	;			;
	; The entry of this scop's simple region (entry.split => for.end) has an trivial			; The entry of this scop's simple region (entry.split => for.end) has an trivial
	; PHI node that is used in a different of the scop region. LCSSA may create such			; PHI node that is used in a different of the scop region. LCSSA may create such
	; PHI nodes. This is a breakdown of this case in the function 'mp_unexp_sub' of			; PHI nodes. This is a breakdown of this case in the function 'mp_unexp_sub' of
	; pifft from LLVM's test-suite.			; pifft from LLVM's test-suite.
	;			;
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define void @test(i64 %n, float* noalias nonnull %A, float %a) {			define void @test(i64 %n, float* noalias nonnull %A, float %a) {
	entry:			entry:
	br label %entry.split			br label %entry.split

	; CHECK-LABEL: polly.start:
	; CHECK: store float %a, float* %b.phiops

	entry.split:			entry.split:
	%b = phi float [ %a, %entry ]			%b = phi float [ %a, %entry ]
	%cmp2 = icmp slt i64 %n, 5			%cmp2 = icmp slt i64 %n, 5
	br i1 %cmp2, label %for.cond, label %for.end			br i1 %cmp2, label %for.cond, label %for.end

	for.cond: ; preds = %for.inc, %entry			for.cond: ; preds = %for.inc, %entry
	%i.0 = phi i64 [ 0, %entry.split ], [ %add, %for.inc ]			%i.0 = phi i64 [ 0, %entry.split ], [ %add, %for.inc ]
	%cmp = icmp slt i64 %i.0, %n			%cmp = icmp slt i64 %i.0, %n
	br i1 %cmp, label %for.body, label %for.end			br i1 %cmp, label %for.body, label %for.end

	for.body: ; preds = %for.cond			for.body: ; preds = %for.cond
	%arrayidx = getelementptr inbounds float, float* %A, i64 %i.0			%arrayidx = getelementptr inbounds float, float* %A, i64 %i.0
	store float %b, float* %arrayidx, align 4			store float %b, float* %arrayidx, align 4
	br label %for.inc			br label %for.inc

				; CHECK-LABEL: polly.stmt.for.body:
				; CHECK-NEXT: %scevgep = getelementptr float, float* %A, i64 %polly.indvar
				; CHECK-NEXT: store float %a, float* %scevgep,

	for.inc: ; preds = %for.body			for.inc: ; preds = %for.body
	%add = add nuw nsw i64 %i.0, 1			%add = add nuw nsw i64 %i.0, 1
	br label %for.cond			br label %for.cond

	for.end: ; preds = %for.cond			for.end: ; preds = %for.cond
	ret void			ret void
	}			}

test/Isl/CodeGen/invariant_load_escaping.ll

	Show All 12 Lines
	; } while (i++ < 100);			; } while (i++ < 100);
	;			;
	; return x;			; return x;
	; }			; }
	;			;
	; CHECK: polly.preload.begin:			; CHECK: polly.preload.begin:
	; CHECK: %polly.access.B = getelementptr i32, i32* %B, i64 0			; CHECK: %polly.access.B = getelementptr i32, i32* %B, i64 0
	; CHECK: %polly.access.B.load = load i32, i32* %polly.access.B			; CHECK: %polly.access.B.load = load i32, i32* %polly.access.B
	; CHECK: store i32 %polly.access.B.load, i32* %tmp.preload.s2a
	;			;
	; CHECK: polly.merge_new_and_old:			; CHECK: polly.merge_new_and_old:
	; CHECK: %tmp.merge = phi i32 [ %tmp.final_reload, %polly.exiting ], [ %tmp, %do.cond ]			; CHECK: %tmp.merge = phi i32 [ %polly.access.B.load, %polly.exiting ], [ %tmp, %do.cond ]
	; CHECK: br label %do.end			; CHECK: br label %do.end
	;			;
	; CHECK: do.end:			; CHECK: do.end:
	; CHECK: ret i32 %tmp.merge			; CHECK: ret i32 %tmp.merge
	;			;
	; CHECK: polly.loop_exit:
	; CHECK: %tmp.final_reload = load i32, i32* %tmp.preload.s2a
	;
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define i32 @f(i32* %A, i32* %B) {			define i32 @f(i32* %A, i32* %B) {
	entry:			entry:
	br label %do.body			br label %do.body

	do.body: ; preds = %do.cond, %entry			do.body: ; preds = %do.cond, %entry
	%indvars.iv = phi i64 [ %indvars.iv.next, %do.cond ], [ 0, %entry ]			%indvars.iv = phi i64 [ %indvars.iv.next, %do.cond ], [ 0, %entry ]
	Show All 15 Lines

test/Isl/CodeGen/invariant_load_scalar_escape_alloca_sharing.ll

	; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s			; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s
	;			;
	; Verify the preloaded %0 is stored and communicated in the same alloca.
	;
	; CHECK-NOT: alloca
	; CHECK: %dec3.s2a = alloca i32
	; CHECK-NOT: alloca
	; CHECK: %dec3.in.phiops = alloca i32
	; CHECK-NOT: alloca
	; CHECK: %.preload.s2a = alloca i32
	; CHECK-NOT: alloca			; CHECK-NOT: alloca
	;			;
	; CHECK: %ncol.load = load i32, i32* @ncol			; CHECK-LABEL: polly.preload.begin:
	; CHECK-NEXT: store i32 %ncol.load, i32* %.preload.s2a			; CHECK-NEXT: %ncol.load = load i32, i32* @ncol
				;
				; CHECK-LABEL: polly.merge:
				; CHECK-NEXT: %ncol.load.merge = phi i32 [ %ncol.load, %polly.stmt.while.body.lr.ph ], [ undef, %polly.else ]
	;			;
	; CHECK: polly.stmt.while.body.lr.ph:			; CHECK-LABEL: polly.loop_header:
	; CHECK-NEXT: %.preload.s2a.reload = load i32, i32* %.preload.s2a			; CHECK-NEXT: %polly.indvar = phi i64 [ 0, %polly.loop_preheader ], [ %polly.indvar_next, %polly.stmt.while.cond.backedge ]
	; CHECK-NEXT: store i32 %.preload.s2a.reload, i32* %dec3.in.phiops			; CHECK-NEXT: %dec3.in.polly.lc = phi i32 [ %ncol.load.merge, %polly.loop_preheader ], [ %p_dec3, %polly.stmt.while.cond.backedge ]
	;			;
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	@ncol = external global i32, align 4			@ncol = external global i32, align 4

	define void @melt_data(i32* %data1, i32* %data2) {			define void @melt_data(i32* %data1, i32* %data2) {
	entry:			entry:
	br label %entry.split			br label %entry.split
	Show All 40 Lines

test/Isl/CodeGen/large-numbers-in-boundary-context.ll

	; RUN: opt %loadPolly -S -polly-codegen < %s \| FileCheck %s			; RUN: opt %loadPolly -S -polly-codegen < %s \| FileCheck %s
	;			;
	; The boundary context contains a constant that does not fit in 64 bits. Hence,			; The boundary context contains a constant that does not fit in 64 bits. Hence,
	; we will check that we use an appropriaty typed constant, here with 65 bits.			; we will check that we use an appropriaty typed constant, here with 65 bits.
	; An alternative would be to bail out early but that would not be as easy.			; An alternative would be to bail out early but that would not be as easy.
	;			;
	; CHECK: %13 = icmp sge i65 %12, -9223372036854775809			; CHECK: 9223372036854775806
				; CHECK: %[[r0:[0-9]*]] = sext i32 %tmp to i64
				; CHECK: %[[r1:[0-9]*]] = add nsw i64 %indvar, %[[r0]]
				; CHECK: %[[r2:[0-9]*]] = sext i64 %[[r1]] to i65
				; CHECK: %[[r3:[0-9]*]] = icmp sge i65 %[[r2]], -9223372036854775809
	;			;
	; CHECK: polly.start			; CHECK: polly.start
	;			;
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	@global = external global i32, align 4			@global = external global i32, align 4
	@global1 = external global i32, align 4			@global1 = external global i32, align 4

	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

test/Isl/CodeGen/non-affine-dominance-generated-entering.ll

	; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s			; RUN: opt %loadPolly -polly-codegen -analyze < %s
	;			;
	; llvm.org/PR25439			; llvm.org/PR25439
	; Scalar reloads in the generated entering block were not recognized as			; Scalar reloads in the generated entering block were not recognized as
	; dominating the subregion blocks when there were multiple entering nodes. This			; dominating the subregion blocks when there were multiple entering nodes. This
	; resulted in values defined in there (here: %cond used in subregionB_entry) not			; resulted in values defined in there (here: %cond used in subregionB_entry) not
	; being copied. We check whether it is reusing the reloaded scalar.			; being copied. We check whether it is reusing the reloaded scalar.
	;			;
	; CHECK-LABEL: polly.stmt.subregionB_entry.exit:			; FIXME: SSA-Codegen does not need to place any PHI nodes here, thus we just check
	; CHECK: store i1 %polly.cond, i1* %cond.s2a			; that the code is valid.
	;			;
	; CHECK-LABEL: polly.stmt.subregionB_entry.entry:
	; CHECK: %cond.s2a.reload = load i1, i1* %cond.s2a
	;
	; CHECK-LABEL: polly.stmt.subregionB_entry:
	; CHECK: br i1 %cond.s2a.reload

	define void @func(i32* %A) {			define void @func(i32* %A) {
	entry:			entry:
	br label %subregionA_entry			br label %subregionA_entry

	subregionA_entry:			subregionA_entry:
	%cond = phi i1 [ false, %entry ], [ true, %subregionB_exit ]			%cond = phi i1 [ false, %entry ], [ true, %subregionB_exit ]
	br i1 %cond, label %subregionA_if, label %subregionA_else			br i1 %cond, label %subregionA_if, label %subregionA_else

	Show All 19 Lines

test/Isl/CodeGen/non-affine-exit-node-dominance.ll

	; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s			; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s
	;			;
	; llvm.org/PR25439			; llvm.org/PR25439
	; The dominance of the generated non-affine subregion block was based on the			; The dominance of the generated non-affine subregion block was based on the
	; scop's merge block, therefore resulted in an invalid DominanceTree.			; scop's merge block, therefore resulted in an invalid DominanceTree.
	; It resulted in some values as assumed to be unusable in the actual generated			; It resulted in some values as assumed to be unusable in the actual generated
	; exit block. Here we check whether the value %escaping is taken from the			; exit block.
	; generated block.			;
				; CHECK-LABEL: polly.merge_new_and_old:
				; CHECK-NEXT: %escaping.merge = phi i32 [ %p_escaping, %polly.exiting ], [ %escaping, %subregion_exit.region_exiting ]
	;			;
	; CHECK-LABEL: polly.stmt.subregion_entry:			; CHECK-LABEL: polly.stmt.subregion_entry:
	; CHECK: %p_escaping = select i1 undef, i32 undef, i32 undef			; CHECK: %p_escaping = select i1 undef, i32 undef, i32 undef
	;
	; CHECK-LABEL: polly.stmt.polly.merge_new_and_old.exit:
	; CHECK: store i32 %p_escaping, i32* %escaping.s2a

	define i32 @func() {			define i32 @func() {
	entry:			entry:
	br label %subregion_entry			br label %subregion_entry

	subregion_entry:			subregion_entry:
	%escaping = select i1 undef, i32 undef, i32 undef			%escaping = select i1 undef, i32 undef, i32 undef
	%cond = or i1 undef, undef			%cond = or i1 undef, undef
	br i1 %cond, label %subregion_exit, label %subregion_if			br i1 %cond, label %subregion_exit, label %subregion_if

	subregion_if:			subregion_if:
	br label %subregion_exit			br label %subregion_exit

	subregion_exit:			subregion_exit:
	ret i32 %escaping			ret i32 %escaping
	}			}

test/Isl/CodeGen/non-affine-phi-node-expansion-2.ll

	; RUN: opt %loadPolly -polly-codegen \			; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s
	; RUN: -S < %s \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"


	; CHECK: polly.stmt.bb3: ; preds = %polly.stmt.bb3.entry			; CHECK: polly.stmt.bb3: ; preds = %polly.stmt.bb3.entry
	; CHECK: %tmp6_p_scalar_ = load double, double* %arg1{{[0-9]*}}, !alias.scope !0, !noalias !2			; CHECK: %tmp6_p_scalar_ = load double, double* %arg1{{[0-9]*}}, !alias.scope !0, !noalias !2
	; CHECK: %p_tmp7 = fadd double 1.000000e+00, %tmp6_p_scalar_			; CHECK: %p_tmp7 = fadd double 1.000000e+00, %tmp6_p_scalar_
	; CHECK: %p_tmp8 = fcmp olt double 1.400000e+01, %p_tmp7			; CHECK: %p_tmp8 = fcmp olt double 1.400000e+01, %p_tmp7
	; CHECK: br i1 %p_tmp8, label %polly.stmt.bb9, label %polly.stmt.bb10			; CHECK: br i1 %p_tmp8, label %polly.stmt.bb9, label %polly.stmt.bb10

	; CHECK: polly.stmt.bb9: ; preds = %polly.stmt.bb3			; CHECK: polly.stmt.bb11.exit:
	; CHECK: store double 1.000000e+00, double* %tmp12.phiops			; CHECK: %0 = phi double [ 2.000000e+00, %polly.stmt.bb10 ], [ 1.000000e+00, %polly.stmt.bb9 ]
	; CHECK: br label %polly.stmt.bb11.exit			; CHECK: br label %polly.stmt.bb11
				;
	; CHECK: polly.stmt.bb10: ; preds = %polly.stmt.bb3			; CHECK: polly.stmt.bb11:
	; CHECK: store double 2.000000e+00, double* %tmp12.phiops			; CHECK: store double %0, double* %arg11
	; CHECK: br label %polly.stmt.bb11.exit			; CHECK: br label %polly.exiting


	define void @hoge(i32 %arg, [1024 x double]* %arg1) {			define void @hoge(i32 %arg, [1024 x double]* %arg1) {
	bb:			bb:
	br label %bb2			br label %bb2

	bb2: ; preds = %bb			bb2: ; preds = %bb
	br label %bb3			br label %bb3
	Show All 30 Lines

test/Isl/CodeGen/non-affine-phi-node-expansion-3.ll

Show All 10 Lines	loop:
%val1 = fadd float 1.0, 2.0		%val1 = fadd float 1.0, 2.0
%val2 = fadd float 1.0, 2.0		%val2 = fadd float 1.0, 2.0
br i1 %cond0, label %branch1, label %backedge		br i1 %cond0, label %branch1, label %backedge

; CHECK-LABEL: polly.stmt.loop:		; CHECK-LABEL: polly.stmt.loop:
; CHECK-NEXT: %p_val0 = fadd float 1.000000e+00, 2.000000e+00		; CHECK-NEXT: %p_val0 = fadd float 1.000000e+00, 2.000000e+00
; CHECK-NEXT: %p_val1 = fadd float 1.000000e+00, 2.000000e+00		; CHECK-NEXT: %p_val1 = fadd float 1.000000e+00, 2.000000e+00
; CHECK-NEXT: %p_val2 = fadd float 1.000000e+00, 2.000000e+00		; CHECK-NEXT: %p_val2 = fadd float 1.000000e+00, 2.000000e+00
; CHECK-NEXT: store float %p_val0, float* %merge.phiops
; CHECK-NEXT: br i1		; CHECK-NEXT: br i1

branch1:		branch1:
br i1 %cond1, label %branch2, label %backedge		br i1 %cond1, label %branch2, label %backedge

; CHECK-LABEL: polly.stmt.branch1:		; CHECK-LABEL: polly.stmt.branch1:
; CHECK-NEXT: store float %p_val1, float* %merge.phiops
; CHECK-NEXT: br i1		; CHECK-NEXT: br i1

branch2:		branch2:
br label %backedge		br label %backedge

; CHECK-LABEL: polly.stmt.branch2:		; CHECK-LABEL: polly.stmt.branch2:
; CHECK-NEXT: store float %p_val2, float* %merge.phiops
; CHECK-NEXT: br label		; CHECK-NEXT: br label

		; CHECK-LABEL: polly.stmt.backedge.exit:
		; CHECK-NEXT: %p_val2.merge = phi float [ %p_val2, %polly.stmt.branch2 ], [ %p_val1, %polly.stmt.branch1 ], [ %p_val0, %polly.stmt.loop ]

backedge:		backedge:
%merge = phi float [%val0, %loop], [%val1, %branch1], [%val2, %branch2]		%merge = phi float [%val0, %loop], [%val1, %branch1], [%val2, %branch2]
%indvar.next = add i64 %indvar, 1		%indvar.next = add i64 %indvar, 1
store float %merge, float* %A		store float %merge, float* %A
%cmp = icmp sle i64 %indvar.next, 100		%cmp = icmp sle i64 %indvar.next, 100
br i1 %cmp, label %loop, label %exit		br i1 %cmp, label %loop, label %exit

		; CHECK-LABEL: polly.stmt.backedge:
		; CHECK-NEXT: store float %p_val2.merge, float* %A, !alias.scope !0, !noalias !2

exit:		exit:
ret void		ret void
}		}

test/Isl/CodeGen/non-affine-phi-node-expansion-4.ll

	; RUN: opt %loadPolly -polly-codegen \			; RUN: opt %loadPolly -polly-codegen \
	; RUN: -S < %s \| FileCheck %s			; RUN: -S < %s \| FileCheck %s

	define void @foo(float* %A, i1 %cond0, i1 %cond1) {			define void @foo(float* %A, i1 %cond0, i1 %cond1) {
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%indvar = phi i64 [0, %entry], [%indvar.next, %backedge]			%indvar = phi i64 [0, %entry], [%indvar.next, %backedge]
	%val0 = fadd float 1.0, 2.0			%val0 = fadd float 1.0, 2.0
	%val1 = fadd float 1.0, 2.0			%val1 = fadd float 1.0, 2.0
	br i1 %cond0, label %branch1, label %backedge			br i1 %cond0, label %branch1, label %backedge

	; CHECK-LABEL: polly.stmt.loop:			; CHECK-LABEL: polly.stmt.loop:
	; CHECK-NEXT: %p_val0 = fadd float 1.000000e+00, 2.000000e+00			; CHECK-NEXT: %p_val0 = fadd float 1.000000e+00, 2.000000e+00
	; CHECK-NEXT: %p_val1 = fadd float 1.000000e+00, 2.000000e+00			; CHECK-NEXT: %p_val1 = fadd float 1.000000e+00, 2.000000e+00
	; CHECK-NEXT: store float %p_val0, float* %merge.phiops
	; CHECK-NEXT: br i1			; CHECK-NEXT: br i1

	; The interesting instruction here is %val2, which does not dominate the exit of			; The interesting instruction here is %val2, which does not dominate the exit of
	; the non-affine region. Care needs to be taken when code-generating this write.			; the non-affine region. Care needs to be taken when code-generating this write.
	; Specifically, at some point we modeled this scalar write, which we tried to			; Specifically, at some point we modeled this scalar write, which we tried to
	; code generate in the exit block of the non-affine region.			; code generate in the exit block of the non-affine region.
	branch1:			branch1:
	%val2 = fadd float 1.0, 2.0			%val2 = fadd float 1.0, 2.0
	br i1 %cond1, label %branch2, label %backedge			br i1 %cond1, label %branch2, label %backedge

	; CHECK-LABEL: polly.stmt.branch1:			; CHECK-LABEL: polly.stmt.branch1:
	; CHECK-NEXT: %p_val2 = fadd float 1.000000e+00, 2.000000e+00			; CHECK-NEXT: %p_val2 = fadd float 1.000000e+00, 2.000000e+00
	; CHECK-NEXT: store float %p_val1, float* %merge.phiops
	; CHECK-NEXT: br i1			; CHECK-NEXT: br i1

	branch2:			branch2:
	br label %backedge			br label %backedge

	; CHECK-LABEL: polly.stmt.branch2:			; CHECK-LABEL: polly.stmt.branch2:
	; CHECK-NEXT: store float %p_val2, float* %merge.phiops
	; CHECK-NEXT: br label			; CHECK-NEXT: br label

				; CHECK-LABEL: polly.stmt.backedge.exit:
				; CHECK-NEXT: %p_val2.merge = phi float [ %p_val2, %polly.stmt.branch2 ], [ %p_val1, %polly.stmt.branch1 ], [ %p_val0, %polly.stmt.loop ]

	backedge:			backedge:
	%merge = phi float [%val0, %loop], [%val1, %branch1], [%val2, %branch2]			%merge = phi float [%val0, %loop], [%val1, %branch1], [%val2, %branch2]
	%indvar.next = add i64 %indvar, 1			%indvar.next = add i64 %indvar, 1
	store float %merge, float* %A			store float %merge, float* %A
	%cmp = icmp sle i64 %indvar.next, 100			%cmp = icmp sle i64 %indvar.next, 100
	br i1 %cmp, label %loop, label %exit			br i1 %cmp, label %loop, label %exit

				; CHECK-LABEL: polly.stmt.backedge:
				; CHECK-NEXT: store float %p_val2.merge, float* %A, !alias.scope !0, !noalias !2

	exit:			exit:
	ret void			ret void
	}			}

test/Isl/CodeGen/non-affine-region-exit-phi-incoming-synthesize.ll

	; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s			; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s
	;			;
	; This caused the code generation to generate invalid code as the same BBMap was			; This caused the code generation to generate invalid code as the same BBMap was
	; used for the whole non-affine region. When %add is synthesized for the			; used for the whole non-affine region. When %add is synthesized for the
	; incoming value of subregion_if first, the code for it was generated into			; incoming value of subregion_if first, the code for it was generated into
	; subregion_if, but reused for the incoming value of subregion_exit, although it			; subregion_if, but reused for the incoming value of subregion_exit, although it
	; is not dominated by subregion_if.			; is not dominated by subregion_if.
	;			;
	; CHECK-LABEL: polly.stmt.subregion_entry:			; CHECK-LABEL: polly.merge_new_and_old:
	; CHECK: %[[R0:[0-9]*]] = add i32 %n, -2			; CHECK: %retval.ph.merge = phi i32 [ %p_add1, %polly.exiting ], [ %add, %subregion_exit.region_exiting ]
	; CHECK: store i32 %[[R0]], i32* %retval.s2a
	;			;
	; CHECK-LABEL: polly.stmt.subregion_if:			; CHECK-LABEL: subregion_exit:
	; CHECK: %[[R1:[0-9]*]] = add i32 %n, -2			; CHECK-LABEL: %retval = phi i32 [ %retval.ph.merge, %polly.merge_new_and_old ]
	; CHECK: store i32 %[[R1]], i32* %retval.s2a			; CHECK-LABEL: ret i32 %retval
	;			;
	; CHECK-LABEL: polly.stmt.polly.merge_new_and_old.exit:			; CHECK-LABEL: polly.stmt.subregion_entry:
	; CHECK: load i32, i32* %retval.s2a			; CHECK: %p_add1 = add nsw i32 %n, -2

	define i32 @func(i32 %n){			define i32 @func(i32 %n){
	entry:			entry:
	br label %subregion_entry			br label %subregion_entry

	subregion_entry:			subregion_entry:
	%add = add nsw i32 %n, -2			%add = add nsw i32 %n, -2
	%cmp = fcmp ogt float undef, undef			%cmp = fcmp ogt float undef, undef
	Show All 9 Lines

test/Isl/CodeGen/non-affine-region-implicit-store.ll

	; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s			; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s
	;			;
	; llvm.org/PR25438			; llvm.org/PR25438
	; After loop versioning, a dominance check of a non-affine subregion's exit node			; After loop versioning, a dominance check of a non-affine subregion's exit node
	; causes the dominance check to always fail any block in the scop. The			; causes the dominance check to always fail any block in the scop. The
	; subregion's exit block has become polly_merge_new_and_old, which also receives			; subregion's exit block has become polly_merge_new_and_old, which also receives
	; the control flow of the generated code. This would cause that any value for			; the control flow of the generated code. This would cause that any value for
	; implicit stores is assumed to be not from the scop.			; implicit stores is assumed to be not from the scop.
	;			;
	; This checks that the stored value is indeed from the generated code.			; CHECK-LABEL: do.body:
				; CHECK-NEXT: %a = phi i32 [ %a.ph, %polly.split_new_and_old ]
	;			;
	; CHECK-LABEL: polly.stmt.do.body.entry:			; CHECK-LABEL: polly.merge_new_and_old:
	; CHECK: a.phiops.reload = load i32, i32* %a.phiops			; CHECK-NEXT: %a.merge = phi i32 [ %a.ph, %polly.exiting ], [ %a, %end_a.region_exiting ]
	;			;
	; CHECK-LABEL: polly.stmt.polly.merge_new_and_old.exit:
	; CHECK: store i32 %polly.a, i32* %a.s2a

	define void @func() {			define void @func() {
	entry:			entry:
	br label %while.body			br label %while.body

	while.body:			while.body:
	br label %do.body			br label %do.body

	do.body:			do.body:
	Show All 23 Lines

test/Isl/CodeGen/non-affine-synthesized-in-branch.ll

	; RUN: opt %loadPolly -polly-process-unprofitable -polly-codegen -S < %s \| FileCheck %s			; RUN: opt %loadPolly -polly-process-unprofitable -polly-codegen -S < %s \| FileCheck %s
	;			;
	; llvm.org/PR25412			; llvm.org/PR25412
	; %synthgep caused %gep to be synthesized in subregion_if which was reused for			; %synthgep caused %gep to be synthesized in subregion_if which was reused for
	; %retval in subregion_exit, even though it is not dominating subregion_exit.			; %retval in subregion_exit, even though it is not dominating subregion_exit.
	;			;
	; CHECK-LABEL: polly.stmt.polly.merge_new_and_old.exit:			; CHECK-LABEL: polly.merge_new_and_old:
	; CHECK: %scevgep[[R1:[0-9]]] = getelementptr %struct.hoge, %struct.hoge %arg, i64 0, i32 2			; CHECK-NEXT: %gep.merge = phi double* [ %p_gep, %polly.exiting ], [ %gep, %subregion_exit.region_exiting ]
	; CHECK: store double* %scevgep[[R1]], double** %gep.s2a			;
	; CHECK: br label			; CHECK-LABEL: subregion_exit:
				; CHECK-NEXT: %retval = load double, double* %gep.merge
				; CHECK-NEXT: ret double %retval
				;
				; CHECK-LABEL: polly.stmt.subregion_entry:
				; CHECK-NEXT: %p_cond = fcmp
				; CHECK-NEXT: %p_gep = getelementptr inbounds %struct.hoge, %struct.hoge* %arg, i64 0, i32 2


	%struct.hoge = type { double, double, double }			%struct.hoge = type { double, double, double }

	define double @func(%struct.hoge* %arg) {			define double @func(%struct.hoge* %arg) {
	entry:			entry:
	br label %subregion_entry			br label %subregion_entry

	subregion_entry:			subregion_entry:
	Show All 12 Lines

test/Isl/CodeGen/non_affine_float_compare.ll

	Show All 20 Lines
	; CHECK: polly.stmt.bb8:			; CHECK: polly.stmt.bb8:
	; CHECK: %scevgep[[R3:[0-9]]] = getelementptr float, float %A, i64 %polly.indvar			; CHECK: %scevgep[[R3:[0-9]]] = getelementptr float, float %A, i64 %polly.indvar
	; CHECK: %tmp10_p_scalar_ = load float, float* %scevgep[[R3]], align 4, !alias.scope !0, !noalias !2			; CHECK: %tmp10_p_scalar_ = load float, float* %scevgep[[R3]], align 4, !alias.scope !0, !noalias !2
	; CHECK: %p_tmp11 = fadd float %tmp10_p_scalar_, 1.000000e+00			; CHECK: %p_tmp11 = fadd float %tmp10_p_scalar_, 1.000000e+00
	; CHECK: store float %p_tmp11, float* %scevgep[[R3]], align 4, !alias.scope !0, !noalias !2			; CHECK: store float %p_tmp11, float* %scevgep[[R3]], align 4, !alias.scope !0, !noalias !2
	; CHECK: br label %polly.stmt.bb12.[[R]]			; CHECK: br label %polly.stmt.bb12.[[R]]

	; CHECK: polly.stmt.bb12.[[R]]:			; CHECK: polly.stmt.bb12.[[R]]:
				; CHECK: %polly.indvar_next = add nsw i64 %polly.indvar, 1
				; CHECK: %polly.loop_cond = icmp sle i64 %polly.indvar, 1022
	; CHECK: br label %polly.stmt.bb12			; CHECK: br label %polly.stmt.bb12

	; CHECK: polly.stmt.bb12:			; CHECK: polly.stmt.bb12:
	; CHECK: %scevgep[[R4:[0-9]]] = getelementptr float, float %A, i64 %polly.indvar			; CHECK: %scevgep[[R4:[0-9]]] = getelementptr float, float %A, i64 %polly.indvar
	; CHECK: %tmp10b_p_scalar_ = load float, float* %scevgep[[R4]], align 4, !alias.scope !0, !noalias !2			; CHECK: %tmp10b_p_scalar_ = load float, float* %scevgep[[R4]], align 4, !alias.scope !0, !noalias !2
	; CHECK: %p_tmp11b = fadd float %tmp10b_p_scalar_, 1.000000e+00			; CHECK: %p_tmp11b = fadd float %tmp10b_p_scalar_, 1.000000e+00
	; CHECK: store float %p_tmp11b, float* %scevgep[[R4]], align 4, !alias.scope !0, !noalias !2			; CHECK: store float %p_tmp11b, float* %scevgep[[R4]], align 4, !alias.scope !0, !noalias !2
	; CHECK: %polly.indvar_next = add nsw i64 %polly.indvar, 1
	; CHECK: %polly.loop_cond = icmp sle i64 %polly.indvar, 1022
	; CHECK: br i1 %polly.loop_cond, label %polly.loop_header, label %polly.loop_exit			; CHECK: br i1 %polly.loop_cond, label %polly.loop_header, label %polly.loop_exit

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define void @f(float* %A) {			define void @f(float* %A) {
	bb:			bb:
	br label %bb1			br label %bb1

	Show All 35 Lines

test/Isl/CodeGen/out-of-scop-phi-node-use.ll

	; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s			; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; CHECK-LABEL: polly.merge_new_and_old:			; CHECK-LABEL: polly.merge_new_and_old:
	; CHECK-NEXT: %_s.sroa.343.0.ph5161118.ph.merge = phi i32 [ %_s.sroa.343.0.ph5161118.ph.final_reload, %polly.exiting ], [ %_s.sroa.343.0.ph516.lcssa2357, %for.cond.981.region_exiting ]			; CHECK-NEXT: %_s.sroa.343.0.ph5161118.ph.merge = phi i32 [ undef, %polly.exiting ], [ %_s.sroa.343.0.ph516.lcssa2357, %for.cond.981.region_exiting ]

	; CHECK-LABEL: for.cond.981:			; CHECK-LABEL: for.cond.981:
	; CHECK-NEXT: %_s.sroa.343.0.ph5161118 = phi i32 [ undef, %for.cond ], [ %_s.sroa.343.0.ph5161118.ph.merge, %polly.merge_new_and_old ]			; CHECK-NEXT: %_s.sroa.343.0.ph5161118 = phi i32 [ undef, %for.cond ], [ %_s.sroa.343.0.ph5161118.ph.merge, %polly.merge_new_and_old ]

	; CHECK-LABEL: polly.exiting:
	; CHECK-NEXT: %_s.sroa.343.0.ph5161118.ph.final_reload = load i32, i32* %_s.sroa.343.0.ph5161118.s2a

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define void @lzmaDecode() #0 {			define void @lzmaDecode() #0 {
	entry:			entry:
	br label %for.cond.outer.outer.outer			br label %for.cond.outer.outer.outer

	for.cond: ; preds = %for.cond.outer.outer.outer			for.cond: ; preds = %for.cond.outer.outer.outer
	switch i32 undef, label %cleanup.1072 [			switch i32 undef, label %cleanup.1072 [
	i32 23, label %for.cond.981			i32 23, label %for.cond.981
	Show All 39 Lines

test/Isl/CodeGen/phi-defined-before-scop.ll

	; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s			; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s

	; CHECK-LABEL: polly.merge_new_and_old:			; CHECK-LABEL: polly.merge_new_and_old:
	; CHECK-NEXT: %tmp7.ph.merge = phi %struct.wibble* [ %tmp7.ph.final_reload, %polly.exiting ], [ %tmp7.ph, %bb6.region_exiting ]			; CHECK-NEXT: %tmp7.ph.merge = phi %struct.wibble* [ %tmp2.merge, %polly.exiting ], [ %tmp7.ph, %bb6.region_exiting ]

	; CHECK-LABEL: polly.stmt.bb3:			; CHECK-LABEL: polly.stmt.bb6.region_exiting:
	; CHECK-NEXT: %tmp2.s2a.reload = load %struct.wibble, %struct.wibble* %tmp2.s2a			; CHECK-NEXT: %tmp2.merge = phi %struct.wibble* [ %tmp2, %polly.stmt.bb3 ], [ undef, %polly.stmt.bb5 ]
	; CHECK-NEXT: store %struct.wibble* %tmp2, %struct.wibble** %tmp7.s2a

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	%struct.blam = type { i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 }			%struct.blam = type { i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 }
	%struct.wibble = type { i32, %struct.wibble, %struct.wibble }			%struct.wibble = type { i32, %struct.wibble, %struct.wibble }

	@global = external global %struct.blam*, align 8			@global = external global %struct.blam*, align 8

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define void @wobble() #0 {			define void @wobble(i1 %b) #0 {
	bb:			bb:
	br label %bb1			br label %bb1

	bb1: ; preds = %bb6, %bb			bb1: ; preds = %bb6, %bb
	%tmp2 = phi %struct.wibble* [ %tmp7, %bb6 ], [ undef, %bb ]			%tmp2 = phi %struct.wibble* [ %tmp7, %bb6 ], [ undef, %bb ]
	%tmp = load %struct.blam, %struct.blam* @global, align 8, !tbaa !1			%tmp = load %struct.blam, %struct.blam* @global, align 8, !tbaa !1
	br label %bb3			br label %bb3

	bb3: ; preds = %bb1			bb3: ; preds = %bb1
	%tmp4 = getelementptr inbounds %struct.blam, %struct.blam* %tmp, i64 0, i32 1			%tmp4 = getelementptr inbounds %struct.blam, %struct.blam* %tmp, i64 0, i32 1
	br i1 false, label %bb6, label %bb5			br i1 %b, label %bb6, label %bb5

	bb5: ; preds = %bb3			bb5: ; preds = %bb3
	br label %bb6			br label %bb6

	bb6: ; preds = %bb5, %bb3			bb6: ; preds = %bb5, %bb3
	%tmp7 = phi %struct.wibble* [ %tmp2, %bb3 ], [ undef, %bb5 ]			%tmp7 = phi %struct.wibble* [ %tmp2, %bb3 ], [ undef, %bb5 ]
	br i1 undef, label %bb8, label %bb1			br i1 undef, label %bb8, label %bb1

	Show All 16 Lines

test/Isl/CodeGen/phi-in-non-affine-subregion-entry.ll

This file was added.

				; RUN: opt %loadPolly -S -polly-codegen < %s \| FileCheck %s
				;
				; CHECK: if.end41.region_exiting:
				; CHECK: %ic.sroa.3.0.ph = phi i64 [ 0, %if.end22 ], [ undef, %if.then33 ]
				; CHECK: br label %polly.merge_new_and_old
				;
				; CHECK: polly.merge_new_and_old:
				; CHECK: %ic.sroa.3.0.ph.merge = phi i64 [ %0, %polly.exiting ], [ %ic.sroa.3.0.ph, %if.end41.region_exiting ]
				; CHECK: br label %if.end41
				;
				; CHECK: if.end41:
				; CHECK: %ic.sroa.3.0 = phi i64 [ %ic.sroa.3.0.ph.merge, %polly.merge_new_and_old ]
				; CHECK: ret void
				;
				; CHECK: polly.stmt.if.end41.region_exiting.exit:
				; CHECK: %0 = phi i64 [ 0, %polly.stmt.if.end22 ], [ undef, %polly.stmt.if.then33 ]

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				; Function Attrs: nounwind uwtable
				define void @intcoord1() #0 {
				entry:
				br i1 undef, label %if.then, label %if.end

				if.then: ; preds = %entry
				unreachable

				if.end: ; preds = %entry
				br i1 false, label %if.then14, label %if.end22

				if.then14: ; preds = %if.end
				%conv17 = fptosi double undef to i32
				%phitmp19 = zext i32 %conv17 to i64
				%phitmp20 = shl nuw i64 %phitmp19, 32
				br label %if.end22

				if.end22: ; preds = %if.then14, %if.end
				%ic.sroa.2.0 = phi i64 [ %phitmp20, %if.then14 ], [ 0, %if.end ]
				%or.cond2 = and i1 undef, undef
				br i1 %or.cond2, label %if.then33, label %if.end41

				if.then33: ; preds = %if.end22
				br label %if.end41

				if.end41: ; preds = %if.then33, %if.end22
				%ic.sroa.3.0 = phi i64 [ undef, %if.then33 ], [ 0, %if.end22 ]
				ret void
				}

test/Isl/CodeGen/phi_condition_modeling_1.ll

	; RUN: opt %loadPolly -S -polly-codegen < %s \| FileCheck %s			; RUN: opt %loadPolly -S -polly-codegen < %s \| FileCheck %s
	;			;
	; void f(int *A, int c, int N) {			; void f(int *A, int c, int N) {
	; int tmp;			; int tmp;
	; for (int i = 0; i < N; i++) {			; for (int i = 0; i < N; i++) {
	; if (i > c)			; if (i > c)
	; tmp = 3;			; tmp = 3;
	; else			; else
	; tmp = 5;			; tmp = 5;
	; A[i] = tmp;			; A[i] = tmp;
	; }			; }
	; }			; }
	;			;
	; CHECK-LABEL: bb:			; CHECK: polly.merge:
	; CHECK: %tmp.0.phiops = alloca i32			; CHECK: %4 = phi i32 [ 5, %polly.stmt.bb7 ], [ 3, %polly.stmt.bb6 ]
	; CHECK-LABEL: polly.stmt.bb8:			; CHECK: br label %polly.stmt.bb8
	; CHECK: %tmp.0.phiops.reload = load i32, i32* %tmp.0.phiops			;
	; CHECK: store i32 %tmp.0.phiops.reload, i32*			; CHECK: polly.stmt.bb8:
	; CHECK-LABEL: polly.stmt.bb7:			; CHECK: %scevgep = getelementptr i32, i32* %A, i64 %polly.indvar
	; CHECK: store i32 5, i32* %tmp.0.phiops			; CHECK: store i32 %4, i32* %scevgep,
	; CHECK-LABEL: polly.stmt.bb6:
	; CHECK: store i32 3, i32* %tmp.0.phiops

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define void @f(i32* %A, i32 %c, i32 %N) {			define void @f(i32* %A, i32 %c, i32 %N) {
	bb:			bb:
	%tmp = sext i32 %N to i64			%tmp = sext i32 %N to i64
	%tmp1 = sext i32 %c to i64			%tmp1 = sext i32 %c to i64
	br label %bb2			br label %bb2

	Show All 28 Lines

test/Isl/CodeGen/phi_condition_modeling_2.ll

	; RUN: opt %loadPolly -S -polly-codegen < %s \| FileCheck %s			; RUN: opt %loadPolly -S -polly-codegen < %s \| FileCheck %s
	;			;
	; void f(int *A, int c, int N) {			; void f(int *A, int c, int N) {
	; int tmp;			; int tmp;
	; for (int i = 0; i < N; i++) {			; for (int i = 0; i < N; i++) {
	; if (i > c)			; if (i > c)
	; tmp = 3;			; tmp = 3;
	; else			; else
	; tmp = 5;			; tmp = 5;
	; A[i] = tmp;			; A[i] = tmp;
	; }			; }
	; }			; }
	;			;
	; CHECK-LABEL: bb:
	; CHECK-DAG: %tmp.0.s2a = alloca i32			; CHECK-LABEL: polly.merge_new_and_old:
	; CHECK-DAG: %tmp.0.phiops = alloca i32			; CHECK-NEXT: br label %bb11
	; CHECK-LABEL: polly.stmt.bb8:			;
	; CHECK: %tmp.0.phiops.reload = load i32, i32* %tmp.0.phiops			; CHECK-LABEL: bb11:
	; CHECK: store i32 %tmp.0.phiops.reload, i32* %tmp.0.s2a			; CHECK-NEXT: ret void
				;
				; CHECK-LABEL: polly.loop_exit:
				; TODO %1 is not needed
				; CHECK-NEXT: %1 = phi i32 [ undef, %polly.loop_if ], [ %4, %polly.stmt.bb8b ]
				; CHECK-NEXT: br label %polly.exiting
				;
				; CHECK-LABEL: polly.exiting:
				; CHECK-NEXT: br label %polly.merge_new_and_old
				;
				; CHECK-LABEL: polly.merge:
				; CHECK-NEXT: %4 = phi i32 [ 5, %polly.stmt.bb7 ], [ 3, %polly.stmt.bb6 ]
				; CHECK-NEXT: br label %polly.stmt.bb8
				;
	; CHECK-LABEL: polly.stmt.bb8b:			; CHECK-LABEL: polly.stmt.bb8b:
	; CHECK: %tmp.0.s2a.reload = load i32, i32* %tmp.0.s2a			; CHECK-NEXT: %scevgep = getelementptr i32, i32* %A, i64 %polly.indvar
	; CHECK: store i32 %tmp.0.s2a.reload,			; CHECK-NEXT: store i32 %4, i32* %scevgep,
	; CHECK-LABEL: polly.stmt.bb7:
	; CHECK: store i32 5, i32* %tmp.0.phiops
	; CHECK-LABEL: polly.stmt.bb6:
	; CHECK: store i32 3, i32* %tmp.0.phiops

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define void @f(i32* %A, i32 %c, i32 %N) {			define void @f(i32* %A, i32 %c, i32 %N) {
	bb:			bb:
	%tmp = sext i32 %N to i64			%tmp = sext i32 %N to i64
	%tmp1 = sext i32 %c to i64			%tmp1 = sext i32 %c to i64
	br label %bb2			br label %bb2
	Show All 32 Lines

test/Isl/CodeGen/phi_conditional_simple_1.ll

	Show All 12 Lines
	; AST: for (int c0 = 0; c0 <= 1023; c0 += 1) {			; AST: for (int c0 = 0; c0 <= 1023; c0 += 1) {
	; AST: if (c <= -1 \|\| c >= 1) {			; AST: if (c <= -1 \|\| c >= 1) {
	; AST: Stmt_if_then(c0);			; AST: Stmt_if_then(c0);
	; AST: } else			; AST: } else
	; AST: Stmt_if_else(c0);			; AST: Stmt_if_else(c0);
	; AST: Stmt_if_end(c0);			; AST: Stmt_if_end(c0);
	; AST: }			; AST: }
	;			;
	; CHECK-LABEL: entry:			; CHECK-LABEL: polly.merge:
	; CHECK-NEXT: %phi.phiops = alloca i32			; CHECK-NEXT: %[[r:[a-zA-Z0-9_.]*]] = phi i32 [ 1, %polly.stmt.if.then ], [ 2, %polly.stmt.if.else ]
				; CHECK-NEXT: br label %polly.stmt.if.end

	; CHECK-LABEL: polly.stmt.if.end:			; CHECK-LABEL: polly.stmt.if.end:
	; CHECK-NEXT: %phi.phiops.reload = load i32, i32* %phi.phiops			; CHECK-NEXT: %scevgep = getelementptr i32, i32* %A, i64 %polly.indvar
	; CHECK-NEXT: %scevgep			; CHECK-NEXT: store i32 %[[r]], i32* %scevgep
	; CHECK-NEXT: store i32 %phi.phiops.reload, i32*
	; CHECK-LABEL: polly.stmt.if.then:
	; CHECK-NEXT: store i32 1, i32* %phi.phiops
	; CHECK-NEXT: br label %polly.merge{{[.]?}}
	; CHECK-LABEL: polly.stmt.if.else:
	; CHECK-NEXT: store i32 2, i32* %phi.phiops
	; CHECK-NEXT: br label %polly.merge{{[.]?}}
	;			;
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define void @jd(i32* %A, i32 %c) {			define void @jd(i32* %A, i32 %c) {
	entry:			entry:
	br label %for.cond			br label %for.cond

	for.cond:			for.cond:
	Show All 27 Lines

test/Isl/CodeGen/phi_in_exit_early_lnt_failure_2.ll

	; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s			; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s
	;			;
	; This caused an lnt crash at some point, just verify it will run through and			; This caused an lnt crash at some point, just verify it will run through and
	; produce the PHI node in the exit we are looking for.			; produce the PHI node in the exit we are looking for.
	;			;
	; CHECK: %eps1.addr.0.s2a = alloca double			; CHECK-NOT: alloca double
	; CHECK-NOT: %eps1.addr.0.ph.s2a = alloca double
	;			;
	; CHECK-LABEL: polly.merge_new_and_old:			; CHECK-LABEL: polly.merge_new_and_old:
	; CHECK: %eps1.addr.0.ph.merge = phi double [ %eps1.addr.0.ph.final_reload, %polly.exiting ], [ %eps1.addr.0.ph, %if.end.47.region_exiting ]			; CHECK: %eps1.addr.0.ph.merge = phi double [ %eps1.merge, %polly.exiting ], [ %eps1.addr.0.ph, %if.end.47.region_exiting ]
	;
	; CHECK-LABEL: polly.start:
	; CHECK-NEXT: store double %eps1, double* %eps1.s2a
	;
	; CHECK-LABEL: polly.exiting:
	; CHECK-NEXT: %eps1.addr.0.ph.final_reload = load double, double* %eps1.addr.0.s2a
	;			;
				; CHECK-LABEL: polly.stmt.if.end.47.region_exiting.exit:
				; CHECK-NEXT: %eps1.merge = phi double [ %eps1, %polly.stmt.for.end ], [ %_p_scalar_, %polly.stmt.if.then.46 ]

	define void @dbisect(double* %c, double* %b, double %eps1, double* %eps2) {			define void @dbisect(double* %c, double* %b, double %eps1, double* %eps2) {
	entry:			entry:
	br label %entry.split			br label %entry.split

	entry.split: ; preds = %entry			entry.split: ; preds = %entry
	store double 0.000000e+00, double* %b, align 8			store double 0.000000e+00, double* %b, align 8
	br i1 false, label %for.inc, label %for.end			br i1 false, label %for.inc, label %for.end

	Show All 23 Lines

test/Isl/CodeGen/phi_in_exit_early_lnt_failure_3.ll

	; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s			; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s
	;			;
	; This caused an lnt crash at some point, just verify it will run through and			; This caused an lnt crash at some point, just verify it will run through and
	; produce the PHI node in the exit we are looking for.			; produce the PHI node in the exit we are looking for.
	;			;
	; CHECK-LABEL: polly.merge_new_and_old:			; CHECK-LABEL: polly.merge_new_and_old:
	; CHECK-NEXT: %n2ptr.2.ph.merge = phi i8* [ %n2ptr.2.ph.final_reload, %polly.exiting ], [ %n2ptr.2.ph, %if.end.45.region_exiting ]			; CHECK-NEXT: %n2ptr.2.ph.merge = phi i8* [ %uglygep.merge, %polly.exiting ], [ %n2ptr.2.ph, %if.end.45.region_exiting ]
	;			;
	; CHECK-LABEL: if.end.45:			; CHECK-LABEL: if.end.45:
	; CHECK-NEXT: %n2ptr.2 = phi i8* [ %add.ptr25, %entry ], [ %add.ptr25, %while.cond.preheader ], [ %n2ptr.2.ph.merge, %polly.merge_new_and_old ]			; CHECK-NEXT: %n2ptr.2 = phi i8* [ %add.ptr25, %entry ], [ %add.ptr25, %while.cond.preheader ], [ %n2ptr.2.ph.merge, %polly.merge_new_and_old ]

	%struct.bc_struct.0.2.4.6.8.15.24.27.29.32.38.46.48.92.93.94.95.97.99.100.102.105.107.111.118.119.121 = type { i32, i32, i32, i32, [1024 x i8] }			%struct.bc_struct.0.2.4.6.8.15.24.27.29.32.38.46.48.92.93.94.95.97.99.100.102.105.107.111.118.119.121 = type { i32, i32, i32, i32, [1024 x i8] }

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	declare %struct.bc_struct.0.2.4.6.8.15.24.27.29.32.38.46.48.92.93.94.95.97.99.100.102.105.107.111.118.119.121* @new_num() #0			declare %struct.bc_struct.0.2.4.6.8.15.24.27.29.32.38.46.48.92.93.94.95.97.99.100.102.105.107.111.118.119.121* @new_num() #0
	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

test/Isl/CodeGen/phi_in_exit_early_lnt_failure_5.ll

	; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s			; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s
	;			;
	; This caused an lnt crash at some point, just verify it will run through and			; This caused an lnt crash at some point, just verify it will run through and
	; produce the PHI node in the exit we are looking for.			; produce the PHI node in the exit we are looking for.
	;			;
	; CHECK-LABEL: polly.merge_new_and_old:			; CHECK-LABEL: polly.merge_new_and_old:
	; CHECK-NEXT: %eps1.addr.0.ph.merge = phi double [ %eps1.addr.0.ph.final_reload, %polly.exiting ], [ %eps1.addr.0.ph, %if.end.47.region_exiting ]			; CHECK-NEXT: %eps1.addr.0.ph.merge = phi double [ %eps1.merge, %polly.exiting ], [ %eps1.addr.0.ph, %if.end.47.region_exiting ]
	; CHECK-NEXT: br label %if.end.47			; CHECK-NEXT: br label %if.end.47
	;			;
	; CHECK-LABEL: if.end.47:			; CHECK-LABEL: if.end.47:
	; CHECK-NEXT: %eps1.addr.0 = phi double [ %eps1.addr.0.ph.merge, %polly.merge_new_and_old ]			; CHECK-NEXT: %eps1.addr.0 = phi double [ %eps1.addr.0.ph.merge, %polly.merge_new_and_old ]
	;			;
	define void @dbisect(double* %c, double* %b, double %eps1, double* %eps2) {			define void @dbisect(double* %c, double* %b, double %eps1, double* %eps2) {
	entry:			entry:
	br label %entry.split			br label %entry.split
	▲ Show 20 Lines • Show All 124 Lines • Show Last 20 Lines

test/Isl/CodeGen/phi_loop_carried_float.ll

	; RUN: opt %loadPolly -S -polly-codegen < %s \| FileCheck %s			; RUN: opt %loadPolly -S -polly-codegen < %s \| FileCheck %s
	;			;
	; float f(float *A, int N) {			; float f(float *A, int N) {
	; float tmp = 0;			; float tmp = 0;
	; for (int i = 0; i < N; i++)			; for (int i = 0; i < N; i++)
	; tmp += A[i];			; tmp += A[i];
	; }			; }
	;			;
	; CHECK: bb:			; CHECK: bb:
	; CHECK-NOT: %tmp7{{[.*]}} = alloca float			; CHECK-NOT: alloca
	; CHECK-DAG: %tmp.0.s2a = alloca float
	; CHECK-NOT: %tmp7{{[.*]}} = alloca float
	; CHECK-DAG: %tmp.0.phiops = alloca float
	; CHECK-NOT: %tmp7{{[.*]}} = alloca float

	; CHECK-LABEL: exit:			; CHECK-LABEL: polly.merge_new_and_old:
	; CHECK-NEXT: ret			; CHECK-NEXT: br label

	; CHECK-LABEL: polly.start:			; CHECK-LABEL: polly.start:
	; CHECK-NEXT: sext			; CHECK-NEXT: sext
	; CHECK-NEXT: store float 0.000000e+00, float* %tmp.0.phiops			; CHECK-NEXT: br label

				; CHECK-LABEL: polly.loop_exit:
				; CHECK-DAG: %tmp.0.polly.lc.merge = phi float [ 0.000000e+00, %polly.loop_if ], [ %tmp.0.polly.lc, %polly.merge ]
				; CHECK-DAG: %p_tmp7.merge.merge = phi float [ undef, %polly.loop_if ], [ %p_tmp7.merge, %polly.merge ]

	; CHECK-LABEL: polly.exiting:			; CHECK-LABEL: polly.exiting:
	; CHECK-NEXT: br label %polly.merge_new_and_old			; CHECK-NEXT: br label %polly.merge_new_and_old

	; CHECK-LABEL: polly.stmt.bb1{{[0-9]*}}:			; CHECK-LABEL: polly.loop_header:
	; CHECK-NEXT: %tmp.0.phiops.reload[[R1:[0-9]]] = load float, float %tmp.0.phiops			; CHECK-NEXT: %polly.indvar = phi i64
	; CHECK: store float %tmp.0.phiops.reload[[R1]], float* %tmp.0.s2a			; CHECK-NEXT: %tmp.0.polly.lc = phi float [ 0.000000e+00, %polly.loop_preheader ], [ %p_tmp7.merge, %polly.merge ]

	; CHECK-LABEL: polly.stmt.bb4:			; CHECK-LABEL: polly.merge:
	; CHECK: %tmp.0.s2a.reload[[R3:[0-9]]] = load float, float %tmp.0.s2a			; CHECK-NEXT: %p_tmp7.merge = phi float [ %p_tmp7, %polly.stmt.bb4 ], [ undef, %polly.else ]
	; CHECK: %tmp[[R5:[0-9]]]_p_scalar_ = load float, float %scevgep, align 4, !alias.scope !0, !noalias !2
	; CHECK: %p_tmp[[R4:[0-9]*]] = fadd float %tmp.0.s2a.reload[[R3]], %tmp[[R5]]_p_scalar_
	; CHECK: store float %p_tmp[[R4]], float* %tmp.0.phiops

	; CHECK-LABEL: polly.stmt.bb1{{[0-9]*}}:
	; CHECK-NEXT: %tmp.0.phiops.reload[[R2:[0-9]]] = load float, float %tmp.0.phiops
	; CHECK: store float %tmp.0.phiops.reload[[R2]], float* %tmp.0.s2a

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define void @f(float* %A, i32 %N) {			define void @f(float* %A, i32 %N) {
	bb:			bb:
	%tmp = sext i32 %N to i64			%tmp = sext i32 %N to i64
	br label %bb1			br label %bb1

	Show All 22 Lines

test/Isl/CodeGen/phi_loop_carried_float_2.ll

This file was added.

				; RUN: opt %loadPolly -S -polly-codegen < %s \| FileCheck %s
				;
				; float f(float *A, int N) {
				; float tmp = 0;
				; int i = 0;
				; do {
				; tmp += A[i];
				; } while (i++ < N);
				; }
				;
				; CHECK-NOT: alloca

				; CHECK-LABEL: polly.merge_new_and_old:
				; CHECK-DAG: %tmp7.merge = phi float [ %p_tmp[[r:[0-9]*]].merge, %polly.exiting ], [ %tmp7, %bb8 ]
				; CHECK-DAG: %tmp.0.merge = phi float [ %tmp.0.polly.lc.merge, %polly.exiting ], [ %tmp.0, %bb8 ]
				; CHECK-NEXT: br label %exit
				;
				; CHECK-LABEL: exit:
				; CHECK-NEXT: %add_exit = fadd float %tmp.0.merge, %tmp7.merge
				; CHECK-NEXT: ret float %add_exit
				;
				; CHECK-LABEL: polly.loop_header:
				; CHECK-NEXT: %polly.indvar = phi i64 [ 0, %polly.loop_preheader ], [ %polly.indvar_next, %polly.stmt.bb1 ]
				; CHECK-NEXT: %tmp.0.polly.lc = phi float [ 0.000000e+00, %polly.loop_preheader ], [ %p_tmp7, %polly.stmt.bb1 ]
				;
				; CHECK: %p_tmp7 = fadd float %tmp.0.polly.lc, %tmp6_p_scalar_

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define float @f(float* %A, i32 %N) {
				bb:
				%tmp = sext i32 %N to i64
				br label %bb1

				bb1: ; preds = %bb4, %bb
				%indvars.iv = phi i64 [ %indvars.iv.next, %bb1 ], [ 0, %bb ]
				%tmp.0 = phi float [ 0.000000e+00, %bb ], [ %tmp7, %bb1 ]
				%tmp5 = getelementptr inbounds float, float* %A, i64 %indvars.iv
				%tmp6 = load float, float* %tmp5, align 4
				%tmp7 = fadd float %tmp.0, %tmp6
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%tmp2 = icmp slt i64 %indvars.iv, %tmp
				br i1 %tmp2, label %bb1, label %bb8

				bb8: ; preds = %bb1
				br label %exit

				exit:
				%add_exit = fadd float %tmp.0, %tmp7
				ret float %add_exit
				}

test/Isl/CodeGen/phi_loop_carried_float_3.ll

This file was added.

				; RUN: opt %loadPolly -S -polly-codegen < %s \| FileCheck %s
				;
				; float f(float *A) {
				; float tmp = 0;
				; int i = 0;
				; do {
				; tmp += A[i];
				; } while (i++ < 100);
				; }
				;
				; CHECK-NOT: alloca
				;
				; CHECK-LABEL: polly.loop_header:
				; CHECK-NEXT: %polly.indvar = phi i64 [ 0, %polly.loop_preheader ], [ %polly.indvar_next, %polly.stmt.bb2 ]
				; CHECK-NEXT: %tmp.0.polly.lc = phi float [ 0.000000e+00, %polly.loop_preheader ], [ %p_tmp7, %polly.stmt.bb2 ]
				;
				; CHECK: %p_tmp7 = fadd float %tmp.0.polly.lc, %tmp6_p_scalar_

				; CHECK-LABEL: polly.stmt.bb2:
				; CHECK-NEXT: %p_tmp7copy = fadd float %tmp.0.polly.lc, %p_tmp7


				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define float @f(float* %A) {
				bb:
				br label %bb1

				bb1: ; preds = %bb4, %bb
				%indvars.iv = phi i64 [ %indvars.iv.next, %bb2 ], [ 0, %bb ]
				%tmp.0 = phi float [ 0.000000e+00, %bb ], [ %tmp7, %bb2 ]
				%tmp5 = getelementptr inbounds float, float* %A, i64 %indvars.iv
				%tmp6 = load float, float* %tmp5, align 4
				%tmp7 = fadd float %tmp.0, %tmp6
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%tmp2 = icmp slt i64 %indvars.iv, 100
				br label %bb2

				bb2:
				%tmp7copy = fadd float %tmp.0, %tmp7
				br i1 %tmp2, label %bb1, label %bb8

				bb8: ; preds = %bb1
				br label %exit

				exit:
				%add_exit = fadd float %tmp.0, %tmp7
				ret float %add_exit
				}

test/Isl/CodeGen/phi_loop_carried_float_4.ll

This file was added.

				; RUN: opt %loadPolly -S -polly-codegen < %s \| FileCheck %s
				;
				; float f() {
				; int i = 0;
				; float lc_a, lc_b, lc_c;
				; float a, b, c;
				; a = b = c = 0;
				;
				; do {
				; lc_a = a;
				; lc_b = b;
				; lc_c = c;
				; if (i > 50)
				; a += 1;
				; else
				; b -= 2;
				; c += a + b;
				; } while (i++ < 100);
				; return a + b + c + lc_a + lc_b + lc_c;
				; }
				;
				; CHECK: do.body:
				; CHECK-DAG: %a.0 = phi float [ %a.1, %do.cond ], [ 0.000000e+00, %polly.split_new_and_old ]
				; CHECK-DAG: %b.0 = phi float [ %b.1, %do.cond ], [ 0.000000e+00, %polly.split_new_and_old ]
				; CHECK-DAG: %c.0 = phi float [ %add2, %do.cond ], [ 0.000000e+00, %polly.split_new_and_old ]
				; CHECK-DAG: %i.0 = phi i32 [ %inc, %do.cond ], [ 0, %polly.split_new_and_old ]
				; CHECK: %cmp = icmp sgt i32 %i.0, 50
				; CHECK: br i1 %cmp, label %if.then, label %if.else
				;
				; CHECK: if.then:
				; CHECK: %add = fadd float %a.0, 1.000000e+00
				; CHECK: br label %if.end
				;
				; CHECK: if.else:
				; CHECK: %sub = fadd float %b.0, -2.000000e+00
				; CHECK: br label %if.end
				;
				; CHECK: if.end:
				; CHECK-DAG: %a.1 = phi float [ %add, %if.then ], [ %a.0, %if.else ]
				; CHECK-DAG: %b.1 = phi float [ %b.0, %if.then ], [ %sub, %if.else ]
				; CHECK: %add1 = fadd float %a.1, %b.1
				; CHECK: %add2 = fadd float %c.0, %add1
				; CHECK: br label %do.cond
				;
				; CHECK: do.cond:
				; CHECK: %inc = add nuw nsw i32 %i.0, 1
				; CHECK: %exitcond = icmp ne i32 %inc, 101
				; CHECK: br i1 %exitcond, label %do.body, label %polly.merge_new_and_old
				;
				; CHECK: polly.merge_new_and_old:
				; CHECK-DAG: %add2.merge = phi float [ %p_add2, %polly.exiting ], [ %add2, %do.cond ]
				; CHECK-DAG: %c.0.merge = phi float [ %c.0.polly.lc, %polly.exiting ], [ %c.0, %do.cond ]
				; CHECK-DAG: %a.0.merge = phi float [ %a.0.polly.lc, %polly.exiting ], [ %a.0, %do.cond ]
				; CHECK-DAG: %b.1.merge = phi float [ %p_sub.merge, %polly.exiting ], [ %b.1, %do.cond ]
				; CHECK-DAG: %b.0.merge = phi float [ %b.0.polly.lc, %polly.exiting ], [ %b.0, %do.cond ]
				; CHECK-DAG: %a.1.merge = phi float [ %a.0.polly.lc.merge, %polly.exiting ], [ %a.1, %do.cond ]
				; CHECK: br label %do.end
				;
				; CHECK: do.end:
				; CHECK-DAG: %add2.lcssa = phi float [ %add2.merge, %polly.merge_new_and_old ]
				; CHECK-DAG: %b.1.lcssa = phi float [ %b.1.merge, %polly.merge_new_and_old ]
				; CHECK-DAG: %a.1.lcssa = phi float [ %a.1.merge, %polly.merge_new_and_old ]
				; CHECK-DAG: %c.0.lcssa = phi float [ %c.0.merge, %polly.merge_new_and_old ]
				; CHECK-DAG: %b.0.lcssa = phi float [ %b.0.merge, %polly.merge_new_and_old ]
				; CHECK-DAG: %a.0.lcssa = phi float [ %a.0.merge, %polly.merge_new_and_old ]
				; CHECK: %add4 = fadd float %a.1.lcssa, %b.1.lcssa
				; CHECK: %add5 = fadd float %add4, %add2.lcssa
				; CHECK: %add6 = fadd float %add5, %a.0.lcssa
				; CHECK: %add7 = fadd float %add6, %b.0.lcssa
				; CHECK: %add8 = fadd float %add7, %c.0.lcssa
				; CHECK: ret float %add8
				;
				; CHECK: polly.start:
				; CHECK: br label %polly.loop_preheader
				;
				; CHECK: polly.loop_exit:
				; CHECK: br label %polly.exiting
				;
				; CHECK: polly.exiting:
				; CHECK: br label %polly.merge_new_and_old
				;
				; CHECK: polly.loop_header:
				; CHECK-DAG: %polly.indvar = phi i64 [ 0, %polly.loop_preheader ], [ %polly.indvar_next, %polly.stmt.do.cond ]
				; CHECK-DAG: %c.0.polly.lc = phi float [ 0.000000e+00, %polly.loop_preheader ], [ %p_add2, %polly.stmt.do.cond ]
				; CHECK-DAG: %a.0.polly.lc = phi float [ 0.000000e+00, %polly.loop_preheader ], [ %a.0.polly.lc.merge, %polly.stmt.do.cond ]
				; CHECK-DAG: %b.0.polly.lc = phi float [ 0.000000e+00, %polly.loop_preheader ], [ %p_sub.merge, %polly.stmt.do.cond ]
				; CHECK: br label %polly.stmt.do.body
				;
				; CHECK: polly.stmt.do.body:
				; CHECK: %0 = trunc i64 %polly.indvar to i32
				; CHECK: %p_cmp = icmp sgt i32 %0, 50
				; CHECK: br label %polly.cond
				;
				; CHECK: polly.cond:
				; CHECK: %1 = icmp sle i64 %polly.indvar, 50
				; CHECK: br i1 %1, label %polly.then, label %polly.else
				;
				; CHECK: polly.merge:
				; CHECK-DAG: %p_sub.merge = phi float [ %p_sub, %polly.stmt.if.else ], [ %b.0.polly.lc, %polly.stmt.if.then ]
				; CHECK-DAG: %a.0.polly.lc.merge = phi float [ %a.0.polly.lc, %polly.stmt.if.else ], [ %p_add, %polly.stmt.if.then ]
				; CHECK: br label %polly.stmt.if.end
				;
				; CHECK: polly.stmt.if.end:
				; CHECK: %p_add1 = fadd float %a.0.polly.lc.merge, %p_sub.merge
				; CHECK: %p_add2 = fadd float %c.0.polly.lc, %p_add1
				; CHECK: br label %polly.stmt.do.cond
				;
				; CHECK: polly.stmt.do.cond:
				; CHECK: %2 = trunc i64 %polly.indvar to i32
				; CHECK: %3 = add i32 %2, 1
				; CHECK: %p_exitcond = icmp ne i32 %3, 101
				; CHECK: %polly.indvar_next = add nsw i64 %polly.indvar, 1
				; CHECK: %polly.loop_cond = icmp sle i64 %polly.indvar, 99
				; CHECK: br i1 %polly.loop_cond, label %polly.loop_header, label %polly.loop_exit
				;
				; CHECK: polly.loop_preheader:
				; CHECK: br label %polly.loop_header
				;
				; CHECK: polly.then:
				; CHECK: br label %polly.stmt.if.else
				;
				; CHECK: polly.stmt.if.else:
				; CHECK: %p_sub = fadd float %b.0.polly.lc, -2.000000e+00
				; CHECK: br label %polly.merge
				;
				; CHECK: polly.else:
				; CHECK: br label %polly.stmt.if.then
				;
				; CHECK: polly.stmt.if.then:
				; CHECK: %p_add = fadd float %a.0.polly.lc, 1.000000e+00
				; CHECK: br label %polly.merge
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define float @f() {
				entry:
				br label %do.body

				do.body:
				%a.0 = phi float [ 0.000000e+00, %entry ], [ %a.1, %do.cond ]
				%b.0 = phi float [ 0.000000e+00, %entry ], [ %b.1, %do.cond ]
				%c.0 = phi float [ 0.000000e+00, %entry ], [ %add2, %do.cond ]
				%i.0 = phi i32 [ 0, %entry ], [ %inc, %do.cond ]
				%cmp = icmp sgt i32 %i.0, 50
				br i1 %cmp, label %if.then, label %if.else

				if.then:
				%add = fadd float %a.0, 1.000000e+00
				br label %if.end

				if.else:
				%sub = fadd float %b.0, -2.000000e+00
				br label %if.end

				if.end:
				%a.1 = phi float [ %add, %if.then ], [ %a.0, %if.else ]
				%b.1 = phi float [ %b.0, %if.then ], [ %sub, %if.else ]
				%add1 = fadd float %a.1, %b.1
				%add2 = fadd float %c.0, %add1
				br label %do.cond

				do.cond:
				%inc = add nuw nsw i32 %i.0, 1
				%exitcond = icmp ne i32 %inc, 101
				br i1 %exitcond, label %do.body, label %do.end

				do.end:
				%add2.lcssa = phi float [ %add2, %do.cond ]
				%b.1.lcssa = phi float [ %b.1, %do.cond ]
				%a.1.lcssa = phi float [ %a.1, %do.cond ]
				%c.0.lcssa = phi float [ %c.0, %do.cond ]
				%b.0.lcssa = phi float [ %b.0, %do.cond ]
				%a.0.lcssa = phi float [ %a.0, %do.cond ]
				%add4 = fadd float %a.1.lcssa, %b.1.lcssa
				%add5 = fadd float %add4, %add2.lcssa
				%add6 = fadd float %add5, %a.0.lcssa
				%add7 = fadd float %add6, %b.0.lcssa
				%add8 = fadd float %add7, %c.0.lcssa
				ret float %add8
				}

test/Isl/CodeGen/phi_loop_carried_float_5.ll

This file was added.

				; RUN: opt %loadPolly -analyze < %s \| FileCheck %s
				;
				; FIXME: Edit the run line and add checks!
				;
				; XFAIL: *
				grosserUnsubmitted Not Done Reply Inline Actions This test case is incomplete. Without a comment it is unclear what exactly is tested here. (The test case seems to be interesting, though) grosser: This test case is incomplete. Without a comment it is unclear what exactly is tested here. (The…
				;
				; float f() {
				; int i = 0;
				; float lc_a;
				; float a = 0;
				;
				; do {
				; lc_a = a;
				;
				; if (i++ > 100)
				; break;
				;
				; if (i > 50) {
				; a += 1;
				; continue;
				; }
				;
				; if (i > 25) {
				; a -= 2;
				; continue;
				; }
				;
				; } while (1);
				;
				; return a + lc_a;
				; }
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define float @f() {
				entry:
				br label %do.body

				do.body: ; preds = %do.cond, %entry
				%a.0 = phi float [ 0.000000e+00, %entry ], [ %a.1, %do.cond ]
				%i.0 = phi i32 [ 0, %entry ], [ %inc, %do.cond ]
				%inc = add nuw nsw i32 %i.0, 1
				%cmp = icmp sgt i32 %i.0, 100
				br i1 %cmp, label %if.then, label %if.end

				if.then: ; preds = %do.body
				%a.0.lcssa = phi float [ %a.0, %do.body ]
				br label %do.end

				if.end: ; preds = %do.body
				%cmp1 = icmp sgt i32 %i.0, 49
				br i1 %cmp1, label %if.then2, label %if.end3

				if.then2: ; preds = %if.end
				%add = fadd float %a.0, 1.000000e+00
				br label %do.cond

				if.end3: ; preds = %if.end
				%cmp4 = icmp sgt i32 %i.0, 24
				br i1 %cmp4, label %if.then5, label %if.end6

				if.then5: ; preds = %if.end3
				%sub = fadd float %a.0, -2.000000e+00
				br label %do.cond

				if.end6: ; preds = %if.end3
				br label %do.cond

				do.cond: ; preds = %if.end6, %if.then5, %if.then2
				%a.1 = phi float [ %add, %if.then2 ], [ %sub, %if.then5 ], [ %a.0, %if.end6 ]
				br i1 true, label %do.body, label %do.end.loopexit

				do.end.loopexit: ; preds = %do.cond
				%a.1.lcssa = phi float [ %a.1, %do.cond ]
				%a.0.lcssa1 = phi float [ %a.0, %do.cond ]
				br label %do.end

				do.end: ; preds = %do.end.loopexit, %if.then
				%a.02 = phi float [ %a.0.lcssa, %if.then ], [ %a.0.lcssa1, %do.end.loopexit ]
				%a.2 = phi float [ %a.0.lcssa, %if.then ], [ %a.1.lcssa, %do.end.loopexit ]
				%add7 = fadd float %a.2, %a.02
				ret float %add7
				}

test/Isl/CodeGen/phi_loop_carried_float_6.ll

This file was added.

				; RUN: opt %loadPolly -analyze < %s \| FileCheck %s
				;
				; FIXME: Edit the run line and add checks!
				;
				; XFAIL: *
				;
				; float f() {
				; int i = 0;
				; float lc_a;
				; float a = 0;
				;
				; do {
				; lc_a = a;
				;
				; if (i++ > 100)
				; break;
				;
				; if (i > 50) {
				; a += 1;
				; continue;
				; }
				;
				; if (i > 25) {
				; a -= 2;
				; continue;
				; }
				;
				; } while (1);
				;
				; return a + lc_a;
				; }
				;

				; CHECK-LABEL: polly.merge_new_and_old:
				; CHECK-NEXT: %a.0.merge = phi float [ %a.0.polly.lc, %polly.exiting ], [ %a.0, %do.body ]
				;
				; CHECK: %a.0.polly.lc = phi float [ 0.000000e+00, %polly.loop_preheader ], [ %a.0.polly.lc.merge, %polly.merge ]
				;
				; CHECK: %a.0.polly.lc.merge = phi float [ %a.0.polly.lc, %polly.stmt.if.end6 ], [ %p_sub.merge, %polly.merge2 ]
				;
				; CHECK: %p_sub.merge = phi float [ %p_sub, %polly.stmt.if.then5 ], [ %p_add.merge, %polly.merge6 ]
				;
				; CHECK: %p_sub = fadd float %a.0.polly.lc, -2.000000e+00
				;
				; CHECK: %p_add.merge = phi float [ %p_add, %polly.stmt.if.then2 ], [ undef, %polly.else8 ]
				;
				; CHECK: %p_add = fadd float %a.0.polly.lc, 1.000000e+00
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define float @f() {
				entry:
				br label %do.body

				do.body: ; preds = %do.cond, %entry
				%a.0 = phi float [ 0.000000e+00, %entry ], [ %add, %if.then2 ], [ %sub, %if.then5 ], [ %a.0, %if.end6 ]
				%i.0 = phi i32 [ 0, %entry ], [ %inc, %if.then2 ], [ %inc, %if.then5 ], [ %inc, %if.end6 ]
				%inc = add nuw nsw i32 %i.0, 1
				%cmp = icmp sgt i32 %i.0, 100
				br i1 %cmp, label %do.end, label %if.end

				if.end: ; preds = %do.body
				%cmp1 = icmp sgt i32 %i.0, 49
				br i1 %cmp1, label %if.then2, label %if.end3

				if.then2: ; preds = %if.end
				%add = fadd float %a.0, 1.000000e+00
				br label %do.body

				if.end3: ; preds = %if.end
				%cmp4 = icmp sgt i32 %i.0, 24
				br i1 %cmp4, label %if.then5, label %if.end6

				if.then5: ; preds = %if.end3
				%sub = fadd float %a.0, -2.000000e+00
				br label %do.body

				if.end6: ; preds = %if.end3
				br label %do.body

				do.end: ; preds = %do.end.loopexit, %if.then
				ret float %a.0
				}

test/Isl/CodeGen/phi_loop_carried_float_escape.ll

	; RUN: opt %loadPolly -S \			; RUN: opt %loadPolly -S \
	; RUN: -polly-analyze-read-only-scalars=false -polly-codegen < %s \| FileCheck %s			; RUN: -polly-analyze-read-only-scalars=false -polly-codegen < %s \| FileCheck %s

	; RUN: opt %loadPolly -S \			; RUN: opt %loadPolly -S \
	; RUN: -polly-analyze-read-only-scalars=true -polly-codegen < %s \| FileCheck %s			; RUN: -polly-analyze-read-only-scalars=true -polly-codegen < %s \| FileCheck %s
	;			;
	; float f(float *A, int N) {			; float f(float *A, int N) {
	; float tmp = 0;			; float tmp = 0;
	; for (int i = 0; i < N; i++)			; for (int i = 0; i < N; i++)
	; tmp += A[i];			; tmp += A[i];
	; return tmp;			; return tmp;
	; }			; }

	; CHECK-LABEL: polly.merge_new_and_old:			; CHECK-LABEL: polly.merge_new_and_old:
	; CHECK-NEXT: %tmp.0.merge = phi float [ %tmp.0.final_reload, %polly.exiting ], [ %tmp.0, %bb8 ]			; CHECK-NEXT: %tmp.0.merge = phi float [ %tmp.0.polly.lc.merge, %polly.exiting ], [ %tmp.0, %bb8 ]
	; CHECK-NEXT: br label %exit			; CHECK-NEXT: br label %exit

	; CHECK-LABEL: polly.start:			; CHECK-LABEL: polly.loop_exit:
	; CHECK-NEXT: sext			; CHECK-DAG: %tmp.0.polly.lc.merge = phi float [ 0.000000e+00, %polly.loop_if ], [ %tmp.0.polly.lc, %polly.merge ]
	; CHECK-NEXT: store float 0.000000e+00, float* %tmp.0.phiops			; CHECK-DAG: %p_tmp7.merge.merge = phi float [ undef, %polly.loop_if ], [ %p_tmp7.merge, %polly.merge ]

	; CHECK-LABEL: polly.exiting:			; CHECK-LABEL: polly.exiting:
	; CHECK-NEXT: %tmp.0.final_reload = load float, float* %tmp.0.s2a
	; CHECK-NEXT: br label %polly.merge_new_and_old			; CHECK-NEXT: br label %polly.merge_new_and_old

	; CHECK-LABEL: polly.stmt.bb1{{[0-9]*}}:			; CHECK-LABEL: polly.loop_header:
	; CHECK-NEXT: %tmp.0.phiops.reload[[R1:[0-9]]] = load float, float %tmp.0.phiops			; CHECK-NEXT: %polly.indvar = phi i64 [ 0, %polly.loop_preheader ], [ %polly.indvar_next, %polly.merge ]
	; CHECK-: store float %tmp.0.phiops.reload[[R1]], float* %tmp.0.s2a			; CHECK-NEXT: %tmp.0.polly.lc = phi float [ 0.000000e+00, %polly.loop_preheader ], [ %p_tmp7.merge, %polly.merge ]

				; CHECK-LABEL: polly.merge:
				; CHECK-NEXT: %p_tmp7.merge = phi float [ %p_tmp7, %polly.stmt.bb4 ], [ undef, %polly.else ]

	; CHECK-LABEL: polly.stmt.bb4:			; CHECK-LABEL: polly.stmt.bb4:
	; CHECK: %tmp.0.s2a.reload[[R3:[0-9]]] = load float, float %tmp.0.s2a			; CHECK-NEXT: %scevgep = getelementptr float, float* %A, i64 %polly.indvar
	; CHECK: %tmp[[R5:[0-9]]]_p_scalar_ = load float, float %scevgep, align 4, !alias.scope !0, !noalias !2			; CHECK-NEXT: %tmp6_p_scalar_ = load float, float* %scevgep
	; CHECK: %p_tmp[[R4:[0-9]*]] = fadd float %tmp.0.s2a.reload[[R3]], %tmp[[R5]]_p_scalar_			; CHECK-NEXT: %p_tmp7 = fadd float %tmp.0.polly.lc, %tmp6_p_scalar_
	; CHECK: store float %p_tmp[[R4]], float* %tmp.0.phiops

	; CHECK-LABEL: polly.stmt.bb1{{[0-9]*}}:
	; CHECK-NEXT: %tmp.0.phiops.reload[[R2:[0-9]]] = load float, float %tmp.0.phiops
	; CHECK: store float %tmp.0.phiops.reload[[R2]], float* %tmp.0.s2a

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define float @f(float* %A, i32 %N) {			define float @f(float* %A, i32 %N) {
	bb:			bb:
	%tmp = sext i32 %N to i64			%tmp = sext i32 %N to i64
	br label %bb1			br label %bb1

	Show All 22 Lines

test/Isl/CodeGen/phi_scalar_simple_1.ll

	; RUN: opt %loadPolly -S -polly-codegen < %s \| FileCheck %s			; RUN: opt %loadPolly -S -polly-codegen < %s \| FileCheck %s
	;			;
	; int jd(int *restrict A, int x, int N) {			; int jd(int *restrict A, int x, int N) {
	; for (int i = 1; i < N; i++)			; for (int i = 1; i < N; i++)
	; for (int j = 3; j < N; j++)			; for (int j = 3; j < N; j++)
	; x += A[i];			; x += A[i];
	; return x;			; return x;
	; }			; }
	;			;
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define i32 @jd(i32* noalias %A, i32 %x, i32 %N) {			define i32 @jd(i32* noalias %A, i32 %x, i32 %N) {
	entry:			entry:
	; CHECK-LABEL: entry:			; CHECK-LABEL: entry:
	; CHECK-DAG: %x.addr.1.lcssa.s2a = alloca i32			; CHECK-NOT: alloca
	; CHECK-DAG: %x.addr.1.lcssa.phiops = alloca i32
	; CHECK-DAG: %x.addr.1.s2a = alloca i32
	; CHECK-DAG: %x.addr.1.phiops = alloca i32
	; CHECK-DAG: %x.addr.0.s2a = alloca i32
	; CHECK-DAG: %x.addr.0.phiops = alloca i32
	%tmp = sext i32 %N to i64			%tmp = sext i32 %N to i64
	br label %for.cond			br label %for.cond

	; CHECK-LABEL: polly.merge_new_and_old:			; CHECK-LABEL: polly.merge_new_and_old:
	; CHECK: %x.addr.0.merge = phi i32 [ %x.addr.0.final_reload, %polly.exiting ], [ %x.addr.0, %for.cond ]			; CHECK: %x.addr.0.merge = phi i32 [ %x.merge.merge, %polly.exiting ], [ %x.addr.0, %for.cond ]
	; CHECK: ret i32 %x.addr.0.merge			; CHECK: ret i32 %x.addr.0.merge

	; CHECK-LABEL: polly.start:			; CHECK-LABEL: polly.start:
	; CHECK: store i32 %x, i32* %x.addr.0.phiops			; CHECK-NEXT: br label %polly.cond

	; CHECK-LABEL: polly.merge:			; CHECK-LABEL: polly.merge:
	; CHECK: %x.addr.0.final_reload = load i32, i32* %x.addr.0.s2a			; CHECK: %x.merge.merge = phi i32 [ %x.merge, %polly.loop_exit ], [ %x, %polly.stmt.for.cond{{2[0-9]}} ]

				; CHECK-LABEL: polly.loop_exit:
				; CHECK: %x.merge = phi i32 [ %x, %polly.loop_if ], [ %x.addr.0.polly.lc.merge.merge[[p:[0-9]*]].merge, %polly.merge2 ]

				; CHECK-LABEL: polly.loop_header:
				; CHECK: %x.addr.0.polly.lc = phi i32 [ %x, %polly.loop_preheader ], [ %x.addr.0.polly.lc.merge.merge[[p]].merge, %polly.merge2 ]

	for.cond: ; preds = %for.inc4, %entry			for.cond: ; preds = %for.inc4, %entry
	; CHECK-LABEL: polly.stmt.for.cond{{[0-9]*}}:			; CHECK-LABEL: polly.stmt.for.cond{{[0-9]*}}:
	; CHECK: %x.addr.0.phiops.reload[[R1:[0-9]]] = load i32, i32 %x.addr.0.phiops			; CHECK-NOT: load
	; CHECK: store i32 %x.addr.0.phiops.reload[[R1]], i32* %x.addr.0.s2a			; CHECK-NOT: store
	%indvars.iv = phi i64 [ %indvars.iv.next, %for.inc4 ], [ 1, %entry ]			%indvars.iv = phi i64 [ %indvars.iv.next, %for.inc4 ], [ 1, %entry ]
	%x.addr.0 = phi i32 [ %x, %entry ], [ %x.addr.1.lcssa, %for.inc4 ]			%x.addr.0 = phi i32 [ %x, %entry ], [ %x.addr.1.lcssa, %for.inc4 ]
	%cmp = icmp slt i64 %indvars.iv, %tmp			%cmp = icmp slt i64 %indvars.iv, %tmp
	br i1 %cmp, label %for.body, label %for.end6			br i1 %cmp, label %for.body, label %for.end6

				; CHECK-LABEL: polly.merge2:
				; CHECK: %x.addr.0.polly.lc.merge.merge[[p]].merge = phi i32 [ %x.addr.0.polly.lc.merge.merge[[k:[0-9]*]], %polly.merge[[t:2[0-9]]] ], [ %x.addr.0.polly.lc, %polly.else4 ]

	for.body: ; preds = %for.cond			for.body: ; preds = %for.cond
	; CHECK-LABEL: polly.stmt.for.body:			; CHECK-LABEL: polly.stmt.for.body:
	; CHECK: %x.addr.0.s2a.reload[[R2:[0-9]]] = load i32, i32 %x.addr.0.s2a			; CHECK-NOT: load
	; CHECK: store i32 %x.addr.0.s2a.reload[[R2]], i32* %x.addr.1.phiops			; CHECK-NOT: store
	br label %for.cond1			br label %for.cond1

				; CHECK-LABEL: polly.loop_exit8:
				; CHECK: %x.addr.0.polly.lc.merge = phi i32 [ %x.addr.0.polly.lc, %polly.loop_if5 ], [ %p_add.merge, %polly.merge15 ]

				; CHECK: polly.merge[[t]]:
				; CHECK: %x.addr.0.polly.lc.merge.merge[[k]] = phi i32 [ %x.addr.0.polly.lc.merge, %polly.stmt.for.inc4 ], [ %x.addr.0.polly.lc, %polly.else{{2[0-9]}} ]

				; CHECK-LABEL: polly.loop_header6:
				; CHECK: %x.addr.1.polly.lc = phi i32 [ %x.addr.0.polly.lc, %polly.loop_preheader7 ], [ %p_add.merge, %polly.merge15 ]

				grosserUnsubmitted Not Done Reply Inline Actions These check lines are misleading as the current patch is generating more (unnecessary) PHI instructions here. In the optimal case we would not even generate them. As long as we generate them, we should list all of them with CHECK-NEXT and add a TODO that some of these PHIs are unnecessary and expected to be deleted later. grosser: These check lines are misleading as the current patch is generating more (unnecessary) PHI…
	for.cond1: ; preds = %for.inc, %for.body			for.cond1: ; preds = %for.inc, %for.body
	; CHECK-LABEL: polly.stmt.for.cond1:			; CHECK-LABEL: polly.stmt.for.cond1:
	; CHECK: %x.addr.1.phiops.reload = load i32, i32* %x.addr.1.phiops			; CHECK-NOT: load
	; CHECK: store i32 %x.addr.1.phiops.reload, i32* %x.addr.1.s2a[[R6:[0-9]*]]			; CHECK-NOT: store
	; CHECK: store i32 %x.addr.1.phiops.reload, i32* %x.addr.1.lcssa.phiops
	%x.addr.1 = phi i32 [ %x.addr.0, %for.body ], [ %add, %for.inc ]			%x.addr.1 = phi i32 [ %x.addr.0, %for.body ], [ %add, %for.inc ]
	%j.0 = phi i32 [ 3, %for.body ], [ %inc, %for.inc ]			%j.0 = phi i32 [ 3, %for.body ], [ %inc, %for.inc ]
	%exitcond = icmp ne i32 %j.0, %N			%exitcond = icmp ne i32 %j.0, %N
	br i1 %exitcond, label %for.body3, label %for.end			br i1 %exitcond, label %for.body3, label %for.end

				; CHECK-LABEL: polly.merge15:
				; CHECK-LABEL: %p_add.merge = phi i32 [ %p_add, %polly.stmt.for.inc ], [ %x.addr.1.polly.lc, %polly.else17 ]

	for.inc: ; preds = %for.body3			for.inc: ; preds = %for.body3
	; CHECK-LABEL: polly.stmt.for.inc:			; CHECK-LABEL: polly.stmt.for.inc:
	; CHECK: %x.addr.1.s2a.reload[[R3:[0-9]]] = load i32, i32 %x.addr.1.s2a			; CHECK: %p_add = add nsw i32 %x.addr.1.polly.lc, %tmp1_p_scalar_
	; CHECK: %p_add = add nsw i32 %x.addr.1.s2a.reload[[R3]], %tmp1_p_scalar_			; CHECK-NOT: load
	; CHECK: store i32 %p_add, i32* %x.addr.1.phiops			; CHECK-NOT: store
	%arrayidx = getelementptr inbounds i32, i32* %A, i64 %indvars.iv			%arrayidx = getelementptr inbounds i32, i32* %A, i64 %indvars.iv
	%tmp1 = load i32, i32* %arrayidx, align 4			%tmp1 = load i32, i32* %arrayidx, align 4
	%add = add nsw i32 %x.addr.1, %tmp1			%add = add nsw i32 %x.addr.1, %tmp1
	%inc = add nsw i32 %j.0, 1			%inc = add nsw i32 %j.0, 1
	br label %for.cond1			br label %for.cond1

	for.end: ; preds = %for.cond1			for.end: ; preds = %for.cond1
	; CHECK-LABEL: polly.stmt.for.end:			; CHECK-LABEL: polly.stmt.for.end:
	; CHECK-NEXT: %x.addr.1.lcssa.phiops.reload = load i32, i32* %x.addr.1.lcssa.phiops			; CHECK-NOT: load
	; CHECK-NEXT: store i32 %x.addr.1.lcssa.phiops.reload, i32* %x.addr.1.lcssa.s2a[[R4:[0-9]*]]			; CHECK-NOT: store
	%x.addr.1.lcssa = phi i32 [ %x.addr.1, %for.cond1 ]			%x.addr.1.lcssa = phi i32 [ %x.addr.1, %for.cond1 ]
	br label %for.inc4			br label %for.inc4

	for.inc4: ; preds = %for.end			for.inc4: ; preds = %for.end
	; CHECK-LABEL: polly.stmt.for.inc4:			; CHECK-LABEL: polly.stmt.for.inc4:
	; CHECK: %x.addr.1.lcssa.s2a.reload[[R5:[0-9]]] = load i32, i32 %x.addr.1.lcssa.s2a[[R4]]			; CHECK-NOT: load
	; CHECK: store i32 %x.addr.1.lcssa.s2a.reload[[R5]], i32* %x.addr.0.phiops			; CHECK-NOT: store
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	br label %for.cond			br label %for.cond

	for.body3: ; preds = %for.cond1			for.body3: ; preds = %for.cond1
	br label %for.inc			br label %for.inc

	for.end6: ; preds = %for.cond			for.end6: ; preds = %for.cond
	ret i32 %x.addr.0			ret i32 %x.addr.0
	}			}

test/Isl/CodeGen/phi_scalar_simple_2.ll

	; RUN: opt %loadPolly -S -polly-codegen < %s \| FileCheck %s			; RUN: opt %loadPolly -S -polly-codegen < %s \| FileCheck %s
	;			;
	; int jd(int *restrict A, int x, int N, int c) {			; int jd(int *restrict A, int x, int N, int c) {
	; for (int i = 0; i < N; i++)			; for (int i = 0; i < N; i++)
	; for (int j = 0; j < N; j++)			; for (int j = 0; j < N; j++)
	; if (i < c)			; if (i < c)
	; x += A[i];			; x += A[i];
	; return x;			; return x;
	; }			; }
	;			;
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define i32 @jd(i32* noalias %A, i32 %x, i32 %N, i32 %c) {			define i32 @jd(i32* noalias %A, i32 %x, i32 %N, i32 %c) {
	entry:			entry:
	; CHECK-LABEL: entry:			; CHECK-LABEL: entry:
	; CHECK-DAG: %x.addr.2.s2a = alloca i32			; CHECK-NOT: alloca
				grosserUnsubmitted Not Done Reply Inline Actions If we are dropping here almost all CHECK lines, this test case becomes useless. Either we should make an argument that it does indeed not test anything interesting and just delete it or we should add CHECK lines similar to the ones added in phi_scalar_simple_1.ll. If we make the argument this test case is redundant, it should be deleted in a separate commit. grosser: If we are dropping here almost all CHECK lines, this test case becomes useless. Either we…
	; CHECK-DAG: %x.addr.2.phiops = alloca i32
	; CHECK-DAG: %x.addr.1.s2a = alloca i32
	; CHECK-DAG: %x.addr.1.phiops = alloca i32
	; CHECK-DAG: %x.addr.0.s2a = alloca i32
	; CHECK-DAG: %x.addr.0.phiops = alloca i32
	%tmp = sext i32 %N to i64			%tmp = sext i32 %N to i64
	%tmp1 = sext i32 %c to i64			%tmp1 = sext i32 %c to i64
	br label %for.cond			br label %for.cond

	; CHECK-LABEL: polly.merge_new_and_old:
	; CHECK: %x.addr.0.merge = phi i32 [ %x.addr.0.final_reload, %polly.exiting ], [ %x.addr.0, %for.cond ]
	; CHECK: ret i32 %x.addr.0.merge

	; CHECK-LABEL: polly.start:
	; CHECK-NEXT: sext
	; CHECK-NEXT: store i32 %x, i32* %x.addr.0.phiops

	; CHECK-LABEL: polly.merge21:
	; CHECK: %x.addr.0.final_reload = load i32, i32* %x.addr.0.s2a

	for.cond: ; preds = %for.inc5, %entry			for.cond: ; preds = %for.inc5, %entry
	; CHECK-LABEL: polly.stmt.for.cond{{[0-9]*}}:
	; CHECK: %x.addr.0.phiops.reload[[R1:[0-9]]] = load i32, i32 %x.addr.0.phiops
	; CHECK: store i32 %x.addr.0.phiops.reload[[R1]], i32* %x.addr.0.s2a
	%indvars.iv = phi i64 [ %indvars.iv.next, %for.inc5 ], [ 0, %entry ]			%indvars.iv = phi i64 [ %indvars.iv.next, %for.inc5 ], [ 0, %entry ]
	%x.addr.0 = phi i32 [ %x, %entry ], [ %x.addr.1, %for.inc5 ]			%x.addr.0 = phi i32 [ %x, %entry ], [ %x.addr.1, %for.inc5 ]
	%cmp = icmp slt i64 %indvars.iv, %tmp			%cmp = icmp slt i64 %indvars.iv, %tmp
	br i1 %cmp, label %for.body, label %for.end7			br i1 %cmp, label %for.body, label %for.end7

	for.body: ; preds = %for.cond			for.body: ; preds = %for.cond
	; CHECK-LABEL: polly.stmt.for.body:
	; CHECK: %x.addr.0.s2a.reload[[R2:[0-9]]] = load i32, i32 %x.addr.0.s2a
	; CHECK: store i32 %x.addr.0.s2a.reload[[R2]], i32* %x.addr.1.phiops
	br label %for.cond1			br label %for.cond1

	for.inc5: ; preds = %for.end			for.inc5: ; preds = %for.end
	; CHECK-LABEL: polly.stmt.for.inc5:
	; CHECK: %x.addr.1.s2a.reload[[R5:[0-9]]] = load i32, i32 %x.addr.1.s2a
	; CHECK: store i32 %x.addr.1.s2a.reload[[R5]], i32* %x.addr.0.phiops
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	br label %for.cond			br label %for.cond

	for.cond1: ; preds = %for.inc, %for.body			for.cond1: ; preds = %for.inc, %for.body
	; CHECK-LABEL: polly.stmt.for.cond1:
	; CHECK: %x.addr.1.phiops.reload = load i32, i32* %x.addr.1.phiops
	; CHECK: store i32 %x.addr.1.phiops.reload, i32* %x.addr.1.s2a
	%x.addr.1 = phi i32 [ %x.addr.0, %for.body ], [ %x.addr.2, %for.inc ]			%x.addr.1 = phi i32 [ %x.addr.0, %for.body ], [ %x.addr.2, %for.inc ]
	%j.0 = phi i32 [ 0, %for.body ], [ %inc, %for.inc ]			%j.0 = phi i32 [ 0, %for.body ], [ %inc, %for.inc ]
	%exitcond = icmp ne i32 %j.0, %N			%exitcond = icmp ne i32 %j.0, %N
	br i1 %exitcond, label %for.body3, label %for.end			br i1 %exitcond, label %for.body3, label %for.end

	for.body3: ; preds = %for.cond1			for.body3: ; preds = %for.cond1
	; CHECK-LABEL: polly.stmt.for.body3:
	; CHECK: %x.addr.1.s2a.reload = load i32, i32* %x.addr.1.s2a
	; CHECK: store i32 %x.addr.1.s2a.reload, i32* %x.addr.2.phiops
	%cmp4 = icmp slt i64 %indvars.iv, %tmp1			%cmp4 = icmp slt i64 %indvars.iv, %tmp1
	br i1 %cmp4, label %if.then, label %if.end			br i1 %cmp4, label %if.then, label %if.end

	if.end: ; preds = %if.then, %for.body3			if.end: ; preds = %if.then, %for.body3
	; CHECK-LABEL: polly.stmt.if.end:
	; CHECK: %x.addr.2.phiops.reload = load i32, i32* %x.addr.2.phiops
	; CHECK: store i32 %x.addr.2.phiops.reload, i32* %x.addr.2.s2a
	%x.addr.2 = phi i32 [ %add, %if.then ], [ %x.addr.1, %for.body3 ]			%x.addr.2 = phi i32 [ %add, %if.then ], [ %x.addr.1, %for.body3 ]
	br label %for.inc			br label %for.inc

	for.inc: ; preds = %if.end			for.inc: ; preds = %if.end
	; CHECK-LABEL: polly.stmt.for.inc:
	; CHECK: %x.addr.2.s2a.reload[[R3:[0-9]]] = load i32, i32 %x.addr.2.s2a
	; CHECK: store i32 %x.addr.2.s2a.reload[[R3]], i32* %x.addr.1.phiops
	%inc = add nsw i32 %j.0, 1			%inc = add nsw i32 %j.0, 1
	br label %for.cond1			br label %for.cond1

	if.then: ; preds = %for.body3			if.then: ; preds = %for.body3
	; CHECK-LABEL: polly.stmt.if.then:
	; CHECK: %x.addr.1.s2a.reload[[R5:[0-9]]] = load i32, i32 %x.addr.1.s2a
	; CHECK: %p_add = add nsw i32 %x.addr.1.s2a.reload[[R5]], %tmp2_p_scalar_
	; CHECK: store i32 %p_add, i32* %x.addr.2.phiops
	%arrayidx = getelementptr inbounds i32, i32* %A, i64 %indvars.iv			%arrayidx = getelementptr inbounds i32, i32* %A, i64 %indvars.iv
	%tmp2 = load i32, i32* %arrayidx, align 4			%tmp2 = load i32, i32* %arrayidx, align 4
	%add = add nsw i32 %x.addr.1, %tmp2			%add = add nsw i32 %x.addr.1, %tmp2
	br label %if.end			br label %if.end

	for.end: ; preds = %for.cond1			for.end: ; preds = %for.cond1
	br label %for.inc5			br label %for.inc5

	for.end7: ; preds = %for.cond			for.end7: ; preds = %for.cond
	ret i32 %x.addr.0			ret i32 %x.addr.0
	}			}

test/Isl/CodeGen/phi_with_multi_exiting_edges_2.ll

	; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s			; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s
	;			;
	; CHECK: polly.merge_new_and_old:			; CHECK: polly.merge_new_and_old:
	; CHECK: %result.ph.merge = phi float [ %result.ph.final_reload, %polly.exiting ], [ %result.ph, %next.region_exiting ]			; CHECK: %result.ph.merge = phi float [ %p_sumB.merge, %polly.exiting ], [ %result.ph, %next.region_exiting ]
	; CHECK: br label %next			; CHECK: br label %next
	;			;
	; CHECK: next:			; CHECK: next:
	; CHECK: %result = phi float [ %result.ph.merge, %polly.merge_new_and_old ]			; CHECK: %result = phi float [ %result.ph.merge, %polly.merge_new_and_old ]
	; CHECK: ret float %result			; CHECK: ret float %result

	define float @foo(float* %A, i64 %param) {			define float @foo(float* %A, i64 %param) {
	entry:			entry:
	Show All 29 Lines

test/Isl/CodeGen/phi_with_one_exit_edge.ll

	; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s			; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s
	;			;
	;			;
	; CHECK: polly.merge_new_and_old:			; CHECK: polly.merge_new_and_old:
	; CHECK: %sumA.merge = phi float [ %sumA.final_reload, %polly.exiting ], [ %sumA, %loopA ]			; CHECK: %sumA.merge = phi float [ %p_sumA, %polly.exiting ], [ %sumA, %loopA ]
	; CHECK: br label %next			; CHECK: br label %next
	;			;
	; CHECK: next:			; CHECK: next:
	; CHECK: %result = phi float [ %sumA.merge, %polly.merge_new_and_old ]			; CHECK: %result = phi float [ %sumA.merge, %polly.merge_new_and_old ]
	; CHECK: ret float %result			; CHECK: ret float %result
	;			;
	define float @foo(float* %A, i64 %param) {			define float @foo(float* %A, i64 %param) {
	entry:			entry:
	Show All 19 Lines

test/Isl/CodeGen/pr25241.ll

	; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s			; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s

	; PR25241 (https://llvm.org/bugs/show_bug.cgi?id=25241)			; PR25241 (https://llvm.org/bugs/show_bug.cgi?id=25241)
	; Ensure that synthesized values of a PHI node argument are generated in the			; Ensure that synthesized values of a PHI node argument are generated in the
	; incoming block, not in the PHI's block.			; incoming block, not in the PHI's block.

	; CHECK-LABEL: polly.stmt.if.then.862:			; CHECK: polly.stmt.if.then.862: ; preds = %polly.stmt.if.then.813
	; CHECK: %[[R1:[0-9]+]] = add i32 %tmp, 1			; CHECK: %[[r:[0-9a-zA-Z_.]*]] = add nsw i32 %tmp, 1
	; CHECK: store i32 %0, i32* %curr.3.s2a			; CHECK: br label %polly.stmt.while.body.740.region_exiting
	; CHECK: br label			;
				; CHECK: polly.stmt.if.else.864: ; preds = %polly.stmt.if.then.813
	; CHECK-LABEL: polly.stmt.polly.merge_new_and_old.exit:			; CHECK: br label %polly.stmt.while.body.740.region_exiting
	; CHECK: %curr.3.ph.final_reload = load i32, i32* %curr.3.s2a			;
	; CHECK: br label			; CHECK: polly.stmt.while.body.740.region_exiting: ; preds = %polly.stmt.if.else.864, %polly.stmt.if.then.862
				; CHECK: %polly.curr.3.ph = phi i32 [ undef, %polly.stmt.if.else.864 ], [ %[[r]], %polly.stmt.if.then.862 ]
				; CHECK: br label %polly.stmt.polly.merge_new_and_old.exit


	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define void @BZ2_decompress() #0 {			define void @BZ2_decompress() #0 {
	entry:			entry:
	Show All 40 Lines

test/Isl/CodeGen/read-only-scalars.ll

	; RUN: opt %loadPolly -polly-analyze-read-only-scalars=false -polly-codegen \			; RUN: opt %loadPolly -polly-analyze-read-only-scalars=false -polly-codegen \
	; RUN: \			; RUN: \
	; RUN: -S < %s \| FileCheck %s			; RUN: -S < %s \| FileCheck %s
	; RUN: opt %loadPolly -polly-analyze-read-only-scalars=true -polly-codegen \			; RUN: opt %loadPolly -polly-analyze-read-only-scalars=true -polly-codegen \
	; RUN: \			; RUN: \
	; RUN: -S < %s \| FileCheck %s -check-prefix=SCALAR			; RUN: -S < %s \| FileCheck %s -check-prefix=SCALAR

	; CHECK-NOT: alloca			; CHECK-NOT: alloca
				; SCALAR-NOT: alloca
	; SCALAR-LABEL: entry:
	; SCALAR-NEXT: %scalar.s2a = alloca float

	; SCALAR-LABEL: polly.start:
	; SCALAR-NEXT: store float %scalar, float* %scalar.s2a

	; SCALAR-LABEL: polly.stmt.stmt1:
	; SCALAR-NEXT: %scalar.s2a.reload = load float, float* %scalar.s2a
	; SCALAR-NEXT: %val_p_scalar_ = load float, float* %A,
	; SCALAR-NEXT: %p_sum = fadd float %val_p_scalar_, %scalar.s2a.reload

	define void @foo(float* noalias %A, float %scalar) {			define void @foo(float* noalias %A, float %scalar) {
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%indvar = phi i64 [0, %entry], [%indvar.next, %loop.backedge]			%indvar = phi i64 [0, %entry], [%indvar.next, %loop.backedge]
	br label %stmt1			br label %stmt1
	Show All 15 Lines

test/Isl/CodeGen/scalar-dependence-reverse-text-order-two-uses.ll

This file was added.

				;RUN: opt %loadPolly -polly-import-jscop-dir=%S -polly-import-jscop -polly-ast -analyze < %s \| FileCheck %s --check-prefix=AST
				;RUN: opt %loadPolly -polly-import-jscop-dir=%S -polly-import-jscop -polly-codegen -S < %s \| FileCheck %s
				;
				; AST: for (int c0 = 0; c0 <= 101; c0 += 1) {
				; AST: if (c0 >= 1)
				; AST: Stmt_loopB(c0 - 1);
				; AST: if (c0 <= 100) {
				; AST: Stmt_loopA(c0);
				; AST: Stmt_loopC(c0);
				; AST: }
				; AST: }
				;
				; CHECK: polly.loop_header:
				; CHECK-NEXT: %polly.indvar = phi i64 [ 0, %polly.loop_preheader ], [ %polly.indvar_next, %polly.merge[[M:[0-9]*]] ]
				; CHECK-NEXT: %val.polly.lc = phi float [ undef, %polly.loop_preheader ], [ %val_p_scalar_.merge[[N:[0-9]*]], %polly.merge[[M]] ]
				;
				; CHECK: polly.merge[[M]]:
				; CHECK-DAG: %val_p_scalar_.merge = phi float [ %val_p_scalar_, %polly.stmt.loopC ], [ undef, %polly.else[[P:[0-9]*]] ]
				; CHECK-DAG: %val_p_scalar_.merge[[N]] = phi float [ %val_p_scalar_, %polly.stmt.loopC ], [ undef, %polly.else[[P]] ]
				;
				; CHECK: polly.stmt.loopB:
				; CHECK-NEXT: %scevgep = getelementptr float, float* %B, i64
				; CHECK-NEXT: store float %val.polly.lc, float* %scevgep
				;
				; CHECK: polly.stmt.loopA:
				; CHECK-NEXT: %scevgep[[K:[0-9]]] = getelementptr float, float %A, i64 %polly.indvar
				; CHECK-NEXT: %val_p_scalar_ = load float, float* %scevgep[[K]]
				; CHECK-NEXT: br label %polly.stmt.loopC
				;
				; CHECK: polly.stmt.loopC:
				; CHECK-NEXT: %scevgep[[L:[0-9]]] = getelementptr float, float %C, i64 %polly.indvar
				; CHECK-NEXT: store float %val_p_scalar_, float* %scevgep[[L]]
				; CHECK-NEXT: br label %polly.merge[[M]]
				;
				define void @sdrtotu(float* %A, float* %B, float* %C) {
				entry:
				br label %loopA

				loopA:
				%indvar = phi i64 [0, %entry], [%indvar.next, %loopB]
				%ptrA = getelementptr float, float* %A, i64 %indvar
				%val = load float, float* %ptrA
				br label %loopC

				loopC:
				%ptrC = getelementptr float, float* %C, i64 %indvar
				store float %val, float* %ptrC
				br label %loopB

				loopB:
				%indvar.next = add i64 %indvar, 1
				%ptrB = getelementptr float, float* %B, i64 %indvar
				store float %val, float* %ptrB
				%cmp = icmp sge i64 %indvar, 100
				br i1 %cmp, label %exit, label %loopA

				exit:
				ret void
				}

test/Isl/CodeGen/scalar-dependence-reverse-text-order.ll

This file was added.

				;RUN: opt %loadPolly -polly-import-jscop-dir=%S -polly-import-jscop -polly-ast -analyze < %s \| FileCheck %s --check-prefix=AST
				;RUN: opt %loadPolly -polly-import-jscop-dir=%S -polly-import-jscop -polly-codegen -S < %s \| FileCheck %s
				;
				; AST: for (int c0 = 0; c0 <= 101; c0 += 1) {
				; AST: if (c0 >= 1)
				; AST: Stmt_loopB(c0 - 1);
				; AST: if (c0 <= 100)
				; AST: Stmt_loopA(c0);
				; AST: }
				;
				; CHECK: polly.loop_header:
				; CHECK-NEXT: %polly.indvar = phi i64 [ 0, %polly.loop_preheader ], [ %polly.indvar_next, %polly.merge4 ]
				; CHECK-NEXT: %val.polly.lc = phi float [ undef, %polly.loop_preheader ], [ %val_p_scalar_.merge[[N:[0-9]*]], %polly.merge4 ]
				;
				; CHECK: polly.merge4:
				; CHECK-DAG: %val_p_scalar_.merge = phi float [ %val_p_scalar_, %polly.stmt.loopA ], [ undef, %polly.else6 ]
				; CHECK-DAG: %val_p_scalar_.merge[[N]] = phi float [ %val_p_scalar_, %polly.stmt.loopA ], [ undef, %polly.else6 ]
				;
				; CHECK: polly.stmt.loopB:
				; CHECK-NEXT: %scevgep = getelementptr float, float* %B, i64
				; CHECK-NEXT: store float %val.polly.lc, float* %scevgep
				;
				; CHECK: polly.stmt.loopA:
				; CHECK-NEXT: %scevgep7 = getelementptr float, float* %A, i64 %polly.indvar
				; CHECK-NEXT: %val_p_scalar_ = load float, float* %scevgep7
				; CHECK-NEXT: br label %polly.merge4
				;
				define void @sdrto(float* %A, float* %B) {
				entry:
				br label %loopA

				loopA:
				%indvar = phi i64 [0, %entry], [%indvar.next, %loopB]
				%ptrA = getelementptr float, float* %A, i64 %indvar
				%val = load float, float* %ptrA
				br label %loopB

				loopB:
				%indvar.next = add i64 %indvar, 1
				%ptrB = getelementptr float, float* %B, i64 %indvar
				store float %val, float* %ptrB
				%cmp = icmp sge i64 %indvar, 100
				br i1 %cmp, label %exit, label %loopA

				exit:
				ret void
				}

test/Isl/CodeGen/scalar-store-from-same-bb.ll

	; RUN: opt %loadPolly \			; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s
	; RUN: -polly-codegen -S < %s \| FileCheck %s

	; This test ensures that the expression N + 1 that is stored in the phi-node			; CHECK: polly.merge_new_and_old:
	; alloca, is directly computed and not incorrectly transfered through memory.			; CHECK: %res.merge = phi i64 [ %p_sum.merge, %polly.exiting ], [ %res, %merge ]
				; CHECK: br label %exit
				;
				; CHECK: exit:
				; CHECK: ret i64 %res.merge
				;
				; CHECK: polly.merge:
				; CHECK: %p_sum.merge = phi i64 [ %p_sum, %polly.loop_exit ], [ 0, %polly.else ]

	; CHECK: store i64 %2, i64* %res.phiops			; CHECK: polly.stmt.loop:
	; CHECK: %2 = add i64 %N, 1			; CHECK: %p_sum = add i64 %N, 1
				grosserUnsubmitted Not Done Reply Inline Actions Nice! grosser: Nice!

	define i64 @foo(float* %A, i64 %N) {			define i64 @foo(float* %A, i64 %N) {
	entry:			entry:
	br label %next			br label %next

	next:			next:
	%cond = icmp eq i64 %N, 0			%cond = icmp eq i64 %N, 0
	br i1 %cond, label %loop, label %merge			br i1 %cond, label %loop, label %merge
	Show All 16 Lines

test/Isl/CodeGen/sdrto___%loopA---%exit.jscop

This file was added.

				{
				"context" : "{ : }",
				"name" : "loopA => exit",
				"statements" : [
				{
				"accesses" : [
				{
				"kind" : "read",
				"relation" : "{ Stmt_loopA[i0] -> MemRef_A[i0] }"
				},
				{
				"kind" : "write",
				"relation" : "{ Stmt_loopA[i0] -> MemRef_val[] }"
				}
				],
				"domain" : "{ Stmt_loopA[i0] : i0 <= 100 and i0 >= 0 }",
				"name" : "Stmt_loopA",
				"schedule" : "{ Stmt_loopA[i0] -> [i0, 1] }"
				},
				{
				"accesses" : [
				{
				"kind" : "read",
				"relation" : "{ Stmt_loopB[i0] -> MemRef_val[] }"
				},
				{
				"kind" : "write",
				"relation" : "{ Stmt_loopB[i0] -> MemRef_B[i0] }"
				}
				],
				"domain" : "{ Stmt_loopB[i0] : i0 <= 100 and i0 >= 0 }",
				"name" : "Stmt_loopB",
				"schedule" : "{ Stmt_loopB[i0] -> [i0+1, 0] }"
				}
				]
				}

test/Isl/CodeGen/sdrtotu___%loopA---%exit.jscop

This file was added.

				{
				"context" : "{ : }",
				"name" : "loopA => exit",
				"statements" : [
				{
				"accesses" : [
				{
				"kind" : "read",
				"relation" : "{ Stmt_loopA[i0] -> MemRef_A[i0] }"
				},
				{
				"kind" : "write",
				"relation" : "{ Stmt_loopA[i0] -> MemRef_val[] }"
				}
				],
				"domain" : "{ Stmt_loopA[i0] : i0 <= 100 and i0 >= 0 }",
				"name" : "Stmt_loopA",
				"schedule" : "{ Stmt_loopA[i0] -> [i0, 1] }"
				},
				{
				"accesses" : [
				{
				"kind" : "read",
				"relation" : "{ Stmt_loopC[i0] -> MemRef_val[] }"
				},
				{
				"kind" : "write",
				"relation" : "{ Stmt_loopC[i0] -> MemRef_C[i0] }"
				}
				],
				"domain" : "{ Stmt_loopC[i0] : i0 <= 100 and i0 >= 0 }",
				"name" : "Stmt_loopC",
				"schedule" : "{ Stmt_loopC[i0] -> [i0, 2] }"
				},
				{
				"accesses" : [
				{
				"kind" : "read",
				"relation" : "{ Stmt_loopB[i0] -> MemRef_val[] }"
				},
				{
				"kind" : "write",
				"relation" : "{ Stmt_loopB[i0] -> MemRef_B[i0] }"
				}
				],
				"domain" : "{ Stmt_loopB[i0] : i0 <= 100 and i0 >= 0 }",
				"name" : "Stmt_loopB",
				"schedule" : "{ Stmt_loopB[i0] -> [i0+1, 0] }"
				}
				]
				}

test/Isl/CodeGen/simple_vec_call.ll

	Show All 22 Lines
	return:			return:
	ret void			ret void
	}			}

	; CHECK: [[RES1:%[a-zA-Z0-9_]+]] = tail call float @foo(float %.load) [[NUW:#[0-9]+]]			; CHECK: [[RES1:%[a-zA-Z0-9_]+]] = tail call float @foo(float %.load) [[NUW:#[0-9]+]]
	; CHECK: [[RES2:%[a-zA-Z0-9_]+]] = tail call float @foo(float %.load) [[NUW]]			; CHECK: [[RES2:%[a-zA-Z0-9_]+]] = tail call float @foo(float %.load) [[NUW]]
	; CHECK: [[RES3:%[a-zA-Z0-9_]+]] = tail call float @foo(float %.load) [[NUW]]			; CHECK: [[RES3:%[a-zA-Z0-9_]+]] = tail call float @foo(float %.load) [[NUW]]
	; CHECK: [[RES4:%[a-zA-Z0-9_]+]] = tail call float @foo(float %.load) [[NUW]]			; CHECK: [[RES4:%[a-zA-Z0-9_]+]] = tail call float @foo(float %.load) [[NUW]]
	; CHECK: %4 = insertelement <4 x float> undef, float [[RES1]], i32 0			; CHECK: %0 = insertelement <4 x float> undef, float [[RES1]], i32 0
	; CHECK: %5 = insertelement <4 x float> %4, float [[RES2]], i32 1			; CHECK: %1 = insertelement <4 x float> %0, float [[RES2]], i32 1
	; CHECK: %6 = insertelement <4 x float> %5, float [[RES3]], i32 2			; CHECK: %2 = insertelement <4 x float> %1, float [[RES3]], i32 2
	; CHECK: %7 = insertelement <4 x float> %6, float [[RES4]], i32 3			; CHECK: %3 = insertelement <4 x float> %2, float [[RES4]], i32 3
	; CHECK: store <4 x float> %7			; CHECK: store <4 x float> %3
				grosserUnsubmitted Not Done Reply Inline Actions This is a trivial change, but mostly unrelated to the proposed patch. I would prefer to introduce REGEXP matches ahead of time to keep the review and patch focused. I did this in r267875 so this change won't show up in a possible future version of this patch. grosser: This is a trivial change, but mostly unrelated to the proposed patch. I would prefer to…
	; CHECK: attributes [[NUW]] = { nounwind }			; CHECK: attributes [[NUW]] = { nounwind }

test/Isl/CodeGen/simple_vec_stride_one.ll

	; RUN: opt %loadPolly -polly-codegen -polly-vectorizer=polly \			; RUN: opt %loadPolly -polly-codegen -polly-vectorizer=polly \
	; RUN: < %s -S \| FileCheck %s			; RUN: < %s -S \| FileCheck %s

	; CHECK: store <4 x double> %val.s2a_p_splat, <4 x double>* %vector_ptr			; CHECK: polly.stmt.loop3:
				; CHECK: store double %val_p_scalar_, double* %scevgep8, align 8, !alias.scope !0, !noalias !2
				; CHECK: store double %val_p_scalar_, double* %scevgep9, align 8, !alias.scope !0, !noalias !2
				; CHECK: store double %val_p_scalar_, double* %scevgep10, align 8, !alias.scope !0, !noalias !2
				; CHECK: store double %val_p_scalar_, double* %scevgep11, align 8, !alias.scope !0, !noalias !2
				grosserUnsubmitted Not Done Reply Inline Actions Instead of a vector store, we now generate four scalar stores. This is a regression that seems to be introduced accidentally in this patch. grosser: Instead of a vector store, we now generate four scalar stores. This is a regression that seems…

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define void @update_access_functions(i64 %arg, double* %A, double* %B) {			define void @update_access_functions(i64 %arg, double* %A, double* %B) {
	bb3:			bb3:
	br label %loop1			br label %loop1

	loop1:			loop1:
	%indvar = phi i64 [ %indvar.next, %loop1 ], [ 0, %bb3 ]			%indvar = phi i64 [ %indvar.next, %loop1 ], [ 0, %bb3 ]
	Show All 25 Lines

test/Isl/CodeGen/srem-in-other-bb.ll

	; RUN: opt %loadPolly -polly-codegen -S \			; RUN: opt %loadPolly -polly-codegen -S \
	; RUN: < %s \| FileCheck %s			; RUN: < %s \| FileCheck %s
	;			;
	; void pos(float *A, long n) {			; void pos(float *A, long n) {
	; for (long i = 0; i < 100; i++)			; for (long i = 0; i < 100; i++)
	; A[n % 42] += 1;			; A[n % 42] += 1;
	; }			; }
	;			;
	; CHECK: polly.stmt.bb2:			; CHECK: polly.stmt.bb2:
	; CHECK-NEXT: %p_tmp = srem i64 %n, 42			; CHECK-NEXT: %p_tmp = srem i64 %n, 42
	; CHECK-NEXT: store i64 %p_tmp, i64* %tmp.s2a
	;			;
	; CHECK: polly.stmt.bb3:			; CHECK: polly.stmt.bb3:
	; CHECK: %tmp.s2a.reload = load i64, i64* %tmp.s2a			; CHECK: %p_tmp3 = getelementptr inbounds float, float* %A, i64 %p_tmp
				grosserUnsubmitted Not Done Reply Inline Actions Trivial, but nice! grosser: Trivial, but nice!
	; CHECK: %p_tmp3 = getelementptr inbounds float, float* %A, i64 %tmp.s2a.reload

	define void @pos(float* %A, i64 %n) {			define void @pos(float* %A, i64 %n) {
	bb:			bb:
	br label %bb1			br label %bb1

	bb1: ; preds = %bb6, %bb			bb1: ; preds = %bb6, %bb
	%i.0 = phi i64 [ 0, %bb ], [ %tmp7, %bb6 ]			%i.0 = phi i64 [ 0, %bb ], [ %tmp7, %bb6 ]
	%exitcond = icmp ne i64 %i.0, 100			%exitcond = icmp ne i64 %i.0, 100
	Show All 20 Lines

test/Isl/CodeGen/uninitialized_scalar_memory.ll

	; RUN: opt %loadPolly -S -polly-codegen < %s \| FileCheck %s			; RUN: opt %loadPolly -S -polly-codegen < %s \| FileCheck %s
	;			;
	; Verify we initialize the scalar locations reserved for the incoming phi			; CHECK-NOT: alloca
	; values.			;
				; CHECK: polly.merge_new_and_old:
				; CHECK-DAG: %ebig.1.merge = phi float [ %tmp4_p_scalar_.merge, %polly.exiting ], [ %ebig.1, %for.inc ]
				; CHECK-DAG: %indvars.iv.next.merge = phi i64 [ %p_indvars.iv.next, %polly.exiting ], [ %indvars.iv.next, %for.inc ]
				; CHECK-DAG: %iebig.1.merge = phi i32 [ %p_conv8.merge, %polly.exiting ], [ %iebig.1, %for.inc ]
				; CHECK: br label %for.cond
				;
				; CHECK: polly.stmt.if.end.9.exit:
				; CHECK-DAG: %tmp4_p_scalar_.merge = phi float [ %tmp4_p_scalar_, %polly.stmt.if.then.2 ], [ %ebig.0, %polly.stmt.if.end ]
				; CHECK-DAG: %p_conv8.merge = phi i32 [ %p_conv8, %polly.stmt.if.then.2 ], [ %iebig.0, %polly.stmt.if.end ]
				; CHECK: br label %polly.stmt.if.end.9
	;			;
	; CHECK: polly.start:
	; CHECK-NEXT: store float %ebig.0, float* %ebig.0.s2a
	; CHECK-NEXT: store i32 %iebig.0, i32* %iebig.0.s2a
	; CHECK-NEXT: br label %polly.stmt.if.end.entry
	;			;
	; int g(void);			; int g(void);
	; float M;			; float M;
	; int max(float restrict xbig, int eres, int bres, float restrict indx) {			; int max(float restrict xbig, int eres, int bres, float restrict indx) {
	; int i, iebig;			; int i, iebig;
	; float ebig;			; float ebig;
	; for (i = 0; i < 4 + eres; i++) {			; for (i = 0; i < 4 + eres; i++) {
	; if (g())			; if (g())
	▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

test/ScopInfo/invariant_load_access_classes_different_base_type.ll

	Show All 14 Lines
	; CHECK: ReadAccess := [Reduction Type: NONE] [Scalar: 0]			; CHECK: ReadAccess := [Reduction Type: NONE] [Scalar: 0]
	; CHECK: { Stmt_for_body[i0] -> MemRef_S[0] };			; CHECK: { Stmt_for_body[i0] -> MemRef_S[0] };
	; CHECK: Execution Context: { : }			; CHECK: Execution Context: { : }
	; CHECK: ReadAccess := [Reduction Type: NONE] [Scalar: 0]			; CHECK: ReadAccess := [Reduction Type: NONE] [Scalar: 0]
	; CHECK: { Stmt_for_body[i0] -> MemRef_S[1] };			; CHECK: { Stmt_for_body[i0] -> MemRef_S[1] };
	; CHECK: Execution Context: { : }			; CHECK: Execution Context: { : }
	; CHECK: }			; CHECK: }
	;			;
	; CODEGEN: %S.b.preload.s2a = alloca float			; CODEGEN-NOT: alloca
	; CODEGEN: %S.a.preload.s2a = alloca i32
	;			;
	; CODEGEN: %.load = load i32, i32* getelementptr inbounds (%struct.anon, %struct.anon* @S, i32 0, i32 0)			; CODEGEN: %.load = load i32, i32* getelementptr inbounds (%struct.anon, %struct.anon* @S, i32 0, i32 0)
	; CODEGEN: store i32 %.load, i32* %S.a.preload.s2a
	; CODEGEN: %.load12 = load float, float* bitcast (i32* getelementptr (i32, i32* getelementptr inbounds (%struct.anon, %struct.anon* @S, i32 0, i32 0), i64 1) to float*)			; CODEGEN: %.load12 = load float, float* bitcast (i32* getelementptr (i32, i32* getelementptr inbounds (%struct.anon, %struct.anon* @S, i32 0, i32 0), i64 1) to float*)
	; CODEGEN: store float %.load12, float* %S.b.preload.s2a
	;			;
	; CODEGEN: polly.stmt.for.body:			; CODEGEN: polly.stmt.for.body:
	; CODEGEN: %p_conv = sitofp i32 %.load to float			; CODEGEN: %p_conv = sitofp i32 %.load to float
	; CODEGEN: %p_add = fadd float %p_conv, %.load12			; CODEGEN: %p_add = fadd float %p_conv, %.load12
	; CODEGEN: %p_conv1 = fptosi float %p_add to i32			; CODEGEN: %p_conv1 = fptosi float %p_add to i32

	%struct.anon = type { i32, float }			%struct.anon = type { i32, float }

	Show All 28 Lines

test/ScopInfo/invariant_load_access_classes_different_base_type_escaping.ll

	Show All 31 Lines
	; CHECK-NEXT: Domain :=			; CHECK-NEXT: Domain :=
	; CHECK-NEXT: { Stmt_do_body[i0] : 0 <= i0 <= 1000 };			; CHECK-NEXT: { Stmt_do_body[i0] : 0 <= i0 <= 1000 };
	; CHECK-NEXT: Schedule :=			; CHECK-NEXT: Schedule :=
	; CHECK-NEXT: { Stmt_do_body[i0] -> [i0] };			; CHECK-NEXT: { Stmt_do_body[i0] -> [i0] };
	; CHECK-NEXT: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]			; CHECK-NEXT: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]
	; CHECK-NEXT: { Stmt_do_body[i0] -> MemRef_A[i0] };			; CHECK-NEXT: { Stmt_do_body[i0] -> MemRef_A[i0] };
	; CHECK-NEXT: }			; CHECK-NEXT: }
	;			;
	; CODEGEN: entry:
	; CODEGEN: %S.b.preload.s2a = alloca float
	; CODEGEN: %S.a.preload.s2a = alloca i32
	;			;
	; CODEGEN: polly.preload.begin:			; CODEGEN: polly.preload.begin:
	; CODEGEN: %.load = load i32, i32* getelementptr inbounds (%struct.anon, %struct.anon* @S, i32 0, i32 0)			; CODEGEN: %.load = load i32, i32* getelementptr inbounds (%struct.anon, %struct.anon* @S, i32 0, i32 0)
	; CODEGEN: store i32 %.load, i32* %S.a.preload.s2a
	; CODEGEN: %.load12 = load float, float* bitcast (i32* getelementptr (i32, i32* getelementptr inbounds (%struct.anon, %struct.anon* @S, i32 0, i32 0), i64 1) to float*)			; CODEGEN: %.load12 = load float, float* bitcast (i32* getelementptr (i32, i32* getelementptr inbounds (%struct.anon, %struct.anon* @S, i32 0, i32 0), i64 1) to float*)
	; CODEGEN: store float %.load12, float* %S.b.preload.s2a
	;			;
	; CODEGEN: polly.merge_new_and_old:			; CODEGEN: polly.merge_new_and_old:
	; CODEGEN-DAG: %S.b.merge = phi float [ %S.b.final_reload, %polly.exiting ], [ %S.b, %do.cond ]			; CODEGEN-DAG: %S.b.merge = phi float [ %.load12, %polly.exiting ], [ %S.b, %do.cond ]
	; CODEGEN-DAG: %S.a.merge = phi i32 [ %S.a.final_reload, %polly.exiting ], [ %S.a, %do.cond ]			; CODEGEN-DAG: %S.a.merge = phi i32 [ %.load, %polly.exiting ], [ %S.a, %do.cond ]
	;			;
	; CODEGEN: do.end:			; CODEGEN: do.end:
	; CODEGEN: %conv3 = sitofp i32 %S.a.merge to float			; CODEGEN: %conv3 = sitofp i32 %S.a.merge to float
	; CODEGEN: %add4 = fadd float %conv3, %S.b.merge			; CODEGEN: %add4 = fadd float %conv3, %S.b.merge
	; CODEGEN: ret float %add4			; CODEGEN: ret float %add4
	;			;
	; CODEGEN: polly.loop_exit:			; CODEGEN: polly.loop_exit:
	; CODEGEN-DAG: %S.b.final_reload = load float, float* %S.b.preload.s2a			; CODEGEN-NEXT: br label
	; CODEGEN-DAG: %S.a.final_reload = load i32, i32* %S.a.preload.s2a

	;			;
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	%struct.anon = type { i32, float }			%struct.anon = type { i32, float }

	@S = common global %struct.anon zeroinitializer, align 4			@S = common global %struct.anon zeroinitializer, align 4

	define float @f(i32* %A) {			define float @f(i32* %A) {
	Show All 24 Lines

test/ScopInfo/invariant_load_access_classes_different_base_type_same_pointer.ll

	Show All 17 Lines
	; CHECK-NEXT: Domain :=			; CHECK-NEXT: Domain :=
	; CHECK-NEXT: { Stmt_for_body[i0] : 0 <= i0 <= 999 };			; CHECK-NEXT: { Stmt_for_body[i0] : 0 <= i0 <= 999 };
	; CHECK-NEXT: Schedule :=			; CHECK-NEXT: Schedule :=
	; CHECK-NEXT: { Stmt_for_body[i0] -> [i0] };			; CHECK-NEXT: { Stmt_for_body[i0] -> [i0] };
	; CHECK-NEXT: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]			; CHECK-NEXT: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]
	; CHECK-NEXT: { Stmt_for_body[i0] -> MemRef_A[i0] };			; CHECK-NEXT: { Stmt_for_body[i0] -> MemRef_A[i0] };
	; CHECK-NEXT: }			; CHECK-NEXT: }
	;			;
	; CODEGEN: entry:			; CODEGEN: entry:
	; CODEGEN: %U.f.preload.s2a = alloca float			; CODEGEN-NOT: alloca
	; CODEGEN: br label %polly.split_new_and_old			; CODEGEN: br label %polly.split_new_and_old
	;			;
	; CODEGEN: polly.preload.begin:			; CODEGEN: polly.preload.begin:
	; CODEGEN: %U.load1 = load float, float* bitcast (i32* @U to float*)			; CODEGEN: %U.load1 = load float, float* bitcast (i32* @U to float*)
	; TODO FIXME There should not be a bitcast but either a real conversion or			; TODO FIXME There should not be a bitcast but either a real conversion or
	; another load as one type is FP the other is not.			; another load as one type is FP the other is not.
	; CODEGEN: %0 = bitcast float %U.load1 to i32			; CODEGEN: %0 = bitcast float %U.load1 to i32
	; CODEGEN: store float %U.load1, float* %U.f.preload.s2a
	;			;
	; CODEGEN: polly.merge_new_and_old:			; CODEGEN: polly.merge_new_and_old:
	; CODEGEN-NOT: merge = phi			; CODEGEN-NOT: merge = phi
	;			;
	; CODEGEN: polly.loop_exit:			; CODEGEN: polly.loop_exit:
	; CODEGEN-NOT: final_reload			; CODEGEN-NOT: final_reload
	;			;
	; CODEGEN: polly.stmt.for.body:			; CODEGEN: polly.stmt.for.body:
	Show All 33 Lines

test/ScopInfo/invariant_load_access_classes_different_base_type_same_pointer_escaping.ll

	Show All 23 Lines
	; CHECK-NEXT: { Stmt_do_body[i0] : 0 <= i0 <= 100 };			; CHECK-NEXT: { Stmt_do_body[i0] : 0 <= i0 <= 100 };
	; CHECK-NEXT: Schedule :=			; CHECK-NEXT: Schedule :=
	; CHECK-NEXT: { Stmt_do_body[i0] -> [i0] };			; CHECK-NEXT: { Stmt_do_body[i0] -> [i0] };
	; CHECK-NEXT: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]			; CHECK-NEXT: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]
	; CHECK-NEXT: { Stmt_do_body[i0] -> MemRef_A[i0] };			; CHECK-NEXT: { Stmt_do_body[i0] -> MemRef_A[i0] };
	; CHECK-NEXT: }			; CHECK-NEXT: }
	;			;
	; CODEGEN: entry:			; CODEGEN: entry:
	; CODEGEN: %U.f.preload.s2a = alloca float
	; CODEGEN: br label %polly.split_new_and_old			; CODEGEN: br label %polly.split_new_and_old
	;			;
	; CODEGEN: polly.preload.begin:			; CODEGEN: polly.preload.begin:
	; CODEGEN: %U.load1 = load float, float* bitcast (i32* @U to float*)			; CODEGEN: %U.load1 = load float, float* bitcast (i32* @U to float*)
	; CODEGEN: %0 = bitcast float %U.load1 to i32			; CODEGEN: %0 = bitcast float %U.load1 to i32
	; CODEGEN: store float %U.load1, float* %U.f.preload.s2a
	;			;
	; CODEGEN: polly.merge_new_and_old:			; CODEGEN: polly.merge_new_and_old:
	; CODEGEN-DAG: %U.f.merge = phi float [ %U.f.final_reload, %polly.exiting ], [ %U.f, %do.cond ]			; CODEGEN-DAG: %U.f.merge = phi float [ %U.load1, %polly.exiting ], [ %U.f, %do.cond ]
	; CODEGEN-DAG: %U.i.merge = phi i32 [ %5, %polly.exiting ], [ %U.i, %do.cond ]			; CODEGEN-DAG: %U.i.merge = phi i32 [ %0, %polly.exiting ], [ %U.i, %do.cond ]
	;			;
	; CODEGEN: polly.loop_exit:			; CODEGEN: polly.loop_exit:
	; CODEGEN-DAG: %U.f.final_reload = load float, float* %U.f.preload.s2a			; CODEGEN-NEXT: br label
	; CODEGEN-DAG: %U.i.final_reload = load float, float* %U.f.preload.s2a
	; CODEGEN-DAG: %5 = bitcast float %U.i.final_reload to i32
	;			;
	; CODEGEN: polly.stmt.do.body:			; CODEGEN: polly.stmt.do.body:
	; CODEGEN: %p_conv = fptosi float %U.load1 to i32			; CODEGEN: %p_conv = fptosi float %U.load1 to i32
	; CODEGEN: %p_add = add nsw i32 %0, %p_conv			; CODEGEN: %p_add = add nsw i32 %0, %p_conv
	;			;
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	@U = common global i32 0, align 4			@U = common global i32 0, align 4
	Show All 26 Lines

test/ScopInfo/invariant_load_zext_parameter.ll

	Show All 11 Lines
	; will generate valid code and replace it by the preloaded value, e.g., to evaluate			; will generate valid code and replace it by the preloaded value, e.g., to evaluate
	; the execution context of the invariant access to I1.			; the execution context of the invariant access to I1.
	;			;
	; CHECK: p0: (zext i32 %loadI0 to i64)			; CHECK: p0: (zext i32 %loadI0 to i64)
	;			;
	; CODEGEN: polly.preload.begin:			; CODEGEN: polly.preload.begin:
	; CODEGEN-NEXT: %polly.access.I0 = getelementptr i32, i32* %I0, i64 0			; CODEGEN-NEXT: %polly.access.I0 = getelementptr i32, i32* %I0, i64 0
	; CODEGEN-NEXT: %polly.access.I0.load = load i32, i32* %polly.access.I0			; CODEGEN-NEXT: %polly.access.I0.load = load i32, i32* %polly.access.I0
	; CODEGEN-NEXT: store i32 %polly.access.I0.load, i32* %loadI0.preload.s2a
	; CODEGEN-NEXT: %0 = zext i32 %polly.access.I0.load to i64			; CODEGEN-NEXT: %0 = zext i32 %polly.access.I0.load to i64
	; CODEGEN-NEXT: %1 = icmp eq i64 %0, 0			; CODEGEN-NEXT: %1 = icmp eq i64 %0, 0
	; CODEGEN-NEXT: br label %polly.preload.cond			; CODEGEN-NEXT: br label %polly.preload.cond
	;			;
	; CODEGEN: polly.preload.cond:			; CODEGEN: polly.preload.cond:
	; CODEGEN-NEXT: br i1 %1, label %polly.preload.exec, label %polly.preload.merge			; CODEGEN-NEXT: br i1 %1, label %polly.preload.exec, label %polly.preload.merge
	;			;
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	Show All 39 Lines

test/ScopInfo/out-of-scop-use-in-region-entry-phi-node-nonaffine-subregion.ll

Property	Old Value	New Value
File Mode	100755	100644

	; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s			; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s
				grosserUnsubmitted Not Done Reply Inline Actions The file mode change is unrelated and should be a separate commit. This has already been fixed in r266473. grosser: The file mode change is unrelated and should be a separate commit. This has already been fixed…
	;			;
	; Check whether %newval is identified as escaping value, even though it is used			; Check whether %newval is identified as escaping value, even though it is used
	; in a phi that is in the region. Non-affine subregion case.			; in a phi that is in the region. Non-affine subregion case.
	;			;
	; CHECK-LABEL: subregion_entry.region_entering:			; CHECK-LABEL: subregion_entry.region_entering:
	; CHECK: %loop_carried.ph = phi float [ %newval.merge, %backedge ], [ undef, %entry ]			; CHECK: %loop_carried.ph = phi float [ %newval.merge, %backedge ], [ undef, %entry ]
	;			;
	; CHECK-LABEL: polly.merge_new_and_old:			; CHECK-LABEL: polly.merge_new_and_old:
	; CHECK: %newval.merge = phi float [ %newval.final_reload, %polly.exiting ], [ %newval, %subregion_exit.region_exiting ]			; CHECK-DAG: %newval.merge = phi float [ %p_newval, %polly.exiting ], [ %newval, %subregion_exit.region_exiting ]
				; CHECK-DAG: %indvar.merge = phi i32 [ %[[r:[0-9a-zA-Z._]*]], %polly.exiting ], [ %indvar, %subregion_exit.region_exiting ]
				; CHECK-NEXT: br label %subregion_exit
	;			;
	; CHECK-LABEL: polly.start:			; CHECK-LABEL: polly.start:
	; CHECK: store float %loop_carried.ph, float* %loop_carried.phiops			; CHECK-NEXT: br label %polly.stmt.subregion_entry.entry
	;
	; CHECK-LABEL: polly.stmt.subregion_entry.entry:
	; CHECK: %loop_carried.phiops.reload = load float, float* %loop_carried.phiops
	;			;
	; CHECK-LABEL: polly.stmt.subregion_entry:			; CHECK-LABEL: polly.stmt.subregion_entry:
	; CHECK: %polly.loop_carried = phi float [ %loop_carried.phiops.reload2, %polly.stmt.subregion_entry.entry ]			; CHECK-NEXT: %p_newval = fadd float %loop_carried.ph, 1.000000e+00
	; CHECK: %p_newval = fadd float %polly.loop_carried, 1.000000e+00			; CHECK-NEXT: %p_cmp
				; CHECK-NEXT: %[[p:[0-9a-zA-Z._]]] = trunc i64 %[[q:[0-9a-zA-Z._]]] to i32
				; CHECK-NEXT: %[[r]] = add i32 %[[p]], 1
	;			;
	; CHECK-LABEL: polly.stmt.polly.merge_new_and_old.exit:
	; CHECK: %newval.final_reload = load float, float* %newval.s2a

	define void @func() {			define void @func() {
	entry:			entry:
	br label %subregion_entry			br label %subregion_entry

	subregion_entry:			subregion_entry:
	%loop_carried = phi float [ undef, %entry ], [ %newval, %backedge ]			%loop_carried = phi float [ undef, %entry ], [ %newval, %backedge ]
	%indvar = phi i32 [ 1, %entry ], [ %indvar_next, %backedge ]			%indvar = phi i32 [ 1, %entry ], [ %indvar_next, %backedge ]
	%newval = fadd float %loop_carried, 1.0			%newval = fadd float %loop_carried, 1.0
	Show All 22 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[WIP][Polly] SSA CodegenNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 45834

include/polly/CodeGen/BlockGenerators.h

include/polly/CodeGen/IslNodeBuilder.h

include/polly/Support/ScopHelper.h

lib/Analysis/ScopDetection.cpp

lib/CodeGen/BlockGenerators.cpp

lib/CodeGen/CodeGeneration.cpp

lib/CodeGen/IslNodeBuilder.cpp

lib/Support/RegisterPasses.cpp

lib/Support/ScopHelper.cpp

test/Isl/CodeGen/MemAccess/update_access_functions.ll

test/Isl/CodeGen/OpenMP/invariant_base_pointer_preloaded_different_bb.ll

test/Isl/CodeGen/OpenMP/single_loop_with_param.ll

test/Isl/CodeGen/entry_with_trivial_phi_other_bb.ll

test/Isl/CodeGen/invariant_load_escaping.ll

test/Isl/CodeGen/invariant_load_scalar_escape_alloca_sharing.ll

test/Isl/CodeGen/large-numbers-in-boundary-context.ll

test/Isl/CodeGen/non-affine-dominance-generated-entering.ll

test/Isl/CodeGen/non-affine-exit-node-dominance.ll

test/Isl/CodeGen/non-affine-phi-node-expansion-2.ll

test/Isl/CodeGen/non-affine-phi-node-expansion-3.ll

test/Isl/CodeGen/non-affine-phi-node-expansion-4.ll

test/Isl/CodeGen/non-affine-region-exit-phi-incoming-synthesize.ll

test/Isl/CodeGen/non-affine-region-implicit-store.ll

test/Isl/CodeGen/non-affine-synthesized-in-branch.ll

test/Isl/CodeGen/non_affine_float_compare.ll

test/Isl/CodeGen/out-of-scop-phi-node-use.ll

test/Isl/CodeGen/phi-defined-before-scop.ll

test/Isl/CodeGen/phi-in-non-affine-subregion-entry.ll

test/Isl/CodeGen/phi_condition_modeling_1.ll

test/Isl/CodeGen/phi_condition_modeling_2.ll

test/Isl/CodeGen/phi_conditional_simple_1.ll

test/Isl/CodeGen/phi_in_exit_early_lnt_failure_2.ll

test/Isl/CodeGen/phi_in_exit_early_lnt_failure_3.ll

test/Isl/CodeGen/phi_in_exit_early_lnt_failure_5.ll

test/Isl/CodeGen/phi_loop_carried_float.ll

test/Isl/CodeGen/phi_loop_carried_float_2.ll

test/Isl/CodeGen/phi_loop_carried_float_3.ll

test/Isl/CodeGen/phi_loop_carried_float_4.ll

test/Isl/CodeGen/phi_loop_carried_float_5.ll

test/Isl/CodeGen/phi_loop_carried_float_6.ll

test/Isl/CodeGen/phi_loop_carried_float_escape.ll

test/Isl/CodeGen/phi_scalar_simple_1.ll

test/Isl/CodeGen/phi_scalar_simple_2.ll

test/Isl/CodeGen/phi_with_multi_exiting_edges_2.ll

test/Isl/CodeGen/phi_with_one_exit_edge.ll

test/Isl/CodeGen/pr25241.ll

test/Isl/CodeGen/read-only-scalars.ll

test/Isl/CodeGen/scalar-dependence-reverse-text-order-two-uses.ll

test/Isl/CodeGen/scalar-dependence-reverse-text-order.ll

test/Isl/CodeGen/scalar-store-from-same-bb.ll

test/Isl/CodeGen/sdrto___%loopA---%exit.jscop

test/Isl/CodeGen/sdrtotu___%loopA---%exit.jscop

test/Isl/CodeGen/simple_vec_call.ll

test/Isl/CodeGen/simple_vec_stride_one.ll

test/Isl/CodeGen/srem-in-other-bb.ll

test/Isl/CodeGen/uninitialized_scalar_memory.ll

test/ScopInfo/invariant_load_access_classes_different_base_type.ll

test/ScopInfo/invariant_load_access_classes_different_base_type_escaping.ll

test/ScopInfo/invariant_load_access_classes_different_base_type_same_pointer.ll

test/ScopInfo/invariant_load_access_classes_different_base_type_same_pointer_escaping.ll

test/ScopInfo/invariant_load_zext_parameter.ll

test/ScopInfo/out-of-scop-use-in-region-entry-phi-node-nonaffine-subregion.ll

[WIP][Polly] SSA Codegen
Needs ReviewPublic