This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
polly/trunk/
-
trunk/
-
include/polly/
-
polly/
-
CodeGen/
-
BlockGenerators.h
-
ScopInfo.h
-
lib/
-
Analysis/
-
ScopInfo.cpp
-
CodeGen/
-
BlockGenerators.cpp
-
test/Isl/CodeGen/
-
Isl/
-
CodeGen/
-
non-affine-phi-node-expansion-2.ll
-
non-affine-phi-node-expansion-3.ll
-
non-affine-phi-node-expansion-4.ll
-
phi_loop_carried_float.ll
-
phi_loop_carried_float_escape.ll
-
read-only-scalars.ll

Differential D13487

[Polly] Load/Store scalar accesses before/after the statement itself
ClosedPublic

Authored by Meinersbur on Oct 6 2015, 4:40 PM.

Download Raw Diff

Details

Reviewers

grosser
jdoerfert

Commits

rG225f0d1ee2d3: Load/Store scalar accesses before/after the statement itself
rPLO250625: Load/Store scalar accesses before/after the statement itself
rL250625: Load/Store scalar accesses before/after the statement itself

Summary

Instead of generating implicit loads within basic blocks, put them before the instructions of the statment itself, including non-affine subregions. The region's entry node is dominating all blocks in the region and therefore the loaded value will be available there.

Implicit writes in block-stmts were already stored back at the end of the block. Now, also generate the stores of non-affine subregions when leaving the statement, i.e. in the exiting block.

This change is required for array-mapped implicits ("De-LICM") to ensure that there are no dependencies of demoted scalars within statments. Statement load all required values, operator on copied in registers, and then write back the changed value to the demoted memory. Lifetimes analysis within statements becomes unecessary.

Diff Detail

Repository: rL LLVM

Event Timeline

Meinersbur updated this revision to Diff 36675.Oct 6 2015, 4:40 PM

Meinersbur retitled this revision from to [Polly] Load/Store scalar accesses before/after the statement itself.

Meinersbur updated this object.

Meinersbur added reviewers: jdoerfert, grosser.

Meinersbur added a project: Restricted Project.

Meinersbur added subscribers: llvm-commits, pollydev.

jdoerfert added inline comments.Oct 6 2015, 11:28 PM

lib/CodeGen/BlockGenerators.cpp
1176 ↗	(On Diff #36675)	This part and the one below seem tricky and I have to understand the change better to get why/what happens here. The idea of a single value map seems reasonable though. Are these two changes coupled?

Meinersbur added inline comments.Oct 7 2015, 1:34 PM

lib/CodeGen/BlockGenerators.cpp
1176 ↗	(On Diff #36675)	This part and the one below seem tricky and I have to understand the change better to get why/what happens here. These conditions should become unecessary when I do the work to remove redunant accesses earlier in the Polly pipeline. Until then, this is required to keep the unit tests happy. The idea of a single value map seems reasonable though. Are these two changes coupled? Yes. Before it would be possible scalar loads are created in a BB, and then a BB processed later would look it up and use it although the two BBs might be unrelated in the dominator tree. (Practically I think the generateScalarLoads would just overwrite the older entry in ValueMap, but I am not brave enough to try it). After this patch all scalar loads are inserted into the EntryBB, which is guarantee to dominate everything in the subregion.

As discussed yesterday, the direction of this patch is good. I leave you two finish the patch review.

There are 3 major points to this patch:

The code movement between the copyBB methods seem not necessary and should be avoided to simplify the insertion point requirements needed.
The dominance check (as well as the isPHI check before that) are complicated and I still do not follow what/how they are different to what was happening before. I think they somehow simplify the generated code but that only hides the underlying issue
The general idea of one map and the early reload of scalars seems reasonable and looks good except what is mentioned in point 1) and 2).

lib/CodeGen/BlockGenerators.cpp
341 ↗	(On Diff #36675)	If we keep the SetInsertPoint and the generateScalarXXX calls in the other copyBB function we don't need to set the insert point manually outside (noted somewhere below). If there is a reason you moved it here please explain.
1035 ↗	(On Diff #36675)	BlockMap seems to be obsolet if we store the mapping in the ValueMap too. If so we could remove it now or later.
1038 ↗	(On Diff #36675)	As mentioned above, this complicates the copyBB logic and could be hidden.
1195 ↗	(On Diff #36675)	As mentioned in the phone call, we should never need to check dominance here as we didn't do it before. If this patch somehow requires that, it might not be the best way to do it. Can you illustrate why this is needed (with an example) and why it is beneficial not to store the scalars as before? We simply generated scalar stores in the blocks they were defined and did not need to reason about insertion points and dominance at all.
test/Isl/CodeGen/non-affine-phi-node-expansion-3.ll
20 ↗	(On Diff #36675)	While the result looks nice I think the way we achieved is not what we want. We should not generate the scalar write accesses in the SCoP if they are not needed but now we apparently filter them in the code generation.

One LNT test was failing (consumer-lame). In some cases the SCEV expander can generate code into the current position. This yields invalid code if a neighboring block which is not dominated by the former block tries to expand the same SCEV.

The update undoes the use single ValueMap for entire subregion patch, therefore adds another map that contains all values that are potentially visible in other ScopStmts. Redundant maps to be cleaned up in a separate patch.

I would have liked a handwritten testcase , but I could not easily determine under which conditions SCEV expander does this.

In D13487#264224, @jdoerfert wrote:

The code movement between the copyBB methods seem not necessary and should be avoided to simplify the insertion point requirements needed.

Undone in update

The dominance check (as well as the isPHI check before that) are complicated and I still do not follow what/how they are different to what was happening before. I think they somehow simplify the generated code but that only hides the underlying issue

The dominance check will go away in a separate patch that depends on this one.

lib/CodeGen/BlockGenerators.cpp
341 ↗	(On Diff #36675)	SetInsertPoint needs to be set before generateScalarLoads. RegionGenerator does not call generateScalarLoads before each copyBB. If moving Builder.SetInsertPoint(CopyBB->begin()) into copyBB, this overload would still need to call it before generateScalarStores.
1035 ↗	(On Diff #36675)	I aggree, should be a separate change.
1195 ↗	(On Diff #36675)	As mentioned in the phone call, we should never need to check dominance here as we didn't do it before. If this patch somehow requires that, it might not be the best way to do it. As mentioned in the phone call and a message to the mailing list, the current code does generate such MemoryAccesses. Some later planned change will hopefully make this check go away. Can you illustrate why this is needed (with an example) and why it is beneficial not to store the scalars as before? We simply generated scalar stores in the blocks they were defined and did not need to reason about insertion points and dominance at all. Explained in summary message. It is not beneficial by itself, but required for later changes.
test/Isl/CodeGen/non-affine-phi-node-expansion-3.ll
20 ↗	(On Diff #36675)	This is a side-effect because val0 and val1 are dominating the non-affine region's exit, i.e. it is not possible to have such a store in the exit block. A planned change will clean up the logic that generates MemoryAccesses. I didn't not understand why check for instructions that shouldn't even be there, so I just removed the check.

Meinersbur added a child revision: D13676: [Polly] Do not store scalar accesses in InstructionToAccess.Oct 13 2015, 8:20 AM

Hi Michael,

to (hopefully) resolve some open misunderstandings and to get this patch in quickly, I looked myself through these changes. As I said earlier, the general direction seems clear, but when trying to understand your change in detail (and looking at Johannes questions) some implementation details remained unclear to me. Sorry for bring these up so late, but answering them would really help my understanding.

Best,
Tobias

lib/CodeGen/BlockGenerators.cpp
341 ↗	(On Diff #37133)	@johannes: I think the point is that for region statements, Michael wants to generate scalar stores not before and after each basic block, but only once at the beginning and at the end of the entire ScopStmt. To my understanding "keep[ing] the SetInsertPoint and the generateScalarXXX in the other copyBB function" would allow the former, but not the latter. This is why Michael has difficulties to address your comment. I have been thinking about this myself, but I do not see how to implement this better besides documenting that copyBB requires generateScalarLoads/Stores to be run explicitly before/after copyBB is called for all BBs in a ScopStmt.
438 ↗	(On Diff #37133)	Nice simplification!
1048 ↗	(On Diff #37133)	@johannes: I have no idea how this could be hidden assuming we want indeed "scalar copies only once at the beginning and at the end of the entire ScopStmt" as described in an earlier comment. Johannes, did you write this assuming "scalar copies for each basic block" or have you a solution in mind that works also for "scalar copies only once"?
1150 ↗	(On Diff #37133)	It is not yet clear why we need a special implementation here. If I replace this code with just a call to BlockGenerator::generateScalarLoads(Stmt, BBMap)" all tests pass. You comment here: Only generate loads for PHIs in the entry block. Intra-ScopStmt PHIs are copied directly. but this comment does not seem to be related to the if-condition that follows, as inter-ScopStmt PHIs are not even modeled and do consequently not need to be skipped. Did you assume we model Intra-ScopStmt PHI nodes or was there another reason for including this code that I missed?
1210 ↗	(On Diff #37133)	As mentioned in the phone call and a message to the mailing list, the current code does generate such MemoryAccesses. Some later planned change will hopefully make this check go away. Michael, without giving an example this is hard to understand. I commented out the code and tried to generate all scalar writes in the exit block, but this failed as you promised. I give now an example: bb1: br non-affine bb2, bb3 bb2: bb3: a = b = br exit br exit exit: val = phi [a, %bb2], [b, %bb3] For this example we create the RegionStmt bb1->exit which has two PHI writes. It is important that these writes happen in bb2 and bb3. Neither the values written may dominate the exit block nor can we choose which value to write without knowing if the control flow passed through bb2 or bb3. Hence for PHI-node writes we actually need to create them in the basic block statement they are created in (this will always be an exiting statement of the Region). Am I correct this is the reason for these PHI-node special cases? I tried to simplify the conditions above and came up with the following code which still passes all tests: for (MemoryAccess MA : Stmt) { if (MA->isExplicit() \|\| MA->isRead()) continue; Instruction ScalarInst = MA->getAccessInstruction(); Value Val = MA->getAccessValue(); // In case we add the store into an exiting block, we need to restore the // position for stores in the exit node. auto SavedInsertionPoint = Builder.GetInsertPoint(); BasicBlock ExitingBB = ScalarInst->getParent(); BasicBlock ExitingBBCopy = BlockMap[ExitingBB]; Builder.SetInsertPoint(ExitingBBCopy->getTerminator()); auto Address = getOrCreateAlloca(MA); Val = getNewScalarValue(Val, R, Stmt, LTS, BBMap); Builder.CreateStore(Val, Address); // Restore the insertion point if necessary. Builder.SetInsertPoint(SavedInsertionPoint); } Did I miss something important or are these simplifications correct? Also, is it correct that we code-generate all PHI nodes at their normal statement locations? Hence, the copy-out code for PHI-nodes does not introduce any functional change?
test/Isl/CodeGen/non-affine-phi-node-expansion-3.ll
20 ↗	(On Diff #37133)	This is a side-effect because val0 and val1 are dominating the non-affine region's exit, i.e. it is not possible to have such a store in the exit block. A planned change will clean up the logic that generates MemoryAccesses. I have troubles following this reasoning. If this code was correct before, why would it now be incorrect? Would it break the no-scalar-in-statement dependency property you want to establish? I didn't not understand why check for instructions that shouldn't even be there, so I just removed the check. Which check are you talking about? I could not find a check in Polly's source code that was dropped. Independent of the comments above, your test seems to be unnecessarily weak. AKA, it would also pass if the original output is generated. I think if the new output is indeed preferable, we should check that it is really generated.
test/Isl/CodeGen/scev_expansion_in_nonaffine.ll
18 ↗	(On Diff #37133)	This test case currently fails for me due to the instruction number having changed from %64 to %39. Using FileCheck Variables [1], this dependence on the instruction number can be removed. ; CHECK-LABEL: polly.stmt.if.then.110: ; CHECK: [[REGISTER:%[0-9]+]] = mul i64 %polly.indvar{{.}}, 30 ; CHECK: [[GEP:%scevgep[0-9]+]] = getelementptr i32, i32 %scevgep{{.}}, i64 [[REGISTER]] ; CHECK: store i32 0, i32 [[GEP]] ; CHECK-LABEL: polly.stmt.if.else: ; CHECK: [[REGISTER:%[0-9]+]] = mul i64 %polly.indvar{{.}}, 30 ; CHECK: [[GEP:%scevgep[0-9]+]] = getelementptr i32, i32 %scevge{{.}}, i64 [[REGISTER:%[0-9]+]] ; CHECK: store i32 21, i32 [[GEP]] Also, I am not fully sure what you check here. The stores you are CHECKing for seem to be due to array stores. Stmt_for_body_101TOfor_inc_114 does not have any scalar stores modeled in -polly-scops. In your comments you mention dominance, so maybe this is related to the condition "!DT.dominates(cast<Instruction>(Val)->getParent(), StmtExitBB))", but as this condition is only checked for scalar writes this does not seem to be the case. I also tried if your patch affects this test, but in my experiments it passes with and without the remaining patch applied. Could you help me to understand how this test case is affected by your change? [1] http://llvm.org/docs/CommandGuide/FileCheck.html#filecheck-variables

Meinersbur added inline comments.Oct 13 2015, 10:58 AM

lib/CodeGen/BlockGenerators.cpp
1150 ↗	(On Diff #37133)	Considering that I encountered intra-scopstmt writes I assumed that there will also be intra-scop reads. If this assumption is wrong we could indeed fall through BlockGenerator::generateScalarLoads directly.
1210 ↗	(On Diff #37133)	There can be (unecessary) writes to a.s2a and b.s2a which are not necessarily in the exit(ing) block, i.e. somewhere in the middle. This violated the intention I had to not write to (not yet existing) MAPPED scalars in the middle of ScopStmt, making intra-ScopStmt lifetime analysis necessary. Stores in the BBs appear in the reverse order. Not a problem from my POV though. I don't know whether it is possible that there already is code in the ExitingBBCopy when it has been split up to insert the non-affine region. If yes, building code at the terminator might be after the instruction that have been before. I wouldn't think too much about this code, I will make it go away in the patch I am working on at the moment.
test/Isl/CodeGen/non-affine-phi-node-expansion-3.ll
20 ↗	(On Diff #37133)	I have troubles following this reasoning. If this code was correct before, why would it now be incorrect? Would it break the no-scalar-in-statement dependency property you want to establish? Because this patch moves the store from the end of the BB where it has been defined to the region exit, where the value's definition is not necessarily dominating. Taking your example: bb1: br non-affine bb2, bb3 bb2: bb3: a = b = <store %a, %a.s2a> br bb4 br exit bb4: br exit: exit: val = phi [a, %bb4], [b, %bb3] <store %a, %a.s2a> The store can be in bb2, but not in exit. It cannot be in bb2 because of my intention to not have any writes in the middle of ScopStmts. It doesn't need to be written anyway because there can be no use of %a (i.e. without PHIs) beyond bb2/4. Which check are you talking about? I could not find a check in Polly's source code that was dropped. ; CHECK-NEXT: store float %p_val1, float* %val1.s2a ; CHECK-NEXT: store float %p_val2, float* %val2.s2a Independent of the comments above, your test seems to be unnecessarily weak. AKA, it would also pass if the original output is generated. I think if the new output is indeed preferable, we should check that it is really generated. What test are you talking about? This is not meant as a unit test of any new functionality, just an adaption to changed behavior.
test/Isl/CodeGen/scev_expansion_in_nonaffine.ll
18 ↗	(On Diff #37133)	It is intended to fail before the diff update where instead of RegionMaps, a single ScalarMap was used. Johannes and me thought it would be okay, but this one test was failing. I made it to ensure that when trying to remove RegionMaps again, we have a unit test fail before an LNT test fail. You could argue that this check is unrelated. I might commit it separately.

Hi Michael,

thank you for this quick and detailed answer. I added my comments inline. One general question I have if the original reason for this patch was to somehow handle the inter-non-affine-scop statement scalar dependences visible e.g. in non-affine-phi-node-expansion-4.ll? Meaning: if we do not generate them (as they are clearly not needed), would this possibly even remove the need for the entire patch? Or is the patch needed even if we drop them?

Best,
Tobias

lib/CodeGen/BlockGenerators.cpp
1150 ↗	(On Diff #37133)	I might be wrong (even though I do not yet understand why). So let's try to get it right. What do you mean by intra-scopstmt writes? Are these scalar write statements that are read in the same scopstmt? Did they appear only in RegionStmts and with PHI nodes? I am looking for an example that makes me understand what you mean. Does this pattern appear in one of the test cases that changed? Or are you talking about the in necessary read-write chains we see in non-affine-phi-node-expansion-4.ll? What would be a intra-scopstmt read? 'intra' suggests at least two elements that are somehow related. So maybe two scalar reads from the same address? Such may indeed happen. The only thing that will not happen is a PHI-read of a PHI-node that is part of the same non-affine region statement.
1210 ↗	(On Diff #37133)	There can be (unecessary) writes to a.s2a and b.s2a which are not necessarily in the exit(ing) block, i.e. somewhere in the middle. This violated the intention I had to not write to (not yet existing) MAPPED scalars in the middle of ScopStmt, making intra-ScopStmt lifetime analysis necessary. Probably this was already explained in earlier discussions, but I do not yet fully understand what "MAPPED scalars" will be (probably the once that will be folded into array loads). To get a better feeling let me try to understand the intention of your changes. As far as I understand not inserting a.s2a and b.s2a somewhere in the middle of a non-affine region statement is important, but dropping the .s2a is not, right? Is the following code sufficient for what you need. It should generate all s2a in the exit block, but does still generate all s2a statements that we currently model. for (MemoryAccess MA : Stmt) { if (MA->isExplicit() \|\| MA->isRead()) continue; Instruction ScalarInst = MA->getAccessInstruction(); Value Val = MA->getAccessValue(); // In case we add the store into an exiting block, we need to restore the // position for stores in the exit node. auto SavedInsertionPoint = Builder.GetInsertPoint(); if (MA->isPHI()) { BasicBlock ExitingBB = ScalarInst->getParent(); BasicBlock ExitingBBCopy = BlockMap[ExitingBB]; Builder.SetInsertPoint(ExitingBBCopy->getTerminator()); } auto Address = getOrCreateAlloca(MA); Val = getNewScalarValue(Val, R, Stmt, LTS, BBMap); Builder.CreateStore(Val, Address); // Restore the insertion point if necessary. if (MA->isPHI() { Builder.SetInsertPoint(SavedInsertionPoint); } } (I mostly try to understand what are the requirements for your later patches) I don't know whether it is possible that there already is code in the ExitingBBCopy when it has been split up to insert the non-affine region. If yes, building code at the terminator might be after the instruction that have been before.\ Good point, but there should be no such code (besides branch statements and other control-flow conditions that would be unrelated). I wouldn't think too much about this code, I will make it go away in the patch I am working on at the moment. OK. Though I still try to be a little detailed to make sure I get a better feeling of the direction you are aiming. I was a little high-level last week and I think that may have caused some confusion. If I know where you are going, it is easier for me to get the upcoming patches.
test/Isl/CodeGen/non-affine-phi-node-expansion-3.ll
20 ↗	(On Diff #37133)	I have troubles following this reasoning. If this code was correct before, why would it now be incorrect? Would it break the no-scalar- in-statement dependency property you want to establish? Because this patch moves the store from the end of the BB where it has been defined to the region exit, where the value's definition is not necessarily dominating. Taking your example: bb1: br non-affine bb2, bb3 bb2: bb3: a = b = <store %a, %a.s2a> br bb4 br exit bb4: br exit: exit: val = phi [a, %bb4], [b, %bb3] <store %a, %a.s2a> The store can be in bb2, but not in exit. It cannot be in bb2 because of my intention to not have any writes in the middle of ScopStmts. It doesn't need to be written anyway because there can be no use of %a (i.e. without PHIs) beyond bb2/4. As far as I understand the test case that triggered this discussion is not affected by this issue. I just committed test/Isl/CodeGen/non-affine-phi-node-expansion-4.ll where the value passed to the PHI node indeed does not dominate the non-affine region's exit. Looking at the code your unmodified patch creates, I see the following suspicious statements in the generated code that were not created before your patch: polly.stmt.loop.entry: ; preds = %polly.loop_header %val1.s2a.reload = load float, float* %val1.s2a %val2.s2a.reload = load float, float* %val2.s2a br label %polly.stmt.loop polly.stmt.branch2: ; preds = %polly.stmt.branch1 store float %val2.s2a.reload, float* %merge.phiops br label %polly.stmt.backedge.exit Is this a bug in your patch? Now as I understand the issue, it seems what we want to do is to ensure we do not create scalar loads in case the (PHI-node) use is within the same region. I think this is indeed something we should avoid, but as Johannes pointed out we probably want to fix this already at the place where we model the scop statement. Specifically, the following code should be modified to take non-affine regions into account: if (OpI) { BasicBlock *OpIBB = OpI->getParent(); // As we pretend there is a use (or more precise a write) of OpI in OpBB // we have to insert a scalar dependence from the definition of OpI to // OpBB if the definition is not in OpBB. if (OpIBB != OpBB) { addScalarReadAccess(OpI, PHI, OpBB); addScalarWriteAccess(OpI); } The above seems to be an independent change that should go in before this patch. Independent of the comments above, your test seems to be unnecessarily weak. AKA, it would also pass if the original output is generated. I think if the new output is indeed preferable, we should check that it is really generated. What test are you talking about? This is not meant as a unit test of any new functionality, just an adaption to changed behavior. This test case. It is is important that the .s2a accesses are removed, the test should fail if this is not the case. I would just make this a CHECK-NEXT and also check for the (branch?) instruction that follows the .phiops store. (I committed this hardening already when adding non-affine-phi-node-expansion-4.ll)
test/Isl/CodeGen/scev_expansion_in_nonaffine.ll
18 ↗	(On Diff #37133)	OK. I see. Good that you added the test case. It would indeed be better to commit it ahead of time to make clear that this test has been working before and still keeps working. Feel free to commit as is without further review (but please include the regex stuff).

Rebase to r250411

Dominance check still needed or phi_in_exit_early_lnt_failure_2.ll fails.

I could further investigate why why condition in generateScalarStores cannot be further simplified, but I don't think its worth bothering. D13762 should clean it all up.

lib/CodeGen/BlockGenerators.cpp
1170 ↗	(On Diff #37490)	Your proposed code fails Isl/CodeGen/phi_in_exit_early_lnt_failure_2.ll with assertion: Assertion `!verifyGeneratedFunction(S, *EnteringBB->getParent()) && "Verification of generated function failed"' failed.
test/Isl/CodeGen/non-affine-phi-node-expansion-3.ll
20–21 ↗	(On Diff #37490)	%val1.s2a.reload = load float, float* %val1.s2a %val2.s2a.reload = load float, float* %val2.s2a appeared because I modified the check in generateScalarLoads that originally was `isa<PHINode>(Inst)`. The new condition also let loads through that are also not supposed to be there in the first place. I reinstated the original condition although I am not convinced it is correct There could be a legitimate PHI in the subregion's entry block. Just to remind, this is going to be cleaned up in D13762 anyway.

Hi Michael,

it seems after r250411 the parts of the patch that caused concerns are not needed any more. Even though they might be replaced soon, I think it is worth committing a minimal patch. I am very bad in keep track of things that should be removed, so while at this lets try to avoid adding unnecessary complicated code (including the PHI node stuff that does not seem to be well understood). Maybe you could rebase the patch once again?

Best,
Tobias

lib/CodeGen/BlockGenerators.cpp
1170 ↗	(On Diff #37490)	I just retested after 250411 and I do not see any failure.
test/Isl/CodeGen/non-affine-phi-node-expansion-3.ll
20–21 ↗	(On Diff #37490)	After 250411 I do not see such loads being generated even with the just this simple implementation: void RegionGenerator::generateScalarLoads(ScopStmt &Stmt, ValueMapT &BBMap) { return BlockGenerator::generateScalarLoads(Stmt, BBMap); }

Rebase to r250439; removed now unecessary override of generateScalarLoads.

Meinersbur added inline comments.Oct 15 2015, 3:54 PM

lib/CodeGen/BlockGenerators.cpp
1151 ↗	(On Diff #37533)	I still see the failure. I will look tomorrow into it.
test/Isl/CodeGen/non-affine-phi-node-expansion-3.ll
19–23 ↗	(On Diff #37533)	Mmmh, I looked for fixing the suspicious instructions in non-affine-phi-node-expansion-4.ll first without noticing they are gone with 250411 as well.

Rebase to r250518; Remove now unnecessary code to detect useless MemoeryAccesses.

Hi Michael,

besides one minor comment this patch looks now good to me.

Best,
Tobias

lib/CodeGen/BlockGenerators.cpp
1114 ↗	(On Diff #37595)	Was there a specific reason you moved away from the MA->isPHI()? Your new code seems to be functional equivalent, as TerminatorInstructions are only used as memory access instructions for PHI nodes, but I find this less clear than MA->isPHI().

Forgot to accept the revision.

This revision is now accepted and ready to land.Oct 16 2015, 3:24 PM

Meinersbur added inline comments.Oct 17 2015, 8:18 AM

lib/CodeGen/BlockGenerators.cpp
1114 ↗	(On Diff #37595)	Exit PHI node handling creates SCALAR access for incoming values, to handle them as escaping values. Effectively, it means that the created alloca has the suffix .phiops instead of .s2a. Also see ScopInfo::addPHIWriteAccess()

Meinersbur added inline comments.

Comment at: lib/CodeGen/BlockGenerators.cpp:1114
@@ +1113,3 @@
+ // Implicit writes induced by PHIs must be written in the incoming blocks.
+ if (isa<TerminatorInst>(ScalarInst)) {

+ BasicBlock *ExitingBB = ScalarInst->getParent();

grosser wrote:

Was there a specific reason you moved away from the MA->isPHI()? Your new code seems to be functional equivalent, as TerminatorInstructions are only used as memory access instructions for PHI nodes, but I find this less clear than MA->isPHI().

Exit PHI node handling creates SCALAR access for incoming values, to handle them as escaping values. Effectively, it means that the created alloca has the suffix .phiops instead of .s2a.

Also see ScopInfo::addPHIWriteAccess()

OK. Maybe add a test case that would have failed with the code I proposed.

Best,
Tobias

Meinersbur added a parent revision: D13848: [Polly] Avoid unnecessay .s2a write access when used only in PHIs.Oct 17 2015, 11:17 AM

Rebase to D13848

In D13487#269745, @grosser wrote:

OK. Maybe add a test case that would have failed with the code I proposed.

Isl/CodeGen/phi_in_exit_early_lnt_failure_2.ll already fails with that code (just tried it). Do you want a dedicated test case?

Meinersbur added a comment.

In http://reviews.llvm.org/D13487#269745, @grosser wrote:

OK. Maybe add a test case that would have failed with the code I proposed.

Isl/CodeGen/phi_in_exit_early_lnt_failure_2.ll already fails with that code (just tried it). Do you want a dedicated test case?

No, I just did not realize it would fail. (It did not in my earlier
experiments, but does now). Then its fine as it is.

Best,
Tobias

Closed by commit rL250625: Load/Store scalar accesses before/after the statement itself (authored by Meinersbur). · Explain WhyOct 17 2015, 2:38 PM

This revision was automatically updated to reflect the committed changes.

Hi Michael,

the latest version of this patch does not automatically apply as it has 'polly/trunk/' in the prefix. Could you possibly fix this?

grosser removed a child revision: D13676: [Polly] Do not store scalar accesses in InstructionToAccess.Dec 4 2015, 11:17 AM

It seems I got confused. This patch was already committed, it was still kept as a dependent patch in phabricator. When trying to apply the patch that was registered as depending on this one, 'arc patch' tried to reapply the patch that was already committed and failed. I resolved this by removing the dependency to the already committed patch.

OK, thanks. Because I do have this patch in my own repository, I never used arc patch by myself.

Revision Contents

Path

Size

polly/

trunk/

include/

polly/

CodeGen/

BlockGenerators.h

23 lines

ScopInfo.h

6 lines

lib/

Analysis/

ScopInfo.cpp

6 lines

CodeGen/

BlockGenerators.cpp

94 lines

test/

Isl/

CodeGen/

non-affine-phi-node-expansion-2.ll

2 lines

non-affine-phi-node-expansion-3.ll

2 lines

non-affine-phi-node-expansion-4.ll

2 lines

phi_loop_carried_float.ll

2 lines

phi_loop_carried_float_escape.ll

2 lines

read-only-scalars.ll

2 lines

Diff 37690

polly/trunk/include/polly/CodeGen/BlockGenerators.h

Show First 20 Lines • Show All 335 Lines • ▼ Show 20 Lines	protected:
/// @param ScalarBase The demoted scalar value.		/// @param ScalarBase The demoted scalar value.
/// @param Map The map we should look for a mapped alloca value.		/// @param Map The map we should look for a mapped alloca value.
/// @param NameExt The suffix we add to the name of a new created alloca.		/// @param NameExt The suffix we add to the name of a new created alloca.
///		///
/// @returns The alloca for @p ScalarBase.		/// @returns The alloca for @p ScalarBase.
Value getOrCreateAlloca(Value ScalarBase, ScalarAllocaMapTy &Map,		Value getOrCreateAlloca(Value ScalarBase, ScalarAllocaMapTy &Map,
const char *NameExt);		const char *NameExt);

/// @brief Generate reload of scalars demoted to memory and needed by @p Inst.		/// @brief Generate reload of scalars demoted to memory and needed by @p Stmt.
///		///
/// @param Stmt The statement we generate code for.		/// @param Stmt The statement we generate code for.
/// @param Inst The instruction that might need reloaded values.
/// @param BBMap A mapping from old values to their new values in this block.		/// @param BBMap A mapping from old values to their new values in this block.
virtual void generateScalarLoads(ScopStmt &Stmt, const Instruction *Inst,		void generateScalarLoads(ScopStmt &Stmt, ValueMapT &BBMap);
ValueMapT &BBMap);

/// @brief Generate the scalar stores for the given statement.		/// @brief Generate the scalar stores for the given statement.
///		///
/// After the statement @p Stmt was copied all inner-SCoP scalar dependences		/// After the statement @p Stmt was copied all inner-SCoP scalar dependences
/// starting in @p Stmt (hence all scalar write accesses in @p Stmt) need to		/// starting in @p Stmt (hence all scalar write accesses in @p Stmt) need to
/// be demoted to memory.		/// be demoted to memory.
///		///
/// @param Stmt The statement we generate code for.		/// @param Stmt The statement we generate code for.
/// @param BB The basic block we generate code for.
/// @param LTS A mapping from loops virtual canonical induction		/// @param LTS A mapping from loops virtual canonical induction
/// variable to their new values		/// variable to their new values
/// (for values recalculated in the new ScoP, but not		/// (for values recalculated in the new ScoP, but not
/// within this basic block)		/// within this basic block)
/// @param BBMap A mapping from old values to their new values in this block.		/// @param BBMap A mapping from old values to their new values in this block.
virtual void generateScalarStores(ScopStmt &Stmt, BasicBlock *BB,		virtual void generateScalarStores(ScopStmt &Stmt, LoopToScevMapT &LTS,
LoopToScevMapT &LTS, ValueMapT &BBMap);		ValueMapT &BBMap);

/// @brief Handle users of @p Inst outside the SCoP.		/// @brief Handle users of @p Inst outside the SCoP.
///		///
/// @param R The current SCoP region.		/// @param R The current SCoP region.
/// @param Inst The current instruction we check.		/// @param Inst The current instruction we check.
/// @param Address If given it is used as the escape address for @p Inst.		/// @param Address If given it is used as the escape address for @p Inst.
void handleOutsideUsers(const Region &R, Instruction *Inst,		void handleOutsideUsers(const Region &R, Instruction *Inst,
Value *Address = nullptr);		Value *Address = nullptr);
▲ Show 20 Lines • Show All 366 Lines • ▼ Show 20 Lines	private:
/// @param PHI The original PHI we copy.		/// @param PHI The original PHI we copy.
/// @param PHICopy The copy of @p PHI.		/// @param PHICopy The copy of @p PHI.
/// @param IncomingBB An incoming block of @p PHI.		/// @param IncomingBB An incoming block of @p PHI.
/// @param LTS A map from old loops to new induction variables as		/// @param LTS A map from old loops to new induction variables as
/// SCEVs.		/// SCEVs.
void addOperandToPHI(ScopStmt &Stmt, const PHINode PHI, PHINode PHICopy,		void addOperandToPHI(ScopStmt &Stmt, const PHINode PHI, PHINode PHICopy,
BasicBlock *IncomingBB, LoopToScevMapT &LTS);		BasicBlock *IncomingBB, LoopToScevMapT &LTS);

/// @brief Generate reload of scalars demoted to memory and needed by @p Inst.
///
/// @param Stmt The statement we generate code for.
/// @param Inst The instruction that might need reloaded values.
/// @param BBMap A mapping from old values to their new values in this block.
virtual void generateScalarLoads(ScopStmt &Stmt, const Instruction *Inst,
ValueMapT &BBMap) override;

/// @brief Generate the scalar stores for the given statement.		/// @brief Generate the scalar stores for the given statement.
///		///
/// After the statement @p Stmt was copied all inner-SCoP scalar dependences		/// After the statement @p Stmt was copied all inner-SCoP scalar dependences
/// starting in @p Stmt (hence all scalar write accesses in @p Stmt) need to		/// starting in @p Stmt (hence all scalar write accesses in @p Stmt) need to
/// be demoted to memory.		/// be demoted to memory.
///		///
/// @param Stmt The statement we generate code for.		/// @param Stmt The statement we generate code for.
/// @param BB The basic block we generate code for.
/// @param LTS A mapping from loops virtual canonical induction variable to		/// @param LTS A mapping from loops virtual canonical induction variable to
/// their new values (for values recalculated in the new ScoP,		/// their new values (for values recalculated in the new ScoP,
/// but not within this basic block)		/// but not within this basic block)
/// @param BBMap A mapping from old values to their new values in this block.		/// @param BBMap A mapping from old values to their new values in this block.
virtual void generateScalarStores(ScopStmt &Stmt, BasicBlock *BB,		virtual void generateScalarStores(ScopStmt &Stmt, LoopToScevMapT &LTS,
LoopToScevMapT &LTS,
ValueMapT &BBMAp) override;		ValueMapT &BBMAp) override;

/// @brief Copy a single PHI instruction.		/// @brief Copy a single PHI instruction.
///		///
/// This copies a single PHI instruction and updates references to old values		/// This copies a single PHI instruction and updates references to old values
/// with references to new values, as defined by GlobalMap and BBMap.		/// with references to new values, as defined by GlobalMap and BBMap.
///		///
/// @param Stmt The statement to code generate.		/// @param Stmt The statement to code generate.
Show All 12 Lines

polly/trunk/include/polly/ScopInfo.h

Show First 20 Lines • Show All 445 Lines • ▼ Show 20 Lines	private:
/// @brief Updated access relation read from JSCOP file.		/// @brief Updated access relation read from JSCOP file.
isl_map *NewAccessRelation;		isl_map *NewAccessRelation;
// @}		// @}

unsigned getElemSizeInBytes() const { return ElemBytes; }		unsigned getElemSizeInBytes() const { return ElemBytes; }

bool isAffine() const { return IsAffine; }		bool isAffine() const { return IsAffine; }

/// @brief Is this MemoryAccess modeling special PHI node accesses?
bool isPHI() const { return Origin == PHI; }

__isl_give isl_basic_map createBasicAccessMap(ScopStmt Statement);		__isl_give isl_basic_map createBasicAccessMap(ScopStmt Statement);

void assumeNoOutOfBound();		void assumeNoOutOfBound();

/// @brief Compute bounds on an over approximated access relation.		/// @brief Compute bounds on an over approximated access relation.
///		///
/// @param ElementSize The size of one element accessed.		/// @param ElementSize The size of one element accessed.
void computeBoundsOnAccessRelation(unsigned ElementSize);		void computeBoundsOnAccessRelation(unsigned ElementSize);
▲ Show 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	public:

/// @brief Whether this is an access of an explicit load or store in the IR.		/// @brief Whether this is an access of an explicit load or store in the IR.
bool isExplicit() const { return Origin == EXPLICIT; }		bool isExplicit() const { return Origin == EXPLICIT; }

/// @brief Whether this access represents a register access or models PHI		/// @brief Whether this access represents a register access or models PHI
/// nodes.		/// nodes.
bool isImplicit() const { return !isExplicit(); }		bool isImplicit() const { return !isExplicit(); }

		/// @brief Is this MemoryAccess modeling special PHI node accesses?
		bool isPHI() const { return Origin == PHI; }

/// @brief Get the statement that contains this memory access.		/// @brief Get the statement that contains this memory access.
ScopStmt *getStatement() const { return Statement; }		ScopStmt *getStatement() const { return Statement; }

/// @brief Get the reduction type of this access		/// @brief Get the reduction type of this access
ReductionType getReductionType() const { return RedType; }		ReductionType getReductionType() const { return RedType; }

/// @brief Set the updated access relation read from JSCOP file.		/// @brief Set the updated access relation read from JSCOP file.
void setNewAccessRelation(__isl_take isl_map *NewAccessRelation);		void setNewAccessRelation(__isl_take isl_map *NewAccessRelation);
▲ Show 20 Lines • Show All 1,176 Lines • Show Last 20 Lines

polly/trunk/lib/Analysis/ScopInfo.cpp

Show First 20 Lines • Show All 3,283 Lines • ▼ Show 20 Lines	for (User *U : Inst->users()) {
// dependence.		// dependence.
// Hence bail out, before we register an "out-of-region" use for this		// Hence bail out, before we register an "out-of-region" use for this
// definition.		// definition.
if (isa<PHINode>(UI) && UI->getParent() == R->getExit() &&		if (isa<PHINode>(UI) && UI->getParent() == R->getExit() &&
!R->getExitingBlock())		!R->getExitingBlock())
continue;		continue;

// Check whether or not the use is in the SCoP.		// Check whether or not the use is in the SCoP.
if (!R->contains(UseParent)) {		// If there is single exiting block, the single incoming value exit for node
		// PHIs are handled like any escaping SCALAR. Otherwise, as if the PHI
		// belongs to the the scop region.
		bool IsExitNodePHI = isa<PHINode>(UI) && UI->getParent() == R->getExit();
		if (!R->contains(UseParent) && (R->getExitingBlock() \|\| !IsExitNodePHI)) {
AnyCrossStmtUse = true;		AnyCrossStmtUse = true;
continue;		continue;
}		}

// If the instruction can be synthesized and the user is in the region		// If the instruction can be synthesized and the user is in the region
// we do not need to add scalar dependences.		// we do not need to add scalar dependences.
if (canSynthesizeInst)		if (canSynthesizeInst)
continue;		continue;
▲ Show 20 Lines • Show All 409 Lines • Show Last 20 Lines

polly/trunk/lib/CodeGen/BlockGenerators.cpp

Show First 20 Lines • Show All 225 Lines • ▼ Show 20 Lines	RuntimeDebugBuilder::createCPUPrinter(Builder, "Store to ", NewPointer,
": ", ValueOperand, "\n");		": ", ValueOperand, "\n");

Builder.CreateAlignedStore(ValueOperand, NewPointer, Store->getAlignment());		Builder.CreateAlignedStore(ValueOperand, NewPointer, Store->getAlignment());
}		}

void BlockGenerator::copyInstruction(ScopStmt &Stmt, Instruction *Inst,		void BlockGenerator::copyInstruction(ScopStmt &Stmt, Instruction *Inst,
ValueMapT &BBMap, LoopToScevMapT &LTS,		ValueMapT &BBMap, LoopToScevMapT &LTS,
isl_id_to_ast_expr *NewAccesses) {		isl_id_to_ast_expr *NewAccesses) {

// First check for possible scalar dependences for this instruction.
generateScalarLoads(Stmt, Inst, BBMap);

// Terminator instructions control the control flow. They are explicitly		// Terminator instructions control the control flow. They are explicitly
// expressed in the clast and do not need to be copied.		// expressed in the clast and do not need to be copied.
if (Inst->isTerminator())		if (Inst->isTerminator())
return;		return;

Loop *L = getLoopForInst(Inst);		Loop *L = getLoopForInst(Inst);
if ((Stmt.isBlockStmt() \|\| !Stmt.getRegion()->contains(L)) &&		if ((Stmt.isBlockStmt() \|\| !Stmt.getRegion()->contains(L)) &&
canSynthesize(Inst, &LI, &SE, &Stmt.getParent()->getRegion())) {		canSynthesize(Inst, &LI, &SE, &Stmt.getParent()->getRegion())) {
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	BasicBlock BlockGenerator::splitBB(BasicBlock BB) {
CopyBB->setName("polly.stmt." + BB->getName());		CopyBB->setName("polly.stmt." + BB->getName());
return CopyBB;		return CopyBB;
}		}

BasicBlock BlockGenerator::copyBB(ScopStmt &Stmt, BasicBlock BB,		BasicBlock BlockGenerator::copyBB(ScopStmt &Stmt, BasicBlock BB,
ValueMapT &BBMap, LoopToScevMapT &LTS,		ValueMapT &BBMap, LoopToScevMapT &LTS,
isl_id_to_ast_expr *NewAccesses) {		isl_id_to_ast_expr *NewAccesses) {
BasicBlock *CopyBB = splitBB(BB);		BasicBlock *CopyBB = splitBB(BB);
		Builder.SetInsertPoint(CopyBB->begin());
		generateScalarLoads(Stmt, BBMap);

copyBB(Stmt, BB, CopyBB, BBMap, LTS, NewAccesses);		copyBB(Stmt, BB, CopyBB, BBMap, LTS, NewAccesses);

		// After a basic block was copied store all scalars that escape this block in
		// their alloca.
		generateScalarStores(Stmt, LTS, BBMap);
return CopyBB;		return CopyBB;
}		}

void BlockGenerator::copyBB(ScopStmt &Stmt, BasicBlock BB, BasicBlock CopyBB,		void BlockGenerator::copyBB(ScopStmt &Stmt, BasicBlock BB, BasicBlock CopyBB,
ValueMapT &BBMap, LoopToScevMapT &LTS,		ValueMapT &BBMap, LoopToScevMapT &LTS,
isl_id_to_ast_expr *NewAccesses) {		isl_id_to_ast_expr *NewAccesses) {
Builder.SetInsertPoint(CopyBB->begin());
EntryBB = &CopyBB->getParent()->getEntryBlock();		EntryBB = &CopyBB->getParent()->getEntryBlock();

for (Instruction &Inst : *BB)		for (Instruction &Inst : *BB)
copyInstruction(Stmt, &Inst, BBMap, LTS, NewAccesses);		copyInstruction(Stmt, &Inst, BBMap, LTS, NewAccesses);

// After a basic block was copied store all scalars that escape this block
// in their alloca. First the scalars that have dependences inside the SCoP,
// then the ones that might escape the SCoP.
generateScalarStores(Stmt, BB, LTS, BBMap);
}		}

Value BlockGenerator::getOrCreateAlloca(Value ScalarBase,		Value BlockGenerator::getOrCreateAlloca(Value ScalarBase,
ScalarAllocaMapTy &Map,		ScalarAllocaMapTy &Map,
const char *NameExt) {		const char *NameExt) {
// If no alloca was found create one and insert it in the entry block.		// If no alloca was found create one and insert it in the entry block.
if (!Map.count(ScalarBase)) {		if (!Map.count(ScalarBase)) {
auto *Ty = ScalarBase->getType();		auto *Ty = ScalarBase->getType();
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	void BlockGenerator::handleOutsideUsers(const Region &R, Instruction *Inst,

// Get or create an escape alloca for this instruction.		// Get or create an escape alloca for this instruction.
auto *ScalarAddr = Address ? Address : getOrCreateScalarAlloca(Inst);		auto *ScalarAddr = Address ? Address : getOrCreateScalarAlloca(Inst);

// Remember that this instruction has escape uses and the escape alloca.		// Remember that this instruction has escape uses and the escape alloca.
EscapeMap[Inst] = std::make_pair(ScalarAddr, std::move(EscapeUsers));		EscapeMap[Inst] = std::make_pair(ScalarAddr, std::move(EscapeUsers));
}		}

void BlockGenerator::generateScalarLoads(ScopStmt &Stmt,		void BlockGenerator::generateScalarLoads(ScopStmt &Stmt, ValueMapT &BBMap) {
const Instruction *Inst,		for (MemoryAccess *MA : Stmt) {
ValueMapT &BBMap) {		if (MA->isExplicit() \|\| MA->isWrite())
auto *MAL = Stmt.lookupAccessesFor(Inst);

if (!MAL)
return;

for (MemoryAccess MA : MAL) {
if (MA->isExplicit() \|\| !MA->isRead())
continue;		continue;

auto Address = getOrCreateAlloca(MA);		auto Address = getOrCreateAlloca(MA);
BBMap[MA->getBaseAddr()] =		BBMap[MA->getBaseAddr()] =
Builder.CreateLoad(Address, Address->getName() + ".reload");		Builder.CreateLoad(Address, Address->getName() + ".reload");
}		}
}		}

Show All 39 Lines	Value BlockGenerator::getNewScalarValue(Value ScalarValue, const Region &R,

// Case (3b)		// Case (3b)
Value *Address = getOrCreateScalarAlloca(ScalarValueInst);		Value *Address = getOrCreateScalarAlloca(ScalarValueInst);
ScalarValue = Builder.CreateLoad(Address, Address->getName() + ".reload");		ScalarValue = Builder.CreateLoad(Address, Address->getName() + ".reload");

return ScalarValue;		return ScalarValue;
}		}

void BlockGenerator::generateScalarStores(ScopStmt &Stmt, BasicBlock *BB,		void BlockGenerator::generateScalarStores(ScopStmt &Stmt, LoopToScevMapT &LTS,
LoopToScevMapT &LTS,
ValueMapT &BBMap) {		ValueMapT &BBMap) {
const Region &R = Stmt.getParent()->getRegion();		const Region &R = Stmt.getParent()->getRegion();

assert(Stmt.isBlockStmt() && BB == Stmt.getBasicBlock() &&		assert(Stmt.isBlockStmt() && "Region statements need to use the "
"Region statements need to use the generateScalarStores() "		"generateScalarStores() function in the "
"function in the RegionGenerator");		"RegionGenerator");

for (MemoryAccess *MA : Stmt) {		for (MemoryAccess *MA : Stmt) {
if (MA->isExplicit() \|\| MA->isRead())		if (MA->isExplicit() \|\| MA->isRead())
continue;		continue;

Value *Val = MA->getAccessValue();		Value *Val = MA->getAccessValue();
auto Address = getOrCreateAlloca(MA);		auto Address = getOrCreateAlloca(MA);

▲ Show 20 Lines • Show All 519 Lines • ▼ Show 20 Lines	void RegionGenerator::copyStmt(ScopStmt &Stmt, LoopToScevMapT &LTS,
assert(Stmt.isRegionStmt() &&		assert(Stmt.isRegionStmt() &&
"Only region statements can be copied by the region generator");		"Only region statements can be copied by the region generator");

// Forget all old mappings.		// Forget all old mappings.
BlockMap.clear();		BlockMap.clear();
RegionMaps.clear();		RegionMaps.clear();
IncompletePHINodeMap.clear();		IncompletePHINodeMap.clear();

		// Collection of all values related to this subregion.
		ValueMapT ValueMap;

// The region represented by the statement.		// The region represented by the statement.
Region *R = Stmt.getRegion();		Region *R = Stmt.getRegion();

// Create a dedicated entry for the region where we can reload all demoted		// Create a dedicated entry for the region where we can reload all demoted
// inputs.		// inputs.
BasicBlock *EntryBB = R->getEntry();		BasicBlock *EntryBB = R->getEntry();
BasicBlock *EntryBBCopy =		BasicBlock *EntryBBCopy =
SplitBlock(Builder.GetInsertBlock(), Builder.GetInsertPoint(), &DT, &LI);		SplitBlock(Builder.GetInsertBlock(), Builder.GetInsertPoint(), &DT, &LI);
EntryBBCopy->setName("polly.stmt." + EntryBB->getName() + ".entry");		EntryBBCopy->setName("polly.stmt." + EntryBB->getName() + ".entry");
Builder.SetInsertPoint(EntryBBCopy->begin());		Builder.SetInsertPoint(EntryBBCopy->begin());

		generateScalarLoads(Stmt, RegionMaps[EntryBBCopy]);

for (auto PI = pred_begin(EntryBB), PE = pred_end(EntryBB); PI != PE; ++PI)		for (auto PI = pred_begin(EntryBB), PE = pred_end(EntryBB); PI != PE; ++PI)
if (!R->contains(*PI))		if (!R->contains(*PI))
BlockMap[*PI] = EntryBBCopy;		BlockMap[*PI] = EntryBBCopy;

// Iterate over all blocks in the region in a breadth-first search.		// Iterate over all blocks in the region in a breadth-first search.
std::deque<BasicBlock *> Blocks;		std::deque<BasicBlock *> Blocks;
SmallPtrSet<BasicBlock *, 8> SeenBlocks;		SmallPtrSet<BasicBlock *, 8> SeenBlocks;
Blocks.push_back(EntryBB);		Blocks.push_back(EntryBB);
SeenBlocks.insert(EntryBB);		SeenBlocks.insert(EntryBB);

while (!Blocks.empty()) {		while (!Blocks.empty()) {
BasicBlock *BB = Blocks.front();		BasicBlock *BB = Blocks.front();
Blocks.pop_front();		Blocks.pop_front();

// First split the block and update dominance information.		// First split the block and update dominance information.
BasicBlock *BBCopy = splitBB(BB);		BasicBlock *BBCopy = splitBB(BB);
BasicBlock *BBCopyIDom = repairDominance(BB, BBCopy);		BasicBlock *BBCopyIDom = repairDominance(BB, BBCopy);

// In order to remap PHI nodes we store also basic block mappings.		// In order to remap PHI nodes we store also basic block mappings.
BlockMap[BB] = BBCopy;		BlockMap[BB] = BBCopy;

// Get the mapping for this block and initialize it with the mapping		// Get the mapping for this block and initialize it with the mapping
// available at its immediate dominator (in the new region).		// available at its immediate dominator (in the new region).
ValueMapT &RegionMap = RegionMaps[BBCopy];		ValueMapT &RegionMap = RegionMaps[BBCopy];
		if (BBCopy != EntryBBCopy)
RegionMap = RegionMaps[BBCopyIDom];		RegionMap = RegionMaps[BBCopyIDom];

// Copy the block with the BlockGenerator.		// Copy the block with the BlockGenerator.
		Builder.SetInsertPoint(BBCopy->begin());
copyBB(Stmt, BB, BBCopy, RegionMap, LTS, IdToAstExp);		copyBB(Stmt, BB, BBCopy, RegionMap, LTS, IdToAstExp);

// In order to remap PHI nodes we store also basic block mappings.		// In order to remap PHI nodes we store also basic block mappings.
BlockMap[BB] = BBCopy;		BlockMap[BB] = BBCopy;

// Add values to incomplete PHI nodes waiting for this block to be copied.		// Add values to incomplete PHI nodes waiting for this block to be copied.
for (const PHINodePairTy &PHINodePair : IncompletePHINodeMap[BB])		for (const PHINodePairTy &PHINodePair : IncompletePHINodeMap[BB])
addOperandToPHI(Stmt, PHINodePair.first, PHINodePair.second, BB, LTS);		addOperandToPHI(Stmt, PHINodePair.first, PHINodePair.second, BB, LTS);
IncompletePHINodeMap[BB].clear();		IncompletePHINodeMap[BB].clear();

// And continue with new successors inside the region.		// And continue with new successors inside the region.
for (auto SI = succ_begin(BB), SE = succ_end(BB); SI != SE; SI++)		for (auto SI = succ_begin(BB), SE = succ_end(BB); SI != SE; SI++)
if (R->contains(SI) && SeenBlocks.insert(SI).second)		if (R->contains(SI) && SeenBlocks.insert(SI).second)
Blocks.push_back(*SI);		Blocks.push_back(*SI);

		// Remember value in case it is visible after this subregion.
		ValueMap.insert(RegionMap.begin(), RegionMap.end());
}		}

// Now create a new dedicated region exit block and add it to the region map.		// Now create a new dedicated region exit block and add it to the region map.
BasicBlock *ExitBBCopy =		BasicBlock *ExitBBCopy =
SplitBlock(Builder.GetInsertBlock(), Builder.GetInsertPoint(), &DT, &LI);		SplitBlock(Builder.GetInsertBlock(), Builder.GetInsertPoint(), &DT, &LI);
ExitBBCopy->setName("polly.stmt." + R->getExit()->getName() + ".exit");		ExitBBCopy->setName("polly.stmt." + R->getExit()->getName() + ".exit");
BlockMap[R->getExit()] = ExitBBCopy;		BlockMap[R->getExit()] = ExitBBCopy;

▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	for (BasicBlock *BB : SeenBlocks) {

for (auto *PredBBCopy : make_range(pred_begin(BBCopy), pred_end(BBCopy)))		for (auto *PredBBCopy : make_range(pred_begin(BBCopy), pred_end(BBCopy)))
if (LoopPHI->getBasicBlockIndex(PredBBCopy) < 0)		if (LoopPHI->getBasicBlockIndex(PredBBCopy) < 0)
LoopPHI->addIncoming(NullVal, PredBBCopy);		LoopPHI->addIncoming(NullVal, PredBBCopy);

LTS[L] = SE.getUnknown(LoopPHI);		LTS[L] = SE.getUnknown(LoopPHI);
}		}

// Reset the old insert point for the build.		// Continue generating code in the exit block.
Builder.SetInsertPoint(ExitBBCopy->begin());		Builder.SetInsertPoint(ExitBBCopy->getFirstInsertionPt());
}

void RegionGenerator::generateScalarLoads(ScopStmt &Stmt,
const Instruction *Inst,
ValueMapT &BBMap) {

// Inside a non-affine region PHI nodes are copied not demoted. Once the		// Write values visible to other statements.
// phi is copied it will reload all inputs from outside the region, hence		generateScalarStores(Stmt, LTS, ValueMap);
// we do not need to generate code for the read access of the operands of a
// PHI.
if (isa<PHINode>(Inst))
return;

return BlockGenerator::generateScalarLoads(Stmt, Inst, BBMap);
}		}

void RegionGenerator::generateScalarStores(ScopStmt &Stmt, BasicBlock *BB,		void RegionGenerator::generateScalarStores(ScopStmt &Stmt, LoopToScevMapT &LTS,
LoopToScevMapT &LTS,
ValueMapT &BBMap) {		ValueMapT &BBMap) {
const Region &R = Stmt.getParent()->getRegion();		const Region &R = Stmt.getParent()->getRegion();

assert(Stmt.getRegion() &&		assert(Stmt.getRegion() &&
"Block statements need to use the generateScalarStores() "		"Block statements need to use the generateScalarStores() "
"function in the BlockGenerator");		"function in the BlockGenerator");

for (MemoryAccess *MA : Stmt) {		for (MemoryAccess *MA : Stmt) {

if (MA->isExplicit() \|\| MA->isRead())		if (MA->isExplicit() \|\| MA->isRead())
continue;		continue;

Instruction *ScalarInst = MA->getAccessInstruction();		Instruction *ScalarInst = MA->getAccessInstruction();

// Only generate accesses that belong to this basic block.
if (ScalarInst->getParent() != BB)
continue;

Value *Val = MA->getAccessValue();		Value *Val = MA->getAccessValue();

		// In case we add the store into an exiting block, we need to restore the
		// position for stores in the exit node.
		auto SavedInsertionPoint = Builder.GetInsertPoint();

		// Implicit writes induced by PHIs must be written in the incoming blocks.
		if (isa<TerminatorInst>(ScalarInst)) {
		BasicBlock *ExitingBB = ScalarInst->getParent();
		BasicBlock *ExitingBBCopy = BlockMap[ExitingBB];
		Builder.SetInsertPoint(ExitingBBCopy->getTerminator());
		}

auto Address = getOrCreateAlloca(*MA);		auto Address = getOrCreateAlloca(*MA);

Val = getNewScalarValue(Val, R, Stmt, LTS, BBMap);		Val = getNewScalarValue(Val, R, Stmt, LTS, BBMap);
Builder.CreateStore(Val, Address);		Builder.CreateStore(Val, Address);

		// Restore the insertion point if necessary.
		if (isa<TerminatorInst>(ScalarInst))
		Builder.SetInsertPoint(SavedInsertionPoint);
}		}
}		}

void RegionGenerator::addOperandToPHI(ScopStmt &Stmt, const PHINode *PHI,		void RegionGenerator::addOperandToPHI(ScopStmt &Stmt, const PHINode *PHI,
PHINode PHICopy, BasicBlock IncomingBB,		PHINode PHICopy, BasicBlock IncomingBB,
LoopToScevMapT &LTS) {		LoopToScevMapT &LTS) {
Region *StmtR = Stmt.getRegion();		Region *StmtR = Stmt.getRegion();

▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

polly/trunk/test/Isl/CodeGen/non-affine-phi-node-expansion-2.ll

	; RUN: opt %loadPolly -polly-codegen \			; RUN: opt %loadPolly -polly-codegen \
	; RUN: -S < %s \| FileCheck %s			; RUN: -S < %s \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"


	; CHECK: polly.stmt.bb3: ; preds = %polly.stmt.bb3.entry			; CHECK: polly.stmt.bb3: ; preds = %polly.stmt.bb3.entry
	; CHECK: %tmp6_p_scalar_ = load double, double* %arg11, !alias.scope !0, !noalias !2			; CHECK: %tmp6_p_scalar_ = load double, double* %arg1{{[0-9]*}}, !alias.scope !0, !noalias !2
	; CHECK: %p_tmp7 = fadd double 1.000000e+00, %tmp6_p_scalar_			; CHECK: %p_tmp7 = fadd double 1.000000e+00, %tmp6_p_scalar_
	; CHECK: %p_tmp8 = fcmp olt double 1.400000e+01, %p_tmp7			; CHECK: %p_tmp8 = fcmp olt double 1.400000e+01, %p_tmp7
	; CHECK: br i1 %p_tmp8, label %polly.stmt.bb9, label %polly.stmt.bb10			; CHECK: br i1 %p_tmp8, label %polly.stmt.bb9, label %polly.stmt.bb10

	; CHECK: polly.stmt.bb9: ; preds = %polly.stmt.bb3			; CHECK: polly.stmt.bb9: ; preds = %polly.stmt.bb3
	; CHECK: store double 1.000000e+00, double* %tmp12.phiops			; CHECK: store double 1.000000e+00, double* %tmp12.phiops
	; CHECK: br label %polly.stmt.bb11.exit			; CHECK: br label %polly.stmt.bb11.exit

	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

polly/trunk/test/Isl/CodeGen/non-affine-phi-node-expansion-3.ll

Show All 11 Lines	loop:
%val2 = fadd float 1.0, 2.0		%val2 = fadd float 1.0, 2.0
br i1 %cond0, label %branch1, label %backedge		br i1 %cond0, label %branch1, label %backedge

; CHECK-LABEL: polly.stmt.loop:		; CHECK-LABEL: polly.stmt.loop:
; CHECK-NEXT: %polly.subregion.iv = phi i32 [ 0, %polly.stmt.loop.entry ]		; CHECK-NEXT: %polly.subregion.iv = phi i32 [ 0, %polly.stmt.loop.entry ]
; CHECK-NEXT: %p_val0 = fadd float 1.000000e+00, 2.000000e+00		; CHECK-NEXT: %p_val0 = fadd float 1.000000e+00, 2.000000e+00
; CHECK-NEXT: %p_val1 = fadd float 1.000000e+00, 2.000000e+00		; CHECK-NEXT: %p_val1 = fadd float 1.000000e+00, 2.000000e+00
; CHECK-NEXT: %p_val2 = fadd float 1.000000e+00, 2.000000e+00		; CHECK-NEXT: %p_val2 = fadd float 1.000000e+00, 2.000000e+00
; CHECK-NEXT: store float %p_val0, float* %merge.phiops
; CHECK-NEXT: %polly.subregion.iv.inc = add i32 %polly.subregion.iv, 1		; CHECK-NEXT: %polly.subregion.iv.inc = add i32 %polly.subregion.iv, 1
		; CHECK-NEXT: store float %p_val0, float* %merge.phiops
; CHECK-NEXT: br i1		; CHECK-NEXT: br i1

branch1:		branch1:
br i1 %cond1, label %branch2, label %backedge		br i1 %cond1, label %branch2, label %backedge

; CHECK-LABEL: polly.stmt.branch1:		; CHECK-LABEL: polly.stmt.branch1:
; CHECK-NEXT: store float %p_val1, float* %merge.phiops		; CHECK-NEXT: store float %p_val1, float* %merge.phiops
; CHECK-NEXT: br i1		; CHECK-NEXT: br i1
Show All 18 Lines

polly/trunk/test/Isl/CodeGen/non-affine-phi-node-expansion-4.ll

Show All 9 Lines	loop:
%val0 = fadd float 1.0, 2.0		%val0 = fadd float 1.0, 2.0
%val1 = fadd float 1.0, 2.0		%val1 = fadd float 1.0, 2.0
br i1 %cond0, label %branch1, label %backedge		br i1 %cond0, label %branch1, label %backedge

; CHECK-LABEL: polly.stmt.loop:		; CHECK-LABEL: polly.stmt.loop:
; CHECK-NEXT: %polly.subregion.iv = phi i32 [ 0, %polly.stmt.loop.entry ]		; CHECK-NEXT: %polly.subregion.iv = phi i32 [ 0, %polly.stmt.loop.entry ]
; CHECK-NEXT: %p_val0 = fadd float 1.000000e+00, 2.000000e+00		; CHECK-NEXT: %p_val0 = fadd float 1.000000e+00, 2.000000e+00
; CHECK-NEXT: %p_val1 = fadd float 1.000000e+00, 2.000000e+00		; CHECK-NEXT: %p_val1 = fadd float 1.000000e+00, 2.000000e+00
; CHECK-NEXT: store float %p_val0, float* %merge.phiops
; CHECK-NEXT: %polly.subregion.iv.inc = add i32 %polly.subregion.iv, 1		; CHECK-NEXT: %polly.subregion.iv.inc = add i32 %polly.subregion.iv, 1
		; CHECK-NEXT: store float %p_val0, float* %merge.phiops
; CHECK-NEXT: br i1		; CHECK-NEXT: br i1

; The interesting instruction here is %val2, which does not dominate the exit of		; The interesting instruction here is %val2, which does not dominate the exit of
; the non-affine region. Care needs to be taken when code-generating this write.		; the non-affine region. Care needs to be taken when code-generating this write.
; Specifically, at some point we modeled this scalar write, which we tried to		; Specifically, at some point we modeled this scalar write, which we tried to
; code generate in the exit block of the non-affine region.		; code generate in the exit block of the non-affine region.
branch1:		branch1:
%val2 = fadd float 1.0, 2.0		%val2 = fadd float 1.0, 2.0
Show All 24 Lines

polly/trunk/test/Isl/CodeGen/phi_loop_carried_float.ll

	Show All 22 Lines
	; CHECK-LABEL: polly.merge2:			; CHECK-LABEL: polly.merge2:
	; CHECK-NEXT: br label %polly.merge_new_and_old			; CHECK-NEXT: br label %polly.merge_new_and_old

	; CHECK-LABEL: polly.stmt.bb1{{[0-9]*}}:			; CHECK-LABEL: polly.stmt.bb1{{[0-9]*}}:
	; CHECK-NEXT: %tmp.0.phiops.reload[[R1:[0-9]]] = load float, float %tmp.0.phiops			; CHECK-NEXT: %tmp.0.phiops.reload[[R1:[0-9]]] = load float, float %tmp.0.phiops
	; CHECK: store float %tmp.0.phiops.reload[[R1]], float* %tmp.0.s2a			; CHECK: store float %tmp.0.phiops.reload[[R1]], float* %tmp.0.s2a

	; CHECK-LABEL: polly.stmt.bb4:			; CHECK-LABEL: polly.stmt.bb4:
	; CHECK: %tmp[[R5:[0-9]]]_p_scalar_ = load float, float %scevgep, align 4, !alias.scope !0, !noalias !2
	; CHECK: %tmp.0.s2a.reload[[R3:[0-9]]] = load float, float %tmp.0.s2a			; CHECK: %tmp.0.s2a.reload[[R3:[0-9]]] = load float, float %tmp.0.s2a
				; CHECK: %tmp[[R5:[0-9]]]_p_scalar_ = load float, float %scevgep, align 4, !alias.scope !0, !noalias !2
	; CHECK: %p_tmp[[R4:[0-9]*]] = fadd float %tmp.0.s2a.reload[[R3]], %tmp[[R5]]_p_scalar_			; CHECK: %p_tmp[[R4:[0-9]*]] = fadd float %tmp.0.s2a.reload[[R3]], %tmp[[R5]]_p_scalar_
	; CHECK: store float %p_tmp[[R4]], float* %tmp.0.phiops			; CHECK: store float %p_tmp[[R4]], float* %tmp.0.phiops

	; CHECK-LABEL: polly.stmt.bb1{{[0-9]*}}:			; CHECK-LABEL: polly.stmt.bb1{{[0-9]*}}:
	; CHECK-NEXT: %tmp.0.phiops.reload[[R2:[0-9]]] = load float, float %tmp.0.phiops			; CHECK-NEXT: %tmp.0.phiops.reload[[R2:[0-9]]] = load float, float %tmp.0.phiops
	; CHECK: store float %tmp.0.phiops.reload[[R2]], float* %tmp.0.s2a			; CHECK: store float %tmp.0.phiops.reload[[R2]], float* %tmp.0.s2a

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	Show All 28 Lines

polly/trunk/test/Isl/CodeGen/phi_loop_carried_float_escape.ll

	Show All 22 Lines
	; CHECK-NEXT: %tmp.0.final_reload = load float, float* %tmp.0.s2a			; CHECK-NEXT: %tmp.0.final_reload = load float, float* %tmp.0.s2a
	; CHECK-NEXT: br label %polly.merge_new_and_old			; CHECK-NEXT: br label %polly.merge_new_and_old

	; CHECK-LABEL: polly.stmt.bb1{{[0-9]*}}:			; CHECK-LABEL: polly.stmt.bb1{{[0-9]*}}:
	; CHECK-NEXT: %tmp.0.phiops.reload[[R1:[0-9]]] = load float, float %tmp.0.phiops			; CHECK-NEXT: %tmp.0.phiops.reload[[R1:[0-9]]] = load float, float %tmp.0.phiops
	; CHECK-: store float %tmp.0.phiops.reload[[R1]], float* %tmp.0.s2a			; CHECK-: store float %tmp.0.phiops.reload[[R1]], float* %tmp.0.s2a

	; CHECK-LABEL: polly.stmt.bb4:			; CHECK-LABEL: polly.stmt.bb4:
	; CHECK: %tmp[[R5:[0-9]]]_p_scalar_ = load float, float %scevgep, align 4, !alias.scope !0, !noalias !2
	; CHECK: %tmp.0.s2a.reload[[R3:[0-9]]] = load float, float %tmp.0.s2a			; CHECK: %tmp.0.s2a.reload[[R3:[0-9]]] = load float, float %tmp.0.s2a
				; CHECK: %tmp[[R5:[0-9]]]_p_scalar_ = load float, float %scevgep, align 4, !alias.scope !0, !noalias !2
	; CHECK: %p_tmp[[R4:[0-9]*]] = fadd float %tmp.0.s2a.reload[[R3]], %tmp[[R5]]_p_scalar_			; CHECK: %p_tmp[[R4:[0-9]*]] = fadd float %tmp.0.s2a.reload[[R3]], %tmp[[R5]]_p_scalar_
	; CHECK: store float %p_tmp[[R4]], float* %tmp.0.phiops			; CHECK: store float %p_tmp[[R4]], float* %tmp.0.phiops

	; CHECK-LABEL: polly.stmt.bb1{{[0-9]*}}:			; CHECK-LABEL: polly.stmt.bb1{{[0-9]*}}:
	; CHECK-NEXT: %tmp.0.phiops.reload[[R2:[0-9]]] = load float, float %tmp.0.phiops			; CHECK-NEXT: %tmp.0.phiops.reload[[R2:[0-9]]] = load float, float %tmp.0.phiops
	; CHECK: store float %tmp.0.phiops.reload[[R2]], float* %tmp.0.s2a			; CHECK: store float %tmp.0.phiops.reload[[R2]], float* %tmp.0.s2a

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	Show All 28 Lines

polly/trunk/test/Isl/CodeGen/read-only-scalars.ll

	; RUN: opt %loadPolly -polly-analyze-read-only-scalars=false -polly-codegen \			; RUN: opt %loadPolly -polly-analyze-read-only-scalars=false -polly-codegen \
	; RUN: \			; RUN: \
	; RUN: -S < %s \| FileCheck %s			; RUN: -S < %s \| FileCheck %s
	; RUN: opt %loadPolly -polly-analyze-read-only-scalars=true -polly-codegen \			; RUN: opt %loadPolly -polly-analyze-read-only-scalars=true -polly-codegen \
	; RUN: \			; RUN: \
	; RUN: -S < %s \| FileCheck %s -check-prefix=SCALAR			; RUN: -S < %s \| FileCheck %s -check-prefix=SCALAR

	; CHECK-NOT: alloca			; CHECK-NOT: alloca

	; SCALAR-LABEL: entry:			; SCALAR-LABEL: entry:
	; SCALAR-NEXT: %scalar.s2a = alloca float			; SCALAR-NEXT: %scalar.s2a = alloca float

	; SCALAR-LABEL: polly.start:			; SCALAR-LABEL: polly.start:
	; SCALAR-NEXT: store float %scalar, float* %scalar.s2a			; SCALAR-NEXT: store float %scalar, float* %scalar.s2a

	; SCALAR-LABEL: polly.stmt.stmt1:			; SCALAR-LABEL: polly.stmt.stmt1:
	; SCALAR-NEXT: %val_p_scalar_ = load float, float* %A,
	; SCALAR-NEXT: %scalar.s2a.reload = load float, float* %scalar.s2a			; SCALAR-NEXT: %scalar.s2a.reload = load float, float* %scalar.s2a
				; SCALAR-NEXT: %val_p_scalar_ = load float, float* %A,
	; SCALAR-NEXT: %p_sum = fadd float %val_p_scalar_, %scalar.s2a.reload			; SCALAR-NEXT: %p_sum = fadd float %val_p_scalar_, %scalar.s2a.reload

	define void @foo(float* noalias %A, float %scalar) {			define void @foo(float* noalias %A, float %scalar) {
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%indvar = phi i64 [0, %entry], [%indvar.next, %loop.backedge]			%indvar = phi i64 [0, %entry], [%indvar.next, %loop.backedge]
	Show All 16 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Polly] Load/Store scalar accesses before/after the statement itselfClosedPublic

Details

Diff Detail

Event Timeline

+ BasicBlock *ExitingBB = ScalarInst->getParent();

Revision Contents

Diff 37690

polly/trunk/include/polly/CodeGen/BlockGenerators.h

polly/trunk/include/polly/ScopInfo.h

polly/trunk/lib/Analysis/ScopInfo.cpp

polly/trunk/lib/CodeGen/BlockGenerators.cpp

polly/trunk/test/Isl/CodeGen/non-affine-phi-node-expansion-2.ll

polly/trunk/test/Isl/CodeGen/non-affine-phi-node-expansion-3.ll

polly/trunk/test/Isl/CodeGen/non-affine-phi-node-expansion-4.ll

polly/trunk/test/Isl/CodeGen/phi_loop_carried_float.ll

polly/trunk/test/Isl/CodeGen/phi_loop_carried_float_escape.ll

polly/trunk/test/Isl/CodeGen/read-only-scalars.ll

[Polly] Load/Store scalar accesses before/after the statement itself
ClosedPublic