This is an archive of the discontinued LLVM Phabricator instance.

[ForwardOpTree] Allow forwarding in the presence of region statements
ClosedPublic

Authored by grosser on Aug 30 2017, 8:13 AM.

Event Timeline

grosser created this revision.Aug 30 2017, 8:13 AM
Meinersbur added inline comments.Aug 30 2017, 8:33 AM
lib/Analysis/ScopInfo.cpp
1740–1756 ↗(On Diff #113268)

[Suggestion] Wouldn't is be easier to enumerate over Scop::getBoxedLoops() and check those whether they are contained in the stmt?

1749–1753 ↗(On Diff #113268)

I assume this is for the case L==nullptr. Could be put into the else branch. of if (L)

lib/Transform/ZoneAlgo.cpp
379 ↗(On Diff #113268)

[Serious] Boxed loops are not the only problematic case. Consider within a region stmt:

  br label %StmtB

StmtA:
  %a= load %A
  br label %StmtC  

StmtB:
  store %A 
  br label %StmtA  

StmtC:
  ...

Iteration order might be StmtA first, then StmtB, although StmtB is executed before StmtA. It should be rejected, but is not.

381–384 ↗(On Diff #113268)

Message should be changed to the new meaning.

test/DeLICM/reject_storeinsubregion.ll
78–85 ↗(On Diff #113268)

This was a check the case when it does _not_ work. Please ensure that we still have a test case for the not-allowed case.

test/ForwardOpTree/forward_region.ll
10 ↗(On Diff #113268)

This condition does not correspond to the IR below.

test/ForwardOpTree/noforward_from_region.ll
1

Why is this deleted? There are still cases in regions that are not allowed.

Thank you. Looking into it.

lib/Analysis/ScopInfo.cpp
1740–1756 ↗(On Diff #113268)

This would grow linearly in the size of the scop, whereas the current approach hopefully would not.

lib/Transform/ZoneAlgo.cpp
379 ↗(On Diff #113268)

OK. What would you suggest to check instead?

381–384 ↗(On Diff #113268)

OK.

test/ForwardOpTree/forward_region.ll
10 ↗(On Diff #113268)

I just moved the old test case. Will cross-check.

test/ForwardOpTree/noforward_from_region.ll
1

OK.

Meinersbur added inline comments.Aug 30 2017, 10:04 AM
lib/Analysis/ScopInfo.cpp
1740–1756 ↗(On Diff #113268)

You can check whether a given BB is in a boxed loop in constant time (assuming hashtable lookup is constant time):

LI.getLoopFor(BB) != Stmt.getSurroundingLoop()

I think closer to what is actually needed, no?

lib/Transform/ZoneAlgo.cpp
379 ↗(On Diff #113268)

If you want to keep it easy, leave it as it is. It would only have an effect for load-forwarding from region stmts (not to region stmt, which seems to be the case you are interested in) (and DeLICM to some extent).
In contrast to my previous assumption this is possible without crossing a phi node: if the load to be forwarded is in the region's entry block (or in strange cases of irreducible loops)

If you want to make it work, you can check the following:

  • Allow reads and writes in the entry block (if not part of a boxed loop which can be checked in constant time as show above; to also cover the case of an irreducible loop which is I am not sure is possible: check that all incoming edges are from outside the region; the point is only to ensure that the BB is not executed multiple times in the same statement instance) like it is done for block stmts (which never have boxed loops).
  • Disallow any writes in non-entry blocks, as it is now for all BBs in region stmts.

For really advanced analysis, find a topological order of access execution order an iterator over that. (Keep in mind that there is no topological order if there is a loop). Can be rejected in that case. getAccessesInOrder() unfortunately is not advances enough for this.

test/ForwardOpTree/forward_region.ll
10 ↗(On Diff #113268)

Might be that I wasn't as accurate myself in the old test case that was probably created by myself. Problably started with an undef, but wasn't accepted by ScopDetection.

test/ForwardOpTree/noforward_from_region.ll
1

I just noticed that this was renamed, not deleted. Sorry for the unjustified comment.

However, I think we have no test case for forwarding a load from a region yet, which would be interesting.

grosser updated this revision to Diff 113373.Aug 31 2017, 2:06 AM

It seems the change in collectIncompatibleElts is not even needed. I also have
a hard time to reproduce a test case where the forwarding is refused due to
incompatible statements. Michael, do you have an idea?

Meinersbur edited edge metadata.Aug 31 2017, 5:02 AM

Your test cases check that the movement of non-load scalars to/from region works now (Which is amazing!). A test that checks that a load is not forwarded in case there is an (unpredictable) write in a region is missing. It might load the content written the store instead of the original value.

Simplify.cpp markAndSweep needs to know about the prepended instructions. Otherwise it doesn't see their operands and doesn't mark them as used, hence remove them.

include/polly/ScopInfo.h
2369–2370

Why this move?

test/ForwardOpTree/forward_into_region.ll
51–57

Can you use CHECK: here? We might want to add additional stats which would cause this to break unnecessarily.

test/ForwardOpTree/noforward_from_region.ll
32–35

These don't seem to be relevant.

grosser updated this revision to Diff 113396.Aug 31 2017, 6:40 AM

Addressed Michael's comment

The getLI is moved to make it public. It was not public before.

Meinersbur accepted this revision.Aug 31 2017, 7:54 AM

LGTM, I have no more correctness concerns.

The getLI is moved to make it public. It was not public before.

I don't see any new uses of it that'd require it to be public.

lib/Support/VirtualInstruction.cpp
186–189 ↗(On Diff #113396)

This is correct, but did you consider not adding the instruction list instructions as roots? Simplify would be able to remove them and their dependencies if they are not considered roots.

test/ForwardOpTree/forward_from_region.ll
55–60

Using CHECK: would make the test less fragile if we want to add more stats.

test/ForwardOpTree/noforward_from_region.ll
3

The comment is inaccurate. The load is not moved. Mentioning the reason for this would also be nice.

17

Array %B is not used.

33

The pseudocode summary says it's A[0] = 42, but this is A[0] = A[0]. I'd prefer the former because it's one less (irrelevant) memory accesses.

This revision is now accepted and ready to land.Aug 31 2017, 7:54 AM
This revision was automatically updated to reflect the committed changes.
grosser marked 2 inline comments as done.