This is an archive of the discontinued LLVM Phabricator instance.

[Polly][DeLICM] Known knowledge.
ClosedPublic

Authored by Meinersbur on Mar 22 2017, 10:24 AM.

Details

Summary

Extend the Knowledge class to store information about the contents of array elements and which values are written. Two knowledges do not conflict the known content is the same. The content information if computed from writes to and loads from the array elements, and represented by "ValInst": isl spaces that compare equal if the value represented is the same.

This patch contains changes that I try to commit independently (introduction of distributeDomain/removal of AllElements, VirtualUse(?)).

Event Timeline

Meinersbur created this revision.Mar 22 2017, 10:24 AM
Meinersbur updated this revision to Diff 92696.Mar 22 2017, 1:17 PM
  • Rebase to r298543
  • Add switch to disable known information
Meinersbur updated this revision to Diff 92707.Mar 22 2017, 2:16 PM
  • Rebase to r298546
  • Removed/activate commented code.
Meinersbur updated this revision to Diff 92718.Mar 22 2017, 2:40 PM
  • Remove spurious semicolon
grosser edited edge metadata.Mar 23 2017, 5:37 AM

Hi Michael,

thank you for pushing this forward. This is again a pretty large patch, even though the kind of different changes are now a lot smaller. Still, for proper and timely review, it would really help to get these patches smaller and upstream as much as possible independently. I did a first round over this patch, and trying to understand which parts are needed to get the tests working that you contributed. It seems indeed that e.g. VirtualUse is not really needed and to my surprise also the computeKnownFromWrite / computeKnownFromRead functions do not seem to be needed. At least all test cases pass even when I just return an empty known set. Did I make some wrong assumptions or is this functionality not tested and can be added later together with the corresponding test cases. (I stop now my review, to get first some feedback from you).

Best,
Tobias

include/polly/ScopInfo.h
1334 ↗(On Diff #92718)

This can probably be committed separately and directly be used and tested in RegionGenerator::addOperandToPHI. No further review needed for this.

include/polly/Support/VirtualInstruction.h
38

An -> A

lib/Transform/DeLICM.cpp
391

This move also not seem to be needed without VirtualUse.

868

Not needed without VirtualUse?

905

Not needed without virtual use?

1213

Grammar: "look get"

1216

I prefer comments to be aligned to the start of the first line (otherwise they mix with the parameters).

1240

The only case currently tested seems to be VirtualUse::Constant. If I add a "break" at the beginning of each other case, all tests still pass as expected. Maybe we can first introduce the constant case, without virtual use and then look again at the virtual use?

1242

on the

1966

If I revert this change and use the original getInputAccessOf function, all tests still pass. This and the use where we check for Val to be a constant are the only uses of VirtualUse. If we drop them, this patch would become more targeted.

2042

If I add return makeEmptyUnionMap(); all tests still pass.

2055

If I add return makeEmptyUnionMap(); all tests still pass.

2102

Not needed without virtual Statement.

2235

Not needed without VirtualStmt.

2255

Not needed without VirtualStmt.

unittests/DeLICM/DeLICMTest.cpp
127

I assume we do not need this commented code to be committed.

Meinersbur marked 4 inline comments as done.Mar 23 2017, 10:03 AM

thank you for pushing this forward. This is again a pretty large patch, even though the kind of different changes are now a lot smaller. Still, for proper and timely review, it would really help to get these patches smaller and upstream as much as possible independently. I did a first round over this patch, and trying to understand which parts are needed to get the tests working that you contributed. It seems indeed that e.g. VirtualUse is not really needed

See inline comment.

and to my surprise also the computeKnownFromWrite / computeKnownFromRead functions do not seem to be needed.

That was my laziness by making minimal changes to the reduction.ll test case to make it work. By doing this, only a particular kind of conflict occurs, namely conflicts with previously mapped values. This is not typical for GVE PRE-optimized code. I'll come up with better ones with the next update.

Michael

include/polly/Support/VirtualInstruction.h
38

It is pronounced "El El Vee Em", hence the article is correct.

Also compare
https://www.google.com/search?q="An+LLVM"
vs
https://www.google.com/search?q="A+LLVM"

lib/Transform/DeLICM.cpp
1240

You may have missed that the "VirtualUse::InterValue" case breaks as well because it is the longest and "main" case, to avoid handling it in a switch case. All other cases return, that is, for the two test cases we need to distinguish at least these two and if its none of them (to return "value unknown"), makes three cases.

The other cases are not that interesting, so no test case for those yet. It doesn't come up because the test cases were crafted that way. For instance, VirtualUse::IntraValue never occurs because there are never two instructions depending on each other in a single basic block. The same computational pattern is employed with only one instruction, so the test case uses the smaller one.

Meinersbur updated this revision to Diff 92967.Mar 24 2017, 9:45 AM
  • Not all synthesizable values are affine (e.g. undef)
  • Not all affine values are synthezizable (e.g. load-hoisted values)
  • Addresses some of Tobias' comments.
  • More test cases.
  • Remove computeKnownFromRead, seems not to be useful here.

Note this currently miscompiles External\SPEC\CINT2006\401.bzip2, looking into it.

401.bzip2 fails even without its patch, look similar to http://llvm.org/PR32251 .

Hi Michael,

thanks for upstreaming already so much of this patch. I believe after this patch is rebased onto trunk, the diff should be small enough to review it as a whole.

Best,
Tobias

include/polly/Support/VirtualInstruction.h
38

You are right

Meinersbur updated this revision to Diff 97972.May 5 2017, 9:30 AM

Rebase to r302234

grosser accepted this revision.May 5 2017, 2:32 PM

Hi Michael,

this looks good to me. I just have a couple of minor typos. I also tried this of some of the polybench kernels, but it seems not yet enough to make them really faster. The one thing we need for a simple gemm kernel is -polly-delicm-overapproximate-writes=true, the second is an option to remove redundant writes. I assume these will be handled in later changes, so no need to block this patch.

Best,
Tobias

lib/Transform/DeLICM.cpp
119

A synthesizable

1395

uses.

1428

no need for comma here.

test/DeLICM/reduction_looprotate.ll
14

to DO in

This revision is now accepted and ready to land.May 5 2017, 2:32 PM
This revision was automatically updated to reflect the committed changes.
Meinersbur marked 4 inline comments as done.

Hi Michael,

this looks good to me. I just have a couple of minor typos. I also tried this of some of the polybench kernels, but it seems not yet enough to make them really faster. The one thing we need for a simple gemm kernel is -polly-delicm-overapproximate-writes=true, the second is an option to remove redundant writes. I assume these will be handled in later changes, so no need to block this patch.

For really being effective, we also need partial writes (MemoryAccesses that do not write in an entire's statement domain). This makes -polly-delicm-overapproximate-writes=true kind of redundant.

Depending on the benchmark, an extension determining that a PHINode is identical to one of their incoming values could be useful as well.

About the redundant writes, I still have to think about to do it correctly. the PHINode extension could be helpful.

Cool. Maybe let's pull out this discussion from the review thread to the ML. This seems to be go beyond what this patch coveres.