This is an archive of the discontinued LLVM Phabricator instance.

[Polly] De-LICM and De-GVN (WIP)
AbandonedPublic

Authored by Meinersbur on Sep 18 2015, 10:28 AM.

Details

Summary

Undo the effects of LICM on loop dependencies. This is done by mapping the scalars such as those that are created by LICM and GVN-PRE back to the array elements they were promoted from.

Exec time results (with -polly-position=before-vectorizer)

3mm -71.63%
2mm -71.56%
gemm -71.03%
syrk -44.84%
doitgen -43.18%
syr2k -42.73%
mvt -28.57%
gemver -26.32%
gramschmidt -12.71%

bmm 2.03%
CrystalMk 3.14%
FloatMM 4.35%
cholesky 73.39%
trmm 226.78%

Do not try to apply this patch directly. It requires some other patches to make this patch smaller and that I am going to commit/upload independently. Get the latest from my Github branch.

Diff Detail

Event Timeline

Meinersbur retitled this revision from to [Polly] De-LICM (WIP).
Meinersbur updated this object.
Meinersbur added reviewers: grosser, jdoerfert.
Meinersbur added a project: Restricted Project.
Meinersbur added subscribers: llvm-commits, pollydev.

Current status:
Works fine on the licm_reduction.ll test case, but has one redundant MustWriteAccess which already has been there before.
None of the other tests or code generation has been worked on yet.
Currently only code for the case that the value is written back to the array. Hoisting of reads must be handled with additional code.
There are many operations based on DenseSets of edges, but I I don't know how to check lifetime overlaps more efficiently.

Meinersbur updated this revision to Diff 35518.Sep 23 2015, 9:52 AM

WIP update:

Implemented CodeGeneration part and inter alia.

"De-LICM" tries to map emulated scalar to an array element that is not used because it will be overwritten unconditionally (no-use zone). BBs will read and write to this mapped location instead of space from a dedicated Alloca. Array element locations are represented using the StoreInst that writes to this. For this reason De-LICM of LoadInst-only zones are currently unsupported.

BBs are modeled s.t. the array elements are loaded into scalars before its instructions and written back after them if the value is modified. This is to avoid to track live values within basic blocks. Live values are tracked between BBs. Two scalars conflict (cannot be mapped to the same array element) if they would write different scalars at the end of the block. The algorithm is greedy, i.e. values that are first encountered to not conflict with any other previously mapped lifetimes are used.

Afterwards memory accesses are created unless they are identified to be redundant. Reads are created on-the-fly while mapped writes are added at the end. As a result, explicit LoadInst/StoreInst in the BB may have no corresponding MemoryAccess.

Dirty code-disclaimer: auto declarations, unused variables, outcommented or dead code, implementations in header file, incomplete merge with r247928 (Multi-dimensional accesses with bitcasts), ignored options, 64 failed unit tests (of which 3 are crashes), etc to be corrected.

Some changes could be committed separately:

  • There is a map that stores whether there is already a MemoryAccess for a given scalar to avoid adding redundant accesses.
  • Explicit MemoryAccess types (AccessOrigin in the code): Explicit (from LoadInst/StoreInst), Scalar (.s2a) and PHI (.phiops)
  • Dedicated addMemoryAccess-equivalents for each of the types.

Intended but not yet implemented:

  • Have separate lists of MemoryAccesses for leading reloads and trailing stores of scalars. Would restore the 1-to-1 relationship of instructions and explicit MemoryAccesses.
  • Remove printing of former TempScopInfo's IRAccesses and change the test cases to check of output of scop->dump() instead.
Meinersbur updated this revision to Diff 38021.Oct 21 2015, 8:54 AM

Mostly rebase to D13762

Meinersbur updated this revision to Diff 38133.Oct 22 2015, 8:43 AM

Pass unit tests

grosser edited edge metadata.Nov 23 2015, 10:18 PM

Hi Michael,

I wonder what the current status of this patch is. As your patch has probably evolved a lot since this original submission, it would be great if you could push an updated version for people to have a look and to provide feedback on how to merge this code or algorithm best.

Best,
Tobias

Meinersbur edited edge metadata.

Update, based on r251962

Meinersbur retitled this revision from [Polly] De-LICM (WIP) to [Polly] De-LICM and De-GVN (WIP).Nov 25 2015, 12:22 PM
Meinersbur updated this object.

Rebase to r254343

Meinersbur updated this revision to Diff 48950.Feb 24 2016, 9:01 AM
Meinersbur updated this object.
Meinersbur added reviewers: etherzhhb, sebpop, zinob.

Rebase; Fix all LNT tests except llvm.org/PR26718

This is not yet a clean patch.

TODOs:

  • Change analysis/transformation to work with exiting MemoryAccesses (instead of modifying their creation) in order to benefit from changes in D13611.
  • Per-edge MemoryAccesses required for many tests cases to work properly. D15722 might be useful.
  • Add systematic unit tests
etherzhhb edited edge metadata.Feb 26 2016, 3:53 AM

Can we put the algorithm to a ScopPass or another cpp file?

include/polly/CodeGen/BlockGenerators.h
134–136

Comments to explain what is going on here? (because the dirty code-disclaimer?)

362

Also talk about @param NewAccesses? (because the dirty code-disclaimer?)

381

Here too

496–499

Comments to explain what is the different between this one and the above one?

isl_id_to_ast_expr *NewAccesses should be an __isl_keep?

826

@param NewAccesses

include/polly/ScopInfo.h
74

The MemAccInst here can only be StoreInst?

110

will the same statement map to multiple values?

2295

It would be good to put explanation for the difference between Inst and AddrAlias.

2348

Stored at the location indicated by @param Addr?

2399

Can we move this to a ScopPass? Or move the algorithm implementation to another cpp file?

lib/Analysis/ScopInfo.cpp
3839

[Unrelated to this patch] Can we put a early return here to reduce the indent levels?

if (!(isa<GetElementPtrInst>(Address) || isa<BitCastInst>(Address)))
  return nullptr;

...
Meinersbur marked an inline comment as done.Feb 26 2016, 8:16 AM

Thanks for your first look.

Can we put the algorithm to a ScopPass or another cpp file?

Explained inline.

include/polly/CodeGen/BlockGenerators.h
362

Before the code is stable enough I don't bother to keep docs up-to-date that don't explain anything new.

Generally I think, we should not propagate such values by arguments through a lot of calls. The doxygen comments get very repetitive. Passing using as class member would be better. If the value is only available a limited time (here: during copyStmt), we could create a class that is valid during this period.

include/polly/ScopInfo.h
74

Kind of.

Yes in contexts of mapping targets. Altough it is only the result of the implementation in greedyCollapse that only StoreInsts are chosen.

No in context where we check if a LoadInst would read the same value, either to see that the value is still used, or whether the LoadInst would be redundant.

110

No for OutgoingValueMapTy. There is a OutgoingValueMapTy per mapping location and only one value can be mapped to that location at a Stamt's exit.

2348

Yes.

2399

I already suggested this for invariant load hoisting, but I don't remember what Johannes' answer was.

It does not make sense for the algorithm in this implementation. We'd need to split ScopInfo into two parts: before and MemoryAccess creation. A ScopInfo is not useful for outsiders without its MemoryAccesses.

It may become feasible when I move the implementation that modifies exiting MemoryAccesses. If we want such a split of passes, we should also split: Invariant load hoisting, D13611, and this.

I need to read this again tomorrow

sebpop resigned from this revision.Sep 19 2016, 1:20 PM
sebpop removed a reviewer: sebpop.
Meinersbur abandoned this revision.Mar 7 2017, 1:25 PM