This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/polly/
-
polly/
-
LinkAllPasses.h
-
lib/
-
CMakeLists.txt
-
Support/
2/3
RegisterPasses.cpp
-
Transform/
50/59
MaximalStaticExpansion.cpp
-
test/MaximalStaticExpansion/
-
MaximalStaticExpansion/
1
read_from_original.ll
-
too_many_writes.ll
1/1
working_expansion.ll

Differential D34982

[Polly][WIP] Fully-Indexed static expansion
ClosedPublic

Authored by niosega on Jul 4 2017, 7:18 AM.

Download Raw Diff

Details

Reviewers

simbuerg
Meinersbur
bollu

Commits

rG81fb6b3e4085: [Polly] Fully-Indexed static expansion
rPLO310304: [Polly] Fully-Indexed static expansion
rL310304: [Polly] Fully-Indexed static expansion

Summary

The idea of this patch is to implement a mechanism of fully-indexed static expansion.

The goal of this patch is to be able to expand every memory access to a fully indexed one. For example from this original source code :

 for(int i = 0; i<Ni; i++)
   for(int j = 0; j<Ni; j++)
S:     B[j] = j;
T: A[i] = B[i]

After the pass, we want this :

 for(int i = 0; i<Ni; i++)
   for(int j = 0; j<Ni; j++)
S:     B[i, j] = j;
T: A[i] = B[i, i]

We are, for now, unable to apply expansion on some cases :

Scalar access
Multiple writes per SAI
MayWrite Access
Expansion that leads to an access to the original array

Diff Detail

Event Timeline

niosega created this revision.Jul 4 2017, 7:18 AM

niosega created this object with visibility "Custom Policy".

niosega created this object with edit policy "Custom Policy".

niosega edited the summary of this revision. (Show Details)Jul 5 2017, 2:04 AM

niosega added a reviewer: Meinersbur.

niosega changed the visibility from "Custom Policy" to "Public (No Login Required)".

niosega changed the edit policy from "Custom Policy" to "All Users".

niosega added subscribers: pollydev, llvm-commits.

niosega added inline comments.Jul 5 2017, 2:10 AM

lib/Transform/MaximalStaticExpansion.cpp
68	By reading and trying to understand the work done in DeLICM and JSONImporter::importAccesses, I had begin the implementation of my pass. But I am stuck in building the new map. For example, if we have the following c code : for(int i = 0; i<Ni; i++) for(int j = 0; j<Ni; j++) S: tmp = A[N+i]tmp; After the pass, we want this : for(int i = 0; i<Ni; i++) for(int j = 0; j<Ni; j++) T: tmp[i, j] = A[N+i]tmp; This means that I want to transform the map of S, [N] -> {S[i,j] -> tmp}, to the map of T, [N] -> {S[i,j] -> tmp[i, j]}. But i was not able to find a way to do that. Anybody has an idea on how to proceed ?
test/MaximalStaticExpansion/no_optim.ll
1 ↗	(On Diff #105173)	I try to create a test case with the folowing c code : #define Ni 2000 double mse(double A[Ni], int N) { int i; double tmp = 2; for (i = 0; i < Ni; i++) { for (int j = 0; j<Ni; j++) { tmp = A[N+i]*tmp; } } return tmp; } I use the following commands to generate the IR : clang -O2 -S -emit-llvm pure_c_main.c opt -S -mem2reg -polly-scops -polly-export-jscop pure_c_main.ll and this command to detect the SCOP : opt -polly-process-unprofitable -polly-remarks-minimal -polly-use-llvm-names -polly-scops -analyze -polly-export-jscop no_optim.ll When I use this commands, polly detects scop but the IR is optimize due to the O2 in clang. I'd like to have a non optimize version of the IR so that the IR is directly linked to the c code. But when I remove the -02, polly does not detect any scop. Anybody has an idea why ?

Fully-Indexed expansion of the write accesses :

Build the new access map from the current access map
Create a new SAI for the expanded version of the access array or scalar
Modify the memory access to the new SAI

Seems to be working for both scalars and non-scalars access.

For now, the test is only based on dump comparison because the code is broken : the reads are still to the old SAI.

Herald added a reviewer: bollu. · View Herald TranscriptJul 12 2017, 12:28 PM

Herald added a subscriber: mgorny. · View Herald Transcript

Meinersbur added inline comments.Jul 13 2017, 3:40 AM

lib/Support/RegisterPasses.cpp
336–337	Suggestion: Put it behind DeadCodeElimination? It uses DependenceInfo and should have the same results even before MSE.
lib/Transform/MaximalStaticExpansion.cpp
2	Remove `- Expand the memory access`. No other header does this.
13	To be consistent with the other files, add an empty line before `//===------`

Meinersbur added inline comments.Jul 13 2017, 3:40 AM

lib/Transform/MaximalStaticExpansion.cpp
41	Typo: transformations
86	Check for whther this returns `nullptr`. At least an assertion. Suggestion: Get the domain sizes from the writing statement's domain.
106–109	`isl_dim_in` and `isl_dim_out` dimensions have no dim_id, so this is not necessary.
112	At this point, in the regression test, we have { Stmt_for_cond1_preheader[i0] -> MemRef_tmp_11__phi1_expanded[o0] } (i0 and o0 are unconnected) That is, every statement instance accesses all array elements. It should be something like: { Stmt_for_cond1_preheader[i0] -> MemRef_tmp_11__phi1_expanded[i0] }
136	Use inline declarations: bool CorrectWrite = expandWrite(S, MA);
138	We cannot return early with transformations only partially applied. The SCoP representation will be inconsistent and at best passes after this will crash, at worst we miscompile. Please check in advance if a transformation can be applied before trying to apply it. In this case, you may want to check to each `ScopArrayInfo` whether it is applicable. The class `ScalarDefUseChains` in DeLICM.cpp might be helpful to get the accesses of a SAI. I think we will sooner-or-later integrate it into ScopInfo.
148	`runOnScop` returns true if and only if the IR has been modified. We only modify the SCoP representation, therefore we only return `false`.
152	In `printScop`, you must print to `OS`. It will print to `stdout` instead to `stderr`. This should also make the regression test simpler.
156	You will need to add `AU.addRequired<DependenceInfo>();` here at some point.
test/MaximalStaticExpansion/mse___%for.cond1.preheader---%for.end8.jscop
1 ↗	(On Diff #106279)	What is this file needed for?

In this revision, we have done :

Use of c++ bindings for isl instead of direct isl
Implementation of the reads expansion
Add constraint for the write map so that the in dims are link to the out dims
Remove useless JSCoP file
Create a new expanded SAI for each statement
Add a check method before doing the expansion to avoid partial expansion

Problems :

Do I have to expand already fully-indexed array (for example, the first write of B in the test case "too-many-writes.ll" ) ?

For now, we can not expand :

Scalar access
MayWrite access
SAI with more than one write
SAI with read that can cause partial read access (because polly can not handle partial read access)

niosega marked 6 inline comments as done.Jul 20 2017, 7:52 AM

[Suggestion] Add a regression test where generated LLVM-IR is tested.

include/polly/ScopInfo.h
840–842 ↗	(On Diff #107505)	[Style] Why was this moved?
lib/Transform/MaximalStaticExpansion.cpp
48	[Style] Instead of a set of not expandable arrays, why not using a set of expandable arrays? It feels safer because if an array is just missing in the set for whatever reason, it would not default to expand it. [Style] `std::set` is rarely used in LLVM. There are alternitives: See http://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallset-h .
52	[Typo] checkExpandability
60–62	[Style] Please make them doxygen comments (`///`) The doxygen style for parameters we usually use is. @param Dependences The RAW dependences of the SCoP @p S.
78	[Style] Negation in variable name and double negation. Why not `bool Expandable = true`?
125–130	[Style] `NumerElementMap = isl_union_map_n_map(CurrentReadWriteDependences.get())`
144–145	[Serious] In normal operation, do not print anything. Users expect only clang warnings and errors. Alternatives are: `DEBUG(dbgs() << "")` (discurraged for regression tests, this is meant for debugging) Print remarks using `-pass-remarks-missed` Print information at `-analysis` `STATISTIC`
215–219	[Serious] This is not a sufficient condition for full expansion. E.g. one dimension can be a static `0`. There is also more than one possibility for full expansion (e.g. one starting at 0, another at 1). The reads must access the correct element. So if you want to implement this as a heuristic that expansion is not worth it in this case, implement it in `checkExpandability` such that reads are also not modified.
250–251	[Serious] I think `getMaxBackedgeTakenCount` can fail in cases where Polly is still able to detect affine loops. Please add at least an `assert` that SCEV is not `SCEVCountNotCompute`.
256	[Suggestion] Should this be `getLatestScopArrayInfo()` ?
298–299	[Remark] The structure I had in mind was for (array : S.arrays()) { if (!checkExpendability(array)) continue; for (ScopStmt &Stmt : S) for (MemoryAccess *MA : Stmt) { if (MA->isRead()) expandRead(MA); if (MA->isWrite()) expandRead(MA); } } This does not need a `NotExpandable` set, but has worse asymptotic runtime. So I guess your version has an advantage.
303–305	[Style] We usually do not use parenthesis around single statements: if (isExpandable(SAI)) continue;
314	[Style] In Polly's coding style, all sentences end with a dot (but I personally don't care).

Thank you for this mostly working version. I hope my comments are not too daunting.

lib/Transform/MaximalStaticExpansion.cpp
91	[Suggestion] One could skip the check for the current array once it is known to be unexpandable and continue with the next one.
189–200	[Suggestion] Instead of searching for the correct id, you could derive the name as in `expandWrite` and look it up in `ScopArrayNameMap`. [Serious] What if the `Id` is not found? Please add an assertion for that case.
251	[Suggestion] Or use `ScalarEvolution::getAddExpr(SCEV, ScalarEvolution::getConstant(1))` [Serious] I don't see where +1 is added to the ISL version.
test/MaximalStaticExpansion/working_expansion.ll
25	Shouldn't this be MemRef_B_Stmt_for_body3_expanded[2000][3000] ?

niosega added inline comments.Jul 21 2017, 9:18 AM

include/polly/ScopInfo.h
840–842 ↗	(On Diff #107505)	This was moved from private to public section because I need this method to find the expanded SAI name during read expansion. But if I use the solution you suggest, I will not need it anymore.
lib/Transform/MaximalStaticExpansion.cpp
91	That is what I wanted to do in the beginning. But I didn't find a clean solution to "escape" from the two innermost loops and go to next iteration of the loop that iterate over the SAI
251	We discuss during the last phone call about an other solution that involves methods that are in FlattenAlgo.cpp to get the boundary of the loop iterations variables. That's what I meant by "ISL version". But for now, I can not access the methods from FlattenAlgo.cpp because they are defined inside an unamed namespace.
298–299	If I am not wrong, we must first expand all writes before expanding the reads. Because otherwise problems can happened during trying to find expanded SAI during read expansion.

Meinersbur added inline comments.Jul 21 2017, 2:03 PM

include/polly/ScopInfo.h
840–842 ↗	(On Diff #107505)	ok.
lib/Transform/MaximalStaticExpansion.cpp
91	A possible solution is to refactor the inner two loops to a function, which can `return true/false` for a `ScopArrayInfo` at any point. bool isExpandable(SAI) { for (ScopStmt &Stmt : S) for (MemoryAccess *MA : Stmt) { if (MA->isMayWrite()) return false; ... } } for (auto &SAI : S.arrays()) { if (!isExpandable()) NotExpandables.insert(SAI); }
251	You may need to modify them anyway, so don't hesitate to copy them over, especially for a prototype. If later we find they share a significant amount of code, we can find a common file for them.
298–299	Let me refine what I some time ago I had in mind. for (array : S.arrays()) { MemoryAccess TheWrite = nullptr; List<MemoryAccess> AllReads; if (!isExpandable(array, TheWrite, AllReads)) continue; assert(TheWrite); assert(AllReads.size() > 0); ScopArrayInfo ExpandedArray = expandWrite(TheWrite); for (MemoryAccess *MA : AllReads) expandRead(MA, ExpandedArray); }

niosega marked 23 inline comments as done.Jul 24 2017, 12:18 PM

Take into account remarks from Michael.

Change the structure of the expansion. Now iterate over SAI of the Scop.
Get the boundaries of the loop iterations variables with ISL.
Style modifications.

Implementation for remarks is in place but not working. One test case is broken due to change in structure. But the output seems to be correct. I will correct it as soon as possible.

Remove debug if condition.

Meinersbur added inline comments.Jul 25 2017, 8:49 AM

lib/Transform/MaximalStaticExpansion.cpp
194	I get a compile error here. `MA->getAccessRelation()` has been updated to use C++ object. Please update to Polly trunk.
340	[Suggestion] Pass string as `llvm::StringRef` (or `const std::string &` to avoid a copy)
test/MaximalStaticExpansion/partial_access.ll
1 ↗	(On Diff #107946)	[Style] Please remove trailing whitespace.

simbuerg added inline comments.Jul 25 2017, 9:17 AM

lib/Transform/MaximalStaticExpansion.cpp
182	Reminder: These should/will become diagnostics
253	As far as I remember, this will fail, if there are more than one map in the union_map. I would check that with at least an assert.
259	Where do you use this name?
315	This takes a const string &, no need to go over the c_str().
343	Why 'AssumpRestrict'?

niosega marked 7 inline comments as done.Jul 26 2017, 4:01 AM

Take into account Michael and Andreas comments.

Diagnostic still does not work.

Emit remarks instead of stderr printing. Test case works.

To have any effect, we need to clear the DependenceInfo, otherwise -polly-opt-isl will still use the unexpanded dependence info and the mse was useless.

DependenceInfo has no facility yet to reset a previous analysis. You might want want to add one into this patch or a follow-up one.

lib/Support/RegisterPasses.cpp
153	[Suggestion] To be consistent with other switches that add passes, name this `-polly-enable-mse` and the pass itself `-polly-mse` (instead of `-polly-opt-mse`)?
lib/Transform/MaximalStaticExpansion.cpp
68	[Style] Use `SmallPtrSetImpl<MemoryAccess *>` to not require the small size in the parameter.
188	[Typo] extand -> expand
219	The consequence would not be a partial read access, but it would need to read the original value the element had before entering the SCoP. That's a special case similar to having more than one write.
308	[Nit] The UpperBound could overflow a long. Add an assertion for that?
329–332	[Style] This could be simpler using NewAccessMap = NewAccessMap->equate(isl::dim::in, dim. isl::dim::out, dim); or, even, better, use `basic_map::equal`.
348	[Nit] `OptimizationRemarkEmitterWrapperPass`
358–360	[Style] No braces around single statements. Also possible: SmallPtrSet<ScopArrayInfo *, 4> CurrentSAI(S.array_begin(), array_end());
test/MaximalStaticExpansion/partial_access.ll
1 ↗	(On Diff #108256)	The interleaving of stdout (`-analyze`) and stderr (`-pass-remarks-analysis`) is undefined. It is better to have two separate RUN lines, one checking `analyze`, the other `-pass-remarks-analysis`.

This revision is now accepted and ready to land.Jul 26 2017, 7:10 AM

niosega marked 8 inline comments as done.Jul 26 2017, 8:01 AM

niosega added inline comments.

lib/Transform/MaximalStaticExpansion.cpp
219	Are you sure ? Because if I remember well. Let say that we are analyzing this code : for (i = 0; i < Ni; i++) { B[j] = i; for (int j = 0; j<Nj; j++) { B[j] = j; } A[i] = B[i]; When I try to set the new access relation for the B read, the setNewAccessRelation method of class MemoryAccess failed with the assert "Partial READ accesses not supported".
308	How can I efficiently check that there is an overflow ?
329–332	I'd like to use isl_basic_map_equal but I did not find the documentation of this method on the online isl doc. There is also no example of uses in Polly. Can you explain me how it works ?

Meinersbur added inline comments.Jul 27 2017, 4:44 AM

lib/Transform/MaximalStaticExpansion.cpp
219	How were you able to do that? setNewAccessRelation accepts only an isl_map, not isl_union_map. Let me explain in more detail. for (int k = 0; k < M; k+=1) { for (int i = 0; i<= N/2; i+=1) { S: B[i] = i; } for (int i = 0; i<N; i+=1) { T: = ... B[i] } } The flow dependence would look something like: { T[k, i] -> S[k, i] : 0 < i <= N/2 } We could naively expand B to B_expanded: for (int k = 0; k < M; k+=1) { for (int i = 0; i<= N/2; i+=1) { S: B_expanded[k][i] = i; } for (int i = 0; i<N; i+=1) { T: = ... B_expanded[k][i] } } The problem here is that `B_expanded[k][i]` for i > N/2 never gets written (And T would read uninitialized memory). The flow dependence doesn't tell which instance of S wrote it in the first place!! If you try to apply it naively anyway using setNewAccessRelation, we need a source of the value for all instances of S, but we don't have one for `i > N/2`! This is way partial read accesses are unsupported. The correct thing to do would be to read the value from the original array B (which then becomes read-only). { T[k,i] -> B_expanded[k,i] : i < N/2; T[k,i] -> B[i] : i>=N/2 } This again is an isl_union_map (NOT a partial access since it is defined for all instances of T), which we currently do not support support. Please try to understand what the problem with partial read accesses is. Not the partial read accesses are the problem, but the reason why you would want to use one.
308	`UpperBound.le(INT_MAX)` (I think there is no implicit conversion from int to isl::val, but you gget the idea)
329–332	isl_map_space(SpaceMap.copy(), SpaceMap.dim(isl::in)) should get you a basic_map of that space where the `n_equal = SpaceMap.dim(isl::in)` in- and out- dimensions are equal. Something like. { Stmt[i0, i1] -> MemRef[o0, o1] : i0 = o0 and i1 = o1 } However, no documention could mean that the function was not intended to be public.

Take Michael comments into account.

Meinersbur added inline comments.Jul 27 2017, 8:57 AM

lib/Support/RegisterPasses.cpp
153	I'm ok with the switch name, but doesn't the e in "mse" already stand for "expansion" (therefore expand-mse is short for "expand-maximal-array-expansion")
lib/Transform/MaximalStaticExpansion.cpp
308–309	This assertion fails on Windows (and 32 bit platforms): The `isl::val` constructor takes a `long`, which is 32 bit this platforms. UINT_MAX exceeds its range.
310	It has been tested for the range of an `unsigned int`.
311	If `UpperBound.get_num_si()` is `UINT_MAX`, you get an overflow when adding +1.
test/MaximalStaticExpansion/read_from_original.ll
2	Nice new testcase!

niosega added inline comments.Jul 29 2017, 2:03 PM

lib/Transform/MaximalStaticExpansion.cpp
308–309	It's a mistake from my side to compare it with UINT_MAX. If I replace std::numeric_limits<unsigned>::max() with std::numeric_limits<long>::max() it should work on every platform, right ?

Meinersbur added inline comments.Jul 30 2017, 11:02 AM

lib/Transform/MaximalStaticExpansion.cpp
308–309	Except that it is stored into a `std::vector<unsigned>`. Storing a `long` into as an `unsigned int` may get you another overflow. My suggestion is to stick with the lowest common maximum: `std::numeric_limits<int>::max()`. I don't think you would want to allocate memory larger than that anyway.

Take into account Michaels comments.
Update setNewAccessRelation call (isl::map as parameter instead of isl map * )

niosega edited the summary of this revision. (Show Details)Aug 3 2017, 3:20 PM

niosega edited the summary of this revision. (Show Details)Aug 3 2017, 3:23 PM

LGTM.

Andreas, do you want to commit?

lib/Transform/MaximalStaticExpansion.cpp
308	Why `- 1`?

Closed by commit rL310304: [Polly] Fully-Indexed static expansion (authored by simbuerg). · Explain WhyAug 7 2017, 1:55 PM

This revision was automatically updated to reflect the committed changes.

niosega mentioned this in D36647: [Polly][WIP] Scalar fully indexed expansion.Aug 12 2017, 1:50 PM

simbuerg mentioned this in rL311619: [Polly][WIP] Scalar fully indexed expansion.Aug 23 2017, 5:05 PM

Revision Contents

Path

Size

include/

polly/

LinkAllPasses.h

3 lines

lib/

CMakeLists.txt

1 line

Support/

RegisterPasses.cpp

9 lines

Transform/

MaximalStaticExpansion.cpp

394 lines

test/

MaximalStaticExpansion/

read_from_original.ll

105 lines

too_many_writes.ll

111 lines

working_expansion.ll

101 lines

Diff 109640

include/polly/LinkAllPasses.h

Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
llvm::Pass *createCodeGenerationPass();		llvm::Pass *createCodeGenerationPass();
#ifdef GPU_CODEGEN		#ifdef GPU_CODEGEN
llvm::Pass *createPPCGCodeGenerationPass(GPUArch Arch = GPUArch::NVPTX64,		llvm::Pass *createPPCGCodeGenerationPass(GPUArch Arch = GPUArch::NVPTX64,
GPURuntime Runtime = GPURuntime::CUDA);		GPURuntime Runtime = GPURuntime::CUDA);
#endif		#endif
llvm::Pass *createIslScheduleOptimizerPass();		llvm::Pass *createIslScheduleOptimizerPass();
llvm::Pass *createFlattenSchedulePass();		llvm::Pass *createFlattenSchedulePass();
llvm::Pass *createDeLICMPass();		llvm::Pass *createDeLICMPass();
		llvm::Pass *createMaximalStaticExpansionPass();

extern char &CodePreparationID;		extern char &CodePreparationID;
} // namespace polly		} // namespace polly

namespace {		namespace {
struct PollyForcePassLinking {		struct PollyForcePassLinking {
PollyForcePassLinking() {		PollyForcePassLinking() {
// We must reference the passes in such a way that compilers will not		// We must reference the passes in such a way that compilers will not
Show All 17 Lines	PollyForcePassLinking() {
polly::createPollyCanonicalizePass();		polly::createPollyCanonicalizePass();
polly::createPolyhedralInfoPass();		polly::createPolyhedralInfoPass();
polly::createIslAstInfoWrapperPassPass();		polly::createIslAstInfoWrapperPassPass();
polly::createCodeGenerationPass();		polly::createCodeGenerationPass();
#ifdef GPU_CODEGEN		#ifdef GPU_CODEGEN
polly::createPPCGCodeGenerationPass();		polly::createPPCGCodeGenerationPass();
#endif		#endif
polly::createIslScheduleOptimizerPass();		polly::createIslScheduleOptimizerPass();
		polly::createMaximalStaticExpansionPass();
polly::createFlattenSchedulePass();		polly::createFlattenSchedulePass();
polly::createDeLICMPass();		polly::createDeLICMPass();
polly::createDumpModulePass("", true);		polly::createDumpModulePass("", true);
polly::createSimplifyPass();		polly::createSimplifyPass();
polly::createPruneUnprofitablePass();		polly::createPruneUnprofitablePass();
}		}
} PollyForcePassLinking; // Force link by creating a global definition.		} PollyForcePassLinking; // Force link by creating a global definition.
} // namespace		} // namespace

namespace llvm {		namespace llvm {
class PassRegistry;		class PassRegistry;
void initializeCodePreparationPass(llvm::PassRegistry &);		void initializeCodePreparationPass(llvm::PassRegistry &);
void initializeDeadCodeElimPass(llvm::PassRegistry &);		void initializeDeadCodeElimPass(llvm::PassRegistry &);
void initializeJSONExporterPass(llvm::PassRegistry &);		void initializeJSONExporterPass(llvm::PassRegistry &);
void initializeJSONImporterPass(llvm::PassRegistry &);		void initializeJSONImporterPass(llvm::PassRegistry &);
void initializeIslAstInfoWrapperPassPass(llvm::PassRegistry &);		void initializeIslAstInfoWrapperPassPass(llvm::PassRegistry &);
void initializeCodeGenerationPass(llvm::PassRegistry &);		void initializeCodeGenerationPass(llvm::PassRegistry &);
#ifdef GPU_CODEGEN		#ifdef GPU_CODEGEN
void initializePPCGCodeGenerationPass(llvm::PassRegistry &);		void initializePPCGCodeGenerationPass(llvm::PassRegistry &);
#endif		#endif
void initializeIslScheduleOptimizerPass(llvm::PassRegistry &);		void initializeIslScheduleOptimizerPass(llvm::PassRegistry &);
		void initializeMaximalStaticExpanderPass(llvm::PassRegistry &);
void initializePollyCanonicalizePass(llvm::PassRegistry &);		void initializePollyCanonicalizePass(llvm::PassRegistry &);
void initializeFlattenSchedulePass(llvm::PassRegistry &);		void initializeFlattenSchedulePass(llvm::PassRegistry &);
void initializeDeLICMPass(llvm::PassRegistry &);		void initializeDeLICMPass(llvm::PassRegistry &);
} // namespace llvm		} // namespace llvm

#endif		#endif

lib/CMakeLists.txt

Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	add_library(PollyCore OBJECT
Transform/CodePreparation.cpp		Transform/CodePreparation.cpp
Transform/DeadCodeElimination.cpp		Transform/DeadCodeElimination.cpp
Transform/ScheduleOptimizer.cpp		Transform/ScheduleOptimizer.cpp
Transform/FlattenSchedule.cpp		Transform/FlattenSchedule.cpp
Transform/FlattenAlgo.cpp		Transform/FlattenAlgo.cpp
Transform/ForwardOpTree.cpp		Transform/ForwardOpTree.cpp
Transform/DeLICM.cpp		Transform/DeLICM.cpp
Transform/Simplify.cpp		Transform/Simplify.cpp
		Transform/MaximalStaticExpansion.cpp
${POLLY_HEADER_FILES}		${POLLY_HEADER_FILES}
)		)
set_target_properties(PollyCore PROPERTIES FOLDER "Polly")		set_target_properties(PollyCore PROPERTIES FOLDER "Polly")

# Create the library that can be linked into LLVM's tools and Polly's unittests.		# Create the library that can be linked into LLVM's tools and Polly's unittests.
# It depends on all library it needs, such that with		# It depends on all library it needs, such that with
# LLVM_POLLY_LINK_INTO_TOOLS=ON, its dependencies like PollyISL are linked as		# LLVM_POLLY_LINK_INTO_TOOLS=ON, its dependencies like PollyISL are linked as
# well.		# well.
▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

lib/Support/RegisterPasses.cpp

Show First 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	static cl::opt<polly::VectorizerChoice, true> Vectorizer(
cl::location(PollyVectorizerChoice), cl::init(polly::VECTORIZER_NONE),		cl::location(PollyVectorizerChoice), cl::init(polly::VECTORIZER_NONE),
cl::ZeroOrMore, cl::cat(PollyCategory));		cl::ZeroOrMore, cl::cat(PollyCategory));

static cl::opt<bool> ImportJScop(		static cl::opt<bool> ImportJScop(
"polly-import",		"polly-import",
cl::desc("Import the polyhedral description of the detected Scops"),		cl::desc("Import the polyhedral description of the detected Scops"),
cl::Hidden, cl::init(false), cl::ZeroOrMore, cl::cat(PollyCategory));		cl::Hidden, cl::init(false), cl::ZeroOrMore, cl::cat(PollyCategory));

		static cl::opt<bool> FullyIndexedStaticExpansion(
		"polly-enable-mse",
		MeinersburUnsubmitted Done Reply Inline Actions [Suggestion] To be consistent with other switches that add passes, name this `-polly-enable-mse` and the pass itself `-polly-mse` (instead of `-polly-opt-mse`)? Meinersbur: [Suggestion] To be consistent with other switches that add passes, name this `-polly-enable…
		MeinersburUnsubmitted Not Done Reply Inline Actions I'm ok with the switch name, but doesn't the e in "mse" already stand for "expansion" (therefore expand-mse is short for "expand-maximal-array-expansion") Meinersbur: I'm ok with the switch name, but doesn't the e in "mse" already stand for "expansion"…
		cl::desc("Fully expand the memory accesses of the detected Scops"),
		cl::Hidden, cl::init(false), cl::ZeroOrMore, cl::cat(PollyCategory));

static cl::opt<bool> ExportJScop(		static cl::opt<bool> ExportJScop(
"polly-export",		"polly-export",
cl::desc("Export the polyhedral description of the detected Scops"),		cl::desc("Export the polyhedral description of the detected Scops"),
cl::Hidden, cl::init(false), cl::ZeroOrMore, cl::cat(PollyCategory));		cl::Hidden, cl::init(false), cl::ZeroOrMore, cl::cat(PollyCategory));

static cl::opt<bool> DeadCodeElim("polly-run-dce",		static cl::opt<bool> DeadCodeElim("polly-run-dce",
cl::desc("Run the dead code elimination"),		cl::desc("Run the dead code elimination"),
cl::Hidden, cl::init(false), cl::ZeroOrMore,		cl::Hidden, cl::init(false), cl::ZeroOrMore,
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	#ifdef GPU_CODEGEN
LLVMInitializeNVPTXAsmPrinter();		LLVMInitializeNVPTXAsmPrinter();
#endif		#endif
initializeCodePreparationPass(Registry);		initializeCodePreparationPass(Registry);
initializeDeadCodeElimPass(Registry);		initializeDeadCodeElimPass(Registry);
initializeDependenceInfoPass(Registry);		initializeDependenceInfoPass(Registry);
initializeDependenceInfoWrapperPassPass(Registry);		initializeDependenceInfoWrapperPassPass(Registry);
initializeJSONExporterPass(Registry);		initializeJSONExporterPass(Registry);
initializeJSONImporterPass(Registry);		initializeJSONImporterPass(Registry);
		initializeMaximalStaticExpanderPass(Registry);
initializeIslAstInfoWrapperPassPass(Registry);		initializeIslAstInfoWrapperPassPass(Registry);
initializeIslScheduleOptimizerPass(Registry);		initializeIslScheduleOptimizerPass(Registry);
initializePollyCanonicalizePass(Registry);		initializePollyCanonicalizePass(Registry);
initializePolyhedralInfoPass(Registry);		initializePolyhedralInfoPass(Registry);
initializeScopDetectionWrapperPassPass(Registry);		initializeScopDetectionWrapperPassPass(Registry);
initializeScopInfoRegionPassPass(Registry);		initializeScopInfoRegionPassPass(Registry);
initializeScopInfoWrapperPassPass(Registry);		initializeScopInfoWrapperPassPass(Registry);
initializeCodegenCleanupPass(Registry);		initializeCodegenCleanupPass(Registry);
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	void registerPollyPasses(llvm::legacy::PassManagerBase &PM) {
if (EnableDeLICM)		if (EnableDeLICM)
PM.add(polly::createDeLICMPass());		PM.add(polly::createDeLICMPass());
if (EnableSimplify)		if (EnableSimplify)
PM.add(polly::createSimplifyPass());		PM.add(polly::createSimplifyPass());

if (ImportJScop)		if (ImportJScop)
PM.add(polly::createJSONImporterPass());		PM.add(polly::createJSONImporterPass());

if (DeadCodeElim)		if (DeadCodeElim)
PM.add(polly::createDeadCodeElimPass());		PM.add(polly::createDeadCodeElimPass());
		MeinersburUnsubmitted Done Reply Inline Actions Suggestion: Put it behind DeadCodeElimination? It uses DependenceInfo and should have the same results even before MSE. Meinersbur: Suggestion: Put it behind DeadCodeElimination? It uses DependenceInfo and should have the same…

		if (FullyIndexedStaticExpansion)
		PM.add(polly::createMaximalStaticExpansionPass());

if (EnablePruneUnprofitable)		if (EnablePruneUnprofitable)
PM.add(polly::createPruneUnprofitablePass());		PM.add(polly::createPruneUnprofitablePass());

#ifdef GPU_CODEGEN		#ifdef GPU_CODEGEN
if (Target == TARGET_HYBRID)		if (Target == TARGET_HYBRID)
PM.add(		PM.add(
polly::createPPCGCodeGenerationPass(GPUArchChoice, GPURuntimeChoice));		polly::createPPCGCodeGenerationPass(GPUArchChoice, GPURuntimeChoice));
#endif		#endif
▲ Show 20 Lines • Show All 344 Lines • Show Last 20 Lines

lib/Transform/MaximalStaticExpansion.cpp

This file was added.

				//===---------------- MaximalStaticExpansion.cpp -------------------------===//
				//
				MeinersburUnsubmitted Done Reply Inline Actions Remove `- Expand the memory access`. No other header does this. Meinersbur: Remove `- Expand the memory access`. No other header does this.
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass fully expand the memory accesses of a Scop to get rid of
				// dependencies.
				//
				//===----------------------------------------------------------------------===//
				MeinersburUnsubmitted Done Reply Inline Actions To be consistent with the other files, add an empty line before `//===------` Meinersbur: To be consistent with the other files, add an empty line before `//===------`

				#include "polly/DependenceInfo.h"
				#include "polly/FlattenAlgo.h"
				#include "polly/LinkAllPasses.h"
				#include "polly/Options.h"
				#include "polly/ScopInfo.h"
				#include "polly/Support/GICHelper.h"
				#include "polly/Support/ISLOStream.h"
				#include "llvm/Analysis/TargetTransformInfo.h"
				#include "llvm/Support/Debug.h"

				using namespace llvm;
				using namespace polly;

				#define DEBUG_TYPE "polly-mse"

				namespace {
				class MaximalStaticExpander : public ScopPass {
				public:
				static char ID;
				explicit MaximalStaticExpander() : ScopPass(ID) {}

				~MaximalStaticExpander() {}

				/// Expand the accesses of the SCoP.
				///
				/// @param S The SCoP that must be expanded.
				bool runOnScop(Scop &S) override;
				MeinersburUnsubmitted Done Reply Inline Actions Typo: transformations Meinersbur: Typo: transformations

				/// Print the SCoP.
				///
				/// @param OS The stream where to print.
				/// @param S The SCop that must be printed.
				void printScop(raw_ostream &OS, Scop &S) const override;

				MeinersburUnsubmitted Done Reply Inline Actions [Style] Instead of a set of not expandable arrays, why not using a set of expandable arrays? It feels safer because if an array is just missing in the set for whatever reason, it would not default to expand it. [Style] `std::set` is rarely used in LLVM. There are alternitives: See http://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallset-h . Meinersbur: [Style] Instead of a set of not expandable arrays, why not using a set of expandable arrays? It…
				/// Register all analyses and transformations required.
				void getAnalysisUsage(AnalysisUsage &AU) const override;

				private:
				MeinersburUnsubmitted Done Reply Inline Actions [Typo] checkExpandability Meinersbur: [Typo] checkExpandability
				/// OptimizationRemarkEmitter object for displaying diagnostic remarks
				OptimizationRemarkEmitter *ORE;

				/// Emit remark
				void emitRemark(StringRef Msg, Instruction *Inst);

				/// Return true if the SAI in parameter is expandable.
				///
				/// @param SAI the SAI that need to be checked.
				/// @param Writes A set that will contains all the write accesses.
				MeinersburUnsubmitted Done Reply Inline Actions [Style] Please make them doxygen comments (`///`) The doxygen style for parameters we usually use is. @param Dependences The RAW dependences of the SCoP @p S. Meinersbur: [Style] Please make them doxygen comments (`///`) The doxygen style for parameters we usually…
				/// @param Reads A set that will contains all the read accesses.
				/// @param S The SCop in which the SAI is in.
				/// @param Dependences The RAW dependences of the SCop.
				bool isExpandable(const ScopArrayInfo *SAI,
				SmallPtrSetImpl<MemoryAccess *> &Writes,
				SmallPtrSetImpl<MemoryAccess *> &Reads, Scop &S,
				niosegaAuthorUnsubmitted Not Done Reply Inline Actions By reading and trying to understand the work done in DeLICM and JSONImporter::importAccesses, I had begin the implementation of my pass. But I am stuck in building the new map. For example, if we have the following c code : for(int i = 0; i<Ni; i++) for(int j = 0; j<Ni; j++) S: tmp = A[N+i]tmp; After the pass, we want this : for(int i = 0; i<Ni; i++) for(int j = 0; j<Ni; j++) T: tmp[i, j] = A[N+i]tmp; This means that I want to transform the map of S, [N] -> {S[i,j] -> tmp}, to the map of T, [N] -> {S[i,j] -> tmp[i, j]}. But i was not able to find a way to do that. Anybody has an idea on how to proceed ? niosega: By reading and trying to understand the work done in DeLICM and JSONImporter::importAccesses, I…
				MeinersburUnsubmitted Done Reply Inline Actions [Style] Use `SmallPtrSetImpl<MemoryAccess >` to not require the small size in the parameter. Meinersbur:* [Style] Use `SmallPtrSetImpl<MemoryAccess *>` to not require the small size in the parameter.
				isl::union_map &Dependences);

				/// Expand a write memory access.
				///
				/// @param S The SCop in which the memory access appears in.
				/// @param MA The memory access that need to be expanded.
				ScopArrayInfo expandWrite(Scop &S, MemoryAccess MA);

				/// Expand the read memory access.
				///
				MeinersburUnsubmitted Done Reply Inline Actions [Style] Negation in variable name and double negation. Why not `bool Expandable = true`? Meinersbur: [Style] Negation in variable name and double negation. Why not `bool Expandable = true`?
				/// @param The SCop in which the memory access appears in.
				/// @param The memory access that need to be expanded.
				/// @param Dependences The RAW dependences of the SCop.
				/// @param ExpandedSAI The expanded SAI created during write expansion.
				void expandRead(Scop &S, MemoryAccess *MA, isl::union_map &Dependences,
				ScopArrayInfo *ExpandedSAI);
				};
				} // namespace
				MeinersburUnsubmitted Done Reply Inline Actions Check for whther this returns `nullptr`. At least an assertion. Suggestion: Get the domain sizes from the writing statement's domain. Meinersbur: Check for whther this returns `nullptr`. At least an assertion. Suggestion: Get the domain…

				namespace {

				/// Whether a dimension of a set is bounded (lower and upper) by a constant,
				/// i.e. there are two constants Min and Max, such that every value x of the
				MeinersburUnsubmitted Done Reply Inline Actions [Suggestion] One could skip the check for the current array once it is known to be unexpandable and continue with the next one. Meinersbur: [Suggestion] One could skip the check for the current array once it is known to be unexpandable…
				niosegaAuthorUnsubmitted Done Reply Inline Actions That is what I wanted to do in the beginning. But I didn't find a clean solution to "escape" from the two innermost loops and go to next iteration of the loop that iterate over the SAI niosega: That is what I wanted to do in the beginning. But I didn't find a clean solution to "escape"…
				MeinersburUnsubmitted Done Reply Inline Actions A possible solution is to refactor the inner two loops to a function, which can `return true/false` for a `ScopArrayInfo` at any point. bool isExpandable(SAI) { for (ScopStmt &Stmt : S) for (MemoryAccess MA : Stmt) { if (MA->isMayWrite()) return false; ... } } for (auto &SAI : S.arrays()) { if (!isExpandable()) NotExpandables.insert(SAI); } Meinersbur:* A possible solution is to refactor the inner two loops to a function, which can `return…
				/// chosen dimensions is Min <= x <= Max.
				bool isDimBoundedByConstant(isl::set Set, unsigned dim) {
				auto ParamDims = Set.dim(isl::dim::param);
				Set = Set.project_out(isl::dim::param, 0, ParamDims);
				Set = Set.project_out(isl::dim::set, 0, dim);
				auto SetDims = Set.dim(isl::dim::set);
				Set = Set.project_out(isl::dim::set, 1, SetDims - 1);
				return bool(Set.is_bounded());
				}

				/// If @p PwAff maps to a constant, return said constant. If @p Max/@p Min, it
				/// can also be a piecewise constant and it would return the minimum/maximum
				/// value. Otherwise, return NaN.
				isl::val getConstant(isl::pw_aff PwAff, bool Max, bool Min) {
				assert(!Max \|\| !Min);
				isl::val Result;
				PwAff.foreach_piece([=, &Result](isl::set Set, isl::aff Aff) -> isl::stat {
				if (Result && Result.is_nan())
				MeinersburUnsubmitted Done Reply Inline Actions `isl_dim_in` and `isl_dim_out` dimensions have no dim_id, so this is not necessary. Meinersbur: `isl_dim_in` and `isl_dim_out` dimensions have no dim_id, so this is not necessary.
				return isl::stat::ok;

				// TODO: If Min/Max, we can also determine a minimum/maximum value if
				MeinersburUnsubmitted Done Reply Inline Actions At this point, in the regression test, we have { Stmt_for_cond1_preheader[i0] -> MemRef_tmp_11__phi1_expanded[o0] } (i0 and o0 are unconnected) That is, every statement instance accesses all array elements. It should be something like: { Stmt_for_cond1_preheader[i0] -> MemRef_tmp_11__phi1_expanded[i0] } Meinersbur: At this point, in the regression test, we have ``` { Stmt_for_cond1_preheader[i0] ->…
				// Set is constant-bounded.
				if (!Aff.is_cst()) {
				Result = isl::val::nan(Aff.get_ctx());
				return isl::stat::error;
				}

				auto ThisVal = Aff.get_constant_val();
				if (!Result) {
				Result = ThisVal;
				return isl::stat::ok;
				}

				if (Result.eq(ThisVal))
				return isl::stat::ok;

				if (Max && ThisVal.gt(Result)) {
				Result = ThisVal;
				return isl::stat::ok;
				MeinersburUnsubmitted Done Reply Inline Actions [Style] `NumerElementMap = isl_union_map_n_map(CurrentReadWriteDependences.get())` Meinersbur: [Style] `NumerElementMap = isl_union_map_n_map(CurrentReadWriteDependences.get())`
				}

				if (Min && ThisVal.lt(Result)) {
				Result = ThisVal;
				return isl::stat::ok;
				}
				MeinersburUnsubmitted Done Reply Inline Actions Use inline declarations: bool CorrectWrite = expandWrite(S, MA); Meinersbur: Use inline declarations: ``` bool CorrectWrite = expandWrite(S, MA); ```

				// Not compatible
				MeinersburUnsubmitted Not Done Reply Inline Actions We cannot return early with transformations only partially applied. The SCoP representation will be inconsistent and at best passes after this will crash, at worst we miscompile. Please check in advance if a transformation can be applied before trying to apply it. In this case, you may want to check to each `ScopArrayInfo` whether it is applicable. The class `ScalarDefUseChains` in DeLICM.cpp might be helpful to get the accesses of a SAI. I think we will sooner-or-later integrate it into ScopInfo. Meinersbur: We cannot return early with transformations only partially applied. The SCoP representation…
				Result = isl::val::nan(Aff.get_ctx());
				return isl::stat::error;
				});
				return Result;
				}

				} // namespace
				MeinersburUnsubmitted Done Reply Inline Actions [Serious] In normal operation, do not print anything. Users expect only clang warnings and errors. Alternatives are: `DEBUG(dbgs() << "")` (discurraged for regression tests, this is meant for debugging) Print remarks using `-pass-remarks-missed` Print information at `-analysis` `STATISTIC` Meinersbur: [Serious] In normal operation, do not print anything. Users expect only clang warnings and…

				char MaximalStaticExpander::ID = 0;

				MeinersburUnsubmitted Done Reply Inline Actions `runOnScop` returns true if and only if the IR has been modified. We only modify the SCoP representation, therefore we only return `false`. Meinersbur: `runOnScop` returns true if and only if the IR has been modified. We only modify the SCoP…
				bool MaximalStaticExpander::isExpandable(
				const ScopArrayInfo SAI, SmallPtrSetImpl<MemoryAccess > &Writes,
				SmallPtrSetImpl<MemoryAccess *> &Reads, Scop &S,
				isl::union_map &Dependences) {
				MeinersburUnsubmitted Done Reply Inline Actions In `printScop`, you must print to `OS`. It will print to `stdout` instead to `stderr`. This should also make the regression test simpler. Meinersbur: In `printScop`, you must print to `OS`. It will print to `stdout` instead to `stderr`. This…

				int NumberWrites = 0;
				for (ScopStmt &Stmt : S) {
				for (MemoryAccess *MA : Stmt) {
				MeinersburUnsubmitted Done Reply Inline Actions You will need to add `AU.addRequired<DependenceInfo>();` here at some point. Meinersbur: You will need to add `AU.addRequired<DependenceInfo>();` here at some point.

				// Check if the current MemoryAccess involved the current SAI.
				if (SAI != MA->getLatestScopArrayInfo())
				continue;

				// For now, we are not able to expand Scalar.
				if (MA->isLatestScalarKind()) {
				emitRemark(SAI->getName() + " is a Scalar access.",
				MA->getAccessInstruction());
				return false;
				}

				// For now, we are not able to expand MayWrite.
				if (MA->isMayWrite()) {
				emitRemark(SAI->getName() + " has a maywrite access.",
				MA->getAccessInstruction());
				return false;
				}

				// For now, we are not able to expand SAI with more than one write.
				if (MA->isMustWrite()) {
				Writes.insert(MA);
				NumberWrites++;
				if (NumberWrites > 1) {
				emitRemark(SAI->getName() + " has more than 1 write access.",
				MA->getAccessInstruction());
				simbuergUnsubmitted Done Reply Inline Actions Reminder: These should/will become diagnostics simbuerg: Reminder: These should/will become diagnostics
				return false;
				}
				}

				// Check if it is possible to expand this read.
				if (MA->isRead()) {
				MeinersburUnsubmitted Done Reply Inline Actions [Typo] extand -> expand Meinersbur: [Typo] extand -> expand

				// Get the domain of the current ScopStmt.
				auto StmtDomain = isl::give(Stmt.getDomain());

				// Get the domain of the future Read access.

				MeinersburUnsubmitted Done Reply Inline Actions I get a compile error here. `MA->getAccessRelation()` has been updated to use C++ object. Please update to Polly trunk. Meinersbur: I get a compile error here. `MA->getAccessRelation()` has been updated to use C++ object.
				auto ReadDomainSet = MA->getAccessRelation().domain();
				auto ReadDomain = isl::union_set(ReadDomainSet);
				auto CurrentReadWriteDependences =
				Dependences.reverse().intersect_domain(ReadDomain);
				auto DepsDomain = CurrentReadWriteDependences.domain();

				MeinersburUnsubmitted Done Reply Inline Actions [Suggestion] Instead of searching for the correct id, you could derive the name as in `expandWrite` and look it up in `ScopArrayNameMap`. [Serious] What if the `Id` is not found? Please add an assertion for that case. Meinersbur: [Suggestion] Instead of searching for the correct id, you could derive the name as in…
				unsigned NumberElementMap =
				isl_union_map_n_map(CurrentReadWriteDependences.get());

				// If there are multiple maps in the Deps, we cannot handle this case
				// for now.
				if (NumberElementMap != 1) {
				emitRemark(SAI->getName() +
				" has too many dependences to be handle for now.",
				MA->getAccessInstruction());
				return false;
				}

				auto DepsDomainSet = isl::set(DepsDomain);

				// For now, read from the original array is not possible.
				if (!StmtDomain.is_subset(DepsDomainSet)) {
				emitRemark("The expansion of " + SAI->getName() +
				" would lead to a read from the original array.",
				MA->getAccessInstruction());
				MeinersburUnsubmitted Not Done Reply Inline Actions [Serious] This is not a sufficient condition for full expansion. E.g. one dimension can be a static `0`. There is also more than one possibility for full expansion (e.g. one starting at 0, another at 1). The reads must access the correct element. So if you want to implement this as a heuristic that expansion is not worth it in this case, implement it in `checkExpandability` such that reads are also not modified. Meinersbur: [Serious] This is not a sufficient condition for full expansion. E.g. one dimension can be a…
				MeinersburUnsubmitted Done Reply Inline Actions The consequence would not be a partial read access, but it would need to read the original value the element had before entering the SCoP. That's a special case similar to having more than one write. Meinersbur: The consequence would not be a partial read access, but it would need to read the original…
				niosegaAuthorUnsubmitted Done Reply Inline Actions Are you sure ? Because if I remember well. Let say that we are analyzing this code : for (i = 0; i < Ni; i++) { B[j] = i; for (int j = 0; j<Nj; j++) { B[j] = j; } A[i] = B[i]; When I try to set the new access relation for the B read, the setNewAccessRelation method of class MemoryAccess failed with the assert "Partial READ accesses not supported". niosega: Are you sure ? Because if I remember well. Let say that we are analyzing this code : ```…
				MeinersburUnsubmitted Done Reply Inline Actions How were you able to do that? setNewAccessRelation accepts only an isl_map, not isl_union_map. Let me explain in more detail. for (int k = 0; k < M; k+=1) { for (int i = 0; i<= N/2; i+=1) { S: B[i] = i; } for (int i = 0; i<N; i+=1) { T: = ... B[i] } } The flow dependence would look something like: { T[k, i] -> S[k, i] : 0 < i <= N/2 } We could naively expand B to B_expanded: for (int k = 0; k < M; k+=1) { for (int i = 0; i<= N/2; i+=1) { S: B_expanded[k][i] = i; } for (int i = 0; i<N; i+=1) { T: = ... B_expanded[k][i] } } The problem here is that `B_expanded[k][i]` for i > N/2 never gets written (And T would read uninitialized memory). The flow dependence doesn't tell which instance of S wrote it in the first place!! If you try to apply it naively anyway using setNewAccessRelation, we need a source of the value for all instances of S, but we don't have one for `i > N/2`! This is way partial read accesses are unsupported. The correct thing to do would be to read the value from the original array B (which then becomes read-only). { T[k,i] -> B_expanded[k,i] : i < N/2; T[k,i] -> B[i] : i>=N/2 } This again is an isl_union_map (NOT a partial access since it is defined for all instances of T), which we currently do not support support. Please try to understand what the problem with partial read accesses is. Not the partial read accesses are the problem, but the reason why you would want to use one. Meinersbur: How were you able to do that? setNewAccessRelation accepts only an isl_map, not isl_union_map.
				return false;
				}

				Reads.insert(MA);
				}
				}
				}

				// No need to expand SAI with no write.
				if (NumberWrites == 0) {
				emitRemark(SAI->getName() + " has 0 write access.",
				S.getEnteringBlock()->getFirstNonPHI());
				return false;
				}

				return true;
				}

				void MaximalStaticExpander::expandRead(Scop &S, MemoryAccess *MA,
				isl::union_map &Dependences,
				ScopArrayInfo *ExpandedSAI) {

				// Get the current AM.
				auto CurrentAccessMap = MA->getAccessRelation();

				// Get RAW dependences for the current WA.
				auto WriteDomainSet = MA->getAccessRelation().domain();
				auto WriteDomain = isl::union_set(WriteDomainSet);

				auto CurrentReadWriteDependences =
				Dependences.reverse().intersect_domain(WriteDomain);

				MeinersburUnsubmitted Done Reply Inline Actions [Serious] I think `getMaxBackedgeTakenCount` can fail in cases where Polly is still able to detect affine loops. Please add at least an `assert` that SCEV is not `SCEVCountNotCompute`. Meinersbur: [Serious] I think `getMaxBackedgeTakenCount` can fail in cases where Polly is still able to…
				MeinersburUnsubmitted Done Reply Inline Actions [Suggestion] Or use `ScalarEvolution::getAddExpr(SCEV, ScalarEvolution::getConstant(1))` [Serious] I don't see where +1 is added to the ISL version. Meinersbur: [Suggestion] Or use `ScalarEvolution::getAddExpr(SCEV, ScalarEvolution::getConstant(1))`…
				niosegaAuthorUnsubmitted Done Reply Inline Actions We discuss during the last phone call about an other solution that involves methods that are in FlattenAlgo.cpp to get the boundary of the loop iterations variables. That's what I meant by "ISL version". But for now, I can not access the methods from FlattenAlgo.cpp because they are defined inside an unamed namespace. niosega: We discuss during the last phone call about an other solution that involves methods that are in…
				MeinersburUnsubmitted Done Reply Inline Actions You may need to modify them anyway, so don't hesitate to copy them over, especially for a prototype. If later we find they share a significant amount of code, we can find a common file for them. Meinersbur: You may need to modify them anyway, so don't hesitate to copy them over, especially for a…
				// If no dependences, no need to modify anything.
				if (CurrentReadWriteDependences.is_empty()) {
				simbuergUnsubmitted Done Reply Inline Actions As far as I remember, this will fail, if there are more than one map in the union_map. I would check that with at least an assert. simbuerg: As far as I remember, this will fail, if there are more than one map in the union_map. I would…
				return;
				}

				MeinersburUnsubmitted Done Reply Inline Actions [Suggestion] Should this be `getLatestScopArrayInfo()` ? Meinersbur: [Suggestion] Should this be `getLatestScopArrayInfo()` ?
				assert(isl_union_map_n_map(CurrentReadWriteDependences.get()) == 1 &&
				"There are more than one RAW dependencies in the union map.");
				auto NewAccessMap = isl::map::from_union_map(CurrentReadWriteDependences);
				simbuergUnsubmitted Done Reply Inline Actions Where do you use this name? simbuerg: Where do you use this name?

				auto Id = ExpandedSAI->getBasePtrId();

				// Replace the out tuple id with the one of the access array.
				NewAccessMap = NewAccessMap.set_tuple_id(isl::dim::out, Id);

				// Set the new access relation.
				MA->setNewAccessRelation(NewAccessMap);
				}

				ScopArrayInfo MaximalStaticExpander::expandWrite(Scop &S, MemoryAccess MA) {

				// Get the current AM.
				auto CurrentAccessMap = MA->getAccessRelation();

				unsigned in_dimensions = CurrentAccessMap.dim(isl::dim::in);

				// Get domain from the current AM.
				auto Domain = CurrentAccessMap.domain();

				// Create a new AM from the domain.
				auto NewAccessMap = isl::map::from_domain(Domain);

				// Add dimensions to the new AM according to the current in_dim.
				NewAccessMap = NewAccessMap.add_dims(isl::dim::out, in_dimensions);

				// Create the string representing the name of the new SAI.
				// One new SAI for each statement so that each write go to a different memory
				// cell.
				auto CurrentStmtDomain = isl::give(MA->getStatement()->getDomain());
				auto CurrentStmtName = CurrentStmtDomain.get_tuple_name();
				auto CurrentOutId = CurrentAccessMap.get_tuple_id(isl::dim::out);
				std::string CurrentOutIdString =
				MA->getScopArrayInfo()->getName() + "_" + CurrentStmtName + "_expanded";

				// Set the tuple id for the out dimension.
				NewAccessMap = NewAccessMap.set_tuple_id(isl::dim::out, CurrentOutId);

				// Create the size vector.
				std::vector<unsigned> Sizes;
				MeinersburUnsubmitted Done Reply Inline Actions [Remark] The structure I had in mind was for (array : S.arrays()) { if (!checkExpendability(array)) continue; for (ScopStmt &Stmt : S) for (MemoryAccess MA : Stmt) { if (MA->isRead()) expandRead(MA); if (MA->isWrite()) expandRead(MA); } } This does not need a `NotExpandable` set, but has worse asymptotic runtime. So I guess your version has an advantage. Meinersbur:* [Remark] The structure I had in mind was ``` for (array : S.arrays()) { if (!
				niosegaAuthorUnsubmitted Done Reply Inline Actions If I am not wrong, we must first expand all writes before expanding the reads. Because otherwise problems can happened during trying to find expanded SAI during read expansion. niosega: If I am not wrong, we must first expand all writes before expanding the reads. Because…
				MeinersburUnsubmitted Done Reply Inline Actions Let me refine what I some time ago I had in mind. for (array : S.arrays()) { MemoryAccess TheWrite = nullptr; List<MemoryAccess> AllReads; if (!isExpandable(array, TheWrite, AllReads)) continue; assert(TheWrite); assert(AllReads.size() > 0); ScopArrayInfo ExpandedArray = expandWrite(TheWrite); for (MemoryAccess MA : AllReads) expandRead(MA, ExpandedArray); } Meinersbur:* Let me refine what I some time ago I had in mind. ``` for (array : S.arrays()) { MemoryAccess…
				for (unsigned i = 0; i < in_dimensions; i++) {
				assert(isDimBoundedByConstant(CurrentStmtDomain, i) &&
				"Domain boundary are not constant.");
				auto UpperBound = getConstant(CurrentStmtDomain.dim_max(i), true, false);
				assert(!UpperBound.is_null() && UpperBound.is_pos() &&
				!UpperBound.is_nan() &&
				MeinersburUnsubmitted Done Reply Inline Actions [Style] We usually do not use parenthesis around single statements: if (isExpandable(SAI)) continue; Meinersbur: [Style] We usually do not use parenthesis around single statements: ``` if (isExpandable(SAI))…
				"The upper bound is not a positive integer.");
				assert(UpperBound.le(isl::val(CurrentAccessMap.get_ctx(),
				std::numeric_limits<int>::max() - 1)) &&
				MeinersburUnsubmitted Done Reply Inline Actions [Nit] The UpperBound could overflow a long. Add an assertion for that? Meinersbur: [Nit] The UpperBound could overflow a long. Add an assertion for that?
				niosegaAuthorUnsubmitted Done Reply Inline Actions How can I efficiently check that there is an overflow ? niosega: How can I efficiently check that there is an overflow ?
				MeinersburUnsubmitted Done Reply Inline Actions `UpperBound.le(INT_MAX)` (I think there is no implicit conversion from int to isl::val, but you gget the idea) Meinersbur: `UpperBound.le(INT_MAX)` (I think there is no implicit conversion from int to isl::val, but you…
				MeinersburUnsubmitted Not Done Reply Inline Actions Why `- 1`? Meinersbur: Why `- 1`?
				"The upper bound overflow a int.");
				MeinersburUnsubmitted Not Done Reply Inline Actions This assertion fails on Windows (and 32 bit platforms): The `isl::val` constructor takes a `long`, which is 32 bit this platforms. UINT_MAX exceeds its range. Meinersbur: This assertion fails on Windows (and 32 bit platforms): The `isl::val` constructor takes a…
				niosegaAuthorUnsubmitted Not Done Reply Inline Actions It's a mistake from my side to compare it with UINT_MAX. If I replace std::numeric_limits<unsigned>::max() with std::numeric_limits<long>::max() it should work on every platform, right ? niosega: It's a mistake from my side to compare it with UINT_MAX. If I replace ``` std…
				MeinersburUnsubmitted Not Done Reply Inline Actions Except that it is stored into a `std::vector<unsigned>`. Storing a `long` into as an `unsigned int` may get you another overflow. My suggestion is to stick with the lowest common maximum: `std::numeric_limits<int>::max()`. I don't think you would want to allocate memory larger than that anyway. Meinersbur: Except that it is stored into a `std::vector<unsigned>`. Storing a `long` into as an `unsigned…
				Sizes.push_back(UpperBound.get_num_si() + 1);
				MeinersburUnsubmitted Not Done Reply Inline Actions It has been tested for the range of an `unsigned int`. Meinersbur: It has been tested for the range of an `unsigned int`.
				}
				MeinersburUnsubmitted Not Done Reply Inline Actions If `UpperBound.get_num_si()` is `UINT_MAX`, you get an overflow when adding +1. Meinersbur: If `UpperBound.get_num_si()` is `UINT_MAX`, you get an overflow when adding +1.

				// Get the ElementType of the current SAI.
				auto ElementType = MA->getLatestScopArrayInfo()->getElementType();
				MeinersburUnsubmitted Done Reply Inline Actions [Style] In Polly's coding style, all sentences end with a dot (but I personally don't care). Meinersbur: [Style] In Polly's coding style, all sentences end with a dot (but I personally don't care).

				simbuergUnsubmitted Done Reply Inline Actions This takes a const string &, no need to go over the c_str(). simbuerg: This takes a const string &, no need to go over the c_str().
				// Create (or get if already existing) the new expanded SAI.
				auto ExpandedSAI =
				S.createScopArrayInfo(ElementType, CurrentOutIdString, Sizes);
				ExpandedSAI->setIsOnHeap(true);

				// Get the out Id of the expanded Array.
				auto NewOutId = ExpandedSAI->getBasePtrId();

				// Set the out id of the new AM to the new SAI id.
				NewAccessMap = NewAccessMap.set_tuple_id(isl::dim::out, NewOutId);

				// Add constraints to linked output with input id.
				auto SpaceMap = NewAccessMap.get_space();
				auto ConstraintBasicMap =
				isl::basic_map::equal(SpaceMap, SpaceMap.dim(isl::dim::in));
				NewAccessMap = isl::map(ConstraintBasicMap);

				MeinersburUnsubmitted Done Reply Inline Actions [Style] This could be simpler using NewAccessMap = NewAccessMap->equate(isl::dim::in, dim. isl::dim::out, dim); or, even, better, use `basic_map::equal`. Meinersbur: [Style] This could be simpler using ``` NewAccessMap = NewAccessMap->equate(isl::dim::in, dim.
				niosegaAuthorUnsubmitted Done Reply Inline Actions I'd like to use isl_basic_map_equal but I did not find the documentation of this method on the online isl doc. There is also no example of uses in Polly. Can you explain me how it works ? niosega: I'd like to use isl_basic_map_equal but I did not find the documentation of this method on the…
				MeinersburUnsubmitted Done Reply Inline Actions isl_map_space(SpaceMap.copy(), SpaceMap.dim(isl::in)) should get you a basic_map of that space where the `n_equal = SpaceMap.dim(isl::in)` in- and out- dimensions are equal. Something like. { Stmt[i0, i1] -> MemRef[o0, o1] : i0 = o0 and i1 = o1 } However, no documention could mean that the function was not intended to be public. Meinersbur: ``` isl_map_space(SpaceMap.copy(), SpaceMap.dim(isl::in)) ``` should get you a basic_map of…
				// Set the new access relation map.
				MA->setNewAccessRelation(NewAccessMap);

				return ExpandedSAI;
				}

				void MaximalStaticExpander::emitRemark(StringRef Msg, Instruction *Inst) {
				ORE->emit(OptimizationRemarkAnalysis(DEBUG_TYPE, "ExpansionRejection", Inst)
				MeinersburUnsubmitted Done Reply Inline Actions [Suggestion] Pass string as `llvm::StringRef` (or `const std::string &` to avoid a copy) Meinersbur: [Suggestion] Pass string as `llvm::StringRef` (or `const std::string &` to avoid a copy)
				<< Msg);
				}

				simbuergUnsubmitted Done Reply Inline Actions Why 'AssumpRestrict'? simbuerg: Why 'AssumpRestrict'?
				bool MaximalStaticExpander::runOnScop(Scop &S) {

				// Get the ORE from OptimizationRemarkEmitterWrapperPass.
				ORE = &(getAnalysis<OptimizationRemarkEmitterWrapperPass>().getORE());

				MeinersburUnsubmitted Done Reply Inline Actions [Nit] `OptimizationRemarkEmitterWrapperPass` Meinersbur: [Nit] `OptimizationRemarkEmitterWrapperPass`
				// Get the RAW Dependences.
				auto &DI = getAnalysis<DependenceInfo>();
				auto &D = DI.getDependences(Dependences::AL_Statement);
				auto Dependences = isl::give(D.getDependences(Dependences::TYPE_RAW));

				SmallPtrSet<ScopArrayInfo *, 4> CurrentSAI(S.arrays().begin(),
				S.arrays().end());

				for (auto SAI : CurrentSAI) {
				SmallPtrSet<MemoryAccess *, 4> AllWrites;
				SmallPtrSet<MemoryAccess *, 4> AllReads;
				if (!isExpandable(SAI, AllWrites, AllReads, S, Dependences))
				MeinersburUnsubmitted Done Reply Inline Actions [Style] No braces around single statements. Also possible: SmallPtrSet<ScopArrayInfo , 4> CurrentSAI(S.array_begin(), array_end()); Meinersbur:* [Style] No braces around single statements. Also possible: ``` SmallPtrSet<ScopArrayInfo *, 4>…
				continue;

				assert(AllWrites.size() == 1);

				auto TheWrite = *(AllWrites.begin());
				ScopArrayInfo *ExpandedArray = expandWrite(S, TheWrite);

				for (MemoryAccess *MA : AllReads)
				expandRead(S, MA, Dependences, ExpandedArray);
				}

				return false;
				}

				void MaximalStaticExpander::printScop(raw_ostream &OS, Scop &S) const {
				S.print(OS, false);
				}

				void MaximalStaticExpander::getAnalysisUsage(AnalysisUsage &AU) const {
				ScopPass::getAnalysisUsage(AU);
				AU.addRequired<DependenceInfo>();
				AU.addRequired<OptimizationRemarkEmitterWrapperPass>();
				}

				Pass *polly::createMaximalStaticExpansionPass() {
				return new MaximalStaticExpander();
				}

				INITIALIZE_PASS_BEGIN(MaximalStaticExpander, "polly-mse",
				"Polly - Maximal static expansion of SCoP", false, false);
				INITIALIZE_PASS_DEPENDENCY(DependenceInfo);
				INITIALIZE_PASS_DEPENDENCY(OptimizationRemarkEmitterWrapperPass);
				INITIALIZE_PASS_END(MaximalStaticExpander, "polly-mse",
				"Polly - Maximal static expansion of SCoP", false, false)

test/MaximalStaticExpansion/read_from_original.ll

This file was added.

				; RUN: opt -polly-canonicalize %loadPolly -polly-mse -analyze < %s \| FileCheck %s
				; RUN: opt -polly-canonicalize %loadPolly -polly-mse -pass-remarks-analysis="polly-mse" -analyze < %s 2>&1\| FileCheck %s --check-prefix=MSE
				MeinersburUnsubmitted Not Done Reply Inline Actions Nice new testcase! Meinersbur: Nice new testcase!
				;
				; Verify that Polly detects problems and does not expand the array
				;
				; Original source code :
				;
				; #define Ni 2000
				; #define Nj 3000
				;
				; double mse(double A[Ni], double B[Nj]) {
				; int i;
				; double tmp = 6;
				; for (i = 0; i < Ni; i++) {
				; for (int j = 2; j<Nj; j++) {
				; B[j-1] = j;
				; }
				; A[i] = B[i];
				; }
				; return tmp;
				; }
				;
				; Check that the pass detects the problem of read from original array after expansion.
				;
				; MSE: The expansion of MemRef_B would lead to a read from the original array.
				;
				; CHECK-NOT: double MemRef_B2_expanded[2000][3000]; // Element size 8
				;
				; Check that the memory accesses are not modified
				;
				; CHECK-NOT: new: { Stmt_for_body3[i0, i1] -> MemRef_B_Stmt_for_body3_expanded[i0, i1] };
				; CHECK-NOT: new: { Stmt_for_end[i0] -> MemRef_B_Stmt_for_body3_expanded

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: noinline nounwind uwtable
				define double @mse(double* %A, double* %B) {
				entry:
				%A.addr = alloca double*, align 8
				%B.addr = alloca double*, align 8
				%i = alloca i32, align 4
				%tmp = alloca double, align 8
				%j = alloca i32, align 4
				store double* %A, double** %A.addr, align 8
				store double* %B, double** %B.addr, align 8
				store double 6.000000e+00, double* %tmp, align 8
				store i32 0, i32* %i, align 4
				br label %for.cond

				for.cond: ; preds = %for.inc8, %entry
				%0 = load i32, i32* %i, align 4
				%cmp = icmp slt i32 %0, 2000
				br i1 %cmp, label %for.body, label %for.end10

				for.body: ; preds = %for.cond
				store i32 2, i32* %j, align 4
				br label %for.cond1

				for.cond1: ; preds = %for.inc, %for.body
				%1 = load i32, i32* %j, align 4
				%cmp2 = icmp slt i32 %1, 3000
				br i1 %cmp2, label %for.body3, label %for.end

				for.body3: ; preds = %for.cond1
				%2 = load i32, i32* %j, align 4
				%conv = sitofp i32 %2 to double
				%3 = load double, double* %B.addr, align 8
				%4 = load i32, i32* %j, align 4
				%sub = sub nsw i32 %4, 1
				%idxprom = sext i32 %sub to i64
				%arrayidx = getelementptr inbounds double, double* %3, i64 %idxprom
				store double %conv, double* %arrayidx, align 8
				br label %for.inc

				for.inc: ; preds = %for.body3
				%5 = load i32, i32* %j, align 4
				%inc = add nsw i32 %5, 1
				store i32 %inc, i32* %j, align 4
				br label %for.cond1

				for.end: ; preds = %for.cond1
				%6 = load double, double* %B.addr, align 8
				%7 = load i32, i32* %i, align 4
				%idxprom4 = sext i32 %7 to i64
				%arrayidx5 = getelementptr inbounds double, double* %6, i64 %idxprom4
				%8 = load double, double* %arrayidx5, align 8
				%9 = load double, double* %A.addr, align 8
				%10 = load i32, i32* %i, align 4
				%idxprom6 = sext i32 %10 to i64
				%arrayidx7 = getelementptr inbounds double, double* %9, i64 %idxprom6
				store double %8, double* %arrayidx7, align 8
				br label %for.inc8

				for.inc8: ; preds = %for.end
				%11 = load i32, i32* %i, align 4
				%inc9 = add nsw i32 %11, 1
				store i32 %inc9, i32* %i, align 4
				br label %for.cond

				for.end10: ; preds = %for.cond
				%12 = load double, double* %tmp, align 8
				ret double %12
				}

test/MaximalStaticExpansion/too_many_writes.ll

This file was added.

				; RUN: opt -polly-canonicalize %loadPolly -polly-mse -analyze < %s \| FileCheck %s
				; RUN: opt -polly-canonicalize %loadPolly -polly-mse -pass-remarks-analysis="polly-mse" -analyze < %s 2>&1 \| FileCheck %s --check-prefix=MSE
				;
				; Verify that Polly detects problems and does not expand the array
				;
				; Original source code :
				;
				; #define Ni 2000
				; #define Nj 2000
				;
				; double mse(double A[Ni], double B[Nj]) {
				; int i;
				; double tmp = 6;
				; for (i = 0; i < Ni; i++) {
				; B[i] = 2;
				; for (int j = 0; j<Nj; j++) {
				; B[j] = j;
				; }
				; A[i] = B[i];
				; }
				; return tmp;
				; }
				;
				; Check that the pass detects that there are more than 1 write access per array.
				;
				; MSE: MemRef_B has more than 1 write access.
				;
				; Check that the SAI is not expanded
				;
				; CHECK-NOT: double MemRef_B2_expanded[2000][3000]; // Element size 8
				;
				; Check that the memory accesses are not modified
				;
				; CHECK-NOT: new: { Stmt_for_body3[i0, i1] -> MemRef_B_Stmt_for_body3_expanded[i0, i1] };
				; CHECK-NOT: new: { Stmt_for_end[i0] -> MemRef_B_Stmt_for_body3_expanded

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: noinline nounwind uwtable
				define double @mse(double* %A, double* %B) {
				entry:
				%A.addr = alloca double*, align 8
				%B.addr = alloca double*, align 8
				%i = alloca i32, align 4
				%tmp = alloca double, align 8
				%j = alloca i32, align 4
				store double* %A, double** %A.addr, align 8
				store double* %B, double** %B.addr, align 8
				store double 6.000000e+00, double* %tmp, align 8
				store i32 0, i32* %i, align 4
				br label %for.cond

				for.cond: ; preds = %for.inc10, %entry
				%0 = load i32, i32* %i, align 4
				%cmp = icmp slt i32 %0, 2000
				br i1 %cmp, label %for.body, label %for.end12

				for.body: ; preds = %for.cond
				%1 = load double, double* %B.addr, align 8
				%2 = load i32, i32* %i, align 4
				%idxprom = sext i32 %2 to i64
				%arrayidx = getelementptr inbounds double, double* %1, i64 %idxprom
				store double 2.000000e+00, double* %arrayidx, align 8
				store i32 0, i32* %j, align 4
				br label %for.cond1

				for.cond1: ; preds = %for.inc, %for.body
				%3 = load i32, i32* %j, align 4
				%cmp2 = icmp slt i32 %3, 2000
				br i1 %cmp2, label %for.body3, label %for.end

				for.body3: ; preds = %for.cond1
				%4 = load i32, i32* %j, align 4
				%conv = sitofp i32 %4 to double
				%5 = load double, double* %B.addr, align 8
				%6 = load i32, i32* %j, align 4
				%idxprom4 = sext i32 %6 to i64
				%arrayidx5 = getelementptr inbounds double, double* %5, i64 %idxprom4
				store double %conv, double* %arrayidx5, align 8
				br label %for.inc

				for.inc: ; preds = %for.body3
				%7 = load i32, i32* %j, align 4
				%inc = add nsw i32 %7, 1
				store i32 %inc, i32* %j, align 4
				br label %for.cond1

				for.end: ; preds = %for.cond1
				%8 = load double, double* %B.addr, align 8
				%9 = load i32, i32* %i, align 4
				%idxprom6 = sext i32 %9 to i64
				%arrayidx7 = getelementptr inbounds double, double* %8, i64 %idxprom6
				%10 = load double, double* %arrayidx7, align 8
				%11 = load double, double* %A.addr, align 8
				%12 = load i32, i32* %i, align 4
				%idxprom8 = sext i32 %12 to i64
				%arrayidx9 = getelementptr inbounds double, double* %11, i64 %idxprom8
				store double %10, double* %arrayidx9, align 8
				br label %for.inc10

				for.inc10: ; preds = %for.end
				%13 = load i32, i32* %i, align 4
				%inc11 = add nsw i32 %13, 1
				store i32 %inc11, i32* %i, align 4
				br label %for.cond

				for.end12: ; preds = %for.cond
				%14 = load double, double* %tmp, align 8
				ret double %14
				}

test/MaximalStaticExpansion/working_expansion.ll

This file was added.

				; RUN: opt -polly-canonicalize %loadPolly -polly-mse -analyze < %s \| FileCheck %s
				;
				; Verify that the accesses are correctly expanded
				;
				; Original source code :
				;
				; #define Ni 2000
				; #define Nj 3000
				;
				; double mse(double A[Ni], double B[Nj]) {
				; int i;
				; double tmp = 6;
				; for (i = 0; i < Ni; i++) {
				; for (int j = 0; j<Nj; j++) {
				; B[j] = j;
				; }
				; A[i] = B[i];
				; }
				; return tmp;
				; }
				;
				; Check if the expanded SAI are created
				;
				; CHECK: double MemRef_B_Stmt_for_body3_expanded[2000][3000]; // Element size 8
				;
				MeinersburUnsubmitted Done Reply Inline Actions Shouldn't this be MemRef_B_Stmt_for_body3_expanded[2000][3000] ? Meinersbur: Shouldn't this be ``` MemRef_B_Stmt_for_body3_expanded[2000][3000] ``` ?
				; Check if the memory accesses are modified
				;
				; CHECK: new: { Stmt_for_body3[i0, i1] -> MemRef_B_Stmt_for_body3_expanded[i0, i1] };
				; CHECK: new: { Stmt_for_end[i0] -> MemRef_B_Stmt_for_body3_expanded[i0, i0] };

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: noinline nounwind uwtable
				define double @mse(double* %A, double* %B) {
				entry:
				%A.addr = alloca double*, align 8
				%B.addr = alloca double*, align 8
				%i = alloca i32, align 4
				%tmp = alloca double, align 8
				%j = alloca i32, align 4
				store double* %A, double** %A.addr, align 8
				store double* %B, double** %B.addr, align 8
				store double 6.000000e+00, double* %tmp, align 8
				store i32 0, i32* %i, align 4
				br label %for.cond

				for.cond: ; preds = %for.inc8, %entry
				%0 = load i32, i32* %i, align 4
				%cmp = icmp slt i32 %0, 2000
				br i1 %cmp, label %for.body, label %for.end10

				for.body: ; preds = %for.cond
				store i32 0, i32* %j, align 4
				br label %for.cond1

				for.cond1: ; preds = %for.inc, %for.body
				%1 = load i32, i32* %j, align 4
				%cmp2 = icmp slt i32 %1, 3000
				br i1 %cmp2, label %for.body3, label %for.end

				for.body3: ; preds = %for.cond1
				%2 = load i32, i32* %j, align 4
				%conv = sitofp i32 %2 to double
				%3 = load double, double* %B.addr, align 8
				%4 = load i32, i32* %j, align 4
				%idxprom = sext i32 %4 to i64
				%arrayidx = getelementptr inbounds double, double* %3, i64 %idxprom
				store double %conv, double* %arrayidx, align 8
				br label %for.inc

				for.inc: ; preds = %for.body3
				%5 = load i32, i32* %j, align 4
				%inc = add nsw i32 %5, 1
				store i32 %inc, i32* %j, align 4
				br label %for.cond1

				for.end: ; preds = %for.cond1
				%6 = load double, double* %B.addr, align 8
				%7 = load i32, i32* %i, align 4
				%idxprom4 = sext i32 %7 to i64
				%arrayidx5 = getelementptr inbounds double, double* %6, i64 %idxprom4
				%8 = load double, double* %arrayidx5, align 8
				%9 = load double, double* %A.addr, align 8
				%10 = load i32, i32* %i, align 4
				%idxprom6 = sext i32 %10 to i64
				%arrayidx7 = getelementptr inbounds double, double* %9, i64 %idxprom6
				store double %8, double* %arrayidx7, align 8
				br label %for.inc8

				for.inc8: ; preds = %for.end
				%11 = load i32, i32* %i, align 4
				%inc9 = add nsw i32 %11, 1
				store i32 %inc9, i32* %i, align 4
				br label %for.cond

				for.end10: ; preds = %for.cond
				%12 = load double, double* %tmp, align 8
				ret double %12
				}