This is an archive of the discontinued LLVM Phabricator instance.

[Polly] Heap allocation for new arrays
ClosedPublic

Authored by niosega on May 30 2017, 12:42 PM.

Details

Summary

This patch aims to implement the option of allocating new arrays created by polly on heap instead of stack.
To enable this option, a key named 'allocation' must be written in the imported json file with the value 'heap'.

We need such a feature because in a next iteration, we will implement a mechanism of maximal static expansion which will need a way to allocate arrays on heap. Indeed, the expansion is very costly in terms of memory and doing the allocation on stack is not worth considering.

The malloc and the free are added respectively at polly.start and polly.exiting such that there is no use-after-free (for instance in case of Scop in a loop) and such that all memory cells allocated with a malloc are free'd when we don't need them anymore.

We also add :

  • In the class ScopArrayInfo, we add a boolean as member called IsOnHeap which represents the fact that the array in allocated on heap or not.
  • A new branch in the method allocateNewArrays in the ISLNodeBuilder for the case of heap allocation. allocateNewArrays now takes a BBPair containing polly.start and polly.exiting. allocateNewArrays takes this two blocs and add the malloc and free calls respectively to polly.start and polly.exiting.
  • As IntPtrTy for the malloc call, we use the DalaLayout one.

To do that, we have modified :

  • CreateScopArrayInfo and getOrCreateScopArrayInfo such that it return a non-const SAI, in order to be able to call setIsOnHeap in the JSONImporter.
  • executeScopConditionnaly such that it return both start block and end block of the scop, because we need this two blocs to be able to add the malloc and the free calls at the right position.

Diff Detail

Event Timeline

niosega created this revision.May 30 2017, 12:42 PM
niosega created this object with visibility "Custom Policy".
niosega created this object with edit policy "Custom Policy".
niosega added a reviewer: simbuerg.
niosega updated this revision to Diff 101574.Jun 6 2017, 8:40 AM
niosega retitled this revision from Heap allocation for new arrays to [Polly] Heap allocation for new arrays.
niosega edited the summary of this revision. (Show Details)
niosega added reviewers: Meinersbur, grosser.
niosega changed the visibility from "Custom Policy" to "Public (No Login Required)".
niosega changed the edit policy from "Custom Policy" to "Custom Policy".
niosega added subscribers: pollydev, llvm-commits.
simbuerg edited edge metadata.Jun 6 2017, 8:46 AM

General Note: Can you reduce the size of the test-cases? (Remove the debug metadata with opt -strip-debug, unnecessary attributes).

simbuerg added inline comments.Jun 6 2017, 8:54 AM
lib/CodeGen/IslNodeBuilder.cpp
1417

Question for my own understanding: We can pin this to 64bits only if we know which malloc implementation we have available on the target machine, right?

So far it looks good. I would add tests that take care of error cases: negative sizes, overflow in the size calculation.

lib/CodeGen/IslNodeBuilder.cpp
1413

You already wrote it in your diff-summary: I would make this a property of the SAI and fill this information from the JSON side.

philip.pfaffe added inline comments.
lib/CodeGen/IslNodeBuilder.cpp
1417

Why not get the real IntPtrTy from the current DataLayout?

niosega updated this revision to Diff 101577.Jun 6 2017, 9:13 AM
simbuerg added inline comments.Jun 6 2017, 9:19 AM
lib/CodeGen/IslNodeBuilder.cpp
1417

Well it's not the problem that we don't know the size of the IntPtr. We don't know the argument type that the system's malloc function expects (whatever size_t might be on the target).

Normally you would expect 'unsigned int', then we could just get the size from the DataLayout.
By default CreateMalloc would generate a call to 'void malloc(IntPtrTy)'.

Meinersbur edited edge metadata.Jun 6 2017, 11:44 AM

I am wondering why you decided for a global stack-or-heap property. I'd assumed that this is a per-array decision. For instance, Roman's gemm optimization is designed to fit on the stack, there is no reason to allocate it on the heap.

lib/CodeGen/IslNodeBuilder.cpp
1423–1427

Where is the memory free'd?

niosega added a comment.EditedJun 6 2017, 2:43 PM

I am wondering why you decided for a global stack-or-heap property. I'd assumed that this is a per-array decision. For instance, Roman's gemm optimization is designed to fit on the stack, there is no reason to allocate it on the heap.

The global stack-or-heap property is just for debugging purpose (as explained in the summary). I publish my code in phabricator so that I can have feedback and remarks on the malloc / free implementation. I am working in parallel on adding the possibility to specify in the json file if we want heap or stack array, as Andreas described in an inline comment.

Thanks for explaining it (again).

You can mark your patch with "WIP" (Work in Progress) if you do not intend to have it committed as-is.

For being commit-ready, I think the test case should be smaller (as already mentioned by Andres), and the malloc'ed memory must also be released.

simbuerg added inline comments.Jun 7 2017, 5:12 AM
lib/CodeGen/IslNodeBuilder.cpp
1423–1427

Good Catch. We just discussed possible locations for the free. The easiest way would be the exit(s) of the SCoP. However, to keep it simple, we would have to do a copy-in/-out to the original array base pointer, right? Everything else (e.g. calculating lexicographic maximal accesses) would give us trouble with non-polyhedral accesses between SCoPs.

philip.pfaffe added inline comments.Jun 8 2017, 8:48 AM
lib/CodeGen/IslNodeBuilder.cpp
1417

You are right of course. There is no way to obtain the real size_t in the middle end.

I don't have a strong opinion on this, but I'd consider IntPtrTy to be the better default over hard-coding some int type, as AFAICS (u)intptr_t and size_t are the same on basically all platforms.

1423–1427

Couldn't that create use-after-free if the SCoP is within a loop?

niosega updated this revision to Diff 101926.Jun 8 2017, 9:04 AM

Release the memory.

Remove the cli option and now use a JSON property to enable heap allocation.

Remove metadatas in the test case.

Meinersbur added inline comments.
include/polly/ScopInfo.h
361

Please add the information that the property is only relevant if the array is allocated by Polly instead of pre-existing. Also, that when it is false, it is allocated using alloca (instead of malloca)

424

... is allocated on the heap. The "False otherwise" does not add any information, it's a boolean.

2610

Please document the new parameter.

Instead of passing to every constructor, you could also defeault-initialize it to false and add a setIsOnHeap accessor (or similar name) to change that property after creation.

2619

Please document the new parameter.

lib/CodeGen/IslNodeBuilder.cpp
1398

On 32-bit platforms and 64-but windows, long has only 32 bits. You test case failsdue to an overflow:

polly\test\Isl\CodeGen\MemAccess\create_arrays_heap.ll:35:12: error: expected string not found in input
; CODEGEN: %malloccall1 = tail call i8* @malloc(i64 432537600000)
           ^
<stdin>:12:2: note: scanning from here
 %malloccall1 = tail call i8* @malloc(i64 20220739584)
 ^

error: command failed with exit status: 1
1423–1427

copy-in/out is not specific to heap arrays. Such things can be done using with additional copy statements.

For scalar expansion, the scalar are not available after the SCoP anyway, with the exception of 'escaping' scalars. In that case only a single value is available to the outside. I suggest to exclude escaping scalars at the moment.

Here, alloca/malloc are generated when entering the Scop. I think free'ing it when exiting it is the most obvious choice.

@gareevroman
Btw, for alloca this location looks to be the wrong choice. alloca's should be in a function's entry block. The SCoP itself could be within the loop, meaning that everytime the SCoP is executed, the stack grows, up to a possible stack overflow.

1423–1427

@gareevroman
I was wrong, the alloca is actually added to the entry block.

This means that also the malloc is in the entry block, but the call to free below is added into last block of the orginal region (where it is not even used). We either have to

  1. Add the malloc to the entry block and the free at every ret (or non-returning function call)
  • or -
  1. Add the malloc to the start of the generated code (polly.start) and the free at the end of it (polly.exiting).
1431

The variable FreedArray isn't used anywhere.

lib/Exchange/JSONExporter.cpp
718

There is no necessity to store temporary StringRef for a literal string. AllocationString == "head" should just work.

test/Isl/CodeGen/MemAccess/create_arrays_heap.ll
2

mem2reg does not need to be part of a test. You can invoke opt create_arrays_heap.ll -mem2reg -S and use that output for the test case.

Does this test require 2>&1?

simbuerg added inline comments.Jun 9 2017, 2:03 AM
lib/CodeGen/IslNodeBuilder.cpp
1423–1427

Wait, if you malloc at function entry and free at every ret of the function you will end up with problems in code that can't be modeled as a SCoP because you cannot map to the correct
'last write' in the expanded array that is required for a memory access outside of the SCoP. Therefore, you will have to do copy-in/-out before/after the SCoP to stay correct, hence you can malloc/free right at the SCoP boundary (because you need to do the copying anyway).

Use-After-Free wouldn't become an issue as soon as you malloc/free at the SCoP boundary.

simbuerg added inline comments.Jun 9 2017, 2:06 AM
include/polly/ScopInfo.h
2610

I would also recommend going for default initialization to false and provide a setter for the property.

philip.pfaffe added inline comments.Jun 9 2017, 2:57 AM
lib/CodeGen/IslNodeBuilder.cpp
1432

Scop::getExitingBlock() may return nullptr, for instance when it refers to the TopLevelRegion.

Meinersbur added inline comments.Jun 9 2017, 7:50 AM
lib/CodeGen/IslNodeBuilder.cpp
1423–1427

copy-in/out is not specific to heap arrays. Such things can be done using with additional copy statements.

For scalar expansion, the scalar are not available after the SCoP anyway, with the exception of 'escaping' scalars. In that case only a single value is available to the outside. I suggest to exclude escaping scalars at the moment.

Here, alloca/malloc are generated when entering the Scop. I think free'ing it when exiting it is the most obvious choice.

@gareevroman
Btw, for alloca this location looks to be the wrong choice. alloca's should be in a function's entry block. The SCoP itself could be within the loop, meaning that everytime the SCoP is executed, the stack grows, up to a possible stack overflow.

1423–1427

We have no control about memory accesses outside of a SCoP and should not try to modify them to read from the expanded array itself.

For scalar uses, a single LoadInst is enough for outside uses.

For array uses, one has to copy the data to the original array in any case. It might captured by a function call and/or used after the function containing the SCoP.

void function_with_scop(float A[]) {
#pragma scop
for (int i = 0; i < 128; i+=1) 
  for (int j = 0; j < 128; j+=1) 
    A[j] = ...
#pragma endscop
  print(A[getIndexToPrint()]);
}

int main() {
  float A[128];
  function_with_scop(A[128]);
  print(A[5]);
}

How do you expand A to two dimensions without a copy-back?

I suggest to consider only uses where the array/scalar is entire contained within the SCoP at first. Escaping scalars is easy to add. Copy-back can be implemented afterwards. For special cases we could avoid copy-back like this:

void function_with_scop() {
{ // generated
A_expaned = malloc(128*128*sizeof(float))
for (int i = 0; i < 128; i+=1) 
  for (int j = 0; j < 128; j+=1) 
    A_expaned[i][j] = ...
A = &A[127];
}
  use(A[5]);
}

for which we have to be able to replace all base ptrs of A.

Choice of alternatives:

void scop_in_loop() {
  for (int i = 0; i < 128; i+=1) {
#pragma scop
    for (int j = 0; j < 128; j+=1) 
      A[i] = ...
#pragma endscops
  }
}

we could generate

void scop_in_loop() {
  for (int i = 0; i < 128; i+=1)
{ // generated
  A_expanded = malloc(...);
  ...
  free(A_generated);
}
}

or

void scop_in_loop() {
  A_expanded = malloc(...);
  for (int i = 0; i < 128; i+=1)
{ // generated
  ...
}
  free(A_generated);
}
  1. allocates memory multiple times.
  2. allocates memory even if it is never used (e.g. SCoP is in an if-conditional). The memory necessary might also depend on a parameter, which is not known at the entry of a function;
void scop_in_loop() {
  int n = array->size();
  for (int i = 0; i < n; i+=1) {
#pragma scop
    for (int j = 0; j < 128; j+=1) 
      A[i] = ...
#pragma endscops
  }
}

There is also alternative 3 where we create a global of the required size:

static float A_expanded[128][128];

The would be memory inside an executable's .bss segment which most operating systems do not assign physical memory to until its first use. No free'ing required here, but we also cannot make the size dependent on some parameter. You are also polluting the host's virtual address space and get issues with threading.

I tend to go towards alternative 2.

niosega added inline comments.Jun 23 2017, 2:27 AM
lib/CodeGen/IslNodeBuilder.cpp
1423–1427

I am currently working on inserting the malloc and the free at the right place (polly.start and polly.exiting) .
But I am having trouble to find how to get the BasicBloc or the Instruction before which I must insert these two calls.
Does anybody know how to get a pointer to them ?

Meinersbur added inline comments.Jun 26 2017, 2:44 AM
lib/CodeGen/IslNodeBuilder.cpp
1423–1427

polly.start: Either pass StartBlock as an argument to allocateNewArrays or, I think the IRBuilder should still be at that position when allocateNewArrays is called.

polly.exiting: PerfMonitoring also has to get the end of the SCoP. It does so in an not-so-elegant way in CodeGeneration.cpp:195

BasicBlock *MergeBlock = SplitBlock->getTerminator()
                             ->getSuccessor(0)
                             ->getUniqueSuccessor()
                             ->getUniqueSuccessor();

At the point when allocateNewArrays is called, polly.exiting should also be just the successor of polly.start (I think). So StartBlock->getUniqueSuccessor() should be enough.

simbuerg added inline comments.Jun 26 2017, 3:00 AM
lib/CodeGen/IslNodeBuilder.cpp
1423–1427

Maybe it would be a smart idea to refactor
executeScopConditionally to return a BBPair with the polly.start, polly.exiting blocks that were created? It returns polly.start already.

simbuerg added inline comments.Jun 26 2017, 3:12 AM
lib/CodeGen/IslNodeBuilder.cpp
1423–1427

If the malloc/free take place in the freshly inserted blocks polly.start/polly.exiting? That shouldn't be possible, right?

niosega updated this revision to Diff 103990.Jun 26 2017, 10:41 AM

The changes made in this update are the following :

  • Default initialization of isOnHeap to false and implementation of a setter
  • Add a test case to JSONImporter to check the size of an imported array
  • Use the IntPtrTy of the DataLayout
  • Remove useless StringRef in JSONImporter
  • Remove useless FreedArray variable in ISLNodeBuilder
  • Remove useless -mem2reg and 2>1 in test case
  • Call to malloc and free at the correct position (polly.start and polly.exiting)

For the malloc and free positions, we have made the following modifications :

  • Take into account the patch that modify the signature of executeScopConditionnally (now returning both start and end block)
  • Set the InsertPoint before AST building to the end of the block so that the split between newly created branch and old branch is made such that the malloc call is in polly.start
niosega added inline comments.Jun 26 2017, 10:52 AM
lib/Exchange/JSONExporter.cpp
728

I need to have a pointer to the newly created ScopArrayInfo to call the setter.
But the method createScopArrayInfo returns only a const SAI *. The solution
I found is to query the Scop to obtain the SAI by name.

An alternative would be to pass a parameter to createScopArrayInfo then to
getOrCreateScopArrayInfo that represents the value that isOnHeap must take.

simbuerg added inline comments.Jun 27 2017, 1:45 AM
lib/CodeGen/CodeGeneration.cpp
241

Maybe something like this:

Explicitly set the insert point to the end of the block to avoid that a split at the builder's current
insert position would move the malloc calls to the wrong BasicBlock. Ideally we would just split
the block during allocation of the new arrays, but this would break the assumption that there are
no blocks between polly.start and polly.exiting (at this point).

No need to mention that the creation on the heap fails, because this is not the problem, this is the
result. Furthermore, polly.loop_exit has nothing to do with this. This is just the block the malloc end up in,
if you do not set the insert location to the end.

What you want to achieve is simply: Preserve the mallocs in the polly.start block. The correct solution would be
to split the BasicBlock, however that would touch all places that assume that there are only polly.start and polly.exiting.
So, this solution works and interferes with as few locations as possible.

simbuerg added inline comments.Jun 27 2017, 4:41 AM
lib/Exchange/JSONExporter.cpp
728

This is just to circumvent an inconvenient API that you want to add with this patch. I would suggest a simpler way:
Instead of depending on the setter, just add the IsOnHeap property as a function argument to
createScopArrayInfo and pass it through to getOrCreateScopArrayInfo. There you can pass it to the constructor or use
your setter.

niosega added inline comments.Jun 27 2017, 4:51 AM
lib/Exchange/JSONExporter.cpp
728

To pass the parameter through createScopArrayInfo then getOrCreateScopArrayInfo then to the constructor is what I did in the previous version of the patch. Michael and you agreed that it was not the prettiest solution. Should I change it back ?

simbuerg added inline comments.Jun 27 2017, 5:10 AM
lib/Exchange/JSONExporter.cpp
728

Well the pass-through is ugly too :-).
So far, I see 3 options:

  1. Pass it through everything.
  2. Remove the const-ness of createScopArrayInfo
  3. Make IsOnHeap mutable.

I just dislike the lookup by name to get a non-const ScopArrayInfo. Michael what do you think?
If nobody is objecting the name-lookup, I'm fine with it as well.

Meinersbur added inline comments.Jun 27 2017, 5:53 AM
lib/CodeGen/IslNodeBuilder.cpp
1419

There is ScopArrayInfo::getElemSizeInBytes() which should be the standard way to get an element's size, unless you have a reason why you need something different.

lib/Exchange/JSONExporter.cpp
718

The additional range check could be committed separately, it look unrelated.
(Andreas can just commit it with its test case)

728

I prefer removing the const which IMHO serves no purpose. Some of its methods like updateElementType already do modify the SAI, and it should not matter how one gets the reference to it.

Scop::updateAccessDimensionality() already does an const_cast in order to be able to call updateElementType.

niosega updated this revision to Diff 104157.Jun 27 2017, 7:13 AM

This update modify two things :

  • Modifying the comment in CodeGeneration about the changes of insert point.
  • Remove the constness of CreateScopArrayInfo and getOrCreateScopArrayInfo and call directly the setter setIsOnHeap in JSONImporter.

We're getting close IMHO. I will commit the small range-check patch tonight.

lib/Exchange/JSONExporter.cpp
718

I will tonight.

741

You don't need to redirect from CString to StringRef for a simple string comparison.

Try:

for (; ArrayIdx < Arrays.size(0; ArrayIdx++) {
  auto &Array = Arrays[ArrayIdx];
  ... (Replace usage of Arrays[ArrayIdx] with Array ...
  if (Array.isMember("allocation") {
    NewSAI->setIsOnHeap(Array["allocation"].asString() == "heap");
  }

Once you extract the new ImportArrays test case (either create another review or ask Andreas to commit directly), IMHO this is ready to be committed as well.

test/Isl/CodeGen/MemAccess/create_arrays_heap.ll
34–39

Since we had a discussion about this, could you also CHECK for the name of the basic block this is inserted to (such as

CODEGEN: polly.start:

)

43–48

Here as well, e.g.,

CODEGEN: polly.exiting:
test/JSONExporter/ImportArrays/create_arrays_heap___%for.cond1.preheader---%for.end18.jscop.transformed
15 ↗(On Diff #104157)

Coul you clean-up the test case by only reducing the size (I think only this line is relevant. the others can be removed by also removing theit accesses in the .ll file)
Also, try to give it a more meaningful name (e.g. rename the function to "ImportArrays_Negative_size") and add the original, unmodified .jscop file as written by -export-jscop.

niosega updated this revision to Diff 104238.Jun 27 2017, 1:11 PM
  • Refactor code in JSONExporter for heap allocation detection.
  • Remove the test case concerning
  • Add CHECK in the test case for malloc/free to check if the inserting positions are correct.
niosega updated this revision to Diff 104250.Jun 27 2017, 1:29 PM

Allright, I just committed the size-check with test. From my side this patch is ready as soon as:

  • Rebase is done
  • Commit message gives a usefull description of what this patch does.

Currently the message is as follows:

Add the option to allocate arrays on heap instead of stack while creating new arrays.

For now, a cli option is used to enable the heap allocation. But at the end, this information will be included in the imported json file.

To allocate on the heap, I use the CreateMalloc function with the following parameters :

Instruction * InsertBefore : The same instruction used for stack allocation.
Type * IntPtrTy : For now, a int64.
Type * AllocTy : The type of an element of the array.
Value * AllocSize : The size of an element of the array.
Value * ArraySize : The product of the size of all dimensions.
Function * MallocF : nullptr to use the default malloc function.
const Twine &Name : The name of the SAI.

What we need here is a description of the patch, something that
answers:

  • Where do we add what?
  • When is it freed?
  • Why do we need this?
niosega updated this revision to Diff 104356.Jun 28 2017, 1:10 AM

Rebase with master to take into account the commit of the check size for JSONImporter.

niosega edited the summary of this revision. (Show Details)Jun 28 2017, 4:14 AM

Could you change getPrimitiveSizeInBits() to getElemSizeInBytes() or explain why getPrimitiveSizeInBits() is needed here? Othewise, LGTM.

niosega updated this revision to Diff 104387.Jun 28 2017, 5:39 AM

There are no valid reasons not to use getElemSizeInBytes instead of getPrimitiveSizeInBits. This solution (with getElemSizeInBytes) is much more cleaner.

LGTM, I am going to commit...

Meinersbur accepted this revision.Jun 28 2017, 6:01 AM
This revision is now accepted and ready to land.Jun 28 2017, 6:01 AM
This revision was automatically updated to reflect the committed changes.