This is an archive of the discontinued LLVM Phabricator instance.

BlockGenerators: Replace getNewScalarValue with getNewValue
ClosedPublic

Authored by grosser on Jan 24 2016, 10:36 AM.

Details

Summary

Both functions implement the same functionality, with the difference that
getNewScalarValue assumes that globals and out-of-scop scalars can be directly
reused without loading them from their corresponding stack slot. This is correct
for sequential code generation, but causes issues with outlining code e.g. for
OpenMP code generation. getNewValue handles such cases correctly.

Hence, we can replace getNewScalarValue with getNewValue. This is not only more
future proof, but also eliminates a bunch of code.

The only functionality that was available in getNewScalarValue that is lost
is the on-demand creation of scalar values. However, this is not necessary any
more as scalars are always loaded at the beginning of each basic block and will
consequently always be available when scalar stores are generated. As this was
not the case in older versions of Polly, it seems the on-demand loading is just
some older code that has not yet been removed.

Finally, generateScalarLoads also generated loads for values that are loop
invariant, available in GlobalMap and which are preferred over the ones loaded
in generateScalarLoads. Hence, we can just skip the code generation of such
scalar values, avoiding the generation of dead code.

Diff Detail

Repository
rL LLVM

Event Timeline

grosser updated this revision to Diff 45828.Jan 24 2016, 10:36 AM
grosser retitled this revision from to BlockGenerators: Replace getNewScalarValue with getNewValue.
grosser updated this object.
grosser added reviewers: jdoerfert, Meinersbur.
grosser added subscribers: llvm-commits, pollydev.
Meinersbur accepted this revision.Jan 24 2016, 2:26 PM
Meinersbur edited edge metadata.

Bu coincidence, I was working on the very same patch. Since rebasing D15687, a new bug appear for which to solve this is needed. I was still in the process of testing whether there are no other problems. My version of this patch is here

lib/CodeGen/BlockGenerators.cpp
412 ↗(On Diff #45828)

This is fixing a symptom only. The MemoryAccess shouldn't exist in the first place if we are going to ignore it.

In fact, invariant loads do not need any MemoryAccesses inserted at all. One can just use the loaded llvm::Value as it is inserted before the generated code. In case the load is conditional, inserting a phi with an undef on one side is simple.

1222 ↗(On Diff #45828)

This is wrong; we don't need the loop where the value is defined but where it is used. The difference comes in to play when the value is defined in the loop, but we write its scalar value after the loop, thus use the value after the last loop iteration.

test/Isl/CodeGen/invariant_load_scalar_escape_alloca_sharing.ll
14 ↗(On Diff #45828)

This store became useless as well, didn't it?

This revision is now accepted and ready to land.Jan 24 2016, 2:26 PM

Hi Michael,

thank you for this fast review.

lib/CodeGen/BlockGenerators.cpp
412 ↗(On Diff #45828)

Right.

However, I am not sure that dropping the memory access is the solution we want, as this would mean we do not represent this read-access in our polyhedral access functions (http://llvm.org/PR25107). This causes trouble in case we want to know the precise set of data-locations read e.g. for kernel outlining or other things (even though it currently works with our OpenMP code generation). I think the right solution is to not add the InvariantLoads to GlobalMap, but to instead model them as normal read/write memory accesses (which are code-generated accordingly).

As this is a little bit more involved and probably also requires some discussion, I would prefer to discuss and address this issue independently. Regarding this patch, I could just leave out the change above and add a FIXME saying that we currently generate redundant memory accesses as we
directly read data from the invariant-load-motioned register.

1222 ↗(On Diff #45828)

Is this not what is happening? ScalarInst comes from getAccessInstruction(), which is defined to return:

/// For memory accesses of kind MK_Value the access instruction of a load      
/// access is the instruction that uses the load. The access instruction of    
/// a write access is the instruction that defines the llvm::Value.

Now, the original code was slightly different. It was using: getLoopForInst(ScalarValueInst) where ScalarValueInst was the definition of the ScalarValueInst.

So it seems the behavior was changed (maybe needs a test case?), but the result should be more correct? In case I got it wrong, could you suggest what value to use otherwise.

(In case we drop the MA->getAccessInstruction() mapping for scalar values, we can probably just use the loop around the entry block.)

test/Isl/CodeGen/invariant_load_scalar_escape_alloca_sharing.ll
14 ↗(On Diff #45828)

Right. It is also redundant.

Meinersbur added inline comments.Jan 25 2016, 4:49 AM
lib/CodeGen/BlockGenerators.cpp
412 ↗(On Diff #45828)

OK

1222 ↗(On Diff #45828)

This is true for accesses of type MK_Value (where the definition must necessarily be in the same Stmt), but not for MK_PHI or MK_ExitPHI. Here, getAccessInstruction() returns the incoming value (at least before D15681) which might be defined anywhere before. Eg.

loop:
  %i = phi i32 [ 0, %entry ], [ %i.inc, %loop ]
  %i.inc = add nsw i32 %i, 1
  %cmp5 = icmp slt i32 %i.inc, 2
  br i1 %cmp5, label %exit, label %loop

exit:
  br label %join

join:
  %phi = phi i32 [i.inc, %join]

There'd be a PHI write in Stmt_exit, writing the value of %i.inc. Hence, the AccessInstruction of that would be %i.inc. Synthesizing it within loop (getLoopForInt(%i.inc)) will expand to something dependent on %i, whereas at %exit the correct value is 2.

You might look into my version of the patch for how I solved the issue.

Test case.

This might currently not exploitable because getNewScalarValue
prioritizes values from BBMap over synthesizable ones. D15706 should
remove such memory accesses for synthesizable values.

Michael

grosser updated this revision to Diff 45860.Jan 25 2016, 6:05 AM
grosser edited edge metadata.

Address Michael's review comments

Hi Michael,

thanks for the comments. I added your test case and addressed the PHI/EXITING_PHI issue. Could you confirm I did not miss anything and this is now ready to go?

Best,
Tobias

lib/CodeGen/BlockGenerators.cpp
1222 ↗(On Diff #45828)

I looked at your code and now derive the loop from getRegion->getEntry().

You derive it from getExit(). This might be incorrect if the region is on a backedge and the exit is the header of a loop that dominates the original region. This is highly unlikely, though. getEntry() on the other hand seems to be save in general as it is itself within the scop region.

Meinersbur added inline comments.Jan 25 2016, 7:12 AM
lib/CodeGen/BlockGenerators.cpp
1216 ↗(On Diff #45860)

I see the problem with getExit(), but I am not yet convinced it is wrong. If it's the backedge that means that we did not leave that loop, i.e. passing that loop to the SCEV expander is correct.

On the other side, getEntry() might be wrong as well. With -polly-allow-nonaffine-loops, there might be a loop in the non-affine subregion as well and the SCEV depend on that. If the entry is part of the loop, SCEV expander will generate an expression depending on its induction variable, but the point of evaluation is the edge to the exit block, which is not part of the contained loop.

ASCII-Art:

 _____
/      Entry
\      /       
 - loop       
      |               
 loop.after  
       |  \
       |   side
       |   /
       Exit

Let's say the loop Entry=>loop has a non-affine exit condition (such as "i*i <n" which AFAIK is supported by SCEV). With -polly-allow-nonaffine-loops, the non-affine subregion should be Entry=>Exit. Exit has a phi:

Exit:

%phi = phi i32 [%i.inc, %loop.after], [0, %side]

%i.inc needs to be evaluated in %loop.after for the MK_PHI MemoryAccess, but if passing the Loop Entry=>loop to it, we expand to an expression that depends on the loop's iv.

This revision was automatically updated to reflect the committed changes.
grosser added inline comments.Jan 26 2016, 2:13 AM
lib/CodeGen/BlockGenerators.cpp
1216 ↗(On Diff #45860)

Hi Michael,

for now I took your getExit() and committed the patch. I am not 100% certain this is the right choice, but have currently difficulties to write working test cases as the non-affine loop domain generation is broken (http://llvm.org/PR26309, http://llvm.org/PR26310). As this code does not yet seem to be tested, is not enabled by default and seems to have been broken (for this corner case) before, I don't think this should block this patch. Hence, I pushed this out in https://llvm.org/svn/llvm-project/polly/trunk@258799

I also opened a bug report to track this issue until the domain generation is fixed: http://PR26311

Thanks for committing. I will go an and commit my patches on top of it.

lib/CodeGen/BlockGenerators.cpp
1216 ↗(On Diff #45860)

Hi Tobias,

given that loops within non-affine subregions are currently unsupported, I think it might have been safer to go with the getEntry() version because then we can assume there is no additional loop that the entry block is part of, but not all of the region.

Rethinking my argument, it is probably wrong too: we could enter a new loop (header) that is not part of the non-affine subregion.

What we are really looking for is the loop of the subregion's exiting edges. We could get it e.g.:

  • Take the top entry of ScopStmt's NestLoops (if not empty)
  • Get the innermost loop that contains both, the subregion exiting and exit node.