This is an archive of the discontinued LLVM Phabricator instance.

Scalar/PHI code genration
ClosedPublic

Authored by jdoerfert on Feb 9 2015, 10:34 AM.

Details

Summary
Added scalar and phi code generation to the backend, hence making the polly
prepare and independent block pass obsolete.

Initial statistics with the LLVM test suite show that we detect now more
SCoPs and while they are smaller (wrt. #basic blocks), they still cover the
same source lines. Additionally, the compile time was decreased for various
benchmarks while the execution time varies only very little.

TODO: - add more of my unit tests
      - create unit tests for the special cases that occured in lnt
      - add more explaination in the commit message
      - do a more comprehensive comparison with the polly prepare and
        independent blocks passes.

FIXME: There is/was one failing lnt test (MultiSource/Applications/oggenc),
       however the problematic SCoP doesn't even contain scalar
       accesses but only real memory accesses with simple __and__
       existentially quantified access functions. For now it's safe to
       assume this is just a new, failing, SCoP, not a error in this
       patch.

Diff Detail

Repository
rL LLVM

Event Timeline

jdoerfert updated this revision to Diff 19598.Feb 9 2015, 10:34 AM
jdoerfert retitled this revision from to Scalar/PHI code genration.
jdoerfert edited the test plan for this revision. (Show Details)
jdoerfert updated this object.
jdoerfert added subscribers: Restricted Project, Unknown Object (MLST).
jdoerfert added inline comments.Feb 9 2015, 10:38 AM
include/polly/CodeGen/BlockGenerators.h
267 ↗(On Diff #19598)

The LTS param was deleted by accident.

grosser edited edge metadata.Feb 9 2015, 11:35 PM

Hi Johannes,

the general structure of the patch looks very good. I started directly to go through it in detail to try to understand what finer details and the corner cases/difficulties you have addressed.

I did not yet manage to finish the patch, but wanted to already send you my first comments.

include/polly/CodeGen/BlockGenerators.h
121 ↗(On Diff #19598)

Is there a reason you use two different maps. As an instruction is either a
scalar or a phi, you could just store them in a single map, no? This might avoid the need to later choose between the two.

While reading further through the source code, it seems a PHI instruction may have two allocations. One for the incoming values and one for the outcoming values. Maybe it would be worth a comment and an example here that explains which values belong where and why PHI nodes can require two different allocs.

lib/CodeGen/BlockGenerators.cpp
461 ↗(On Diff #19598)

The non-PHI case is pretty straightforward, but it becomes unclear due to the mix of PHI and non-PHI code. My comments below try to point out how those code paths could be made more separate.

463 ↗(On Diff #19598)

This could be folded into the if(ScalarBasePHI) condition, no?

465 ↗(On Diff #19598)

artificial.

476 ↗(On Diff #19598)

A single map might avoid this ternary condition on PHIOpMap, no?

Also, moving this below the if(ScalarBasePHI) block, would allow to move the continue for artificial self write accesses into the ScalarBasePHI block as well. Like this there is just a single block that contains all the PHI special cases.

482 ↗(On Diff #19598)

You could possibly put this into an

if (!ScalarBasePHI) {
ScalarValue = ScalarInst;
} else {

}

This would make clear that these are two exclusive code paths. One for the common case, one for the PHI.

484 ↗(On Diff #19598)

PHI? You sometimes write it uppercase, otherwise lowercase.

509 ↗(On Diff #19598)

typo.

512 ↗(On Diff #19598)

Why would we reload a value to directly store it back? Could we instead just neither load nor store the value?

543 ↗(On Diff #19598)

PHI?

562 ↗(On Diff #19598)

typo

594 ↗(On Diff #19598)

typo

answers

include/polly/CodeGen/BlockGenerators.h
121 ↗(On Diff #19598)

For almost all corner cases and most of the decisions I need to add unit tests (I know).

Nevertheless, I am quite sure you need two allocations for a PHI node, the general test case structure looks like:

x1 = ...
for (i=0...N) {
  x2 = phi(x1, add)
  add = x2 + A[i];
}

print(x1)
print(x2)
print(add)
lib/CodeGen/BlockGenerators.cpp
463 ↗(On Diff #19598)

It is used twice so I moved it out of the two if cases.

476 ↗(On Diff #19598)

We could extract the phi path probably, but I still need we need 2 maps.

482 ↗(On Diff #19598)

ok

484 ↗(On Diff #19598)

True. Which do you prefer?

512 ↗(On Diff #19598)

Unit test idea:

bb1:
  %x = 
  br %cond, label %then, label %else

then:
  br label %merge

else:
  %y = 

merge:
  %p = phi[(%x, %then), (%y, %else)]

Now we have a write access in the %then block which needs to reload %x and store it in the operand location of %p.

Second set of comments. Again, mostly comments to help me understand better.

include/polly/CodeGen/BlockGenerators.h
121 ↗(On Diff #19598)

Right. I agree that two different allocations might be needed. If you could comment in the code (possibly with example), why PHI nodes need two allocations that would be great. Also, I wonder if you can somehow explain which of the two allocations is "normal" and ends up in the ScalarMap and which is special and ends up in the PHIOpMap, that would help my understanding.

153 ↗(On Diff #19598)

You explain that this function generates both the loads and the stores of a PHI node, but it is not 100% clear to me why these to need to be generated together, instead of just using the same code-generating for generating their loads/stores as is used for normal scalar loads & stores. Is this for correctness reasons or something else? (In fact, for some PHIs you seem to do so, only self reverencing PHIs seem to be special).

lib/CodeGen/BlockGenerators.cpp
321 ↗(On Diff #19598)

I do not understand why this condition (and the forwarding of the instructions in copyInstruction) is necessary. If I remove the full if (InstCopy) block, all test cases still pass for me. Is the BBMap not already updated when the statements are copied?

For example:

 // Compute NewStore before its insertion in BBMap to make the insertion
 // deterministic.
​ BBMap[Store] = NewStore;
​ return NewStore;
463 ↗(On Diff #19598)

I missed the second use.

It is in the special case where you check if a PHI is self-referencing. (I commented earlier that I do not yet understand why stores caused by self-referencing PHIs can not be handled in generateScalarStores. If they could, the second use would not be needed any more).

484 ↗(On Diff #19598)

I do not have a preference. Maybe just check:

grep -R " PHI " lib/ | wc
grep -R " phi " lib/ | wc

and use the one that is more common in Polly.

512 ↗(On Diff #19598)

Ah, I see. This is also PHI node specific. This indeed makes a lot of sense. mentioning this in the comment and/or folding this into the PHI block might make this more obvious

Initial comments.

lib/CodeGen/BlockGenerators.cpp
321 ↗(On Diff #19598)

During the development this looked different and when I changed it I assumed that if there is a inst copy there is a mapping, however for synthezisable instructions there is none (currently), the same holds for phi nodes (but then there is no inst copy either).

461 ↗(On Diff #19598)

I will seperate them as far as possible.

Hi Johannes,

a last minor comment.

Also, to make it clear. I think the general structure of the patch is great. There are some parts where I try to give some comments to better understand the patch or to improve readability, but none is affecting the general structure of the patch.

Looking forward to an updated version,
Tobias

lib/CodeGen/BlockGenerators.cpp
321 ↗(On Diff #19598)

I understand partially. ;-)
Are you saying this is needed for PHI nodes and synthezisable instructions, but we just don't have a test case? Or this was needed at some point, but is not needed any more?

jdoerfert updated this revision to Diff 24874.May 4 2015, 3:55 AM
jdoerfert edited edge metadata.

New Version that handles loops in non-affine regions. Passes all lnt tests with and without scalar/phi modeling.

grosser accepted this revision.May 5 2015, 10:00 AM
grosser edited edge metadata.

Hi Johannes,

this patch looks great. Thanks for pushing this ahead.

I added a couple of comments. Mostly minor typos. There is one comment on stuff that could be split off into a separate patch (would be nice, but not essential) as well as one possible correctness issue in the linked-list implementation.

Otherwise, this LGTM.

Also, we should try to enable this quickly. I think just having LNT performance numbers for the latest patch that confirm we do not badly regress should be enough to justify this switch.

Best,
Tobias

include/polly/CodeGen/BlockGenerators.h
297 ↗(On Diff #24874)

This change is not needed. I would suggest to remove it from the patch to not distract from the core changes.

342 ↗(On Diff #24874)

@returns missing

include/polly/CodeGen/IslNodeBuilder.h
107 ↗(On Diff #24874)

This (and below)) seems like an unrelated change. Maybe commit separately.

lib/Analysis/ScopInfo.cpp
888 ↗(On Diff #24874)

Is this correct? In case we add three memory accesses A,B,C. Would this code not first add A to the map, than add a link from A to B, and then overwrite the link to B with C, which results in us loosing the link to B entirely?

I somehow feel the manually implemented linked list adds unnecessary complexity and also somehow mixes the MemoryAccess definition and
the InstructionToAccess mapping. I wonder if it might not be clearer to make the InstructionToAccess Map a mapping from instructions to a
vector of accesses.

lib/CodeGen/BlockGenerators.cpp
278 ↗(On Diff #24874)

As this function only returns a Value, there is no need to change this to 'Instruction' (similar comment below). To not distract from the core change, I would suggest to avoid this change (or perform it in a separate patch).

360 ↗(On Diff #24874)

This patch modifies various functions to pass the newly created instruction on. I somehow believe to remember you earlier passed these on for some other use
case, but the current patch only seems to need these changes for the assert.

If the assert has shown helpful during development, it seems reasonable to commit it (and the associated changes). However, in the optimal case, this would be a separate patch committed ahead of time. If that is to much work, I do not insist on separating this out.

Also, a general phabricator issue is that the updated commit messages are not visible. It would be nice if you could add a comment in the commit message that explains the intention of these set of changes (to help people understand that they are not strictly necessary for the core goal of this patch).

419 ↗(On Diff #24874)

an

483 ↗(On Diff #24874)

an

489 ↗(On Diff #24874)

out of the region

494 ↗(On Diff #24874)

grammar?

536 ↗(On Diff #24874)

Use a range-based for loop?

1220 ↗(On Diff #24874)

Unfinished sentence.

lib/CodeGen/IslCodeGeneration.cpp
141 ↗(On Diff #24874)

leftover debugging stuff?

This revision is now accepted and ready to land.May 5 2015, 10:00 AM

Quick comment about lnt:

We mostly benefit from this (as 5 runs on my laptop show) and do not regress to badly anywhere.

My last concern:

I haven't tested this with parallel code generation yet. We need to do that before enabling it (but not before commiting it though).
include/polly/CodeGen/BlockGenerators.h
121 ↗(On Diff #19598)

I added a comment for both maps to describe what they are used for.

153 ↗(On Diff #19598)

I removed this. I use generateScalarStorePHI now from generateScalarStore and the ge cnerateScalarLoad code is easy for both cases.

lib/CodeGen/BlockGenerators.cpp
321 ↗(On Diff #19598)

I put synthezised instructions in the BBMap and removed the conditional you asked about.

Quick comment about lnt:

We mostly benefit from this (as 5 runs on my laptop show) and do not regress to badly anywhere.

My last concern:

I haven't tested this with parallel code generation yet. We need to do that before enabling it (but not before commiting it though).

I will change the stuff according to the comments (including the spelling) and then commit it.

lib/Analysis/ScopInfo.cpp
888 ↗(On Diff #24874)

We add A and MA is null.

We add B and MA is now A (as it was already in the Inst2Acc map)
As MA is not null we link B (the last access) to A:

B -> A

and change the mapping in Inst2Acc to B:

Inst2Acc[AccessInst] ==> B

This way nothing is lost and in the end it looks like:

Inst2Acc[AccessInst] ==> C -> B -> A

We could map it to a vector but I thought the overhead would be unproportional. If you think it is worth it I can change it again.

lib/CodeGen/BlockGenerators.cpp
278 ↗(On Diff #24874)

Ok.

360 ↗(On Diff #24874)

This might be true, I'll check if I can revert the return Instruction/Value part.

I will change the stuff according to the comments (including the spelling) and then commit it.

OK. I commented on the last issue that could need some feedback.

Tobias

lib/Analysis/ScopInfo.cpp
888 ↗(On Diff #24874)

Alright! I got a little confused by the use of MemAccs.back() as well as the *&MA items, which works both as a pointer that is passed to setNextMA() as well as a way to update Inst2Acc.

Instead of using a vector we can also use std::forward_list, which should have a cost similar to what we have today, but it makes it explicit that Inst2Acc is indeed mapping to a list of Accesses.

auto NewAccess = new MemoryAccess(Access, AccessInst, this, SAI);
MemAccs.push_back(NewAccess);

if (!InstructionToAccess.count(AccessInst))
  InstructionToAccess[AccessInst] = {NewAccess};
else
  InstructionToAccess[AccessInst].push_front(NewAccess);

In addition, the small, but needed changes to lookupAccessFor() would clarify that there are indeed multiple accesses per instruction. On the other side, all the manual list-management code would not be needed and we get some proper iterators for this as well.

I'll use some standard container but I won't change it today. Thanks!

Hi Johannes,

I just got fresh performance numbers for this patch. To me it looks as if it gives both, nice performance and nice compile time improvements:

I saw a couple of errs() that you still left in and there is the standard-container thing, but both seem minor issues. Any plans to commit this patch soon?

Best,
Tobias

This revision was automatically updated to reflect the committed changes.