This is an archive of the discontinued LLVM Phabricator instance.

[LoopFlatten] Widen IV, cont'd
ClosedPublic

Authored by SjoerdMeijer on Nov 18 2020, 2:07 AM.

Details

Summary

I disabled the widening in fa5cb4b because it run in an assert. I forgot that an extend could also be a zero-extend, which I have added now.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Nov 18 2020, 2:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 18 2020, 2:07 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
SjoerdMeijer requested review of this revision.Nov 18 2020, 2:07 AM
dmgreen added inline comments.Nov 18 2020, 11:06 AM
llvm/lib/Transforms/Scalar/LoopFlatten.cpp
343

dyn_cast -> isa

344

dyn_cast -> cast

551–552

Some of the formatting is still a little off here.

557

I'm not sure I understand any more. Should we not be replacing it with trunc(OuterInductionPHI) ?

Can you add a test where it (the o*I+i value) is not zext or sext?

SjoerdMeijer added inline comments.Nov 18 2020, 1:20 PM
llvm/lib/Transforms/Scalar/LoopFlatten.cpp
557

I'm not sure I understand any more. Should we not be replacing it with trunc(OuterInductionPHI) ?

After widening we have e.g. this pattern:

%indvar = phi i64 [ %indvar.next, %for.body3.us ], [ 0, %for.cond1.preheader.us ]
%3 = trunc i64 %indvar to i32
%add.us = add i32 %3, %mul.us
%idxprom.us = zext i32 %add.us to i64

The linear IV user in this example is:

%add.us = add i32 %3, %mul.us

We don't want to be replacing this %add.us value which is a i32 value, because it will indeed be replaced by OuterInductionPhi, which is i64 value after widening. This was the assertion is was talking about.
After widening, the value that we should be replacing is zext user which is %idxprom.us in this case.
After widening, we have this IV -> Trunc->LinearIV ->Ext pattern, which is what we are matching here.

I will add a comment to clarify this.

Can you add a test where it (the o*I+i value) is not zext or sext?

I think these cases are present in the original test test/Transforms/LoopFlatten/loop-flatten.ll.

Fixed casts, formatting and added some comments.

dmgreen added inline comments.Nov 18 2020, 2:11 PM
llvm/lib/Transforms/Scalar/LoopFlatten.cpp
557

Hmm. But, don't we start with something that looks like:

for i32 outer = 0..n
  for i32 inner = 0..m
    use(outer * m + inner)

We widen the IV's so they become:

for i64 outer = 0..zext(n)
  for i64 inner = 0..zext(m)
    use(trunc(outer) * m + trunc(inner))

And we want to replace that with a single

for i64 outer = 0..zext(n)*zext(m)
  use(trunc(outer))

We have not proved that the original did not overflow, so if it does we need to use the original truncated i32 value, not the i64 version of it directly.

xbolva00 added inline comments.
llvm/lib/Transforms/Scalar/LoopFlatten.cpp
551–552

isa?

SjoerdMeijer added inline comments.Nov 18 2020, 3:13 PM
llvm/lib/Transforms/Scalar/LoopFlatten.cpp
557

Do you mean overflow in the original

outer * m + inner

Expression? Is that relevant?

Will check when I am back at my desk, but I think after widening we have:

Use(outer)

Without the trunc.

dmgreen added inline comments.Nov 19 2020, 1:05 AM
llvm/lib/Transforms/Scalar/LoopFlatten.cpp
557

Hmm. But Use in this case is an i32. We need to do something to outer (an i64) to get is back to an i32. This patch seems to be assuming that Use will be either a sext or a zext to the widened type. I think if it's not, it will still hit the assert (and if it is, would use the wrong value once outer * m + inner doesn't fit into the smaller type.)

SjoerdMeijer added inline comments.Nov 19 2020, 2:00 AM
llvm/lib/Transforms/Scalar/LoopFlatten.cpp
557

This pass is very restrictive in what it currently supports; it is pattern matching very specific patterns. If there are other users this pass will bail, for example

Found use of inner induction variable:   
Did not match expected pattern, bailing

or

Found use of outer induction variable
Did not match expected pattern, bailing

The assumptions are covered with checks. And when it comes to replacing values, we are safe because we are only replacing values in the loop update which have been widened.

Thanks, fixed isa, looks like I keep forgetting about that.

dmgreen added inline comments.Nov 19 2020, 4:33 AM
llvm/lib/Transforms/Scalar/LoopFlatten.cpp
557

What happens in the zext test if it is changed from:

%arrayidx.us = getelementptr inbounds i16, i16* %A, i64 %idxprom.us

to

%arrayidx.us = getelementptr inbounds i16, i16* %A, i32 %add.us

Also, consider a simpler example where we have:

for i8 outer = 0..n
  for i8 inner = 0..m
    use(i32 zext(outer * m + inner))

If we widen the IV's to i32's for example, the call to use() should still get a value between [0..255], even if the n*m was higher than that (and so outer * m + inner overflows). It would wrap in the original, and still needs to wrap in the final version. For i32->i64 the values will be a lot higher, but the same principle applies.

I'm guessing that if it was a trunc(outer * m + inner) instead, the sext(trunc(..)) in the original case would not naturally simplify nicely?

SjoerdMeijer added inline comments.Nov 19 2020, 5:26 AM
llvm/lib/Transforms/Scalar/LoopFlatten.cpp
557

What happens in the zext test if it is changed from:

%arrayidx.us = getelementptr inbounds i16, i16* %A, i64 %idxprom.us

to

%arrayidx.us = getelementptr inbounds i16, i16* %A, i32 %add.us

We will end up with:

%indvar = phi i64 [ 0, %for.cond1.preheader.us ]
%3 = trunc i64 %indvar to i32
%add.us = add i32 %3, %mul.us
%idxprom.us = zext i32 %add.us to i64
%arrayidx.us = getelementptr inbounds i16, i16* %A, i32 %add.us

if this is the snippet you're interested in, because
.

 Replacing:   %idxprom.us = zext i32 %add.us to i64
with:        %indvar1 = phi i64 [ %indvar.next2, %for.cond1.for.inc7_crit_edge.us ], [ 0, %for.cond1.preheader.us.preheader ]

Which leaves %idxprom.us dead.

Also, consider a simpler example where we have:

for i8 outer = 0..n

for i8 inner = 0..m
 use(i32 zext(outer * m + inner))

Like I said, the pass is very restrictive, and we match very specific patterns. The ZExts are in the way here, we don't recognise the increment and we bail.

SjoerdMeijer updated this revision to Diff 306520.EditedNov 19 2020, 1:21 PM

Sorry Dave, think I took a wrong exit somewhere, but hopefully back on track now.
I took your example and have added that as a test case. That was actually also asserting because of values with different types, another confirmation that something was wrong. Anyway, when we replace values, we now replace that with trunc(outerPhi) if the phi has been widened, as you suggested earlier I think. Let me know what you think while I double check and run some more code.

dmgreen added inline comments.Nov 20 2020, 1:18 AM
llvm/lib/Transforms/Scalar/LoopFlatten.cpp
551–552

Is is better to introduce a new trunc of FI.OuterInductionPHI to the correct bitwidth? I'm a little worried that this is just finding _some_ trunc, not necessarily one that it should. It may introduce more truncs but they should get cleared up. It would also prevent using something that did not dominate.

Maybe it is fine like this, if it is know that the widening will have introduced a trunc. Can it at least check the type of the trunc is correct? And add a comment saying it should have been added by widening.

557

Finding the Value *OuterValue = FI.OuterInductionPHI; if (... can be outside of the loop.

SjoerdMeijer added inline comments.Nov 20 2020, 8:01 AM
llvm/lib/Transforms/Scalar/LoopFlatten.cpp
551–552

Since we promote the IV there has to be a trunc back to its users. What I see is that there is 1 trunc instruction, and then different zexts instructions of the IV value that to different users if they have different types. This means there is 1 trunc instruction, but you're right that this is not the whole story and something is missing at the moment. So, we will need a generic way to map the different values with the different users, if there are any. I think, for now, I will add a check a bit earlier in the pipeline to bail if we find more than 1 zext user of that trunc. That won't be optimal, but I am keen to start somewhere with this. And of course this patch in its current shape is still running in an assert when I just tried it out with different trunc users slightly modifying your example case:

void test(char n, char m) {
for(char i = 0; i < n; i++)
  for(char j = 0; j < m; j++) {
    char x = i*m+j;
    use_32(x);
    use_16(x);
  }
}

Insert a trunc of the outer loop IV for each use, and use that to replace values. Added various tests.

dmgreen accepted this revision.Nov 23 2020, 12:22 AM

Thanks. LGTM

This revision is now accepted and ready to land.Nov 23 2020, 12:22 AM
This revision was automatically updated to reflect the committed changes.