I disabled the widening in fa5cb4b because it run in an assert. I forgot that an extend could also be a zero-extend, which I have added now.
Details
Diff Detail
Event Timeline
llvm/lib/Transforms/Scalar/LoopFlatten.cpp | ||
---|---|---|
343 | dyn_cast -> isa | |
344 | dyn_cast -> cast | |
551–552 | Some of the formatting is still a little off here. | |
557 | I'm not sure I understand any more. Should we not be replacing it with trunc(OuterInductionPHI) ? Can you add a test where it (the o*I+i value) is not zext or sext? |
llvm/lib/Transforms/Scalar/LoopFlatten.cpp | ||
---|---|---|
557 |
After widening we have e.g. this pattern: %indvar = phi i64 [ %indvar.next, %for.body3.us ], [ 0, %for.cond1.preheader.us ] %3 = trunc i64 %indvar to i32 %add.us = add i32 %3, %mul.us %idxprom.us = zext i32 %add.us to i64 The linear IV user in this example is: %add.us = add i32 %3, %mul.us We don't want to be replacing this %add.us value which is a i32 value, because it will indeed be replaced by OuterInductionPhi, which is i64 value after widening. This was the assertion is was talking about. I will add a comment to clarify this.
I think these cases are present in the original test test/Transforms/LoopFlatten/loop-flatten.ll. |
llvm/lib/Transforms/Scalar/LoopFlatten.cpp | ||
---|---|---|
557 | Hmm. But, don't we start with something that looks like: for i32 outer = 0..n for i32 inner = 0..m use(outer * m + inner) We widen the IV's so they become: for i64 outer = 0..zext(n) for i64 inner = 0..zext(m) use(trunc(outer) * m + trunc(inner)) And we want to replace that with a single for i64 outer = 0..zext(n)*zext(m) use(trunc(outer)) We have not proved that the original did not overflow, so if it does we need to use the original truncated i32 value, not the i64 version of it directly. |
llvm/lib/Transforms/Scalar/LoopFlatten.cpp | ||
---|---|---|
551–552 | isa? |
llvm/lib/Transforms/Scalar/LoopFlatten.cpp | ||
---|---|---|
557 | Do you mean overflow in the original outer * m + inner Expression? Is that relevant? Will check when I am back at my desk, but I think after widening we have: Use(outer) Without the trunc. |
llvm/lib/Transforms/Scalar/LoopFlatten.cpp | ||
---|---|---|
557 | Hmm. But Use in this case is an i32. We need to do something to outer (an i64) to get is back to an i32. This patch seems to be assuming that Use will be either a sext or a zext to the widened type. I think if it's not, it will still hit the assert (and if it is, would use the wrong value once outer * m + inner doesn't fit into the smaller type.) |
llvm/lib/Transforms/Scalar/LoopFlatten.cpp | ||
---|---|---|
557 | This pass is very restrictive in what it currently supports; it is pattern matching very specific patterns. If there are other users this pass will bail, for example Found use of inner induction variable: Did not match expected pattern, bailing or Found use of outer induction variable Did not match expected pattern, bailing The assumptions are covered with checks. And when it comes to replacing values, we are safe because we are only replacing values in the loop update which have been widened. |
llvm/lib/Transforms/Scalar/LoopFlatten.cpp | ||
---|---|---|
557 | What happens in the zext test if it is changed from: %arrayidx.us = getelementptr inbounds i16, i16* %A, i64 %idxprom.us to %arrayidx.us = getelementptr inbounds i16, i16* %A, i32 %add.us Also, consider a simpler example where we have: for i8 outer = 0..n for i8 inner = 0..m use(i32 zext(outer * m + inner)) If we widen the IV's to i32's for example, the call to use() should still get a value between [0..255], even if the n*m was higher than that (and so outer * m + inner overflows). It would wrap in the original, and still needs to wrap in the final version. For i32->i64 the values will be a lot higher, but the same principle applies. I'm guessing that if it was a trunc(outer * m + inner) instead, the sext(trunc(..)) in the original case would not naturally simplify nicely? |
llvm/lib/Transforms/Scalar/LoopFlatten.cpp | ||
---|---|---|
557 |
We will end up with: %indvar = phi i64 [ 0, %for.cond1.preheader.us ] %3 = trunc i64 %indvar to i32 %add.us = add i32 %3, %mul.us %idxprom.us = zext i32 %add.us to i64 %arrayidx.us = getelementptr inbounds i16, i16* %A, i32 %add.us if this is the snippet you're interested in, because Replacing: %idxprom.us = zext i32 %add.us to i64 with: %indvar1 = phi i64 [ %indvar.next2, %for.cond1.for.inc7_crit_edge.us ], [ 0, %for.cond1.preheader.us.preheader ] Which leaves %idxprom.us dead.
Like I said, the pass is very restrictive, and we match very specific patterns. The ZExts are in the way here, we don't recognise the increment and we bail. |
Sorry Dave, think I took a wrong exit somewhere, but hopefully back on track now.
I took your example and have added that as a test case. That was actually also asserting because of values with different types, another confirmation that something was wrong. Anyway, when we replace values, we now replace that with trunc(outerPhi) if the phi has been widened, as you suggested earlier I think. Let me know what you think while I double check and run some more code.
llvm/lib/Transforms/Scalar/LoopFlatten.cpp | ||
---|---|---|
551–552 | Is is better to introduce a new trunc of FI.OuterInductionPHI to the correct bitwidth? I'm a little worried that this is just finding _some_ trunc, not necessarily one that it should. It may introduce more truncs but they should get cleared up. It would also prevent using something that did not dominate. Maybe it is fine like this, if it is know that the widening will have introduced a trunc. Can it at least check the type of the trunc is correct? And add a comment saying it should have been added by widening. | |
557 | Finding the Value *OuterValue = FI.OuterInductionPHI; if (... can be outside of the loop. |
llvm/lib/Transforms/Scalar/LoopFlatten.cpp | ||
---|---|---|
551–552 | Since we promote the IV there has to be a trunc back to its users. What I see is that there is 1 trunc instruction, and then different zexts instructions of the IV value that to different users if they have different types. This means there is 1 trunc instruction, but you're right that this is not the whole story and something is missing at the moment. So, we will need a generic way to map the different values with the different users, if there are any. I think, for now, I will add a check a bit earlier in the pipeline to bail if we find more than 1 zext user of that trunc. That won't be optimal, but I am keen to start somewhere with this. And of course this patch in its current shape is still running in an assert when I just tried it out with different trunc users slightly modifying your example case: void test(char n, char m) { for(char i = 0; i < n; i++) for(char j = 0; j < m; j++) { char x = i*m+j; use_32(x); use_16(x); } } |
Insert a trunc of the outer loop IV for each use, and use that to replace values. Added various tests.
dyn_cast -> isa