The early return seems to be missed. This causes a radical and wrong loop optimization on powerpc. It isn't reproducible on x86_64, because "UseInterleaved" is false.
Details
Diff Detail
Event Timeline
This test case is a bit large, can it be made smaller?
test/Analysis/LoopAccessAnalysis/interleave_innermost.ll | ||
---|---|---|
2 | You can shorten this pipeline with: |
Done.
In order to reproduce the bug, it seems to require a nested loop structure. I've reduced the test file to 54 lines, in which the nested loops are pretty obvious to me. Do you think that's simple enough?
The change itself looks good, but there are several things about the test that could be improved:
- I think you don't need to run the entire O2 pipeline on the test. For instance, if the offending optimization is "loop-vectorizer", we could extract IR before it, and replace opt -O2 with opt -loop-vectorize on this IR (to find the list of the O2 passes, run opt -O2 -debug-pass=Arguments).
- Do you use CHECK + XFAIL only to make sure that the specified lines aren't present in the output? Could you use CHECK-NOT instead? Also, if you run only one pass, it will be much easier to check that nothing unexpected happened with the IR.
- Could you please run opt -instnamer on the IR to avoid %123 and <label>:123 names? It's very painful to edit tests with such variable names (as the numbers must be consecutive).
- Minor point: you probably could remove function attributes.
Thanks,
Michael
FWIW, the following also triggers the missing early return with opt -loop-vectorize
target datalayout = "e-m:e-i64:64-n32:64" target triple = "powerpc64le-unknown-linux-gnu" define void @TestFoo(i1 %X, i1 %Y) { bb: br label %.loopexit5.outer .loopexit5.outer: br label %.lr.ph12 .loopexit: br i1 %X, label %.loopexit5.outer, label %.lr.ph12 .lr.ph12: %f.110 = phi i32* [ %tmp1, %.loopexit ], [ null, %.loopexit5.outer ] %tmp1 = getelementptr inbounds i32, i32* %f.110, i64 -1 br i1 %Y, label %bb4, label %.loopexit bb4: %j.27 = phi i32 [ 0, %.lr.ph12 ], [ %tmp7, %bb4 ] %tmp5 = load i32, i32* %f.110, align 4 %tmp7 = add nsw i32 %j.27, 1 %exitcond = icmp eq i32 %tmp7, 0 br i1 %exitcond, label %.loopexit, label %bb4 }
test/Analysis/LoopAccessAnalysis/interleave_innermost.ll | ||
---|---|---|
2 | Is -instcombine necessary here? |
Yes, your test case is much simpler, thanks! :) I tweaked a bit (stride needs to be at least 2 to trigger interleave) to reveal the wrong behavior of the old opt. In such case it has to be a XFAIL, since what we hope is "interleave should not fire" - testing the correct code may be fragile.
I'm not sure - can I use CHECK-NOT(s) for a multi-line pattern?
A bit thinking: semantically I want not (LINE_PATTERN_1 and LINE_PATTERN_2 and LINE_PATTERN3), which means I may do something roughly like (not LINE_PATTERN_1) or (not LINE_PATTERN_2) or (not LINE_PATTERN_3), not LINE_PATTERNx can be expressed as CHECK-NOT, but I'm not sure if we have a way to express or.
Removed XFAIL and CHECK-NOT for interleave related value names - if XFAIL doesn't seem to be used in such a scenario.
I don't think that you can CHECK-NOT two things in a row like this. You'll need a check between. In general for this you should be able to just check that there aren't any vector types in the program? Or you could add something in between as well.
-eric
You can shorten this pipeline with:
; RUN: opt -O2 -S < %s | FileCheck %s