Given the limited range of potential vector transforms, this is an acceptable way to iterate? If this is or becomes too inefficient, we could use a worklist strategy like instcombine, but that would require altering more code. The motivation comes from PR42174:
https://bugs.llvm.org/show_bug.cgi?id=42174
...although we don't have the underlying scalarization-with-constant-operand fold yet. If we add that transform 1st, it would only scalarize 1 instruction instead of the entire chain of insert-insert-binops.
Details
- Reviewers
nikic lebedev.ri efriedma RKSimon
Diff Detail
Event Timeline
My only concern here would be whether this can degenerate quadratically. Without particular familiarity with this pass, and looking at example @ins1_ins1_iterate, it seems like this would work by scalarizing one operation on each iteration. Is that right? If so, that seems potentially problematic, as the pass becomes quadratic for longer instruction chains.
Yes, in the worst case it could become quadratic, and you're seeing the test correctly. I think we could undo the reverse basic-block walk in this patch to make that particular case more efficient.
If I'm seeing it correctly, other iterative passes like SimplifyCFG and CodeGenPrepare allow for quadratic-time possibility too. I'm not sure if there's a theoretical way to draw the line on that, or if we just accept that potential risk (assume that anything this pass will ever do is rare, so it can't be too expensive). As I wrote in the description, I think the alternative is to revise things to be more like InstCombine's user-based worklist. We could also put in a cl::opt flag to bail out and/or assert if we go overboard. Any other suggestions?
Speaking of which: If I replace this with a forward walk, then the ins1_ins1_iterate test folds, and there are no other test changes in VectorCombine or PhaseOrdering. If there are cases that benefit from the backwards walk, there doesn't seem to be test coverage for them.
If I'm seeing it correctly, other iterative passes like SimplifyCFG and CodeGenPrepare allow for quadratic-time possibility too. I'm not sure if there's a theoretical way to draw the line on that, or if we just accept that potential risk (assume that anything this pass will ever do is rare, so it can't be too expensive). As I wrote in the description, I think the alternative is to revise things to be more like InstCombine's user-based worklist. We could also put in a cl::opt flag to bail out and/or assert if we go overboard. Any other suggestions?
I agree that this is unlikely to become a problem in practice for this pass, because vector code is uncommon. And you are right that this problem also exists in other passes, though it is probably not always as easy to find actually quadratic cases. Personally I'm fine with your approach here, and we can always pivot to a worklist if necessary. Though per my comment above, it seems like for now just flipping the iteration order might be sufficient.
Excellent point. Let's make the small change until we have evidence for the need for a costlier solution:
rG81e9ede3a2db
Over in D79078, we're trying to figure out how to properly deal with the reduction test diffs seen in that commit. I'm not sure what the answer will be.
I'll put this patch on hold for now rather than abandon.