Page MenuHomePhabricator

[VectorCombine] add loop to enable iterative folding
Changes PlannedPublic

Authored by spatel on May 12 2020, 12:48 PM.

Details

Summary

Given the limited range of potential vector transforms, this is an acceptable way to iterate? If this is or becomes too inefficient, we could use a worklist strategy like instcombine, but that would require altering more code. The motivation comes from PR42174:
https://bugs.llvm.org/show_bug.cgi?id=42174
...although we don't have the underlying scalarization-with-constant-operand fold yet. If we add that transform 1st, it would only scalarize 1 instruction instead of the entire chain of insert-insert-binops.

Diff Detail

Event Timeline

spatel created this revision.May 12 2020, 12:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 12 2020, 12:48 PM
nikic added a comment.May 15 2020, 9:49 AM

My only concern here would be whether this can degenerate quadratically. Without particular familiarity with this pass, and looking at example @ins1_ins1_iterate, it seems like this would work by scalarizing one operation on each iteration. Is that right? If so, that seems potentially problematic, as the pass becomes quadratic for longer instruction chains.

My only concern here would be whether this can degenerate quadratically. Without particular familiarity with this pass, and looking at example @ins1_ins1_iterate, it seems like this would work by scalarizing one operation on each iteration. Is that right? If so, that seems potentially problematic, as the pass becomes quadratic for longer instruction chains.

Yes, in the worst case it could become quadratic, and you're seeing the test correctly. I think we could undo the reverse basic-block walk in this patch to make that particular case more efficient.

If I'm seeing it correctly, other iterative passes like SimplifyCFG and CodeGenPrepare allow for quadratic-time possibility too. I'm not sure if there's a theoretical way to draw the line on that, or if we just accept that potential risk (assume that anything this pass will ever do is rare, so it can't be too expensive). As I wrote in the description, I think the alternative is to revise things to be more like InstCombine's user-based worklist. We could also put in a cl::opt flag to bail out and/or assert if we go overboard. Any other suggestions?

nikic added a comment.May 16 2020, 8:47 AM

My only concern here would be whether this can degenerate quadratically. Without particular familiarity with this pass, and looking at example @ins1_ins1_iterate, it seems like this would work by scalarizing one operation on each iteration. Is that right? If so, that seems potentially problematic, as the pass becomes quadratic for longer instruction chains.

Yes, in the worst case it could become quadratic, and you're seeing the test correctly. I think we could undo the reverse basic-block walk in this patch to make that particular case more efficient.

Speaking of which: If I replace this with a forward walk, then the ins1_ins1_iterate test folds, and there are no other test changes in VectorCombine or PhaseOrdering. If there are cases that benefit from the backwards walk, there doesn't seem to be test coverage for them.

If I'm seeing it correctly, other iterative passes like SimplifyCFG and CodeGenPrepare allow for quadratic-time possibility too. I'm not sure if there's a theoretical way to draw the line on that, or if we just accept that potential risk (assume that anything this pass will ever do is rare, so it can't be too expensive). As I wrote in the description, I think the alternative is to revise things to be more like InstCombine's user-based worklist. We could also put in a cl::opt flag to bail out and/or assert if we go overboard. Any other suggestions?

I agree that this is unlikely to become a problem in practice for this pass, because vector code is uncommon. And you are right that this problem also exists in other passes, though it is probably not always as easy to find actually quadratic cases. Personally I'm fine with your approach here, and we can always pivot to a worklist if necessary. Though per my comment above, it seems like for now just flipping the iteration order might be sufficient.

spatel planned changes to this revision.May 16 2020, 10:37 AM

My only concern here would be whether this can degenerate quadratically. Without particular familiarity with this pass, and looking at example @ins1_ins1_iterate, it seems like this would work by scalarizing one operation on each iteration. Is that right? If so, that seems potentially problematic, as the pass becomes quadratic for longer instruction chains.

Yes, in the worst case it could become quadratic, and you're seeing the test correctly. I think we could undo the reverse basic-block walk in this patch to make that particular case more efficient.

Speaking of which: If I replace this with a forward walk, then the ins1_ins1_iterate test folds, and there are no other test changes in VectorCombine or PhaseOrdering. If there are cases that benefit from the backwards walk, there doesn't seem to be test coverage for them.

If I'm seeing it correctly, other iterative passes like SimplifyCFG and CodeGenPrepare allow for quadratic-time possibility too. I'm not sure if there's a theoretical way to draw the line on that, or if we just accept that potential risk (assume that anything this pass will ever do is rare, so it can't be too expensive). As I wrote in the description, I think the alternative is to revise things to be more like InstCombine's user-based worklist. We could also put in a cl::opt flag to bail out and/or assert if we go overboard. Any other suggestions?

I agree that this is unlikely to become a problem in practice for this pass, because vector code is uncommon. And you are right that this problem also exists in other passes, though it is probably not always as easy to find actually quadratic cases. Personally I'm fine with your approach here, and we can always pivot to a worklist if necessary. Though per my comment above, it seems like for now just flipping the iteration order might be sufficient.

Excellent point. Let's make the small change until we have evidence for the need for a costlier solution:
rG81e9ede3a2db

Over in D79078, we're trying to figure out how to properly deal with the reduction test diffs seen in that commit. I'm not sure what the answer will be.

I'll put this patch on hold for now rather than abandon.