After much refactoring over the last 2 weeks to the reduction matching code, I think this change is finally ready.
We effectively broke fmax/fmin vector reduction optimization when we started canonicalizing to intrinsics in instcombine, so this will hopefully restore that for SLP.
There are still FMF problems here as noted in the code comments, but we should be avoiding miscompiles on those for fmax/fmin by restricting to full 'fast' ops (negative tests are included). I am planning to look at fixing FMF propagation next.
There's also an open cost model question: should we prefer the getIntrinsicInstrCost() API (as is currently shown in the patch) or use getMinMaxReductionCost() (as is currently shown for the integer min/max ops)? There are no test differences with the current regression tests, but that will need to be examined in more detail to make sure we are getting accurate costs.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Comment Actions
Patch updated: I changed some variable names with d1c4e85 and missed rebasing this diff.
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
7106–7108 | Probably better to use getMinMaxReductionCost rather than the cost of the intrinsic call, it is more precise. |
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
7106–7108 | Ok - I'll update. The current patch is actually wrong because I did not use the correct reduction IDs (Intrinsic::vector_reduce_fmax) for the vector cost. |
Comment Actions
Patch updated:
- Change cost calc to use getMinMaxReductionCost() (this is copied from the existing integer min/max code).
- Added test coverage/comments to demonstrate FMF constraints.
Probably better to use getMinMaxReductionCost rather than the cost of the intrinsic call, it is more precise.