I'm somewhat unsure here, but gating vectorizing reductions on all
fastmath flags seems unnecessary; reassoc should be sufficient.
Details
- Reviewers
tvvikram mkuper kristof.beyls sdesmalen Ayal - Commits
- rG3f5ce18658f0: Reland "Relax constraints for reduction vectorization"
rL355889: Reland "Relax constraints for reduction vectorization"
rG93f8cc186ace: Relax constraints for reduction vectorization
rL355868: Relax constraints for reduction vectorization
Diff Detail
- Repository
- rL LLVM
- Build Status
Buildable 28784 Build 28783: arc lint + arc unit
Event Timeline
lib/Analysis/IVDescriptors.cpp | ||
---|---|---|
556 | Is there a reason for the condition I->hasAllowContract()? As far as I can tell only hasAllowReassoc() seems required and thus hasAllowContract() would be overly restrictive. As long as hasAllowReassoc() is true, I think the resulting reduction operations should have the same properties as the original instruction. For example, if an instruction has hasNoNaNs() == hasNoInfs() == false, the vectorised reduction retain those properties under reassocation. When trying out this patch, the reduction block seems to assume fast instead of just reassoc, e.g. middle.block: %bin.rdx = fadd fast <4 x float> %9, %8 You'll probably want to fix that and extend the test to ensure the flags are retained. |
Propagate only correct FP fast-math flags.
I'd appreciate a careful review here since I'm not sure all of the ways we
codegen reductions in the loop vectorizer.
Thanks for these changes to your patch @sanjoy! The patch is looking in good shape, just a few remaining nits mostly.
include/llvm/Analysis/IVDescriptors.h | ||
---|---|---|
244 | nit: recurrenct -> recurrent | |
lib/Analysis/IVDescriptors.cpp | ||
254 | Is it worth adding a comment describing that FMF will be an intersection of the FastMathFlags from all the reduction operations (and thus needs to start with the full set of flags)? | |
302 | nit: unnecessary curly braces | |
563 | nit: since 'CanVectorizeReduction()' is only used once, it probably makes more sense to just expand it here and remove the function. | |
lib/CodeGen/ExpandReductions.cpp | ||
124 | Rather than choosing 'getFast() and going through 'IsOrdered' (which is set when the full 'fast' property is set), do we want to take the FastMathFlags from the IntrinsicInst directly? | |
124 | Note that there is a test missing for this change. | |
lib/Transforms/Utils/LoopUtils.cpp | ||
675 | nit: unnecessary curly braces. | |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
322 | nit: unnecessary curly braces. | |
329 | nit: unnecessary curly braces. | |
test/Transforms/LoopVectorize/reduction-fastmath.ll | ||
48 | Do you want to add a check for the 'fast' attribute on the reductions in the middle.block, similar to what you've done for reduction_sum_float_only_reassoc_and_contract ? |
- Address review comments
lib/CodeGen/ExpandReductions.cpp | ||
---|---|---|
124 |
My rationale was that I wanted to keep this as obviously NFC as possible -- previously we would implicitly tag the instructions as fast, and this change just makes it explicit.
This change is not supposed to change any behavior (assuming no bugs :) ). Do you want me to add a test case to check existing behavior? CodeGen/Generic/expand-experimental-reductions.ll already tests for fast flags in the expansion. |
Thanks for updating your patch @sanjoy ! LGTM (with a side-note on a follow up patch to take more specific flags in expandReductions).
You may want to update the commit message before you commit :)
lib/CodeGen/ExpandReductions.cpp | ||
---|---|---|
124 |
In a way that feels a bit artificial since the patch is already not NFC and the purpose of this patch is to be more specific about passing the exact flags. However, if you think it makes more sense to do this change in a separate patch, that's fine with me. |
lib/CodeGen/ExpandReductions.cpp | ||
---|---|---|
124 | As I said above, we already have tests for this behavior (that we propagate fast-math flags into getShuffleReduction. |
nit: recurrenct -> recurrent