Page MenuHomePhabricator

[SVE][LoopVectorize] Enable vectorization of fmin/fmax with nnan
Needs ReviewPublic

Authored by kmclaughlin on Feb 9 2021, 9:09 AM.



The fmin/fmax tests added by D95245 use the no-nans-fp-math function
attribute, and fail to vectorize when the attribute is removed in favour of using
nnan directly in the instructions. This patch changes isRecurrenceInstr
to also check if the no-NaNs flag is set on the fcmp/select.

I'm not sure if there are any problems with this approach, which is why I've
split this out from D95245.

Diff Detail

Event Timeline

kmclaughlin created this revision.Feb 9 2021, 9:09 AM
kmclaughlin requested review of this revision.Feb 9 2021, 9:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 9 2021, 9:09 AM
david-arm added inline comments.

Hi @kmclaughlin, I've not looked into this in the same detail as you have, but I wonder if the reason we only checked the function attribute previously is because the reduction is used outside the loop? Not saying that's a good reason for not vectorising though. :) Just that perhaps it was technically the easier solution since we might have to also look outside the loop to make sure users of the value also don't care about NaNs? Perhaps @dmgreen or @spatel have an idea?

spatel added a comment.Feb 9 2021, 9:54 AM

Thanks for working on this. I was headed this direction after D95690.
Using FMF instead of function attributes should be ok here, but we need to be careful about at least 2 things before we make this change:

  1. The existing predicate is inadequate for fmin/fmax reductions. We should be requiring nsz too (or the function-level "no-signed-zeros-fp-math"="true"). The code as-is can miscompile because it is missing that check.
  2. IR-level FMF are currently not fully propagated as we would like. They don't appear on load/store or function arguments. Because of that, we should do a union of flags between the fcmp and select (as shown in D95690 too).

I think the use of FP function attributes is historical. The instruction-level FMF were introduced later, and unfortunately they still are not adequate for all use cases:

Hi @spatel, thanks for the explanation. I've created D96604 to try and address the missing check for no-signed-zeros at the function-level.

spatel added a comment.EditedFeb 17 2021, 4:55 AM

I added a comment about this patch to:
Let me know if you see any other blockers.
(I didn't find the bugzilla ID for @kmclaughlin to cc on the bug report.)