This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Expand select+masked_load combine to include FP splats of -0.0
AbandonedPublic

Authored by david-arm on May 31 2022, 5:20 AM.

Details

Summary

When tail-folding the vectoriser sometimes generates a select instruction
that selects between data and a splat of -0.0. We already have a transform
in InstCombine that folds a select of a masked load and zeroinitializer
into a single masked load with a zeroinitializer passthru value. We can
do something similar for splats of -0.0 when the function has the
attribute no-signed-zeros-fp-math set to true. In this case we can
just replace a splat of -0.0 with zeroinitializer, i.e. a splat of 0.0.

Diff Detail

Event Timeline

david-arm created this revision.May 31 2022, 5:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 31 2022, 5:20 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
david-arm requested review of this revision.May 31 2022, 5:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 31 2022, 5:20 AM
nikic added a subscriber: nikic.

Can this be based on nsz FMF instead?

Can this be based on nsz FMF instead?

I was hoping to do that originally, but I think that select instructions do not have any fast math flags attached?

nikic added a comment.May 31 2022, 5:36 AM

Can this be based on nsz FMF instead?

I was hoping to do that originally, but I think that select instructions do not have any fast math flags attached?

Selects (and phis) do support FMF nowadays.

Can this be based on nsz FMF instead?

I was hoping to do that originally, but I think that select instructions do not have any fast math flags attached?

Selects (and phis) do support FMF nowadays.

OK. The loop vectoriser is kicking out selects that don't have any FMF and there is no other pass afterwards that adds them. So I guess an alternative would be to automatically add FMF flags to all select instructions even though it's actually a bitwise operation? If that's acceptable to everyone of course.

In general how safe it is to use the function's no-signed-zeros-fp-math attribute to infer it's safe to convert one bit pattern to another? Fair enough if we're dealing with floating point operations then we can ignore the signedness of zero but this is just doing a select of some loaded data, which could be immediately stored out. So it feels dangerous to just change the data in this way.

In general how safe it is to use the function's no-signed-zeros-fp-math attribute to infer it's safe to convert one bit pattern to another? Fair enough if we're dealing with floating point operations then we can ignore the signedness of zero but this is just doing a select of some loaded data, which could be immediately stored out. So it feels dangerous to just change the data in this way.

This is a good point, so I wonder if I can restrict this to single uses of the select where the user is a FP operator? We can even check the user's FMF to see how the data is going to be interpreted.

Can this be based on nsz FMF instead?

I guess it depends what other reviewers think about this patch and whether it's useful or not. Another way of achieving the same result is to ensure the vectoriser uses a splat of +0.0 for the select when the nsz flag is set.

Hi @nikic, looking again at this I realised that it's not the vectoriser generating the splat of -0.0. I think it's instcombine itself! A sample output from the vectoriser looks like this:

%wide.masked.load = call <vscale x 4 x float> @llvm.masked.load.nxv4f32.p0(ptr %11, i32 4, <vscale x 4 x i1> %active.lane.mask, <vscale x 4 x float> poison), !tbaa !10
%12 = fadd fast <vscale x 4 x float> %wide.masked.load, %vec.phi
%13 = select <vscale x 4 x i1> %active.lane.mask, <vscale x 4 x float> %12, <vscale x 4 x float> %vec.phi

and I think that instcombine moves the select between the fadd and masked.load. In doing so it creates a new select that selects between %wide.masked.load and a splat of -0.0. So perhaps the real solution here is to teach instcombine to create a sensible splat value?

Can this be based on nsz FMF instead?

I guess it depends what other reviewers think about this patch and whether it's useful or not. Another way of achieving the same result is to ensure the vectoriser uses a splat of +0.0 for the select when the nsz flag is set.

If you can get the vectorizer to produce the better IR in the first place, that would be ideal. Function attributes are the legacy annotations for fast-math; the goal is to rely on FMF whenever possible in the optimizer and codegen.

We should probably canonicalize -0.0 to +0.0 in "select nsz":
https://alive2.llvm.org/ce/z/Z_-yGP

We already canonicalize -0.0 to 0.0 in fcmp (no FMF needed in that case).

david-arm abandoned this revision.Jun 1 2022, 5:40 AM

Based on the comments on this patch I am abandoning it in favour of a different approach (D126774) where we use the nsz flag on a SelectInst to control the identity value used in certain folds.

I also created D126778 for the loop vectoriser to add the fast math flags to select instructions involved in tail-folding + reductions.