This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
InstCombineSelect.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
select-masked_load.ll

Differential D126692

[InstCombine] Expand select+masked_load combine to include FP splats of -0.0
AbandonedPublic

Authored by david-arm on May 31 2022, 5:20 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
kmclaughlin
CarolineConcatto
MattDevereau
aqjune
spatel

Summary

When tail-folding the vectoriser sometimes generates a select instruction
that selects between data and a splat of -0.0. We already have a transform
in InstCombine that folds a select of a masked load and zeroinitializer
into a single masked load with a zeroinitializer passthru value. We can
do something similar for splats of -0.0 when the function has the
attribute no-signed-zeros-fp-math set to true. In this case we can
just replace a splat of -0.0 with zeroinitializer, i.e. a splat of 0.0.

Diff Detail

Unit TestsFailed

	Time	Test
	60,110 ms	x64 debian > AddressSanitizer-x86_64-linux.TestCases::scariness_score_test.cpp

Event Timeline

david-arm created this revision.May 31 2022, 5:20 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 31 2022, 5:20 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

david-arm requested review of this revision.May 31 2022, 5:20 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 31 2022, 5:20 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Can this be based on nsz FMF instead?

In D126692#3547055, @nikic wrote:

Can this be based on nsz FMF instead?

I was hoping to do that originally, but I think that select instructions do not have any fast math flags attached?

In D126692#3547061, @david-arm wrote:

In D126692#3547055, @nikic wrote:

Can this be based on nsz FMF instead?

I was hoping to do that originally, but I think that select instructions do not have any fast math flags attached?

Selects (and phis) do support FMF nowadays.

In D126692#3547065, @nikic wrote:

In D126692#3547061, @david-arm wrote:

In D126692#3547055, @nikic wrote:

Can this be based on nsz FMF instead?

I was hoping to do that originally, but I think that select instructions do not have any fast math flags attached?

Selects (and phis) do support FMF nowadays.

OK. The loop vectoriser is kicking out selects that don't have any FMF and there is no other pass afterwards that adds them. So I guess an alternative would be to automatically add FMF flags to all select instructions even though it's actually a bitwise operation? If that's acceptable to everyone of course.

In general how safe it is to use the function's no-signed-zeros-fp-math attribute to infer it's safe to convert one bit pattern to another? Fair enough if we're dealing with floating point operations then we can ignore the signedness of zero but this is just doing a select of some loaded data, which could be immediately stored out. So it feels dangerous to just change the data in this way.

In D126692#3547103, @paulwalker-arm wrote:

In general how safe it is to use the function's no-signed-zeros-fp-math attribute to infer it's safe to convert one bit pattern to another? Fair enough if we're dealing with floating point operations then we can ignore the signedness of zero but this is just doing a select of some loaded data, which could be immediately stored out. So it feels dangerous to just change the data in this way.

This is a good point, so I wonder if I can restrict this to single uses of the select where the user is a FP operator? We can even check the user's FMF to see how the data is going to be interpreted.

In D126692#3547055, @nikic wrote:

Can this be based on nsz FMF instead?

I guess it depends what other reviewers think about this patch and whether it's useful or not. Another way of achieving the same result is to ensure the vectoriser uses a splat of +0.0 for the select when the nsz flag is set.

Harbormaster completed remote builds in B167023: Diff 433056.May 31 2022, 5:55 AM

Hi @nikic, looking again at this I realised that it's not the vectoriser generating the splat of -0.0. I think it's instcombine itself! A sample output from the vectoriser looks like this:

%wide.masked.load = call <vscale x 4 x float> @llvm.masked.load.nxv4f32.p0(ptr %11, i32 4, <vscale x 4 x i1> %active.lane.mask, <vscale x 4 x float> poison), !tbaa !10
%12 = fadd fast <vscale x 4 x float> %wide.masked.load, %vec.phi
%13 = select <vscale x 4 x i1> %active.lane.mask, <vscale x 4 x float> %12, <vscale x 4 x float> %vec.phi

and I think that instcombine moves the select between the fadd and masked.load. In doing so it creates a new select that selects between %wide.masked.load and a splat of -0.0. So perhaps the real solution here is to teach instcombine to create a sensible splat value?

In D126692#3547119, @david-arm wrote:

In D126692#3547055, @nikic wrote:

Can this be based on nsz FMF instead?

I guess it depends what other reviewers think about this patch and whether it's useful or not. Another way of achieving the same result is to ensure the vectoriser uses a splat of +0.0 for the select when the nsz flag is set.

If you can get the vectorizer to produce the better IR in the first place, that would be ideal. Function attributes are the legacy annotations for fast-math; the goal is to rely on FMF whenever possible in the optimizer and codegen.

We should probably canonicalize -0.0 to +0.0 in "select nsz":
https://alive2.llvm.org/ce/z/Z_-yGP

We already canonicalize -0.0 to 0.0 in fcmp (no FMF needed in that case).

Based on the comments on this patch I am abandoning it in favour of a different approach (D126774) where we use the nsz flag on a SelectInst to control the identity value used in certain folds.

I also created D126778 for the loop vectoriser to add the fast math flags to select instructions involved in tail-folding + reductions.

david-arm mentioned this in rG997ecb0036a5: [LoopVectorize] Add FastMathFlags to the select used for reductions with tail….Jun 7 2022, 2:21 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineSelect.cpp

21 lines

test/