This is an archive of the discontinued LLVM Phabricator instance.

InstCombine: Fold and (fcmp), (is.fpclass) into is.fpclass
ClosedPublic

Authored by arsenm on Dec 1 2022, 11:11 AM.

Details

Summary

Fold class test performed by an fcmp into another class. For now this
avoids introducing new class calls then there isn't one that already
exists.

Diff Detail

Event Timeline

arsenm created this revision.Dec 1 2022, 11:11 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 1 2022, 11:11 AM
arsenm requested review of this revision.Dec 1 2022, 11:11 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 1 2022, 11:11 AM
Herald added a subscriber: wdng. · View Herald Transcript

Looks pretty clean to me overall.

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1243

Using != in the comment is misleading since that is true for nans. Maybe use <> instead or just spell out what you mean.

1276

fcNan | is redundant here.

1473

Do you need to worry about IsLogicalSelect here? (@nikic, @craig.topper?)

arsenm updated this revision to Diff 482333.Dec 12 2022, 6:46 PM
arsenm marked an inline comment as done.
arsenm retitled this revision from InstCombine: Fold and/or of fcmp into class to InstCombine: Fold and (fcmp), (is.fpclass) into is.fpclass.
arsenm edited the summary of this revision. (Show Details)

Resplit the patches differently. For now, only handle fcmp+is.fpclass case, such that it will not introduce is.fpclass in situations it didn't already exist. If we have 2 or more compares that can turn into a class, I think we should introduce them after codegen is improved a bit for it.

arsenm added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1243

It's not misleading, that this is true for nans is the point. It's the negation of the usual syntactic ordered compare

foad added inline comments.Dec 13 2022, 6:26 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1243

This comment is misleading: case FCmpInst::FCMP_UEQ: // match !(x != 0.0)

!(x != 0.0) (in C) is the same as x == 0.0 (in C) which is OEQ not UEQ.

arsenm added inline comments.Dec 13 2022, 8:07 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1243

No, they aren't the same in C. This whole conversation is why this comment is necessary

https://alive2.llvm.org/ce/z/gtzLUh

arsenm updated this revision to Diff 482492.Dec 13 2022, 8:10 AM

Add more comments

foad added inline comments.Dec 13 2022, 8:15 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1243

Yes they are the same in C: https://godbolt.org/z/1rq9GTjEM

!= in C is UNE not ONE.

arsenm updated this revision to Diff 482498.Dec 13 2022, 8:29 AM

Comments

Some mechanism must exists that applies this transformation or not depending on target. If instruction FCLASS is available, the transformation is profitable. But on targets that do not have such instruction, the replacement may produce inefficient code.

Also it is desirable to have a codegen test for target without FCLASS instruction (like x86), to check if the code generated for such comparisons does not become worse.

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1229–1234

These checks should be made for more general case:

  • LHS is same as RHS,
  • RHS is a constant,

as you mention is the comment above.

1255

Is this comment relevant? This block executes for une, not ueq?

1261

Is this comparison always false?

1280

oeq seems irrelevant in this block.

1321–1322
1323–1324
1338
1340
arsenm updated this revision to Diff 482931.Dec 14 2022, 10:39 AM
arsenm marked 4 inline comments as done.

Comment fixes

Some mechanism must exists that applies this transformation or not depending on target. If instruction FCLASS is available, the transformation is profitable. But on targets that do not have such instruction, the replacement may produce inefficient code.

If you're doing 2 or more checks I think is.fpclass is a better canonical form so it doesn't matter what the target prefers, that's a lowering issue. This patch holds off on introducing new classes because we have a bit of codegen improvement to do. I don't think that should block this patch,
since if you're already using class you've already signed up for not-ideal codegen.

Also it is desirable to have a codegen test for target without FCLASS instruction (like x86), to check if the code generated for such comparisons does not become worse.

We already have fpclass legalization tests. I'll add more when I get to the codegen improvements (mostly we're missing all the cases where it can be converted back to FP compares, it only handles a couple of them. Plus it's mishandling denormals with DAZ)

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1229–1234

I don't know if it really should. The more non-canonical forms we try to match, the more edge cases that we won't actually encounter and won't have testing for.

1261

No, this is the same as fcmp ord

sepavloff added inline comments.Dec 20 2022, 12:52 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
829–843 ↗(On Diff #483957)

Replacement of llvm.is_fpclass with comparison hides class checks and make analysis of code passes harder. Does it have any benefit over creation of the comparison in selector?

arsenm added inline comments.Dec 20 2022, 4:02 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
829–843 ↗(On Diff #483957)

Using fcmp makes analysis easier. fcmp has fewer constraints, and will always be better understood by generic passes than a special case intrinsic. It's a better canonical form; otherwise every single place that considers compares would have to perform the exact same checks for a class that performs a compare.

I also just noticed when I rebased I attached the wrong patch here.

arsenm updated this revision to Diff 484213.Dec 20 2022, 4:03 AM

Attach correct patch

arsenm added inline comments.Dec 20 2022, 4:05 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
829–843 ↗(On Diff #483957)

fcmp also supports fast math flags, class does not

sepavloff added inline comments.Dec 20 2022, 9:03 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
829–835 ↗(On Diff #483957)

is_fpclass should not be replaced by compare instruction otherwise it can break valid programs. Consider the code:

define i1 @check_isnan(float %x) strictfp {
  %call = call i1 @func_isnan(float %x)
  ret i1 %call
}

define i1 @func_isnan(float %x) "always_inline" {
entry:
  %call = i1 call @llvm.is.fpclass.f32(float %x, i32 3)
  ret i1 %call
}

if InstCombiner replaces the call of @llvm.is.fpclass.f32 with fcmp uno, then during inlining fcmp is replaced with a call to @llvm.experimental.constrained.fcmp.f32, which has different behavior than is_fpclass.

sepavloff added inline comments.Dec 20 2022, 9:18 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
829–843 ↗(On Diff #483957)

is_fpclass does not produce float value, fast math flags does not make sense for it.

arsenm added inline comments.Dec 20 2022, 12:28 PM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
829–835 ↗(On Diff #483957)

How is it different? If there's a difference in behavior, there's an inliner bug. We have separate constrained.fcmp and constrained.fcmps for compares which may or may not signal

829–843 ↗(On Diff #483957)

Fast math flags apply to the inputs and outputs, which is one of the problems with them. fcmp nnan can trivially fold to true/false for example

jcranmer-intel added inline comments.Dec 20 2022, 1:27 PM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1300–1303

This block is handling fcmp une, so these latter 4 cases should be using fcmp une.

1319–1321
1323

You're missing 4 examples here--one with +inf and ueq with -inf.

arsenm updated this revision to Diff 484388.Dec 20 2022, 2:50 PM
arsenm marked an inline comment as done.

Try to get the comments right again

foad added inline comments.Jan 10 2023, 3:37 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1259

Remove the #if 0 block, just leave a TODO.

1271

What does "would" mean here?

1298

Should be is_fpclass x, 0 aka false.

1303

Should be is_fpclass x, ~0 aka true.

1308

Should be Mask = 0.

Can you imagine any way of fixing the FIXME? If not I wouldn't bother marking it as FIXME.

1319–1321

Agree with @jcranmer-intel. It looks like you might have "fixed" the wrong line here?

1349

Simpler to write this as fcFinite|fcNegInf, as on the line below.

1518–1519

Can remove one of the many sets of parens here.

arsenm updated this revision to Diff 495186.Feb 6 2023, 9:47 AM
arsenm marked 3 inline comments as done.

Address comments

arsenm added inline comments.Feb 6 2023, 9:54 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1280

it's relevant but now moved to the other patch for zero handling

1308

This was mostly for me when I was trying to make sure all the paths were tested. I think it's even harder now that I did the fneg/fabs fold into the test mask combine

foad accepted this revision.Feb 7 2023, 1:53 AM

LGTM with comment fixes

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1310–1313
1339

Simplify like the line below

This revision is now accepted and ready to land.Feb 7 2023, 1:53 AM
arsenm closed this revision.Feb 8 2023, 5:40 PM
arsenm marked an inline comment as done.
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1349

Not sure where this comment is pointing anymore

foad added inline comments.Feb 9 2023, 12:09 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1349

Line 1339 in the current patch. I left a later comment saying the same thing with suggested diff.