Fold class test performed by an fcmp into another class. For now this
avoids introducing new class calls then there isn't one that already
exists.
Details
Diff Detail
Event Timeline
Looks pretty clean to me overall.
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | ||
---|---|---|
1243 ↗ | (On Diff #479367) | Using != in the comment is misleading since that is true for nans. Maybe use <> instead or just spell out what you mean. |
1276 ↗ | (On Diff #479367) | fcNan | is redundant here. |
1351 ↗ | (On Diff #479367) | Do you need to worry about IsLogicalSelect here? (@nikic, @craig.topper?) |
Resplit the patches differently. For now, only handle fcmp+is.fpclass case, such that it will not introduce is.fpclass in situations it didn't already exist. If we have 2 or more compares that can turn into a class, I think we should introduce them after codegen is improved a bit for it.
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | ||
---|---|---|
1243 ↗ | (On Diff #479367) | It's not misleading, that this is true for nans is the point. It's the negation of the usual syntactic ordered compare |
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | ||
---|---|---|
1243 ↗ | (On Diff #479367) | This comment is misleading: case FCmpInst::FCMP_UEQ: // match !(x != 0.0) !(x != 0.0) (in C) is the same as x == 0.0 (in C) which is OEQ not UEQ. |
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | ||
---|---|---|
1243 ↗ | (On Diff #479367) | No, they aren't the same in C. This whole conversation is why this comment is necessary |
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | ||
---|---|---|
1243 ↗ | (On Diff #479367) | Yes they are the same in C: https://godbolt.org/z/1rq9GTjEM != in C is UNE not ONE. |
Some mechanism must exists that applies this transformation or not depending on target. If instruction FCLASS is available, the transformation is profitable. But on targets that do not have such instruction, the replacement may produce inefficient code.
Also it is desirable to have a codegen test for target without FCLASS instruction (like x86), to check if the code generated for such comparisons does not become worse.
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | ||
---|---|---|
1231–1236 ↗ | (On Diff #482498) | These checks should be made for more general case:
as you mention is the comment above. |
1257 ↗ | (On Diff #482498) | Is this comment relevant? This block executes for une, not ueq? |
1263 ↗ | (On Diff #482498) | Is this comparison always false? |
1282 ↗ | (On Diff #482498) | oeq seems irrelevant in this block. |
1323–1324 ↗ | (On Diff #482498) | |
1325–1326 ↗ | (On Diff #482498) | |
1340 ↗ | (On Diff #482498) | |
1342 ↗ | (On Diff #482498) |
If you're doing 2 or more checks I think is.fpclass is a better canonical form so it doesn't matter what the target prefers, that's a lowering issue. This patch holds off on introducing new classes because we have a bit of codegen improvement to do. I don't think that should block this patch,
since if you're already using class you've already signed up for not-ideal codegen.
Also it is desirable to have a codegen test for target without FCLASS instruction (like x86), to check if the code generated for such comparisons does not become worse.
We already have fpclass legalization tests. I'll add more when I get to the codegen improvements (mostly we're missing all the cases where it can be converted back to FP compares, it only handles a couple of them. Plus it's mishandling denormals with DAZ)
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | ||
---|---|---|
1231–1236 ↗ | (On Diff #482498) | I don't know if it really should. The more non-canonical forms we try to match, the more edge cases that we won't actually encounter and won't have testing for. |
1263 ↗ | (On Diff #482498) | No, this is the same as fcmp ord |
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
---|---|---|
829–843 | Replacement of llvm.is_fpclass with comparison hides class checks and make analysis of code passes harder. Does it have any benefit over creation of the comparison in selector? |
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
---|---|---|
829–843 | Using fcmp makes analysis easier. fcmp has fewer constraints, and will always be better understood by generic passes than a special case intrinsic. It's a better canonical form; otherwise every single place that considers compares would have to perform the exact same checks for a class that performs a compare. I also just noticed when I rebased I attached the wrong patch here. |
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
---|---|---|
829–843 | fcmp also supports fast math flags, class does not |
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
---|---|---|
829–835 | is_fpclass should not be replaced by compare instruction otherwise it can break valid programs. Consider the code: define i1 @check_isnan(float %x) strictfp { %call = call i1 @func_isnan(float %x) ret i1 %call } define i1 @func_isnan(float %x) "always_inline" { entry: %call = i1 call @llvm.is.fpclass.f32(float %x, i32 3) ret i1 %call } if InstCombiner replaces the call of @llvm.is.fpclass.f32 with fcmp uno, then during inlining fcmp is replaced with a call to @llvm.experimental.constrained.fcmp.f32, which has different behavior than is_fpclass. |
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
---|---|---|
829–843 | is_fpclass does not produce float value, fast math flags does not make sense for it. |
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
---|---|---|
829–835 | How is it different? If there's a difference in behavior, there's an inliner bug. We have separate constrained.fcmp and constrained.fcmps for compares which may or may not signal | |
829–843 | Fast math flags apply to the inputs and outputs, which is one of the problems with them. fcmp nnan can trivially fold to true/false for example |
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | ||
---|---|---|
1300–1303 ↗ | (On Diff #484213) | This block is handling fcmp une, so these latter 4 cases should be using fcmp une. |
1319–1321 ↗ | (On Diff #484213) | |
1323 ↗ | (On Diff #484213) | You're missing 4 examples here--one with +inf and ueq with -inf. |
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | ||
---|---|---|
1259 ↗ | (On Diff #484388) | Remove the #if 0 block, just leave a TODO. |
1271 ↗ | (On Diff #484388) | What does "would" mean here? |
1298 ↗ | (On Diff #484388) | Should be is_fpclass x, 0 aka false. |
1303 ↗ | (On Diff #484388) | Should be is_fpclass x, ~0 aka true. |
1308 ↗ | (On Diff #484388) | Should be Mask = 0. Can you imagine any way of fixing the FIXME? If not I wouldn't bother marking it as FIXME. |
1349 ↗ | (On Diff #484388) | Simpler to write this as fcFinite|fcNegInf, as on the line below. |
1522–1523 ↗ | (On Diff #484388) | Can remove one of the many sets of parens here. |
1319–1321 ↗ | (On Diff #484213) | Agree with @jcranmer-intel. It looks like you might have "fixed" the wrong line here? |
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | ||
---|---|---|
1308 ↗ | (On Diff #484388) | This was mostly for me when I was trying to make sure all the paths were tested. I think it's even harder now that I did the fneg/fabs fold into the test mask combine |
1282 ↗ | (On Diff #482498) | it's relevant but now moved to the other patch for zero handling |
LGTM with comment fixes
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | ||
---|---|---|
1310–1313 ↗ | (On Diff #495186) | |
1339 ↗ | (On Diff #495186) | Simplify like the line below |
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | ||
---|---|---|
1349 ↗ | (On Diff #484388) | Not sure where this comment is pointing anymore |
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | ||
---|---|---|
1349 ↗ | (On Diff #484388) | Line 1339 in the current patch. I left a later comment saying the same thing with suggested diff. |
is_fpclass should not be replaced by compare instruction otherwise it can break valid programs. Consider the code:
if InstCombiner replaces the call of @llvm.is.fpclass.f32 with fcmp uno, then during inlining fcmp is replaced with a call to @llvm.experimental.constrained.fcmp.f32, which has different behavior than is_fpclass.