This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Guard maxnum/minnum conversions with a TTI query
AbandonedPublic

Authored by jonpa on Nov 29 2019, 5:52 AM.

Details

Summary

After https://reviews.llvm.org/D62414 "[InstCombine] canonicalize fcmp+select to minnum/maxnum intrinsics", the test-suite no longer builds on earlier subtargets. This was commented on here: https://github.com/llvm/llvm-project/commit/ebf9bf2cbc8fa68d536e481e370c4ba40ce61a8a.

This is an attempt to fix this by guarding this transformation with a query to TTI so that if this would result in a libcall, it is skipped.

I was expecting InstCombine to use TargetTransforminfo, but found that I had to add the TTI member. Is there a reason it should not be used?

With this patch the test-suite now builds again on z10...

Diff Detail

Event Timeline

jonpa created this revision.Nov 29 2019, 5:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 29 2019, 5:52 AM
Herald added a subscriber: hiraditya. · View Herald Transcript

This seems like a lowering problem in SystemZ backend.

This seems like a lowering problem in SystemZ backend.

So is the backend supposed to lower minnum/maxnum always (to the obvious select sequence, I guess?), even if there's no instruction for it? Why would that be the job of the backend?

This seems like a lowering problem in SystemZ backend.

So is the backend supposed to lower minnum/maxnum always (to the obvious select sequence, I guess?), even if there's no instruction for it? Why would that be the job of the backend?

Because it is a non-target-specific (i.e. not e.g. @llvm.x86.) llvm instruction.
They are considered canonical, and supposed to be handled by all backends,
even if via expansion into some other instructions.

For other examples, i think pretty much no backend has native funnel shift
(not rotate!) instruction, many don't have saturating math,
probably most don't have native fixed point math, not sure about strict-fp;
treating them as legal *in middle-end* only if particular backend supports
them is somewhat contrary to having canonical representation..

But in those cases like funnel shifts, common code (specifically SelectionDAGBuilder::visitIntrinsicCall) will handle the expansion if the target doesn't have an instruction.

So if we do need to do that for minnum/maxnum, that would probably be the better place.

However, this is still a bit different, because minnum/maxnum can also be the result of an explicit call to fmin/fmax in the source, and it would seem surprising to have that translated into a select sequence as well. If we originally had an fmin/fmax call, which was translated into minnum/maxnum that we then couldn't further optimize, the current expansion (back to the fmin/fmax call) seems correct to me.

It's only the case where we originally had a conditional that was translated into minnum/maxnum that expansion into fmin/fmax is incorrect (because those routines are only available when linking with -lm which the user might not do since there's no actual call to a libm routine in the original source).

The problem really is IMO that there's no way to tell these two cases apart at instruction selection time.

But in those cases like funnel shifts, common code (specifically SelectionDAGBuilder::visitIntrinsicCall) will handle the expansion if the target doesn't have an instruction.

So if we do need to do that for minnum/maxnum, that would probably be the better place.

However, this is still a bit different, because minnum/maxnum can also be the result of an explicit call to fmin/fmax in the source, and it would seem surprising to have that translated into a select sequence as well. If we originally had an fmin/fmax call, which was translated into minnum/maxnum that we then couldn't further optimize, the current expansion (back to the fmin/fmax call) seems correct to me.

It's only the case where we originally had a conditional that was translated into minnum/maxnum that expansion into fmin/fmax is incorrect (because those routines are only available when linking with -lm which the user might not do since there's no actual call to a libm routine in the original source).

The problem really is IMO that there's no way to tell these two cases apart at instruction selection time.

FWIW I agree with Ulrich here, either we can lower this in the middle end or we depend upon either libc/compiler-rt/libgcc/etc. Depending upon libm is historically different here and would require some additional discussion.

spatel added a comment.Dec 2 2019, 7:23 AM

But in those cases like funnel shifts, common code (specifically SelectionDAGBuilder::visitIntrinsicCall) will handle the expansion if the target doesn't have an instruction.

So if we do need to do that for minnum/maxnum, that would probably be the better place.

However, this is still a bit different, because minnum/maxnum can also be the result of an explicit call to fmin/fmax in the source, and it would seem surprising to have that translated into a select sequence as well. If we originally had an fmin/fmax call, which was translated into minnum/maxnum that we then couldn't further optimize, the current expansion (back to the fmin/fmax call) seems correct to me.

Handling this in SDAG sounds like the right solution to me. We don't want instcombine to rely on TTI because it's supposed to be canonicalizing target-independently.
It's worth noting that the transform that is being done in IR depends on FMF:

// Canonicalize select of FP values where NaN and -0.0 are not valid as
// minnum/maxnum intrinsics.

So I think SDAG can also detect the difference between IEEE-compliant source code that uses libm calls vs. loose/fast source code that got converted to the LLVM intrinsic. Ie, we can expand using fcmp+select vs. libcall based on the FMF.

spatel added a comment.Dec 2 2019, 7:46 AM

I was worried that our FMF plumbing in DAG was wrong, but I just checked how x86 deals with this, and it seems to be working:

declare float @llvm.maxnum.f32(float, float) #0

define float @maxnum_fast(float %arg1, float %arg2) {
  %r = call fast float @llvm.maxnum.f32(float %arg1, float %arg2)
  ret float %r
}

define float @maxnum_strict(float %arg1, float %arg2) {
  %r = call float @llvm.maxnum.f32(float %arg1, float %arg2)
  ret float %r
}

Debug output for llc shows that we propagated FMF from the calls to the DAG nodes correctly:

t5: f32 = fmaxnum nnan ninf nsz arcp contract afn reassoc t2, t4

vs.

t5: f32 = fmaxnum t2, t4

And x86 then converts the generic nodes to target-specific nodes and instruction selection produces the right asm:

maxss	%xmm1, %xmm0

vs.

movaps	%xmm0, %xmm2
cmpunordss	%xmm0, %xmm2
movaps	%xmm2, %xmm3
andps	%xmm1, %xmm3
maxss	%xmm0, %xmm1
andnps	%xmm1, %xmm2
orps	%xmm3, %xmm2
movaps	%xmm2, %xmm0

x86 is choosing not to use a libcall because inline code should be faster, but if you add 'minsize', you'll get a libcall to 'fmaxf'. So all possibilities should be covered, and SystemZ should be able to copy/share that logic.

Indeed, it looks like this will work; thanks for the suggestion!

However, I still think that this should be done by default in common code; if targets prefer something else they can always override it, but the default behavior of common code should be to not create dependencies on libm out of thin air ...

I've now implemented this as D70965, and it seems to work for me.

I've now committed D70965, so I think this can be closed.

jonpa abandoned this revision.Dec 5 2019, 5:16 PM