This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine][X86] Simplify demanded elts in SSE intrinsics with repeated args (PR24523)
Changes PlannedPublic

Authored by RKSimon on Mar 27 2020, 8:03 AM.

Details

Summary

If we are repeating the same argument in SSE cmp/min/max ops and the op is the only user of that arg value, then we know that these are the only demanded elts we require and can be more aggressive with the recursive SimplifyDemandedElts call.

This required the addition of a Value::isOnlyUserOf helper, similar in function to the SDNode version.

Fixes PR24523.

Diff Detail

Event Timeline

RKSimon created this revision.Mar 27 2020, 8:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 27 2020, 8:03 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
spatel added a comment.Apr 1 2020, 9:36 AM

These tests show a set of missed optimizations, so I recommend taking a step back and separate this into a few patches:

  1. Fold x86 min/max intrinsics better - if operands are identical, the min/max simplifies away.
  2. Fold x86 cmp intrinsics better (thought we had a bug report for this, but I don't see it now) - if operands are identical, the compare can simplify away (see SimplifyFCmpInst()) or change predicate.
  3. Improve demanded elements analysis with isOnlyUserOf() - use generic opcode like 'mul' to show that improvement (independent of x86).
  4. Improve demanded elements analysis of x86 min/max/cmp - the x86 part of this patch, but with different tests to show the win with different operands.

The first 3 are independent/parallel. The first 2 raise a potential problem that I don't know the answer to: what happens to target-specific intrinsics in a strict FP environment? Do we need to bypass the folds in that case? Is there some existing code that we can look at that deals with that situation?

spatel added a comment.Apr 1 2020, 9:52 AM
  1. Improve demanded elements analysis of x86 min/max/cmp - the x86 part of this patch, but with different tests to show the win with different operands.

On 2nd thought, I doubt there will be anything left to do for this if the other patches are in place...

The first 3 are independent/parallel. The first 2 raise a potential problem that I don't know the answer to: what happens to target-specific intrinsics in a strict FP environment? Do we need to bypass the folds in that case? Is there some existing code that we can look at that deals with that situation?

I think that this is an existing problem based on the similar demand-based simplifications near the code diff in this patch and transforms in InstCombiner::visitCallInst(), so those patches don't need to be gated on an answer.

The first 2 raise a potential problem that I don't know the answer to: what happens to target-specific intrinsics in a strict FP environment? Do we need to bypass the folds in that case?

In general, we need new intrinsics for accessing the FP environment, I think; the existing ones are readnone.

kpn added a subscriber: kpn.Apr 3 2020, 12:34 PM

Didn't Andy Kaylor have a plan to pass strict FP metadata through using (I think he called it) bundles?

RKSimon planned changes to this revision.Jun 22 2020, 12:15 PM