This is an archive of the discontinued LLVM Phabricator instance.

[X86] Scalar mask and scalar move optimizations
ClosedPublic

Authored by tkrupa on May 17 2018, 5:10 AM.

Details

Summary
  1. Introduction of mask scalar TableGen patterns.
  2. Introduction of new scalar move TableGen patterns and refactoring of existing ones.
  3. Folding of pattern created by introducing scalar masking in Clang header files.

Diff Detail

Event Timeline

tkrupa created this revision.May 17 2018, 5:10 AM
tkrupa edited the summary of this revision. (Show Details)May 17 2018, 5:12 AM
rja accepted this revision.May 17 2018, 1:37 PM
rja added a subscriber: rja.

looks fine

This revision is now accepted and ready to land.May 17 2018, 1:37 PM
craig.topper added inline comments.May 17 2018, 1:51 PM
lib/Target/X86/X86ISelLowering.cpp
32206

use cast not dyn_cast

craig.topper requested changes to this revision.May 17 2018, 2:55 PM
This revision now requires changes to proceed.May 17 2018, 2:55 PM
echristo removed a reviewer: rja.May 17 2018, 2:57 PM
echristo added a subscriber: echristo.

Removing rja from the reviewers line here.

Is it possible to add tests for any patterns that are new? It's not good to add untested code.

Yes we could, but only for X86ISelLowering folding and FMA patterns - the other ones are multiclasses without any definitions, so they never alter anything at this point. They are intended to be a basis for several upcoming patches which will add defms for these patterns with the tests alongside them. I'm OOO tomorrow so I'll be able to add the tests for first two cases on Monday.

tkrupa updated this revision to Diff 147741.May 21 2018, 1:29 AM

Added tests for ISelLowering combining and FMA patterns.

tkrupa marked an inline comment as done.May 21 2018, 1:30 AM

Add these tests with the current codegen so the patch shows the diff improvement?

Add these tests with the current codegen so the patch shows the diff improvement?

As in a separate patch?

Just commit the test files to trunk with the current codegen. No need for a phabricator review. Then rebase this patch to show the diff with the new codegen

tkrupa updated this revision to Diff 147844.May 21 2018, 1:05 PM

Test diffs are now visible.
Note: there is no test for div in combine-select.ll - div pattern gets split into 3 basic block with a branch condition, so this optimization doesn't apply to it.

craig.topper added inline comments.May 23 2018, 10:22 AM
lib/Target/X86/X86ISelLowering.cpp
32212

I don't know if you can use ISD::SELECT here if you don't have the ANDing with 1. ISD::SELECT definition is that the condition value is either 0 or 1 regardless of how many bits it is. If you pass in the raw X you violate this rule.

tkrupa added inline comments.May 24 2018, 12:19 AM
lib/Target/X86/X86ISelLowering.cpp
32212

I think I can - X86TargetLowering::LowerSELECT takes care of that when operands are scalar and subtarget has AVX512 which is always the case here. I'm talking about this code:

// AVX512 fallback is to lower selects of scalar floats to masked moves.
if ((VT == MVT::f64 || VT == MVT::f32) && Subtarget.hasAVX512()) {
  SDValue Cmp = DAG.getNode(ISD::SCALAR_TO_VECTOR, DL, MVT::v1i1, Cond);
  return DAG.getNode(X86ISD::SELECTS, DL, VT, Cmp, Op1, Op2);
}

It's not really a question of whether it will be lowered correctly. Target independent DAG combines call getBooleanContents and expect that the condition for a target independent ISD:;SELECT node obey that.

craig.topper added inline comments.May 24 2018, 12:09 PM
lib/Target/X86/X86ISelLowering.cpp
32204

This isScalarInteger check is unnecessary, ConstantSDNodes can only be scalar, and the condition for an ISD::SELECT can only be scalar which means the AND would have to be scalar.

32209

The tests all pass if I change this line to

SDValue Mask = AndNode;

We have a later combine that removes the mask once we've converted to v1i1.

tkrupa updated this revision to Diff 148551.May 25 2018, 12:14 AM
tkrupa edited the summary of this revision. (Show Details)
tkrupa edited the summary of this revision. (Show Details)
tkrupa marked 3 inline comments as done.
tkrupa updated this revision to Diff 148575.May 25 2018, 3:39 AM

Fixed fma mask3 pattern.

craig.topper added inline comments.May 25 2018, 10:55 AM
lib/Target/X86/X86ISelLowering.cpp
32207

You might still need a truncate on the AND if it isn't an i8

lib/Target/X86/X86InstrFMA.td
369

This should be [HasFMA, NoAVX512]. And the AVX512InstrInfo.td needs an equivalent set of patterns mapped to the unmasked EVEX instructions.

tkrupa updated this revision to Diff 148786.May 28 2018, 1:08 AM
tkrupa marked 2 inline comments as done.
This revision is now accepted and ready to land.May 28 2018, 11:43 AM
This revision was automatically updated to reflect the committed changes.