This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] Wait with selection of VREPI and VGM until after DAGCombine2.
AbandonedPublic

Authored by jonpa on Feb 7 2019, 1:40 PM.

Details

Reviewers
uweigand
Summary

The lowerBUILD_VECTOR() handling of constant splats refactored into two methods: analyzeBVNForConstantReplication() is used both during lowering and in Select() to analyze the BVNs. SystemZDAGToDAGISel::tryReplicateConstantSplat() then performs the actual instruction selection.

This is a continuation of the handling of constant BVNs during legalization with the idea to expose more constant vectors to Combine2. The same problem with FP constants as was seen with VGBM persists, and this time it seems to have more impact on SPEC.

I see

spec-llvm_A_master/ spec-llvm
vrepg          :                 3279                 3621     +342
vgmg           :                  344                   25     -319
larl           :               153427               153664     +237
ldeb           :                 8511                 8728     +217
vl             :                22873                23025     +152
vst            :                24059                24131      +72
vlrepf         :                  688                  709      +21
vmrhg          :                 1154                 1170      +16
vmrhf          :                  728                  740      +12
vgbm           :                 3887                 3898      +11
...
Spill|Reload   :               189582               189793     +211

There are many more larls, which should be due to many more ConstantFP vectors loaded from the constant pool. It seems these are the FP splats present in SPEC:

206 BUILD_VECTOR ConstantFP:f64<1.000000e+00>, ConstantFP:f64<1.000000e+00>
 96 BUILD_VECTOR ConstantFP:f64<5.000000e-01>, ConstantFP:f64<5.000000e-01>
 17 BUILD_VECTOR ConstantFP:f64<2.000000e+00>, ConstantFP:f64<2.000000e+00>
 12 BUILD_VECTOR ConstantFP:f64<-2.000000e+00>, ConstantFP:f64<-2.000000e+00>
  8 BUILD_VECTOR undef:f64, ConstantFP:f64<2.000000e+00>
  8 BUILD_VECTOR ConstantFP:f32<nan>, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<nan>, ConstantFP:f32<0.000000e+00>
  4 BUILD_VECTOR ConstantFP:f64<1.250000e-01>, ConstantFP:f64<1.250000e-01>
  4 BUILD_VECTOR ConstantFP:f32<1.000000e+00>, ConstantFP:f32<1.000000e+00>, ConstantFP:f32<1.000000e+00>, ConstantFP:f32<1.000000e+00>
  4 BUILD_VECTOR ConstantFP:f32<1.000000e+00>, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<1.000000e+00>, ConstantFP:f32<0.000000e+00>
  3 BUILD_VECTOR ConstantFP:f32<0.000000e+00>, ConstantFP:f32<1.000000e+00>, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<1.000000e+00>
  2 BUILD_VECTOR ConstantFP:f32<nan>, ConstantFP:f32<nan>, ConstantFP:f32<nan>, ConstantFP:f32<nan>
  1 BUILD_VECTOR ConstantFP:f64<INF>, ConstantFP:f64<INF>
  1 BUILD_VECTOR ConstantFP:f64<7.812500e-03>, ConstantFP:f64<7.812500e-03>
  1 BUILD_VECTOR ConstantFP:f32<5.000000e-01>, ConstantFP:f32<5.000000e-01>, ConstantFP:f32<5.000000e-01>, ConstantFP:f32<5.000000e-01>
  1 BUILD_VECTOR ConstantFP:f32<0.000000e+00>, ConstantFP:f32<5.000000e-01>, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<5.000000e-01>

Not sure what to do next - does this mean we should consider again improving the handling of ConstantFP nodes in the backend, or should we abandon this?

Tests with FP splats that are no longer supported have been deleted.

Note: tryReplicateConstantSplat() first calls SelectCode() on the bitcast and then on the REPLICATE / ROTATE_MASK. Not entirely sure if that's wise, or if getMachineNode() should be called instead.

Diff Detail

Event Timeline

jonpa created this revision.Feb 7 2019, 1:40 PM

Well, for replication we definitely need proper float support. For VGBM, we could ignore floats since (except for the all-zero and all-one pattern) there aren't really any common FP constants that can be created via a VGBM pattern. But that isn't true at all for replication ...

Why doesn't this work properly for ContantFP nodes, again?

jonpa added a comment.Feb 8 2019, 8:46 AM

Well, for replication we definitely need proper float support. For VGBM, we could ignore floats since (except for the all-zero and all-one pattern) there aren't really any common FP constants that can be created via a VGBM pattern. But that isn't true at all for replication ...

Ok

Why doesn't this work properly for ContantFP nodes, again?

Any ConstantFP nodes that are not giving 'true' from isFPImmLegal() seem to end up in the constant pool.

The options as we tried before seem to be:

  1. return true from isFPImmLegal() for the values that fit VREPI / VGM, and *either*
    • (a) Add handling so that also the scalar ConstantFP nodes can be selected
    • (b) Extend isFPImmLegal() to have an argument so that we only return true for the constant splat BVN case.
  1. Rebuild the BVN to have TargetConstantFP operands so that they are not touched by common-code.
  1. Rebuild the BVN to have integer constant operands and bitcast to the FP vector type.

I wonder if "1a" would also help scalar FP code a bit? In that case that might be an interesting option.

I don't think that any alternative is really simple and neat to implement, except maybe "1b".

I think 1a would be the best option, indeed.

jonpa updated this revision to Diff 186042.Feb 8 2019, 2:10 PM

I think 1a would be the best option, indeed.

OK, added improvement of isFPImmLegal(). It seems that VGM covers all cases for FP immediates on benchmarks, so VREPI is not considered in analyzeFPImm().

This is the change of instruction counts after just extending isFPImmLegal() to call analyzeFPImm() and to then have emitFPScalarImm() handle the new FP??ScalarImmPseudo:s:

vgmg           :                  344                 4825    +4481
larl           :               153427               150578    -2849
ldeb           :                 8511                 6092    -2419
ldr            :                25172                24061    -1111
ld             :                34810                34163     -647
vst            :                24059                24602     +543
vl             :                22873                23401     +528
std            :                23550                23038     -512
lde            :                 9404                 8970     -434
j              :               120185               120336     +151
lg             :               374157               374024     -133
ste            :                 9160                 9073      -87
cebr           :                 1450                 1535      +85
ceb            :                  526                  444      -82
aebr           :                 5217                 5290      +73
aeb            :                 2577                 2506      -71
wfsdb          :                 7471                 7408      -63
sdbr           :                 3062                 3123      +61
meeb           :                 3580                 3522      -58
...
Spill|Reload   :               189582               189140     -442

The further difference to then as well keeping the BUILD_VECTOR opcode through DAGCombine2 (as before), is now merely

vgbm           :                 3887                 3895       +8
vgmg           :                 4825                 4817       -8

This involves the tryReplicateConstantSplat() and analyzeBVNForConstantReplication() methods.

Would it help to split this patch into two parts per above?

jonpa added a comment.Feb 9 2019, 9:40 AM

I made a new post for just the handling of scalar FP immediates with VGM: https://reviews.llvm.org/D58003

It has updated tests as well as a new one, and it seems we could do that before handling the constant BUILD_VECTORS here.

jonpa updated this revision to Diff 186546.Feb 12 2019, 2:55 PM

Patch rebased.

As before, this gives very little change on benchmarks (only the same previously seen 8 x vgmg -> vgbm).

Removed FP tests that require replication (vec-const-11.ll and vec-const-12.ll), as well as tests in vec-const-18.ll that demand a VGMF for a <2 x double> vector. As before, this is lost functionality that does not affect benchmarks.

Again, I am somewhat hesitant to use the two SelectCode() calls in tryReplicateConstantSplat(), but it seems to work although I cannot find another example of this usage. The alternative is to select a machine opcode for Op instead.

lib/Target/SystemZ/SystemZISelLowering.cpp