This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] Wait with VGBM selection until after DAGCombine2.
ClosedPublic

Authored by jonpa on Jan 24 2019, 6:14 AM.

Details

Reviewers
uweigand
Summary

I was looking back at the shouldScalarizeBinop() hook, and found that if I used the X86 implementation of it, nothing changed on SPEC.

I then also remembered your discussion that perhaps it would be better to keep BUILD_VECTOR nodes during combine2, instead of replacing with SYSTEMZ::BYTE_MASK during legalization. I decided to try this and this is the patch I have.

It seems to be not too complicated to do this, since apart from the handling in Select() it is enough to redefine the z_vzero and z_vones nodes to recognize BUILD_VECTORs instead, and the pattern matching will work as before.

I am not quite sure if this is the best solution, but as it is now tryBuildVectorByteMask() is used first during legalization to build a new BUILD_VECTOR with the right constants, and then again in Select() to get the same mask back again. I first thought it would be possible to just leave the BUILD_VECTORS during legalization, but then I found a case where this doesn't
work which involved ConstantFP<nan>, which ended up in the constant pool.

First observations on benchmarks is that just one file (462.libquantum/build/qec.s) changes like (inside a vectorized loop):

vno            :                  203                  193      -10
vnc            :                  149                  151       +2
vn             :                  557                  555       -2

This is a surprisingly small improvement. Perhaps some piece is missing to unlock more improvements?

With this patch in place, I tried the shouldScalarizeBinop() hook again (copied from X86), and now one additional file changed (454.calculix/build/InpMtx_init.s) like

xilf           :                 4922                 4942      +20
vno            :                  193                  190       -3
tmll           :                14386                14385       -1
jne            :                15139                15138       -1
la             :               199056               199055       -1
vlgvf          :                 1266                 1265       -1

It seems this is a loop with many extracts and test-under-mask:s, that now do a scalar xilf before each tmll...

The handling of replication of constant in lowerBUILD_VECTOR() is perhaps the next step after this to rework in a similar way...

Diff Detail

Event Timeline

jonpa created this revision.Jan 24 2019, 6:14 AM

I am not quite sure if this is the best solution, but as it is now tryBuildVectorByteMask() is used first during legalization to build a new BUILD_VECTOR with the right constants, and then again in Select() to get the same mask back again. I first thought it would be possible to just leave the BUILD_VECTORS during legalization, but then I found a case where this doesn't work which involved ConstantFP<nan>, which ended up in the constant pool.

Can you explain this further? If you leave the BUILD_VECTOR as-is, where is it pushed into the constant pool?

The handling of replication of constant in lowerBUILD_VECTOR() is perhaps the next step after this to rework in a similar way...

That would probably make sense.

It seems this is a loop with many extracts and test-under-mask:s, that now do a scalar xilf before each tmll...

Well, that's certainly not good.

jonpa added a comment.Jan 25 2019, 8:23 AM

Can you explain this further? If you leave the BUILD_VECTOR as-is, where is it pushed into the constant pool?

If Op is returned unchanged from lowerBUILD_VECTOR(), SelectionDAG::legalize() will later call Legalizer.LegalizeOp(N), with the ConstantFP operand of the BUILD_VECTOR. Since it is not a TargetConstant, this will go on to find the Action to be 'Expand'. The ExpandNode() case for ConstantFP will call TLI.isFPImmLegal() which return false (in most cases), and therefore ExpandFPConstant() is called, which returns a load from the constant pool.

It is in fact not just NaN ConstantFPs that is in question here, but rather any ConstantFP that is a candidate for VGBM. In order to select into VGBM later, it seems we need to detect this case here and produce a new BUILD_VECTOR, either with i8 operands, or perhaps with TargetConstantFP operands, which makes a bit more sense in a way.

I also discovered that it is not so good to replace undef operands with 0, as the patch currently does by using the Mask produced by tryBuildVectorByteMask(). In cases where some elements are all-ones and the rest undef, it is best to keep it that way so that SD::isBuildVectorAllOnes() can return true.

jonpa updated this revision to Diff 183548.Jan 25 2019, 8:37 AM

Patch updated per previous comment so that it returns all integer BVNs unmodified, and for FP a new BVN is build with TargetConstantFP operands, to avoid expansion into the constant pool.

I had to adjust two vllez patterns to make all tests pass.

Due to keeping the undef operands instead of making them explicitly 0, one more file on benchmarks is slightly improved, which gives 2 in total :-)

Not sure if it would simplify the code a bit to build integer (16xi8) BVNs instead of insisting on keeping the FP operands as TargetConstantFP:s.

I am still using tryBuildVectorByteMask() in two places which I think saves some duplicated code even though the Mask isn't used during legalization.

The ExpandNode() case for ConstantFP will call TLI.isFPImmLegal() which return false (in most cases), and therefore ExpandFPConstant() is called, which returns a load from the constant pool.

If we actually can load the FP constant using an immediate vector instruction, shouldn't we then return true from TLI.isFPImmLegal? Maybe that's the real underlying problem ...

jonpa added a comment.Jan 29 2019, 2:01 AM

If we actually can load the FP constant using an immediate vector instruction, shouldn't we then return true from TLI.isFPImmLegal? Maybe that's the real underlying problem ...

I actually tried that earlier, and found that it worked as you say in the case where BUILD_VECTOR can be selected into VGBM. However, an FP-immediate (outside any BUILD_VECTOR) then crashes isel, since there is no handling for it other than loading it from the constant pool.

I was also a bit skeptical about using isFPImmLegal since there is no way to tell if this is a scalar or vector element constant. However, I tried this now again, and found that this actually seems to work out fairly well in lower_BUILD_VECTOR:

  • If all fp-elements are legal, we can return it unmodified just as with an int vector and emit VGBM in Select().
  • If not all fp-elements are legal, we return SDValue(), so the entire vector constant is loaded from the constant pool.

This is just what we want, so the question now is what to do with the new legal scalar FP immediates. We could either decide to generate them with VGBM, or we could keep them in the constant pool where they are now. There are plenty of reg-mem FP instructions, and I suspect we would like to keep using them? In that case I think we could either extend isFPImmLegal with an argument that informs us to make the right decision depending on scalar/vector context, or we could possibly say that the scalar constant is legal, but we decide to load it from the constant pool in Select().

jonpa updated this revision to Diff 184585.Jan 31 2019, 1:09 PM

Handle the FP case by building a v16i8 vector instead of using TargetFPConstants (or isFPImmLegal).

NFC to using TargetFPConstants.

jonpa updated this revision to Diff 185393.Feb 5 2019, 1:57 PM

The support of VGBM for non-zero FP vectors has been removed, which greatly simplifies the patch. This is NFC on SPEC compared to previous version of patch, except for in one file where the floating point VGBM mask of 0xf0f0 is selected as VGMG insteadof VGBM (1 file, 8 places).

test functions in vec-const-05.ll and vec-const-06.ll for VGBM masks of non-zero FP vectors removed.

uweigand accepted this revision.Feb 6 2019, 2:26 AM

Yes, I think this version makes most sense. LGTM.

This revision is now accepted and ready to land.Feb 6 2019, 2:26 AM
jonpa closed this revision.Feb 6 2019, 11:02 AM

Thanks for review. r353325.