I was looking back at the shouldScalarizeBinop() hook, and found that if I used the X86 implementation of it, nothing changed on SPEC.
I then also remembered your discussion that perhaps it would be better to keep BUILD_VECTOR nodes during combine2, instead of replacing with SYSTEMZ::BYTE_MASK during legalization. I decided to try this and this is the patch I have.
It seems to be not too complicated to do this, since apart from the handling in Select() it is enough to redefine the z_vzero and z_vones nodes to recognize BUILD_VECTORs instead, and the pattern matching will work as before.
I am not quite sure if this is the best solution, but as it is now tryBuildVectorByteMask() is used first during legalization to build a new BUILD_VECTOR with the right constants, and then again in Select() to get the same mask back again. I first thought it would be possible to just leave the BUILD_VECTORS during legalization, but then I found a case where this doesn't
work which involved ConstantFP<nan>, which ended up in the constant pool.
First observations on benchmarks is that just one file (462.libquantum/build/qec.s) changes like (inside a vectorized loop):
vno : 203 193 -10 vnc : 149 151 +2 vn : 557 555 -2
This is a surprisingly small improvement. Perhaps some piece is missing to unlock more improvements?
With this patch in place, I tried the shouldScalarizeBinop() hook again (copied from X86), and now one additional file changed (454.calculix/build/InpMtx_init.s) like
xilf : 4922 4942 +20 vno : 193 190 -3 tmll : 14386 14385 -1 jne : 15139 15138 -1 la : 199056 199055 -1 vlgvf : 1266 1265 -1
It seems this is a loop with many extracts and test-under-mask:s, that now do a scalar xilf before each tmll...
The handling of replication of constant in lowerBUILD_VECTOR() is perhaps the next step after this to rework in a similar way...