(copied from original post:)
The patch checks the other operands of the single-lane (W) vector instruction. If all are already allocated to an FP (<16) register, then a memfold pseudo for the mapped fp memory instruction is used (since the register allocator has already begun to spill, it seems less promising to try to constrain any not yet allocated register to FP registers and do the fold also in such cases).
New mappings are added from W... to FP instructions. Since any instruction that is mapped to a memory instruction in getMemOpcode() has to be correctly treated, I wanted to be sure that we only see expected opcodes by recognizing them all in foldMemoryOperandImpl() and then having a check to see that there are no stray vector instructions showing up. It seems to me now that it would probably be more reasonable to just check any instruction at that point for vector registers allocated >15 and in that case return nullptr. In other words trust the mapping in SystemZInstrFormats.td and allow any folding by it as long as there are no FP16-31 registers allocated. Waiting for some feedback before changing this...
The MemFoldCopies stat is still low and acceptable, I think: 37 on a SPEC 17 build. This is when the regalloc evicts one of the registers of the memfold pseudo and a COPY then later has to be built in SystemZPostRewrite.cpp.
Added commutation flags on WFA/WFM - I hope this is ok and also that there are no unwanted implications regarding fp semantics with this patch.
LIS->getRegUnit() will compute the LiveInterval for CC. This is now needed in two places so it was moved to the top of the function and so always called (should not be a compile time problem). Because of this, the kill flags on CC in int-cmp-56.mir are removed (which seems to always be done during regalloc).
So it is a bit weird that the vec->fp renaming is done here for Unary and Compare, but is not done (and instead done in the reverse direction by the MemFold pseudos) for Binary and Ternary. It would be nicer if this were symmetric.