This is applied on top of https://reviews.llvm.org/D78486.
Details
Diff Detail
Event Timeline
Patch rebased on again on top of https://reviews.llvm.org/D78486, and also reworked a bit according to latest changes to that patch.
With the current heuristics used here, which basically means to emit a VLLEZ + VLE only in specific cases where there for instance is no other use of the original load (see comments in tryVLLEZ), it seems that it no longer provides any benefit.
The original improvement I saw here was with imagick, but that is already improved with D78486, and improves no further with this. And per the heuristic of that patch that a single VPERM is better than two sequential instructions, it seems actually wrong to introduce a pair of VLLEZ + VLE.
(In order to get imagick to change at all with this I had to remove the checks for a single use of the load. I think that check is needed though, in order to be sure to only use max. two loads instead of one. Only one file on SPEC'17 changes with this as it is. I don't think it should be more aggressive. For instance the case with a single loaded element into a zero vector should already be handled, I hope. I am not quite sure if there could be some other cases that might make sense to handle with different heuristics in tryVLLEZ).
I think this can be abandoned once D78486 has been committed.