This slightly increases the costs of InsertElement instructions that are part of a vector splat sequence, i.e. a load, InsertElement and a shuffle. The resulting LD1R is a high latency instruction, and this slight increase in costs avoids SLP vectorisation for a couple of cases where this isn't profitable.
SPEC 2017 FP and INT performance results with this change are completely neutral so only seem to affect cases like the changed regression tests.
Ah, only after uploading this diff I noticed that the function names indicate that this should be profitable... I had missed that.
Hmmm.... I guess that then needs looking into.