Integer vector extractions and insertions are accomplished with direct moves which take about 5 cycles.
Single precision floating point values just need to be aligned and converted (and inserted with a vperm in the insert case).
Double precision floating point values just need to be aligned (and inserted using an xxpermdi in the insert case).
For Power9, 32-bit values can be inserted into a vector without a vperm so do not require loading a permute mask.
This patch reflects these aspects of the operations in getVectorInstrCost.
This seems to be the point where all of the interesting changes are, so I'll add my comments here.
I would recommend that some of these values (i.e, DirectMoveCost be defined somewhere else, either as an enumeration or #define. I expect that these costs will change over time (with the hardware), and it would be good to have a clear and convenient mechanism to represent that as opposed to a bunch of condition checks in this function.
Off the top of my head, I can think of a few general ways to design this:
I'm sure there are other possibilities as well.