This patch fixes pr31144.
Power8 has MTVSRWZ but no LXSIBZX/LXSIHZX, so move 1 or 2 bytes to VSR through MTVSRWZ is much faster than store the extended value into stack and load it with LXSIWZX.
Differential D27287
[PPC] Prefer direct move on power8 if load 1 or 2 bytes to VSR Carrot on Nov 30 2016, 4:11 PM. Authored by
Details This patch fixes pr31144. Power8 has MTVSRWZ but no LXSIBZX/LXSIHZX, so move 1 or 2 bytes to VSR through MTVSRWZ is much faster than store the extended value into stack and load it with LXSIWZX.
Diff Detail Event Timeline
Comment Actions Other than the inline comment, I think this patch is fine but I'll let Hal have a look for the official stamp of approval.
|
This is really a property of the ISA rather than the actual CPU. I think a better way of accomplishing this would be to check that the subtarget does not have "hasP9Vector()" because that's where the LXSIBZX/LXSIHZX become available. Even if that means this has to become a member function then.