Hi,
This patch re-implements the work to disable the vzeroupper insertion pass
on PS4 based on review feedback from Hal and Sean.
I am not sure whether there are other processors that behave like Jaguar
when it comes to writing YMM registers.
Differential D16837
Disable the vzeroupper insertion pass on PS4 ygao on Feb 2 2016, 7:32 PM. Authored by
Details Hi, I am not sure whether there are other processors that behave like Jaguar
Diff Detail
Event TimelineComment Actions As long as the consequence of running such code on a non-btver2 CPU is merely performance, not correctness. Comment Actions My understanding is that this should only affect performance. The problem is when you mix legacy SSE instructions with AVX instructions. Legacy SSE instructions do not affect the upper 128-bits of the YMM registers. This may cause false dependencies due to partial register writes. So, if a library is built for a non AVX CPU (or if the library cannot avoid using legacy SSE code), the absence of vzeroupper in the code has the potential of causing stalls due to false dependencies (when there is a AVX-SSE transition). On AMD Fam 15h processors (and Btver2) there is no penalty for AVX-SSE transitions. This is an important difference with respect to Intel processors where, for each SSE-AVX transition, the hardware saves and restores the upper 128 bits of the YMM registers. I think that is the reason why on Intel, vzeroupper is very fast, while on btver2 vzeroupper is microcoded (and extremely slow!). Comment Actions I definitely remember there was some concern (or incident?) over correctness, In this patch I was setting the feature bit on btver2, but it probably also |