Perfect shuffle was introduced into PowerPC backend years ago, and only available in big-endian subtargets.
This optimization has good effects in simple cases, but brings serious negative impact in large programs with many shuffle instructions sharing the same mask. D116801 fixes the issue in those programs, but still causes performance degradation similar to disabling perfect shuffle.
So I propose introducing a temporary backend hidden option to control it until we implemented better way to fix the gap in vectorshuffle decomposition.