[X86][SSE] Add cpu feature for aggressive combining to variable shuffles
As mentioned in D38318 and D40865, modern Intel processors prefer to combine multiple shuffles to a variable shuffle mask (PSHUFB/VPERMPS etc.) instead of having multiple stage 'fixed' shuffles which put more pressure on Port 5 (at the expense of extra shuffle mask loads).
This patch provides a FeatureFastVariableShuffle target flag for Haswell+ CPUs that prefers combining 2 or more fixed shuffles to a single variable shuffle (default is 3 shuffles).
The long term aim is to drive more of this from schedule data (probably via the MC) but we're not close to being ready for that yet.
Differential Revision: https://reviews.llvm.org/D41323