If we can not prove that f16 operands of a buildvector are canonicalized, then we can not lower into a V_PACK. In this scenario, we would previously lower into some combination of and(sdwa), shr, or. This patch allows for matching into V_PERM instead -- which uses additional SGPR (or encodes the literal in the instruction itself), but has less VALU latency.
Change-Id: Ifa4a74fdb81ef44f22ba490c7fdf81ec8aebc945
Can use a class or foreach over the types to avoid repeating the same pattern twice