The perfect shuffle tables encode a cost of either 0 (a nop-copy) or 1 (a single instruction) with a cost encoding of 0 in the upper 2 bits. All perfect shuffles with any cost are then marked as legal shuffles though (the maximum encoded cost is 3), which can confuse the DAG combiner into thinking the shuffles are cheaper than the should be.
Limiting legal shuffles to single instructions seems to do better in most case, producing less instructions for complex shuffles. There are some cases that now become tbl, which may be better or worse depending on whether the instruction is in a loop and the tbl load can be hoisted out.
This comment is a bit cryptic for me. The description of this ticket is clear though, perhaps you adopt some of that rationale here.