I also had to tweak one existing X86 combine to avoid a regression there. I don't think we need the IdxVal == 0 check on the out insert_subvector. From the other index checks, we know the we had a subvector with some number of 0 elements above it. We can safely drop those 0 elements inserting just the smaller subvector anywhere into the larger zero vector.
This helps our vXi1 code see the full concat operation and allow it optimize undef to a zero if there is already a zero in the concat. This helped us use a movzx instead of an AND in some of the tests. In those tests, one concat comes from SelectionDAGBuilder and the second comes from type legalization of v4i1->i4 bitcasts which uses an additional concat. Though these changes weren't my original motivation.
I'm looking at making X86ISelLowering's narrowShuffle emit a concat_vectors instead of an insert_subvector since concat_vectors is more canonical during early DAG combine. This patch helps prevent a regression from my experiments with that.
Are we sure that this works in the general IdxVal case?