The proposal in D56796 may cross the line because we're trying to avoid vectorization transforms in generic DAG combining. So this is an alternate, later, x86-specific translation of that patch.
I've avoided all potentially controversial transforms such as extraction from a non-zero element of a vector, so all test diffs here are a clear win AFAIK.