This is the IR optimizer follow-on patch for D8563: the x86 backend patch that converts this kind of shuffle back into a vperm2.
This is also a continuation of the transform that started in D8486. In that patch, Andrea suggested that we could convert vperm2 intrinsics that use zero masks into a single shuffle. This is an implementation of that suggestion.
I recognize that we could go even further into bit twiddling hackery to make the code a line or two shorter, but I thought it would hurt readability.
You can move this logic after line 235.