This is a follow up patch for D87236 based on the comments about the test regressions.
The new code is even slightly better than it was originally because the first sign bit flip is pulled before the first shuffle. Maybe this should also be done for lowering for 8 bit signed minimums and maximums.
An entirely alternative approach to fix the regression might be to switch to the "sign bit flip" method depending on some heuristic based on the number of chained UMIN/UMAX instructions. For example when calculating the median of 3 integers via max(min(x, y), min(max(x, y), z)), we also want to use the old way, I think. However this is more complex to implement.
Is it worth pulling this in as well?