This patch started off much more general and ambitious, but it's been a nightmare seeing all the ways x86 vector codegen can go wrong.
So the code is still structured to allow extending easily, but it's currently limited in several ways:
- Only handle cases with an extending load.
- Only handle cases with a zero constant compare.
- Ignore setcc with vector bitmask (SetCCWidth != 1) - so AVX512 should be unaffected.
The motivating case from PR37427:
https://bugs.llvm.org/show_bug.cgi?id=37427
...is the 1st test, and that shows the expected win - we eliminated the unnecessary intermediate cast.
There's a clear regression in the last test (sgt_zero_fp_select) because we longer recognize a 'SHRUNKBLEND' opportunity. I think that general problem is also present in sgt_zero, so I'm hoping we can fix that more generally in a follow-up. We need to match a sign-bit setcc from a sign-extended operand and remove it.