The motivating case are the changes in vector-reduce-add.ll where
we were doing extra work in the scalar domain instead of shuffling.
There may be some one use check that needs to be looked into there,
but this patch sidesteps the issue by avoiding broadcasts that
aren't really broadcasting.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Comment Actions
Is this going to interfere with folding AVX512 broadcast loads into an instruction at all?
More generally, broadcast is preferable if the input is a foldable load (immediate shifts can't fold), but I think combineX86ShuffleChain should handle this.
Comment Actions
Are you asking about v8i16 where I switched the shift priority?
I wish we knew the original VT. In the motivating case the shuffle has been widened. So there’s a bitcast that makes the possibility of folding unlikely.
Comment Actions
Yes, but I think combineX86ShuffleChain will try to convert this to a foldableload+broadcast if it can so we should be OK.
LGTM