With predicate masks, AVX512 can efficiently perform variable-index vector insertion with 2 broadcasts + 1 comparison, avoiding a lot of aliased memory traffic.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/CodeGen/X86/insertelement-var-index.ll | ||
---|---|---|
3–5 | Worth adding a shared prefix for "AVX1or2", so we don't get so much duplication? |
Comment Actions
LGTM - see inline for a couple of minors.
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
18829–18832 | Could use DAG.getSplatBuildVector() for both of these for slightly less code. | |
18839 | We should have a code comment to describe the pattern: |
Could use DAG.getSplatBuildVector() for both of these for slightly less code.