With predicate masks, AVX512 can efficiently perform variable-index vector insertion with 2 broadcasts + 1 comparison, avoiding a lot of aliased memory traffic.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
| llvm/test/CodeGen/X86/insertelement-var-index.ll | ||
|---|---|---|
| 3–5 | Worth adding a shared prefix for "AVX1or2", so we don't get so much duplication? | |
Comment Actions
LGTM - see inline for a couple of minors.
| llvm/lib/Target/X86/X86ISelLowering.cpp | ||
|---|---|---|
| 18829–18832 | Could use DAG.getSplatBuildVector() for both of these for slightly less code. | |
| 18839 | We should have a code comment to describe the pattern: | |
Could use DAG.getSplatBuildVector() for both of these for slightly less code.