Improves the 8 byte case from PR42674.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
- Build Status
Buildable 36580 Build 36579: arc lint + arc unit
Event Timeline
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
35451 | we can easily support 2i8/4i8 as well by replacing this with an insertion into a zero v16i8 vector |
Comment Actions
That doesn’t seem profitable for v2i8. We’d be better off extracting both elements and doing a scalar add. For v4i8, I’m not sure. Psadbw is 5 cycles on some CPUs if I remember right, the normal expansion is probably faster on those CPUs.
we can easily support 2i8/4i8 as well by replacing this with an insertion into a zero v16i8 vector