According to https://www.ibm.com/docs/en/xl-c-and-cpp-linux/16.1.1?topic=functions-vec-promote, elements not specified by the input index argument are undefined. So that we don't need to set these elements to be zeros.
Details
- Reviewers
nemanjai shchenz stefanp amyk - Group Reviewers
Restricted Project - Commits
- rG1ceaec3e8104: [PowerPC][altivec] Optimize codegen of vec_promote
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/lib/Headers/altivec.h | ||
---|---|---|
14662 | Using __builtin_shufflevector generates poison values, which is stronger than undef, exposing more optimizations in my view. See https://llvm.org/docs/LangRef.html#id1781. | |
llvm/test/CodeGen/PowerPC/vec-promote.ll | ||
1 | I prefer keeping this file
|
clang/lib/Headers/altivec.h | ||
---|---|---|
14662 | Also using __builtin_shufflevector is explicit, which has a formal specification https://clang.llvm.org/docs/LanguageExtensions.html#langext-builtin-shufflevector. |
LGTM. This is a good idea and we should go ahead with this for anyone that uses vec_promote, but it might be a good idea to improve codegen for the insert which might be more common.
llvm/test/CodeGen/PowerPC/vec-promote.ll | ||
---|---|---|
43 | This code is absolutely terrible. Not only is the lfs super slow compared to lfiwzx/lxsiwzx that we actually want, but the two conversions and three permutes are super slow. I think the change to altivec.h to produce better code for something like that is a good thing, but I wonder if something like this might come up in other contexts. At least on Power9 and up, we can do much better than this. We don't do particularly well regardless of whether we're using a zero vector input or an arbitrary vector: https://godbolt.org/z/79fx8nsdP |
but it might be a good idea to improve codegen for the insert which might be more common.
Yes. Usually insertelement is transformed to BUILD_VECTOR in SDAG, currently we don't have much optimization for BUILD_VECTOR in special patterns.
Could we just define it without initialization? This can also make undefined vector.