The patch uses different expanding way for vp.powi from the method of powi.
Vector powi is unrolled to multiple powi() libary calls in SelectionDAG, but the
method is not work for scalable vectors.
To support scalable vectors, the patch expands vp.powi at IR level. The
expanding way of vp.powi is based on compiler-rt/__powidf2.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Maybe I missed the rationale, but why not use the ExpandVectorPredicationPass for this?
Maybe I missed the rationale, but why not use the ExpandVectorPredicationPass for this?
LLVM could not expanding scalable vector type powi now. So this pass is not only for vp.powi, but also expanding scalable vector type powi in the future.
llvm/lib/CodeGen/ExpandPowi.cpp | ||
---|---|---|
157 | Sorry, they are my misuse. |
llvm/lib/CodeGen/ExpandPowi.cpp | ||
---|---|---|
17 | GlobalsModRef Probably uneeded? |
llvm/lib/CodeGen/ExpandPowi.cpp | ||
---|---|---|
83 | What's preventing using vp.icmp? |
llvm/lib/CodeGen/ExpandPowi.cpp | ||
---|---|---|
83 | It is only that I don't know how to construct vp.icmp/vp.fcmp instructions. |
llvm/lib/CodeGen/ExpandPowi.cpp | ||
---|---|---|
83 | I don't understand how make predicate to a pointer of Value. |
llvm/lib/CodeGen/ExpandPowi.cpp | ||
---|---|---|
83 | Should be something like this code from IRBuilder with the assert removed. Value *getConstrainedFPPredicate(CmpInst::Predicate Predicate) { assert(CmpInst::isFPPredicate(Predicate) && Predicate != CmpInst::FCMP_FALSE && Predicate != CmpInst::FCMP_TRUE && "Invalid constrained FP comparison predicate!"); StringRef PredicateStr = CmpInst::getPredicateName(Predicate); auto *PredicateMDS = MDString::get(Context, PredicateStr); return MetadataAsValue::get(Context, PredicateMDS); } |
llvm/lib/CodeGen/ExpandPowi.cpp | ||
---|---|---|
83 | Thank you for the recommendation. |
llvm/lib/CodeGen/ExpandPowi.cpp | ||
---|---|---|
58 | why "forward"? |
llvm/lib/CodeGen/ExpandPowi.cpp | ||
---|---|---|
58 | Sorry, it didn't make sense. I changed it to powi-expansion-loop. |
Apologies, I was away on holiday.
Thanks - I missed that the plan was also to support llvm.powi. I guess I just find ExpandPowi and ExpandVectorPredicationPass to be doing two very similar things (in this patch) with regards to vp.powi: expanding it into an equivalent set of operations; that seems unfortunate.
I get that scalable-vector llvm.powi is different, but so would many other scalable-vector intrinsics if the target doesn't support that operation: llvm.sin, llvm.cos, etc. So would we have passes for each intrinsic? If not, ExpandPowi seems too restrictive in its scope.
If we're supporting intrinsics, what about plain scalable-vector add on a target without scalable vectors, like x86?
I'd basically like to know how this fits in with some longer-term strategy about what we want to support for illegal scalable-vector operations, rather than this specific powi use-case. If we start to open the door to specific intrinsics, I think it'd help to have a well-defined rationale and plan in mind.
I agree with you that only expanding powi is too restrictive. I think at least we should expand all the math function in a pass. But I don't have no idea that whether we should expand scalable operations for target without scalable vectors?
Note that this pass doesn't scalarize
How would we expand the other math functions? Many of them are large and probably difficult to keep in vector form. We could scalarize them with a loop and use scalar libcalls. But that makes it very different than what we're doing for powi here.
How do envision sharing this code for llvm.powi. A lot of this code creates VP intrinsics. Do you have an abstraction plan?
My plan is use same expanding function but use true mask for its mask and the elementcount for its evl.
No test for RISC-V?
llvm/test/CodeGen/Generic/expand-powi.ll | ||
---|---|---|
3 | This needs a REQUIRES: x86-registered-target or it needs to be moved into the X86 directory. |
llvm/lib/CodeGen/ExpandPowi.cpp | ||
---|---|---|
128 | I think we should do a vp_icmp followed by a mask vp_reduce_or. |
All the existing tests for llvm.powi use a scalar exponent even when the result is a vector. Should vp.powi only accept scalar exponent?
This update does,
- Make vp.powi follows llvm.powi to only accept scalar exponent.
- Add tests for RISC-V.
- Update test cases.
But it still a test fail for ir unit test. I don't know how to debug it. I even
can not use gdb to trace it.
The below command about the test fails.
$ LLVM_SYMBOLIZER_PATH=./build/bin/llvm-symbolizer ./build/unittests/IR/./IRTests ... [ RUN ] VPIntrinsicTest.VPIntrinsicDeclarationForParams IRTests: /home/yeting/x86-riscv-llvm/llvm/include/llvm/ADT/ArrayRef.h:255: const T& llvm::ArrayRef<T>::operator[](size_t) const [with T = llvm::Type*; size_t = long unsigned int]: Assertion `Index < Length && "Invalid index!"' failed. #0 0x00005620c5f909ee llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/yeting/x86-riscv-llvm/llvm/lib/Support/Unix/Signals.inc:567:22 #1 0x00005620c5f90dc0 PrintStackTraceSignalHandler(void*) /home/yeting/x86-riscv-llvm/llvm/lib/Support/Unix/Signals.inc:641:1 #2 0x00005620c5f8e4fe llvm::sys::RunSignalHandlers() /home/yeting/x86-riscv-llvm/llvm/lib/Support/Signals.cpp:104:20 #3 0x00005620c5f9033f SignalHandler(int) /home/yeting/x86-riscv-llvm/llvm/lib/Support/Unix/Signals.inc:412:1 #4 0x00007f3c933e0980 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12980) #5 0x00007f3c91d92e87 raise /build/glibc-uZu3wS/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0 #6 0x00007f3c91d947f1 abort /build/glibc-uZu3wS/glibc-2.27/stdlib/abort.c:81:0 #7 0x00007f3c91d843fa __assert_fail_base /build/glibc-uZu3wS/glibc-2.27/assert/assert.c:89:0 #8 0x00007f3c91d84472 (/lib/x86_64-linux-gnu/libc.so.6+0x30472) #9 0x00005620c5b9b1c6 llvm::ArrayRef<llvm::Type*>::operator[](unsigned long) const /home/yeting/x86-riscv-llvm/llvm/include/llvm/ADT/ArrayRef.h:256:14 #10 0x00005620c5cde47b DecodeFixedType(llvm::ArrayRef<llvm::Intrinsic::IITDescriptor>&, llvm::ArrayRef<llvm::Type*>, llvm::LLVMContext&) /home/yeting/x86-riscv-llvm/llvm/lib/IR/Function.cpp:1401:37 #11 0x00005620c5cdeae6 llvm::Intrinsic::getType(llvm::LLVMContext&, unsigned int, llvm::ArrayRef<llvm::Type*>) /home/yeting/x86-riscv-llvm/llvm/lib/IR/Function.cpp:1480:21 #12 0x00005620c5cf70c8 llvm::Intrinsic::getDeclaration(llvm::Module*, unsigned int, llvm::ArrayRef<llvm::Type*>) /home/yeting/x86-riscv-llvm/llvm/lib/IR/Function.cpp:1505:21 #13 0x00005620c5d49863 llvm::VPIntrinsic::getDeclarationForParams(llvm::Module*, unsigned int, llvm::Type*, llvm::ArrayRef<llvm::Value*>) /home/yeting/x86-riscv-llvm/llvm/lib/IR/IntrinsicInst.cpp:594:39 #14 0x00005620c56b2283 (anonymous namespace)::VPIntrinsicTest_VPIntrinsicDeclarationForParams_Test::TestBody() /home/yeting/x86-riscv-llvm/llvm/unittests/IR/VPIntrinsicTest.cpp:367:72
llvm/docs/LangRef.rst | ||
---|---|---|
19934 ↗ | (On Diff #505490) | Predicated version of raising a vector of floating-point values to an integer power. |
I want to second this point. I think doing the fancy expansion here is a bad idea at this time. We can come back to that, but an initial implementation should scalarize via a loop. The lowering works for all of the lane-wise math routines. Only once we have correct lowering for the majority of the routines should we bother optimizing any of them.
Even then, I'm not convinced that inlining this loop is profitable over generating a runtime call to a new routine.
llvm/lib/CodeGen/ExpandPowi.cpp | ||
---|---|---|
36 | This appears to correspond to the recently introduced IRBuilder::CreateElementCount. |
I want to mention that powi is weird and does not correspond to a real math routine. It's a fast math optimization for pow with an integer argument. The scalar version of powi is provided in libgcc/compiler-rt while pow itself is in libm. This almost makes it a compiler implementation detail. Should a vector math library provide this function?
One of the options which was mentioned in the recent compiler-rt thread on discourse was to have a weak definition defined in each object file so that the linker could pick one (including the runtime libs if available). I'd lean towards something like that.
GlobalsModRef Probably uneeded?