This is an archive of the discontinued LLVM Phabricator instance.

[VP] Add vp.powi and a pass for expanding vp.powi before DAG.
Needs ReviewPublic

Authored by fakepaper56 on Feb 8 2023, 6:09 AM.

Details

Summary

The patch uses different expanding way for vp.powi from the method of powi.
Vector powi is unrolled to multiple powi() libary calls in SelectionDAG, but the
method is not work for scalable vectors.
To support scalable vectors, the patch expands vp.powi at IR level. The
expanding way of vp.powi is based on compiler-rt/__powidf2.

Diff Detail

Unit TestsFailed

Event Timeline

fakepaper56 created this revision.Feb 8 2023, 6:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 8 2023, 6:09 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
fakepaper56 requested review of this revision.Feb 8 2023, 6:09 AM

Maybe I missed the rationale, but why not use the ExpandVectorPredicationPass for this?

Maybe I missed the rationale, but why not use the ExpandVectorPredicationPass for this?

LLVM could not expanding scalable vector type powi now. So this pass is not only for vp.powi, but also expanding scalable vector type powi in the future.

craig.topper added inline comments.Feb 9 2023, 7:32 PM
llvm/lib/CodeGen/ExpandPowi.cpp
35

expansion*

70

CreatePHI returns a PHINode*, can we use that to avoid casts?

124

support*

157

Why does this require AA?

Address Craig's comment and add missing test case.

fakepaper56 marked 3 inline comments as done.Feb 10 2023, 1:17 AM
fakepaper56 added inline comments.
llvm/lib/CodeGen/ExpandPowi.cpp
157

Sorry, they are my misuse.

craig.topper added inline comments.Feb 10 2023, 10:09 PM
llvm/lib/CodeGen/ExpandPowi.cpp
17

GlobalsModRef Probably uneeded?

craig.topper added inline comments.Feb 10 2023, 10:10 PM
llvm/lib/CodeGen/ExpandPowi.cpp
83

What's preventing using vp.icmp?

Cleanup headers.

fakepaper56 marked 2 inline comments as done.Feb 11 2023, 4:09 AM
fakepaper56 added inline comments.
llvm/lib/CodeGen/ExpandPowi.cpp
83

It is only that I don't know how to construct vp.icmp/vp.fcmp instructions.

fakepaper56 added inline comments.Feb 11 2023, 4:18 AM
llvm/lib/CodeGen/ExpandPowi.cpp
83

I don't understand how make predicate to a pointer of Value.

craig.topper added inline comments.Feb 11 2023, 11:58 AM
llvm/lib/CodeGen/ExpandPowi.cpp
83

Should be something like this code from IRBuilder with the assert removed.

Value *getConstrainedFPPredicate(CmpInst::Predicate Predicate) {               
  assert(CmpInst::isFPPredicate(Predicate) &&                                  
         Predicate != CmpInst::FCMP_FALSE &&                                   
         Predicate != CmpInst::FCMP_TRUE &&                                    
         "Invalid constrained FP comparison predicate!");                      
                                                                               
  StringRef PredicateStr = CmpInst::getPredicateName(Predicate);               
  auto *PredicateMDS = MDString::get(Context, PredicateStr);                   
                                                                               
  return MetadataAsValue::get(Context, PredicateMDS);                          
}

Use vp.icmp instead of icmp.

fakepaper56 marked an inline comment as done.Feb 11 2023, 11:50 PM
fakepaper56 added inline comments.
llvm/lib/CodeGen/ExpandPowi.cpp
83

Thank you for the recommendation.

fakepaper56 marked an inline comment as done.

Rebase and ping.

craig.topper added inline comments.Feb 21 2023, 12:13 AM
llvm/lib/CodeGen/ExpandPowi.cpp
115

old fixme?

133

Drop curly braces.

craig.topper added inline comments.Feb 21 2023, 12:14 AM
llvm/lib/CodeGen/ExpandPowi.cpp
59

why "forward"?

Address Craig's comment.

fakepaper56 marked 3 inline comments as done.Feb 21 2023, 12:28 AM
fakepaper56 added inline comments.
llvm/lib/CodeGen/ExpandPowi.cpp
59

Sorry, it didn't make sense. I changed it to powi-expansion-loop.

Maybe I missed the rationale, but why not use the ExpandVectorPredicationPass for this?

LLVM could not expanding scalable vector type powi now. So this pass is not only for vp.powi, but also expanding scalable vector type powi in the future.

Apologies, I was away on holiday.

Thanks - I missed that the plan was also to support llvm.powi. I guess I just find ExpandPowi and ExpandVectorPredicationPass to be doing two very similar things (in this patch) with regards to vp.powi: expanding it into an equivalent set of operations; that seems unfortunate.

I get that scalable-vector llvm.powi is different, but so would many other scalable-vector intrinsics if the target doesn't support that operation: llvm.sin, llvm.cos, etc. So would we have passes for each intrinsic? If not, ExpandPowi seems too restrictive in its scope.

If we're supporting intrinsics, what about plain scalable-vector add on a target without scalable vectors, like x86?

I'd basically like to know how this fits in with some longer-term strategy about what we want to support for illegal scalable-vector operations, rather than this specific powi use-case. If we start to open the door to specific intrinsics, I think it'd help to have a well-defined rationale and plan in mind.

fakepaper56 marked an inline comment as done.Feb 21 2023, 7:31 AM

I get that scalable-vector llvm.powi is different, but so would many other scalable-vector intrinsics if the target doesn't support that operation: llvm.sin, llvm.cos, etc. So would we have passes for each intrinsic? If not, ExpandPowi seems too restrictive in its scope.

If we're supporting intrinsics, what about plain scalable-vector add on a target without scalable vectors, like x86?

I agree with you that only expanding powi is too restrictive. I think at least we should expand all the math function in a pass. But I don't have no idea that whether we should expand scalable operations for target without scalable vectors?

Maybe I missed the rationale, but why not use the ExpandVectorPredicationPass for this?

LLVM could not expanding scalable vector type powi now. So this pass is not only for vp.powi, but also expanding scalable vector type powi in the future.

Apologies, I was away on holiday.

Thanks - I missed that the plan was also to support llvm.powi. I guess I just find ExpandPowi and ExpandVectorPredicationPass to be doing two very similar things (in this patch) with regards to vp.powi: expanding it into an equivalent set of operations; that seems unfortunate.

I get that scalable-vector llvm.powi is different, but so would many other scalable-vector intrinsics if the target doesn't support that operation: llvm.sin, llvm.cos, etc. So would we have passes for each intrinsic? If not, ExpandPowi seems too restrictive in its scope.

If we're supporting intrinsics, what about plain scalable-vector add on a target without scalable vectors, like x86?

I'd basically like to know how this fits in with some longer-term strategy about what we want to support for illegal scalable-vector operations, rather than this specific powi use-case. If we start to open the door to specific intrinsics, I think it'd help to have a well-defined rationale and plan in mind.

Note that this pass doesn't scalarize

I get that scalable-vector llvm.powi is different, but so would many other scalable-vector intrinsics if the target doesn't support that operation: llvm.sin, llvm.cos, etc. So would we have passes for each intrinsic? If not, ExpandPowi seems too restrictive in its scope.

If we're supporting intrinsics, what about plain scalable-vector add on a target without scalable vectors, like x86?

I agree with you that only expanding powi is too restrictive. I think at least we should expand all the math function in a pass. But I don't have no idea that whether we should expand scalable operations for target without scalable vectors?

How would we expand the other math functions? Many of them are large and probably difficult to keep in vector form. We could scalarize them with a loop and use scalar libcalls. But that makes it very different than what we're doing for powi here.

How do envision sharing this code for llvm.powi. A lot of this code creates VP intrinsics. Do you have an abstraction plan?

How do envision sharing this code for llvm.powi. A lot of this code creates VP intrinsics. Do you have an abstraction plan?

My plan is use same expanding function but use true mask for its mask and the elementcount for its evl.

Also expanding llvm.powi.

No test for RISC-V?

llvm/test/CodeGen/Generic/expand-powi.ll
3

This needs a REQUIRES: x86-registered-target or it needs to be moved into the X86 directory.

craig.topper added inline comments.Mar 7 2023, 9:54 PM
llvm/lib/CodeGen/ExpandPowi.cpp
128

I think we should do a vp_icmp followed by a mask vp_reduce_or.

All the existing tests for llvm.powi use a scalar exponent even when the result is a vector. Should vp.powi only accept scalar exponent?

fakepaper56 added a comment.EditedMar 14 2023, 7:16 PM

I think we should follow rule of llvm.powi first.

fakepaper56 updated this revision to Diff 505490.EditedMar 15 2023, 7:50 AM

This update does,

  1. Make vp.powi follows llvm.powi to only accept scalar exponent.
  2. Add tests for RISC-V.
  3. Update test cases.

But it still a test fail for ir unit test. I don't know how to debug it. I even
can not use gdb to trace it.
The below command about the test fails.

$ LLVM_SYMBOLIZER_PATH=./build/bin/llvm-symbolizer ./build/unittests/IR/./IRTests
...
[ RUN      ] VPIntrinsicTest.VPIntrinsicDeclarationForParams
IRTests: /home/yeting/x86-riscv-llvm/llvm/include/llvm/ADT/ArrayRef.h:255: const T& llvm::ArrayRef<T>::operator[](size_t) const [with T = llvm::Type*; size_t = long unsigned int]: Assertion `Index < Length && "Invalid index!"' failed.
 #0 0x00005620c5f909ee llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/yeting/x86-riscv-llvm/llvm/lib/Support/Unix/Signals.inc:567:22
 #1 0x00005620c5f90dc0 PrintStackTraceSignalHandler(void*) /home/yeting/x86-riscv-llvm/llvm/lib/Support/Unix/Signals.inc:641:1
 #2 0x00005620c5f8e4fe llvm::sys::RunSignalHandlers() /home/yeting/x86-riscv-llvm/llvm/lib/Support/Signals.cpp:104:20
 #3 0x00005620c5f9033f SignalHandler(int) /home/yeting/x86-riscv-llvm/llvm/lib/Support/Unix/Signals.inc:412:1
 #4 0x00007f3c933e0980 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12980)
 #5 0x00007f3c91d92e87 raise /build/glibc-uZu3wS/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0
 #6 0x00007f3c91d947f1 abort /build/glibc-uZu3wS/glibc-2.27/stdlib/abort.c:81:0
 #7 0x00007f3c91d843fa __assert_fail_base /build/glibc-uZu3wS/glibc-2.27/assert/assert.c:89:0
 #8 0x00007f3c91d84472 (/lib/x86_64-linux-gnu/libc.so.6+0x30472)
 #9 0x00005620c5b9b1c6 llvm::ArrayRef<llvm::Type*>::operator[](unsigned long) const /home/yeting/x86-riscv-llvm/llvm/include/llvm/ADT/ArrayRef.h:256:14
#10 0x00005620c5cde47b DecodeFixedType(llvm::ArrayRef<llvm::Intrinsic::IITDescriptor>&, llvm::ArrayRef<llvm::Type*>, llvm::LLVMContext&) /home/yeting/x86-riscv-llvm/llvm/lib/IR/Function.cpp:1401:37
#11 0x00005620c5cdeae6 llvm::Intrinsic::getType(llvm::LLVMContext&, unsigned int, llvm::ArrayRef<llvm::Type*>) /home/yeting/x86-riscv-llvm/llvm/lib/IR/Function.cpp:1480:21
#12 0x00005620c5cf70c8 llvm::Intrinsic::getDeclaration(llvm::Module*, unsigned int, llvm::ArrayRef<llvm::Type*>) /home/yeting/x86-riscv-llvm/llvm/lib/IR/Function.cpp:1505:21
#13 0x00005620c5d49863 llvm::VPIntrinsic::getDeclarationForParams(llvm::Module*, unsigned int, llvm::Type*, llvm::ArrayRef<llvm::Value*>) /home/yeting/x86-riscv-llvm/llvm/lib/IR/IntrinsicInst.cpp:594:39
#14 0x00005620c56b2283 (anonymous namespace)::VPIntrinsicTest_VPIntrinsicDeclarationForParams_Test::TestBody() /home/yeting/x86-riscv-llvm/llvm/unittests/IR/VPIntrinsicTest.cpp:367:72
craig.topper added inline comments.Mar 15 2023, 8:31 AM
llvm/docs/LangRef.rst
20078

Predicated version of raising a vector of floating-point values to an integer power.

Fixed crash by adding special case in llvm::VPIntrinsic::getDeclarationForParams

How would we expand the other math functions? Many of them are large and probably difficult to keep in vector form. We could scalarize them with a loop and use scalar libcalls.

I want to second this point. I think doing the fancy expansion here is a bad idea at this time. We can come back to that, but an initial implementation should scalarize via a loop. The lowering works for all of the lane-wise math routines. Only once we have correct lowering for the majority of the routines should we bother optimizing any of them.

Even then, I'm not convinced that inlining this loop is profitable over generating a runtime call to a new routine.

llvm/lib/CodeGen/ExpandPowi.cpp
36

This appears to correspond to the recently introduced IRBuilder::CreateElementCount.

How would we expand the other math functions? Many of them are large and probably difficult to keep in vector form. We could scalarize them with a loop and use scalar libcalls.

I want to second this point. I think doing the fancy expansion here is a bad idea at this time. We can come back to that, but an initial implementation should scalarize via a loop. The lowering works for all of the lane-wise math routines. Only once we have correct lowering for the majority of the routines should we bother optimizing any of them.

Even then, I'm not convinced that inlining this loop is profitable over generating a runtime call to a new routine.

I want to mention that powi is weird and does not correspond to a real math routine. It's a fast math optimization for pow with an integer argument. The scalar version of powi is provided in libgcc/compiler-rt while pow itself is in libm. This almost makes it a compiler implementation detail. Should a vector math library provide this function?

Even then, I'm not convinced that inlining this loop is profitable over generating a runtime call to a new routine.

I want to mention that powi is weird and does not correspond to a real math routine. It's a fast math optimization for pow with an integer argument. The scalar version of powi is provided in libgcc/compiler-rt while pow itself is in libm. This almost makes it a compiler implementation detail. Should a vector math library provide this function?

One of the options which was mentioned in the recent compiler-rt thread on discourse was to have a weak definition defined in each object file so that the linker could pick one (including the runtime libs if available). I'd lean towards something like that.

Use CreateElementCount and fix typos in LangRef.rst.

One of the options which was mentioned in the recent compiler-rt thread on discourse was to have a weak definition defined in each object file so that the linker could pick one (including the runtime libs if available). I'd lean towards something like that.

Could you provide the link of the discourse you mentioned?