This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Add & use VPValue operands for VPWidenRecipe (NFC).
ClosedPublic

Authored by fhahn on Mar 28 2020, 11:19 AM.

Details

Summary

This patch adds VPValue version of the instruction operands to
VPWidenRecipe and uses them during code-generation.

Similar to D76373 this reduces ingredient def-use usage by ILV as
a step towards full VPlan-based def-use relations.

Diff Detail

Event Timeline

fhahn created this revision.Mar 28 2020, 11:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2020, 11:19 AM
Ayal added a comment.Mar 30 2020, 3:50 PM

Thanks for following-up quickly. The part that extends VPWidenRecipe to have a User for holding its VPValue operands looks ok, but could the patch make sure that these operands are indeed used correctly by ILV::widenInstruction(), instead of moving the latter over to its execute()? The former may involve refactoring the case that deals with Calls, to record cost-based UseVectorInstrinsic/NeedToScalarize decisions when constructing the VPlan/recipe rather than during codegen; to record InvariantCond of Selects instead of accessing getSCEV(I.getOperand(0)) during codegen, possibly cleaning up to generate ScalarCond OR Cond but not both. In any case, accesses to I’s operands should all change to access User's operands instead, possibly with their underlying ingredient if needed.

This follows the changes D70865 made to ILV::vectorizeInterleaveGroup() and ILV::vectorizeMemoryInstruction(), focusing only on transforming the def/use relations between recipes, as a first (or in this case second ;-) separate step.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7410

Simpler to pass the User to GetVectorOps() and have it iterate over its Operands?

7573

Instead of accessing CI->getNumArgOperands(), get the number of User’s operands; or simply iterate over them?

fhahn marked an inline comment as done.Mar 31 2020, 5:10 AM

Thanks for following-up quickly. The part that extends VPWidenRecipe to have a User for holding its VPValue operands looks ok, but could the patch make sure that these operands are indeed used correctly by ILV::widenInstruction(), instead of moving the latter over to its execute()? The former may involve refactoring the case that deals with Calls, to record cost-based UseVectorInstrinsic/NeedToScalarize decisions when constructing the VPlan/recipe rather than during codegen; to record InvariantCond of Selects instead of accessing getSCEV(I.getOperand(0)) during codegen, possibly cleaning up to generate ScalarCond OR Cond but not both. In any case, accesses to I’s operands should all change to access User's operands instead, possibly with their underlying ingredient if needed.

The main reason for moving it into the recipe is so we can use getUnderlyingValue when lowering calls. A cleaner solution to deal with call would be to move them to their own recipe/instruction and record the required info as suggested. Is that along the line you suggest? I.e. something like the following: Move calls to their own recipe (NFC) and then move VPWidenRecipe and VPCallRecipe to VPUser in parallel?

Without changes to the call handling, I am not sure how we would be able to use the existing logic (unless we expose getUnderlyingValue to ILV.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7573

CallInsts have their arguments first, followed by the called function, followed by operand bundles. Currently they will all be added to User. We could check the function type to see how many operands the call passes, but that might not work for variadic calls.

vkmr added a subscriber: vkmr.Mar 31 2020, 6:07 AM
Ayal added a comment.Apr 1 2020, 1:32 AM

Thanks for following-up quickly. The part that extends VPWidenRecipe to have a User for holding its VPValue operands looks ok, but could the patch make sure that these operands are indeed used correctly by ILV::widenInstruction(), instead of moving the latter over to its execute()? The former may involve refactoring the case that deals with Calls, to record cost-based UseVectorInstrinsic/NeedToScalarize decisions when constructing the VPlan/recipe rather than during codegen; to record InvariantCond of Selects instead of accessing getSCEV(I.getOperand(0)) during codegen, possibly cleaning up to generate ScalarCond OR Cond but not both. In any case, accesses to I’s operands should all change to access User's operands instead, possibly with their underlying ingredient if needed.

The main reason for moving it into the recipe is so we can use getUnderlyingValue when lowering calls. A cleaner solution to deal with call would be to move them to their own recipe/instruction and record the required info as suggested. Is that along the line you suggest? I.e. something like the following: Move calls to their own recipe (NFC) and then move VPWidenRecipe and VPCallRecipe to VPUser in parallel?

Without changes to the call handling, I am not sure how we would be able to use the existing logic (unless we expose getUnderlyingValue to ILV.

Yes, let's go for this cleaner solution. The first phase of VPlan integration using recipes is focused on refactoring ILV logic so that decisions are made and recorded in VPlan rather than during codegen. The latter, taken care of by recipe::execute(), should be as straightforward as possible.

fhahn planned changes to this revision.Apr 1 2020, 2:50 AM
In D76992#1954253, @Ayal wrote:>

Yes, let's go for this cleaner solution. The first phase of VPlan integration using recipes is focused on refactoring ILV logic so that decisions are made and recorded in VPlan rather than during codegen. The latter, taken care of by recipe::execute(), should be as straightforward as possible.

Great. i'l update the patches accordingly in a bit.

fhahn updated this revision to Diff 255061.Apr 4 2020, 10:49 AM

The main reason for moving it into the recipe is so we can use getUnderlyingValue when lowering calls. A cleaner solution to deal with call would be to move them to their own recipe/instruction and record the required info as suggested. Is that along the line you suggest? I.e. something like the following: Move calls to their own recipe (NFC) and then move VPWidenRecipe and VPCallRecipe to VPUser in parallel?

Without changes to the call handling, I am not sure how we would be able to use the existing logic (unless we expose getUnderlyingValue to ILV.

Yes, let's go for this cleaner solution. The first phase of VPlan integration using recipes is focused on refactoring ILV logic so that decisions are made and recorded in VPlan rather than during codegen. The latter, taken care of by recipe::execute(), should be as straightforward as possible.

I've updated the code to just change how operands are accessed during ILV::widenInstruction. Instead of accessing the operands via the instruction, GetUnderlyingValue/GetScalarValue/GetVectorValue are provided, which use the VPUser & VPTransformState. Call handling was moved to a separate recipe D77467.

fhahn updated this revision to Diff 255352.Apr 6 2020, 8:13 AM

Rebased after landing VPWidenCallRecipe in 90be3c24a7162a488f68f7cce159017c10408133.

fhahn updated this revision to Diff 255691.Apr 7 2020, 8:48 AM

Pass VPUser and VPTransformState to widenInstruction, instead of accessor callbacks. A callback is still needed to access the underlying value for checking whether the condition is loop-invariant. But I do not think there's a way around that without additional refactoring.

fhahn edited the summary of this revision. (Show Details)Apr 7 2020, 8:50 AM
fhahn updated this revision to Diff 256227.Apr 9 2020, 2:56 AM

Updated to use iterator_range instead of ArrayRef, as in D77655.

gilr added inline comments.Apr 9 2020, 1:28 PM
llvm/lib/Transforms/Vectorize/VPlanValue.h
46

Make ILV friends with VPValue instead? Would facilitate removing that last lambda.

fhahn marked an inline comment as done.Apr 9 2020, 2:11 PM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/VPlanValue.h
46

I think granting access to ILV would mean that the scope where the underlying value can be accessed would be extended a lot and might encourage more uses (which I think is undesired). I think the lambda allows us to grant more targeted access and there is only a single use in widenInstruction, which will hopefully removed in the future. Does that make sense? I would be also happy to make ILV a friend, if you think that's preferable over the lambda in the long run.

Ayal added inline comments.Apr 9 2020, 4:27 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4287

Better for a VPlan recipe to record such information/decision what Select should eventually be generated, rather than for code-gen to check the SCEV of an underlying value of an operand.

Perhaps worth taking the Select case out of ILV::widenInstruction() and into a dedicated ILV::widenSelectInstruction() which will be given InvariantCond as a parameter, and have VPWidenRecipe (or a dedicated VPWidenSelectRecipe) record InvariantCond and feed it?

gilr added inline comments.Apr 10 2020, 1:50 AM
llvm/lib/Transforms/Vectorize/VPlanValue.h
46

Letting ILV access the underlying value seems to follow the intent of the design principle preceding getUnderlyingValue()'s declaration of limiting its use to the code-gen phase.
Anyway, I agree with Ayal that a better solution would be to take that decision during planning rather than during code-gen.

fhahn updated this revision to Diff 256561.Apr 10 2020, 6:22 AM

Rebased on top of D77869 to get rid of the last callback.

fhahn marked 2 inline comments as done.Apr 10 2020, 6:24 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4287

Better for a VPlan recipe to record such information/decision what Select should eventually be generated, rather than for code-gen to check the SCEV of an underlying value of an operand.

Perhaps worth taking the Select case out of ILV::widenInstruction() and into a dedicated ILV::widenSelectInstruction() which will be given InvariantCond as a parameter, and have VPWidenRecipe (or a dedicated VPWidenSelectRecipe) record InvariantCond and feed it?

Agreed. I originally planned on doing that as a follow-up, but it's probably easier to do up-front: D77869

llvm/lib/Transforms/Vectorize/VPlanValue.h
46

Anyway, I agree with Ayal that a better solution would be to take that decision during planning rather than during code-gen.

+1
I originally planned on doing that as a follow-up, but it's probably easier to do up-front: D77869

fhahn updated this revision to Diff 257668.Apr 15 2020, 4:16 AM

Rebased after recent refactoring patches and VPWidenSelectRecipe landed.

fhahn added a comment.Apr 22 2020, 2:50 AM

ping :) All suggested refactors should have landed now

gilr added inline comments.Apr 22 2020, 3:29 AM
llvm/lib/Transforms/Vectorize/VPlanValue.h
46

Redundant now.

fhahn updated this revision to Diff 259230.Apr 22 2020, 4:10 AM
fhahn marked an inline comment as done.

Remove unnecessary friend class,

gilr added inline comments.Apr 22 2020, 4:11 AM
llvm/lib/Transforms/Vectorize/VPlan.h
769

Missing member comment.

gilr added inline comments.Apr 22 2020, 4:29 AM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
76

Could the code creating the iterator range be made reusable by becoming a VPlan method taking a User?

fhahn updated this revision to Diff 259268.Apr 22 2020, 6:37 AM

Add VPlan::mapToVPValues helper

gilr accepted this revision.Apr 22 2020, 7:56 AM

LGTM (with that missing comment nit), thanks!

This revision is now accepted and ready to land.Apr 22 2020, 7:56 AM
fhahn added a comment.Apr 23 2020, 3:58 AM

LGTM (with that missing comment nit), thanks!

Thanks Gil! I missed the comment in the previous update (sorry!), but I'll add it in the commit.

This revision was automatically updated to reflect the committed changes.
mehdi_amini added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4318

FYI XLA generated IR triggers a crash here.

I have the IR right before the pass (I can share it, this is the public XLA test-suite) but I can't reproduce with opt -loop-vectorize (I suspect some TTI/TLI settings are affecting this).

I don't really understand this code so I couldn't debug it right now but in gdb:

(gdb) call User.getUnderlyingValue() 
$13 = (llvm::Value *) 0x0
(gdb) call User.getOperand(0)->getUnderlyingValue()->dump()
  %reduce-window.10.clone.invar_address.dim.3.016 = phi i64 [ %bc.resume.val, %scalar.ph ], [ %invar.inc3, %reduce-window.10.clone.inner.loop_exit.window.0 ]
(gdb) call User.getOperand(1)->getUnderlyingValue()->dump()
i64 13
(gdb) p Part
$14 = 0

I'll likely revert while we're trying to extract a minimum reproducer. Let me know if you have any advice for this?

mehdi_amini added inline comments.Apr 23 2020, 10:07 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4318

While I have a gdb session opened:

#0  llvm::Value::getType (this=0x0) at llvm/include/llvm/IR/Value.h:244
#1  0x0000555596796922 in llvm::InnerLoopVectorizer::getOrCreateVectorValue (this=0x7fffee60fae0, V=0x0, Part=0) at llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2020
#2  0x00005555967b1a87 in llvm::LoopVectorizationPlanner::VPCallbackILV::getOrCreateVectorValues (this=0x7fffee60f8a8, V=0x0, Part=0)
    at llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7376
#3  0x00005555967bd8fd in llvm::VPTransformState::get (this=0x7fffee60f770, Def=0x37e73a4572d0, Part=0) at llvm/lib/Transforms/Vectorize/VPlan.h:270
#4  0x00005555967a211e in llvm::InnerLoopVectorizer::widenInstruction (this=0x7fffee60fae0, I=..., User=..., State=...) at llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4292
#5  0x00005555967b1d65 in llvm::VPWidenRecipe::execute (this=0x37e73a450b60, State=...) at llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7412
#6  0x00005555968488a0 in llvm::VPBasicBlock::execute (this=0x37e73aa4c360, State=0x7fffee60f770) at llvm/lib/Transforms/Vectorize/VPlan.cpp:240
#7  0x0000555596849c71 in llvm::VPlan::execute (this=0x37e73cbcc260, State=0x7fffee60f770) at llvm/lib/Transforms/Vectorize/VPlan.cpp:476
#8  0x00005555967ae456 in llvm::LoopVectorizationPlanner::executePlan (this=0x7fffee610080, ILV=..., DT=0x37e73b2db0c0) at llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:6612
#9  0x00005555967b4181 in llvm::LoopVectorizePass::processLoop (this=0x37e73c142860, L=0x37e73cc9d2a0) at llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7919
bkramer added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4318

https://reviews.llvm.org/P8215 crashes when run through opt -O3. Sadly bugpoint doesn't seem to like it so it's a huge test case.

fhahn marked an inline comment as done.Apr 24 2020, 3:40 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4318

Thanks, I'll take a look.

@fhahn, we also had a crash downstream on our fuzzer run, if you still need a reproducer I can try reducing our failure.

fhahn added a comment.Apr 26 2020, 4:39 AM

@fhahn, we also had a crash downstream on our fuzzer run, if you still need a reproducer I can try reducing our failure.

Thanks, I've already have a very reduced reproducer, which I'll add when I recommit this patch. Ayal posted a fix for the actual issue (D78847)

But it would be great if you could share the unreduced reproducer, to verify it is the same issue.

Ayal added inline comments.Apr 26 2020, 10:47 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4318

This patch exposed a bug in how backedge-taken-count is recorded; uploaded a fix in D78847.
Would be good to confirm if the two patches combined resolve the original issue(s).

llvm/lib/Transforms/Vectorize/VPlan.h
270

Would be good to add an assert of VPValue2Value.count(Def), instead of creating a entry with null Value leading to a crash, as reported.

277

ditto.

it would be great if you could share the unreduced reproducer, to verify it is the same issue.

This is our original reproducer:

Fails on commit 9245c7ac13480ed48ae339ad0e68cbe680cd0642 like this:

]  bin/opt -loop-vectorize before-vectorize.ll 2>&1 | c++filt | head -n 12
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: bin/opt -loop-vectorize before-vectorize.ll 
1.      Running pass 'Function Pass Manager' on module 'before-vectorize.ll'.
2.      Running pass 'Loop Vectorization' on function '@"virtual void Test.vMeth1(jint)527365"'
/ws/llvm-upstream/builds/release-assert/bin/../lib/libLLVM-11git.so(llvm::sys::PrintStackTrace(llvm::raw_ostream&)+0x1a)[0x7ff9600e0ada]
/ws/llvm-upstream/builds/release-assert/bin/../lib/libLLVM-11git.so(llvm::sys::RunSignalHandlers()+0x34)[0x7ff9600deb44]
/ws/llvm-upstream/builds/release-assert/bin/../lib/libLLVM-11git.so(+0x72ec83)[0x7ff9600dec83]
/lib64/libpthread.so.0(+0xf5d0)[0x7ff95f7a35d0]
/ws/llvm-upstream/builds/release-assert/bin/../lib/libLLVM-11git.so(llvm::InnerLoopVectorizer::getBroadcastInstrs(llvm::Value*)+0x10)[0x7ff9611cddb0]
/ws/llvm-upstream/builds/release-assert/bin/../lib/libLLVM-11git.so(llvm::InnerLoopVectorizer::getOrCreateVectorValue(llvm::Value*, unsigned int)+0xce)[0x7ff9611db3ee]
/ws/llvm-upstream/builds/release-assert/bin/../lib/libLLVM-11git.so(+0x1832361)[0x7ff9611e2361]
fhahn added a comment.Apr 29 2020, 3:47 AM

Recommitted with a reduced version of the reproducer. Both the reproducers don't crash after a3c964a278b4.