- User Since
- Nov 11 2020, 3:34 AM (30 w, 5 d)
Tue, May 25
Actually, this transformation doesn't take too much time. However the IR it generated is very long. And this will have a great effect on the further pass. More specifically, the expanding SCEV which replaces the variable %add.i.2.i contains thousands of operands.
So if you use opt -indvars -S, you can see the program keeps printing the IR all the time.
Besides, I use clang -O3 test.cpp and find that the program gets a segment fault with a stack dump like the followings:
Mon, May 24
Details are recorded in https://bugs.llvm.org/show_bug.cgi?id=50442.
May 7 2021
May 5 2021
Apr 28 2021
@nikic Thanks a lot so your suggesstion. A new patch updated, please review.
Apr 27 2021
Fix bugs in Lazy Value Info analysis rather in Jump Threading pass
Apr 26 2021
The bug is that we will be trapped into an infinite loop when we doing jump threading pass, and finally exhaust the stack.
The Jump Threading call function getValueFromCondition() of LazyValueInfo, and this function will recursively call itself in a way shown in the fowllowing:
BinaryOperator *BO = dyn_cast<BinaryOperator>(Cond); Value *BL = BO->getOperand(0); Value *BR = BO->getOperand(1);
Apr 25 2021
Right now this patch set the argument KeepOneInputPHIs of function DeleteDeadBlocks to be true, that is we only allow replacing the phi node with its incoming value if this incoming value is a constant
The bug is reported in https://bugs.llvm.org/show_bug.cgi?id=50119
This bug is generated in the following logic:
Firstly, we can see that this function has a loop that contains the two red basic blocks. And this loop's preheader is for.body.i.us.1.preheader. After we ProcessBlock, the preheader becomes dead, then we decide to DeleteDeadBlock.
And within the functions DeleteDeadBlock, we visit the dead block's successors and remove the successors from its predecessor. That is we just detach the dead block and the unreachable blocks still remain within the function.
What's more, for those successors, we also try to simplify their phi nodes. In this case, the loop that has two red blocks, its loop header for.body.i.us.1 has a PHI node, which has two incoming values, one from the preheader(predecessor),
the other one from the loop exit. Now that we have removed the header from its predecessor, we can remove the incoming value from its predecessor and replace the phi node use with another incoming value.
After the DeleteDeadBlock, the loop might become this:
for.body.i.us.1: ; preds = %lor.end.i.us.1 br i1 %spec.select44.i.us.1, label %lor.end.i.us.1, label %lor.rhs.i.us.1
Jan 19 2021
A new patch has been updated, could you please review it. Thanks a lot.
Jan 7 2021
Jan 6 2021
Jan 4 2021
Dec 29 2020
Dec 25 2020
Dec 24 2020
Dec 23 2020
bugzilla : https://bugs.llvm.org/show_bug.cgi?id=48583
Thanks for reviewing.
And I still get confused in some places. One is that the DemandedBits.cpp seems compute the demandedbits per-instruction, that is it "successfully" terminates walks at non-Instructions, too.
I guess the way computing demanded-bits per-instruction is not compatible with the way truncateToMinimalBitwidths() truncating the operands for every instruction in MinBWs.
So if we still want keep truncateToMinimalBitwidths(), then we may need modify function computeMinimumValueSizes() to take Arguments into consideration, Or if we still want to compute demanded bits in per-instruction way, then is it a possible solution that we be more conservative in function truncateToMinimalBitwidths(), that is we add some patterns, if match then avoid truncating?
Dec 21 2020
To avoid wrongly truncating operands, I add a whitelist for computing demanded bits of Instruction::Trunc's operands, that is we only reduce the demanded bits for those instructions that are always truncation friendly, such as ADD or SUB.
So as the avoid-truncate-shift-operands.ll, before loop vectorize, we do a shift operation and then truncate its result into 8 bits. That is
%0 = lshr i32 %f, 18 %conv7 = trunc i32 %0 to i8
And the demanded bits says that we only need 8 bits of %0, so function truncateToMinimalBitwidths() firstly truncate lshr's operands into 8 bits first, and then shift, which will cause a wrong result.
%20 = trunc <2 x i32> %broadcast.splat7 to <2 x i8> %21 = lshr <2 x i8> %20, <i8 18, i8 18> %22 = zext <2 x i8> %21 to <2 x i32> %23 = trunc <2 x i32> %22 to <2 x i8>
I guess it is a profitability heuristic.
Firstly, let's take avoid-truncate-icmp-operands.ll as an example to explain this patch.<br>
Before running loop vectorize pass, we can see that in the basic block if.then,
%cmp.i = icmp ult i64 %e, %d %.sroa.speculated = select i1 %cmp.i, i64 %d, i64 %e %conv32 = trunc i64 %.sroa.speculated to i16
we compare value d and value e, and select one of them based on the comparison. And finally truncate the result into 16-bits. <br>
However after loop vectorize pass, we wrongly truncate instruction icmp's operands firstly,
%3 = trunc <2 x i64> %broadcast.splat to <2 x i16> %4 = trunc <2 x i64> %broadcast.splat4 to <2 x i16> %5 = icmp ult <2 x i16> %3, %4 %6 = trunc <2 x i64> %broadcast.splat4 to <2 x i16> %7 = trunc <2 x i64> %broadcast.splat to <2 x i16> %8 = select <2 x i1> %5, <2 x i16> %6, <2 x i16> %7 %9 = zext <2 x i16> %8 to <2 x i64> %10 = trunc <2 x i64> %9 to <2 x i16>
That is we only compare the lower 16 bits of value d and value e, which will cause the wrong answer because lower bits cannot represent a comparison of the whole bits.<br>
And the mechanism that how llvm truncate and zext an instruction's operands is introduced in this patch https://reviews.llvm.org/rL250032. Within this patch, it will firstly compute an instruction's MinBWs by function computeMinimumValueSizes() and then`truncateToMinimalBitwidths()`. And MinBWs comes from llvm::computeMinimumValueSizes() which uses DemandedBits analysis as one of the info source. <br>
I guess the root reason is that, during the demanded-bits computation, because the final result will be truncate into 16-bits, that is %conv32 = trunc i64 %.sroa.speculated to i16, and the llvm blindly says that the operands of instruction Trunc have the same live bits as the trunc instruction itself. So the demanded-bits of raw instruction %.sroa.speculated = select i1 %cmp.i, i64 %d, i64 %e is 16-bits too. <br>
And during truncateToMinimalBitwidths, llvm will truncate operands first.
Dec 5 2020
Nov 26 2020
Nov 25 2020
Nov 24 2020
Nov 20 2020
Nov 19 2020
Nov 18 2020
Nov 17 2020
Nov 16 2020
Yse, So I use forgetTopmostLoop() after optimizeLoopExits directly, I think it's more concise.
use forgetTopmostLoop() directly instead of visiting exitingBB among nested loops