- User Since
- Nov 11 2020, 3:34 AM (26 w, 2 d)
Fri, May 7
Wed, May 5
Wed, Apr 28
@nikic Thanks a lot so your suggesstion. A new patch updated, please review.
Tue, Apr 27
Fix bugs in Lazy Value Info analysis rather in Jump Threading pass
Mon, Apr 26
The bug is that we will be trapped into an infinite loop when we doing jump threading pass, and finally exhaust the stack.
The Jump Threading call function getValueFromCondition() of LazyValueInfo, and this function will recursively call itself in a way shown in the fowllowing:
BinaryOperator *BO = dyn_cast<BinaryOperator>(Cond); Value *BL = BO->getOperand(0); Value *BR = BO->getOperand(1);
Sun, Apr 25
Right now this patch set the argument KeepOneInputPHIs of function DeleteDeadBlocks to be true, that is we only allow replacing the phi node with its incoming value if this incoming value is a constant
Jan 19 2021
A new patch has been updated, could you please review it. Thanks a lot.
Jan 7 2021
Jan 6 2021
Jan 4 2021
Dec 29 2020
Dec 25 2020
Dec 24 2020
Dec 23 2020
bugzilla : https://bugs.llvm.org/show_bug.cgi?id=48583
Thanks for reviewing.
And I still get confused in some places. One is that the DemandedBits.cpp seems compute the demandedbits per-instruction, that is it "successfully" terminates walks at non-Instructions, too.
I guess the way computing demanded-bits per-instruction is not compatible with the way truncateToMinimalBitwidths() truncating the operands for every instruction in MinBWs.
So if we still want keep truncateToMinimalBitwidths(), then we may need modify function computeMinimumValueSizes() to take Arguments into consideration, Or if we still want to compute demanded bits in per-instruction way, then is it a possible solution that we be more conservative in function truncateToMinimalBitwidths(), that is we add some patterns, if match then avoid truncating?
Dec 21 2020
To avoid wrongly truncating operands, I add a whitelist for computing demanded bits of Instruction::Trunc's operands, that is we only reduce the demanded bits for those instructions that are always truncation friendly, such as ADD or SUB.
So as the avoid-truncate-shift-operands.ll, before loop vectorize, we do a shift operation and then truncate its result into 8 bits. That is
%0 = lshr i32 %f, 18 %conv7 = trunc i32 %0 to i8
And the demanded bits says that we only need 8 bits of %0, so function truncateToMinimalBitwidths() firstly truncate lshr's operands into 8 bits first, and then shift, which will cause a wrong result.
%20 = trunc <2 x i32> %broadcast.splat7 to <2 x i8> %21 = lshr <2 x i8> %20, <i8 18, i8 18> %22 = zext <2 x i8> %21 to <2 x i32> %23 = trunc <2 x i32> %22 to <2 x i8>
I guess it is a profitability heuristic.
Firstly, let's take avoid-truncate-icmp-operands.ll as an example to explain this patch.<br>
Before running loop vectorize pass, we can see that in the basic block if.then,
%cmp.i = icmp ult i64 %e, %d %.sroa.speculated = select i1 %cmp.i, i64 %d, i64 %e %conv32 = trunc i64 %.sroa.speculated to i16
we compare value d and value e, and select one of them based on the comparison. And finally truncate the result into 16-bits. <br>
However after loop vectorize pass, we wrongly truncate instruction icmp's operands firstly,
%3 = trunc <2 x i64> %broadcast.splat to <2 x i16> %4 = trunc <2 x i64> %broadcast.splat4 to <2 x i16> %5 = icmp ult <2 x i16> %3, %4 %6 = trunc <2 x i64> %broadcast.splat4 to <2 x i16> %7 = trunc <2 x i64> %broadcast.splat to <2 x i16> %8 = select <2 x i1> %5, <2 x i16> %6, <2 x i16> %7 %9 = zext <2 x i16> %8 to <2 x i64> %10 = trunc <2 x i64> %9 to <2 x i16>
That is we only compare the lower 16 bits of value d and value e, which will cause the wrong answer because lower bits cannot represent a comparison of the whole bits.<br>
And the mechanism that how llvm truncate and zext an instruction's operands is introduced in this patch https://reviews.llvm.org/rL250032. Within this patch, it will firstly compute an instruction's MinBWs by function computeMinimumValueSizes() and then`truncateToMinimalBitwidths()`. And MinBWs comes from llvm::computeMinimumValueSizes() which uses DemandedBits analysis as one of the info source. <br>
I guess the root reason is that, during the demanded-bits computation, because the final result will be truncate into 16-bits, that is %conv32 = trunc i64 %.sroa.speculated to i16, and the llvm blindly says that the operands of instruction Trunc have the same live bits as the trunc instruction itself. So the demanded-bits of raw instruction %.sroa.speculated = select i1 %cmp.i, i64 %d, i64 %e is 16-bits too. <br>
And during truncateToMinimalBitwidths, llvm will truncate operands first.
Dec 5 2020
Nov 26 2020
Nov 25 2020
Nov 24 2020
Nov 20 2020
Nov 19 2020
Nov 18 2020
Nov 17 2020
Nov 16 2020
Yse, So I use forgetTopmostLoop() after optimizeLoopExits directly, I think it's more concise.
use forgetTopmostLoop() directly instead of visiting exitingBB among nested loops