User Details
- User Since
- Nov 11 2020, 3:34 AM (150 w, 6 d)
May 26 2022
Mar 24 2022
Feb 15 2022
Feb 13 2022
Feb 6 2022
Jan 11 2022
Jan 4 2022
LGTM! Thanks for the bug fixing
Dec 31 2021
Dec 29 2021
Sorry for the late reply. The SCEVExpander(Rewriter) is used to create the widen IV and its widenIVUse(). As for the IV increment, we just use the following code:
WideInc = cast<Instruction>(WidePhi->getIncomingValueForBlock(LatchBlock));
Of course in this way, we cannot preserve the nowrap flag cause both the WidePhi and the IncomingValueForBlock do not contain the flag.
I guess the reason why we don`t need to preserve this flag previously is that the AddRec is computed from the OrigPhi, which is like the following:
{((sext i32 %arg1 to i64) + (sext i32 %arg2 to i64)),+,(sext i32 %arg2 to i64)}<nsw><%body>
So, at this moment, it does not matter whether the increased instruction contains the flag because the SCEV is right.
However, during the optimization pipeline, we may call the SE->forgetLoop() to drop the cache value and recompute from scratch. At that moment, since we have lost the NSW flag, then the BackedgeTakenCount would be CouldNotCompute, which will prevent vectorizing.
With this patch, function s122 and function s172 in TSVC now can be vectorized. Following is the source code of s122:
real_t s122(struct args_t * func_args) { // induction variable recognition // variable lower and upper bound, and stride // reverse data access and jump in data access struct{int a;int b;} * x = func_args->arg_info; int n1 = x->a; int n3 = x->b; initialise_arrays(__func__); int j, k; #pragma clang loop vectorize(assume_safety) for (int nl = 0; nl < iterations; nl++) { j = 1; k = 0; for (int i = n1-1; i < LEN_1D; i += n3) { k += j; a[i] += b[LEN_1D - k]; } } }
fix the affected tests
Dec 24 2021
With the help of nsw flag in keep-nsw-nuw-flag.ll, this loop which cannot be vectorized due to CouldNotCompute BackedgeTakenCount now can be vectorized.
We can use opt -indvars keep-nsw-nuw-flag.ll -S | opt -loop-vectorize -S to verify this.
Also, if we just use -indvars -loop-vectorize back to back, this loop can be vectorized without this patch. And I found out that the reason is that nsw flag still remains when we try to vectorize this loop. That is some cached value is reused during scalar-evolution analysis for loop-vectorize pass.
During the pipeline, we may call forgetLoop in many places, thus the cached value may be cleared. So I guess it is better to keep the nsw or nuw flags explicitly after widen IV.
Dec 10 2021
Nov 4 2021
Nov 3 2021
Sorry for the absence of the description.
Actually, it is a runtime error. The simplify range check optimization will change (icmp slt x, 0) | (icmp sgt x, n) into icmp ugt x, n
Now, suppose the %0 is negative, Before optimizing, we will compare the argument with 0 to see whether the argument is less than 0, if so, we will finally return shl i32 %0, 0.
However, after optimizing, we will not have this comparison, and we will finally return shl i32 %0, %and. The problem is that %and is also driven from the argument, it can be undefined cause it is a shift operation that depends on the argument. So, after optimization, we will get an undefined result.
So, I guess that if both x and n in (icmp slt x, 0) | (icmp sgt x, n) are derived from the argument, then we should be conservative.
Nov 2 2021
Oct 15 2021
Oct 14 2021
update test file
Oct 12 2021
Oct 11 2021
pinging reviewers...
Oct 2 2021
Sep 30 2021
Sep 29 2021
Sep 28 2021
Tried to simplify the test case but failed. @david-arm
Sep 27 2021
Sep 11 2021
Sep 8 2021
Sep 7 2021
Pinging reviewers ...
Sep 6 2021
Sep 5 2021
Sep 1 2021
Aug 31 2021
Aug 27 2021
Aug 26 2021
Pinging reviewers...
Aug 25 2021
Aug 24 2021
Hi, @sammccall thanks for the patch.
The precommit checks suggest that some test cases failing, could you please fix them. Thanks
Aug 23 2021
Hi all!
Updata a new test case.
Function test2 will fail due to assertion of invalid offset of EXTRACT_SUBVECTOR.
Also modify the AMD test case
Pinging reviewers....
Aug 22 2021
Aug 19 2021
Hi @hokein , I encounter a bug when clang parses enum and I have been recorded in https://bugs.llvm.org/show_bug.cgi?id=51554.
The source code like the following:
enum E { e = E() }; int main() { return 0; }
Some error message are expected like the following:
test.cpp:1:14: error: invalid use of incomplete type 'E' enum E { e = E() }; ^~~ test.cpp:1:6: note: definition of 'E' is not complete until the closing '}' enum E { e = E() };
Also, I have made some analyses like the following:
In ParseDecl.cpp: 1. llvm-10: AssignedVal.get(): NULL
Aug 18 2021
Aug 17 2021
Hi all, I update a new patch with a much simple test case. Please review, Thanks a lot!
Aug 16 2021
Aug 11 2021
Aug 9 2021
Within the CodegenPrepare::tryToSinkFreeOperands, those Ops that use in the same BB as TargetBB will be skipped.
for (Use *U : reverse(OpsToSink)) { auto *UI = cast<Instruction>(U->get()); if (UI->getParent() == TargetBB || isa<PHINode>(UI)) continue; ToReplace.push_back(U); }
Thus for the Ops shuffle and insertelement of Mul generated by shouldSinkOperands, if the shuffle is already in the same BB of Mul, we will not sink the shuffle. However, the insertelement instruction will be sink right above the Mul instruction while behind the shuffle instruction. That is illegal cause def not dominate the use.
Jul 19 2021
Jun 28 2021
Sometimes the CurrList may have two identical instructions that will be coalesced later. Once the first one coalesced, it will be removed from its parent, this will make the second identical instruction become illegal at the same time which getParent will be a nullptr. In this case, we should avoid coalescing erased instruction.<br>
The test case will trigger an Assertion like the following:
llvm-project/llvm/include/llvm/CodeGen/MachineOperand.h:359: llvm::Register llvm::MachineOperand::getReg() const: Assertion `isReg() && "This is not a register operand!"' failed.
Also recorded in Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=50919
Jun 13 2021
May 25 2021
Actually, this transformation doesn't take too much time. However the IR it generated is very long. And this will have a great effect on the further pass. More specifically, the expanding SCEV which replaces the variable %add.i.2.i contains thousands of operands.
So if you use opt -indvars -S, you can see the program keeps printing the IR all the time.
Besides, I use clang -O3 test.cpp and find that the program gets a segment fault with a stack dump like the followings:
May 24 2021
Details are recorded in https://bugs.llvm.org/show_bug.cgi?id=50442.