User Details
- User Since
- Jun 18 2014, 2:14 AM (457 w, 4 d)
Jan 13 2023
Sep 29 2022
Thanks for fixing it!
LGTM
Jun 8 2022
Thanks for this patch @dmgreen!
As you mentioned, the ToBeRemoved was to keep the iterator safe. I think the make_early_inc_range is better and it should be ok.
LGTM.
May 23 2022
I have checked the commit from https://reviews.llvm.org/D125918 has fixed the stage 2 build error with this patch.
Thanks for fixing it! @paulwalker-arm and @peterwaller-arm
Let me push this patch again.
Apr 29 2022
I am also getting performance degradation from this patch. The backedge count with the max is considered as high cost and it blocks to rewrite exit value in IndVarSimplify pass. It affects LSR's decision to rewrite IV and it causes additional instructions in loop...
As far as I understand, the expression of backedge count describes (max(End,Start)-Start)/Stride normally. If the expression satisfies with (max(RHS,Start) > Start - Stride, the expression is refined simply.
In order to prove (max(RHS,Start) > Start - Stride, SCEV is using below code.
// Can we prove (max(RHS,Start) > Start - Stride? if (isLoopEntryGuardedByCond(L, Cond, OrigStartMinusStride, OrigStart) && isLoopEntryGuardedByCond(L, Cond, OrigStartMinusStride, OrigRHS)) {
Let's see the reduced example of @nikic on https://github.com/llvm/llvm-project/issues/54191 with above code.
; RUN: opt -S -passes='print<scalar-evolution>' < %s define void @test(i64 %n) mustprogress { entry: %guard = icmp sgt i64 %n, 1 br i1 %guard, label %loop, label %exit
Apr 19 2022
It looks the SVE target's data layout is missing 512-bit vector's alignment. The data layout does not mention the alignment of 512-bit vector so the alignment is same with its size. It is bigger than stack's alignment which is 128-bits and it causes stack re-alignment...
Apr 13 2022
um... I can reproduce the build error now... Maybe, there was something wrong with my build...
Thanks for help @omjavaid
um... I have tried to reproduce it on a64fx machine. It looks it is failed with the big number of parallel tasks of ninja build. When I run the failed command manually, it is passed... If I reduce the number of the parallel tasks with ninja build like ninja -j16, the build is also passed... it could be memory allocation error from the bigger memory allocation than before...
The stage2 build and check-all are passed on Cortex-A72 machine...
To be safe, for now, I would like to disable shouldMaximizeVectorBandwidth for SVE...
Let me update this patch with it.
Apr 5 2022
Following comments from @fhahn, updated patch.
Apr 4 2022
Following the comment from @dmgreen, updated code.
Thank for patch. I am +1.
I hope other aarch64 folks are also happy with this cost.
Mar 22 2022
Thanks for comment! @dmgreen
Do you have any performance results for changing this? I've not had much luck with trying it in the past, and it obviously can change quite a lot. It can certainly help in places, but I've found that if you turn it up too high you just end up over-unrolling loops, not getting into the fast loop body as much. It can obviously depend on the input code and loop counts tough. Perhaps it needs some better costmodelling?
I was able to see the overall performance number slightly up for an internal benchmark on neoverse-n1 but we would need to tune something like cost model according to micro architectures.
I was still hoping to get D118979 in because it should help quite a bit - and it on it's own increases the number of items processed per vector element, and this will increase it further. We have cleaned up quite a few of the places it doesn't do as well, there are just a few that have been stuck in review a while. Perhaps it makes sense to try and push that through, then re-evaluate this on top?
I agree with you. Let's visit this patch later.
Mar 21 2022
Ping
Mar 16 2022
Feb 10 2022
Following the comment of @paulwalker-arm, a option is added.
Feb 9 2022
@paulwalker-arm As you can see, there are regression tests which are failed to generate scalable vector type with the shouldMaximizeVectorBandwidth.
Previously, I saw these regression tests were failed because of the cost so I returned false with SVE in shouldMaximizeVectorBandwidth...
I am not sure it is acceptable that LV selects VF 16 instead of scalable vector type on SVE...
If it is not acceptable, we could need to tune the cost on SVE and I would like to suggest to use shouldMaximizeVectorBandwidth for only neon until finishing the cost tune...
Feb 7 2022
Feb 4 2022
Dec 4 2021
Dec 3 2021
Nov 11 2021
Nov 5 2021
I have create a new review https://reviews.llvm.org/D113263.
@LemonBoy If possible, can you check the last update of this patch solves the error from your side please?
[AArch64] Fix a bug from a pattern for uaddv(uaddlp(x)) ==> uaddlv
@LemonBoy I am sorry for bug... It was my mistake... As @efriedma mentioned, it should be v8i18 rather than v4i16. Let me update it.
Nov 1 2021
Oct 25 2021
Oct 20 2021
Following comment of @dmgreen, updated patch.
Following comment of @dmgreen, updated patch.
Oct 19 2021
I have checked bootstrap build and run check-all with the build. It looks OK.
Following comment of @dmgreen, updated patch.
Oct 18 2021
Following the comment of @efriedma, updated patch
Sorry, my patch could cause the problem. I have pushed a bug patch https://reviews.llvm.org/D109963.
Please wait a couple of days to verify the bug patch and then rebase your patch.
Oct 16 2021
The check-all with bootstrap build is ok on AArch64 machine. The reduced case is also passed.
Oct 15 2021
Fixed a bug.
Oct 14 2021
Sorry for Ping.
In order to get more comments, I open this review again.
I would like to mention one thing.
Oct 11 2021
Rebased
Oct 8 2021
Oct 7 2021
I am sorry for late. It took a time to find the problem and check the stage2 build again.
Fixed a bug
Oct 6 2021
Sorry, let me try to reproduce the errors.
Oct 5 2021
It would be good to wait for review from others.
Oct 4 2021
Following comments of @dmgreen, updated patch.
- Checked WZR
- Added MIR tests
Following comments of @mkazantsev, updated patch
- Split the patch into two
- Find PHI with exiting condition from pre-loop correctly