User Details
- User Since
- Jul 12 2021, 6:52 PM (115 w, 1 d)
Jul 12 2023
Fix clang-format issue.
Jul 11 2023
Jul 10 2023
Try to fix builtbot fail on debian x64.
@kiranchandramohan BTW, please notice that ValueToValueMap in Loop Access Analysis is replaced by DenseMap<Value *, const SCEV *> in D147750 if you commandeer this patch and rebase it.
Jul 9 2023
Thanks @nikic for the comments. Addressed them:
Update test cases.
Jul 8 2023
Jul 7 2023
Sorry for the continued ping. I hope to finish this patch by the end of next week due to work changes.
@kiranchandramohan Hi Kiran, could you please help commandeer this patch? I cannot continue working on it due to work changes.
@kiranchandramohan Hi Kiran, could you please help commandeer this patch? I cannot continue working on it due to work changes.
@kiranchandramohan Hi Kiran, could you please help commandeer this patch? I cannot continue working on it due to work changes.
Jul 6 2023
Thanks @goldstein.w.n and @nikic for the comments. Addressed them.
Jul 5 2023
@peixin your reproducer should be fixed by 7f5b15ad150e
Jul 4 2023
gentle ping
Jul 3 2023
Jun 28 2023
Jun 27 2023
Jun 26 2023
There may be one more case which this patch does not capture? Check the following input:
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128" target triple = "aarch64-unknown-linux-gnu"
Jun 18 2023
Jun 14 2023
Jun 12 2023
Thanks @vdonaldson. That sounds fine to me. Multiple OpenMP terminators are allowed. This will fix the two tests I had previously put up.
@peixin Could you share a testcase that shows the MLIR DCE issue
Jun 11 2023
May 29 2023
May 26 2023
May 24 2023
May 18 2023
ping for review @nikic
May 8 2023
Upload the right patch.
May 7 2023
It seems I uploaded the wrong patch.
Address the comments from @nikic .
May 3 2023
@awarzynski Hi Andrzej, sorry for the late reply. I am distracted by several internal projects and other things in my life recently (just came back from vacation). My boss has not decided to let me continue working on LLVM Flang this year, yet.
Apr 26 2023
Apr 25 2023
Apr 23 2023
gentle ping
Apr 17 2023
Addressed the comments.
Apr 16 2023
Address the comments.
- Move to instsimplify.
- Add support for vector type.
Apr 15 2023
- Address the comments.
- Reverse to the previous implementation since it is more robust. The use of gep instruction may not be load or store instruction.
Address the comments.
Apr 11 2023
ping for review
Apr 10 2023
Apr 9 2023
Rebase and refine the implementation so that it is more readable.
Simplify the implementation.
Apr 7 2023
Found one more problem, the symbolic strides in vector.ph are not replaced, either, although it's negligible.
D147378 made changes for 12 lines for strided-accesses.ll.
Thanks for the update. The whole design looks good to me now. Thanks for the work.
Why do you remove the StrideSet? It can be used to check the const stride in cost model very fast. Now, you check if it is constant one in PSE. The question is if all the const one in PSE are the strides?
Apr 5 2023
Apr 4 2023
Strange. One of my inlined comment is lost.
So, I will have a closer look at this, but IF the call was made with a contiguous array, it would benefit a fair amount, since it vectorizes the loop - I will try to do a compare "with and without" benchmark thing.
Apr 2 2023
Fix failed tests with ARM and RISCV backends. Replace the strides with const values in vector region, which is expected with this patch.
Apr 1 2023
Mar 30 2023
Thanks @peixin. We were looking at some kernels from lapack. Some of the complex type kernels give an order-of-magnitude slowdown compared to classic-flang. We see plenty of memchecks generated by the llvm vectorizer for an inner loop. But these were not present for the real type kernels. We are not yet sure whether these are caused by a lack of alias info or something specific to the Complex vectorisation in llvm.
Mar 27 2023
@SBallantyne also ran into some cases where additional tbaa information is useful. It is not clear yet whether this involves tbaa for dummy arguments as well. If we can discuss this in discourse that would be great since we can also participate or stay informed.
Mar 25 2023
Hi @vzakhari , are you still working on box/type descriptor such as type_desc_4/5/6? Or do you plan to work on that later? I am studying some optimizations which may benefit from the complete tbaa.
Thanks @peixin for the comment. Mats is away and is only back in April.
Mar 24 2023
The loop versioning is inefficient for the following scenario, in which not all arrays are contiguous in a loop. Maybe there should be one cost model to perform the loop versioning, and be refined driven by the cases in real workloads?
subroutine vadd(c, a, b, n) implicit none real :: c(:), a(:), b(:) integer :: i, n
Mar 23 2023
LGTM except one small problem
Mar 20 2023
Mar 16 2023
I failed to apply this patch locally.
Mar 15 2023
FYI, another related scenario is https://github.com/llvm/llvm-project/issues/59388.
Yes, I've only spent SOME time trying to understand why vectorizer doesn't - it basically comes down to "can't figure out how the stride is calculated, and whether it may change over time". The loop versioning here is helping to solve that for the lower layers in the compiler - and in my experience (I've written quite a few different style loops and such), either some MLIR pass(es) manages to vectorize the codr, or LLVM loop vectorizer can't do it either. [Why is there two layers of vectorizers? I don't know - presumably people like to write code that does similar things on multiple levels].
Mar 14 2023
I have not noticed any measurable increase in compile time - Spec 2017 wrf_r takes within 1-2 seconds, and that's the one that takes about 15 minutes to build in total. I'm not saying it's impossible to come up with something that compiles slowly with this code, but I've made as good an attempt as I can to "exit early" if there's nothing that needs doing, and only duplicate the innermost loop [which potentially is not the most optimal case].
I am thinking if we should do the loop versioning in MLIR since this may increase compilation time in optimization passes.
Mar 9 2023
Thanks @kiranchandramohan for the provided test cases. It looks like this method is not good one.
Thanks @kiranchandramohan for the notice. Rebase to fix patch application fail.