Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Phabricator shutdown timeline

peixin (Peixin Qiao)
User

Projects

User does not belong to any projects.

User Details

User Since
Jul 12 2021, 6:52 PM (115 w, 1 d)

Recent Activity

Jul 12 2023

peixin committed rGab73bd3897b5: [InstCombine] Enhance select icmp and folding (authored by peixin).
[InstCombine] Enhance select icmp and folding
Jul 12 2023, 7:41 AM · Restricted Project, Restricted Project
peixin closed D148420: [InstCombine] Enhance select icmp and folding.
Jul 12 2023, 7:40 AM · Restricted Project, Restricted Project
peixin updated the diff for D148420: [InstCombine] Enhance select icmp and folding.

Fix clang-format issue.

Jul 12 2023, 3:47 AM · Restricted Project, Restricted Project
peixin committed rG31dda3913f9d: [InstCombine] Precommit a test (authored by peixin).
[InstCombine] Precommit a test
Jul 12 2023, 3:46 AM · Restricted Project, Restricted Project
peixin closed D150069: [InstCombine] Precommit a test.
Jul 12 2023, 3:46 AM · Restricted Project, Restricted Project

Jul 11 2023

peixin updated the summary of D148420: [InstCombine] Enhance select icmp and folding.
Jul 11 2023, 7:06 PM · Restricted Project, Restricted Project
peixin added inline comments to D150069: [InstCombine] Precommit a test.
Jul 11 2023, 7:24 AM · Restricted Project, Restricted Project
peixin added a comment to D148420: [InstCombine] Enhance select icmp and folding.

Fix the implementation by generating shl instruction instead of unset flags. Unsetting nuw/nsw flags was wrong when shl instruction has other use.

Generating a new shl isn't wrong, but I think your previous approach was better. The reason is that the shl with and without nowrap flags are going to be GVN/CSEd anyway, and the nowrap flags will be dropped in the process. So it makes more sense to me to directly go to the final form. Losing some nowrap flags is just the cost of this transform.

Jul 11 2023, 6:35 AM · Restricted Project, Restricted Project
peixin updated the diff for D148420: [InstCombine] Enhance select icmp and folding.
Jul 11 2023, 6:34 AM · Restricted Project, Restricted Project

Jul 10 2023

peixin updated the diff for D148420: [InstCombine] Enhance select icmp and folding.

Try to fix builtbot fail on debian x64.

Jul 10 2023, 3:59 AM · Restricted Project, Restricted Project
peixin added a comment to D147539: [LV] Enable stride versioning to support Fortran IR.

@kiranchandramohan BTW, please notice that ValueToValueMap in Loop Access Analysis is replaced by DenseMap<Value *, const SCEV *> in D147750 if you commandeer this patch and rebase it.

Jul 10 2023, 2:09 AM · Restricted Project, Restricted Project
peixin added inline comments to D147539: [LV] Enable stride versioning to support Fortran IR.
Jul 10 2023, 1:59 AM · Restricted Project, Restricted Project

Jul 9 2023

peixin abandoned D147951: [VPlan][OuterLoop] Relax the canonical loop to stride-one loop.
Jul 9 2023, 6:00 PM · Restricted Project, Restricted Project
peixin updated the diff for D148420: [InstCombine] Enhance select icmp and folding.

Thanks @nikic for the comments. Addressed them:

Jul 9 2023, 2:20 AM · Restricted Project, Restricted Project
peixin updated the diff for D150069: [InstCombine] Precommit a test.

Update test cases.

Jul 9 2023, 2:16 AM · Restricted Project, Restricted Project

Jul 8 2023

peixin added a comment to D127215: [flang][OpenMP] Support common block in OpenMP private clause.

@kiranchandramohan @peixin If it is okay, I can rebase and merge this patch

Jul 8 2023, 6:03 AM · Restricted Project, Restricted Project

Jul 7 2023

peixin added a comment to D148420: [InstCombine] Enhance select icmp and folding.

Sorry for the continued ping. I hope to finish this patch by the end of next week due to work changes.

Jul 7 2023, 9:29 AM · Restricted Project, Restricted Project
peixin added a comment to D127215: [flang][OpenMP] Support common block in OpenMP private clause.

@kiranchandramohan Hi Kiran, could you please help commandeer this patch? I cannot continue working on it due to work changes.

Jul 7 2023, 8:51 AM · Restricted Project, Restricted Project
peixin added a comment to D145684: [flang] Fix host associated vars in OpenMP/OpenACC region.

@kiranchandramohan Hi Kiran, could you please help commandeer this patch? I cannot continue working on it due to work changes.

Jul 7 2023, 8:51 AM · Restricted Project, Restricted Project
peixin added a comment to D147539: [LV] Enable stride versioning to support Fortran IR.

@kiranchandramohan Hi Kiran, could you please help commandeer this patch? I cannot continue working on it due to work changes.

Jul 7 2023, 8:50 AM · Restricted Project, Restricted Project

Jul 6 2023

peixin abandoned D151394: [LSR] Treat URem as uninteresting.
Jul 6 2023, 6:17 PM · Restricted Project, Restricted Project
peixin abandoned D131391: [flang] Initial support of lowering derived type passed by value.
Jul 6 2023, 6:16 PM · Restricted Project, Restricted Project
peixin abandoned D127954: [NFC][flang][OpenMP] Refactor the privatization positions.
Jul 6 2023, 6:16 PM · Restricted Project, Restricted Project
peixin abandoned D125891: [flang] Initial support for storage conflict check.
Jul 6 2023, 6:16 PM · Restricted Project, Restricted Project
peixin updated subscribers of D148420: [InstCombine] Enhance select icmp and folding.

Thanks @goldstein.w.n and @nikic for the comments. Addressed them.

Jul 6 2023, 5:38 AM · Restricted Project, Restricted Project
peixin updated the diff for D148420: [InstCombine] Enhance select icmp and folding.
Jul 6 2023, 5:35 AM · Restricted Project, Restricted Project

Jul 5 2023

peixin added a comment to D153004: [LSR] Consider post-inc form when creating extends/truncates..

@peixin your reproducer should be fixed by 7f5b15ad150e

Jul 5 2023, 6:08 PM · Restricted Project, Restricted Project

Jul 4 2023

peixin added a comment to D148420: [InstCombine] Enhance select icmp and folding.

gentle ping

Jul 4 2023, 1:52 AM · Restricted Project, Restricted Project

Jul 3 2023

peixin added a comment to D153004: [LSR] Consider post-inc form when creating extends/truncates..

@peixin thanks for the additional test case! I precommitted a slightly reduced version of your test and should have a fix ready shortly

Jul 3 2023, 11:19 PM · Restricted Project, Restricted Project

Jun 28 2023

peixin added a comment to D153004: [LSR] Consider post-inc form when creating extends/truncates..

There may be one more case which this patch does not capture? Check the following input:

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-unknown-linux-gnu"

@c = internal global i32 0, align 4
@g = dso_local local_unnamed_addr global ptr @c, align 8
@h = dso_local global i8 0, align 4
@j = dso_local local_unnamed_addr global ptr @h, align 8
@b = internal unnamed_addr global i32 0, align 4
@e = dso_local local_unnamed_addr global i16 0, align 4
@f = dso_local local_unnamed_addr global i64 0, align 8
@l = dso_local local_unnamed_addr global i64 0, align 8
@i = dso_local local_unnamed_addr global i64 0, align 8
@.str = private unnamed_addr constant [5 x i8] c"%lX\0A\00", align 1
@a = dso_local local_unnamed_addr global i8 0, align 1

; Function Attrs: mustprogress nofree norecurse nosync nounwind readnone willreturn uwtable
define dso_local void @n(ptr nocapture noundef %0) {
  ret void
}

; Function Attrs: nofree nounwind uwtable
define dso_local i32 @main() {
  %1 = load ptr, ptr @g, align 8
  %2 = load i64, ptr @f, align 8
  br label %3

3:                                                ; preds = %0, %18
  %4 = phi i32 [ 0, %0 ], [ %19, %18 ]
  %5 = phi i64 [ %2, %0 ], [ %15, %18 ]
  br label %6

6:                                                ; preds = %3, %6
  %7 = phi i32 [ 1, %3 ], [ %10, %6 ]
  %8 = phi i64 [ %5, %3 ], [ %9, %6 ]
  %9 = add nsw i64 %8, 1
  %10 = add nsw i32 %7, -1
  %11 = icmp sgt i32 %7, 0
  br i1 %11, label %6, label %12

12:                                               ; preds = %6, %12
  %13 = phi i32 [ %16, %12 ], [ 1, %6 ]
  %14 = phi i64 [ %15, %12 ], [ %9, %6 ]
  %15 = add nsw i64 %14, 1
  %16 = add nsw i32 %13, -1
  %17 = icmp sgt i32 %13, 0
  br i1 %17, label %12, label %18

18:                                               ; preds = %12
  %19 = add nuw nsw i32 %4, 1
  %20 = icmp eq i32 %19, 8
  br i1 %20, label %21, label %3

21:                                               ; preds = %18
  store i32 %16, ptr @b, align 4
  store i32 0, ptr %1, align 4
  %22 = zext i32 %13 to i64
  store i16 -1, ptr @e, align 4
  store i64 %15, ptr @f, align 8
  store i64 %14, ptr @l, align 8
  store i64 %22, ptr @i, align 8
  %23 = urem i32 %16, 53
  %24 = trunc i32 %23 to i8
  %25 = load ptr, ptr @j, align 8
  store i8 %24, ptr %25, align 1
  %26 = load i8, ptr @h, align 4
  %27 = xor i8 %26, 5
  %28 = zext i8 %27 to i64
  %29 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull @.str, i64 noundef %28)
  ret i32 0
}

declare noundef i32 @printf(ptr nocapture noundef readonly, ...)
$ opt -loop-reduce reduced.ll -S -o out.ll && clang out.ll -o a.out && ./a.out
B
$ clang reduced.ll -o a.out && ./a.out
2C
Jun 28 2023, 6:22 PM · Restricted Project, Restricted Project

Jun 27 2023

peixin added reviewers for D150527: [GlobalISel] Fix the error transformation of BRCOND to BCC.: dmgreen, dzhidzhoev.
Jun 27 2023, 5:46 AM · Restricted Project, Restricted Project

Jun 26 2023

peixin added a comment to D153004: [LSR] Consider post-inc form when creating extends/truncates..

There may be one more case which this patch does not capture? Check the following input:

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-unknown-linux-gnu"
Jun 26 2023, 8:46 PM · Restricted Project, Restricted Project

Jun 18 2023

peixin abandoned D147378: [LV] Replace symbolic strides with constants in LV.
Jun 18 2023, 6:42 PM · Restricted Project, Restricted Project

Jun 14 2023

peixin accepted D129969: [flang][OpenMP][OpenACC] Support stop statement in OpenMP/OpenACC region.

Adding Peixin and Val as Co-authors and commandeering this patch. Authorship will revert to Peixin on submission.

Jun 14 2023, 6:10 PM · Restricted Project, Restricted Project

Jun 12 2023

peixin added a comment to D129969: [flang][OpenMP][OpenACC] Support stop statement in OpenMP/OpenACC region.

Thanks @vdonaldson. That sounds fine to me. Multiple OpenMP terminators are allowed. This will fix the two tests I had previously put up.

Jun 12 2023, 7:19 PM · Restricted Project, Restricted Project
peixin added a comment to D129969: [flang][OpenMP][OpenACC] Support stop statement in OpenMP/OpenACC region.

@peixin Could you share a testcase that shows the MLIR DCE issue

Jun 12 2023, 2:17 AM · Restricted Project, Restricted Project

Jun 11 2023

peixin added a comment to D129969: [flang][OpenMP][OpenACC] Support stop statement in OpenMP/OpenACC region.

Adding @vdonaldson also to check whether this is the right way to do this and also for the interaction with the block construct.

If the problem is that an existing required omp.terminator or omp.yield op is in danger of being deleted, would it suffice to check for that in genStopStatement code? If blockIsUnterminated from Bridge.cpp were available for use in Runtime.cpp that could be done like this:

@@ -106,6 +106,7 @@ void Fortran::lower::genStopStatement(
   }

   builder.create<fir::CallOp>(loc, callee, operands);
+  if (blockIsUnterminated())
     genUnreachable(builder, loc);
 }

If that is the right idea, then blockIsUnterminated or an alternative way of getting that information must be available in Runtime.cpp.

@peixin Can you implement what Val is suggesting here? This will not fix the two examples that I have provided but will fix a lot of usages of STOP. We can handle the other cases separately.

Jun 11 2023, 6:14 PM · Restricted Project, Restricted Project

May 29 2023

peixin added a comment to D151394: [LSR] Treat URem as uninteresting.

I think what I still don't really understand is why LSR thinks it is safe to replace the IV here when it isn't. It seems like marking the urem as interesting here may be non-profitable, but also shouldn't be incorrect. I think there has to be some bug on the LSR side to enable this replacement.

Note that urem is not the only non-add instruction that can end up producing an add SCEV node, so I'm not sure why we need to treat urem in particular specially.

May 29 2023, 7:56 PM · Restricted Project, Restricted Project

May 26 2023

peixin added a comment to D151394: [LSR] Treat URem as uninteresting.

Can you please explain in more detail what causes the miscompile in the first place?

May 26 2023, 7:45 AM · Restricted Project, Restricted Project

May 24 2023

peixin requested review of D151394: [LSR] Treat URem as uninteresting.
May 24 2023, 6:40 PM · Restricted Project, Restricted Project

May 18 2023

peixin added inline comments to D148420: [InstCombine] Enhance select icmp and folding.
May 18 2023, 11:39 PM · Restricted Project, Restricted Project
peixin added a comment to D148420: [InstCombine] Enhance select icmp and folding.

ping for review @nikic

May 18 2023, 2:22 AM · Restricted Project, Restricted Project

May 8 2023

peixin updated the diff for D148420: [InstCombine] Enhance select icmp and folding.

Upload the right patch.

May 8 2023, 6:18 AM · Restricted Project, Restricted Project

May 7 2023

peixin planned changes to D148420: [InstCombine] Enhance select icmp and folding.

It seems I uploaded the wrong patch.

May 7 2023, 6:03 PM · Restricted Project, Restricted Project
peixin updated the diff for D148420: [InstCombine] Enhance select icmp and folding.

Address the comments from @nikic .

May 7 2023, 8:05 AM · Restricted Project, Restricted Project
peixin requested review of D150069: [InstCombine] Precommit a test.
May 7 2023, 8:02 AM · Restricted Project, Restricted Project

May 3 2023

peixin added a comment to D148420: [InstCombine] Enhance select icmp and folding.

Just took a closer look at this. This is one of those transforms where we need to be very careful about derefinement, because we're replacing a constant 0 with an instruction that (under the given condition) may refine to zero but is not necessarily equivalent to zero.

In particular, if the shl has nowrap flags, this transform is incorrect: https://alive2.llvm.org/ce/z/EjBRVW

This means we need to drop nowrap flags as part of the transform. I'm afraid this means that my original recommendation to move this to InstSimplify was wrong and this should be in InstCombine after all, as we can only drop nowrap flags there.

May 3 2023, 6:03 PM · Restricted Project, Restricted Project
peixin added a comment to D134821: [flang][driver] Allow main program to be in an archive.

@awarzynski Hi Andrzej, sorry for the late reply. I am distracted by several internal projects and other things in my life recently (just came back from vacation). My boss has not decided to let me continue working on LLVM Flang this year, yet.

May 3 2023, 5:56 PM · Restricted Project, Restricted Project, Restricted Project

Apr 26 2023

peixin planned changes to D148420: [InstCombine] Enhance select icmp and folding.

It looks like this should be part of simplifySelectBitTest(), which covers other "select of icmp and" patterns.

It seems not, the CondVal in simplifySelectWithICmpCond is ICmp of value and zero. For cases in simplifySelectBitTest, the CondVal is ICmp of value and value.

Not sure I follow. Isn't it matching the (X & C) == 0 pattern here? https://github.com/llvm/llvm-project/blob/f6ec56ac5be26ad862e22d0eebb2645d1ad78218/llvm/lib/Analysis/InstructionSimplify.cpp#L4494-L4500

Apr 26 2023, 8:18 AM · Restricted Project, Restricted Project

Apr 25 2023

peixin added a comment to D148420: [InstCombine] Enhance select icmp and folding.

It looks like this should be part of simplifySelectBitTest(), which covers other "select of icmp and" patterns.

Apr 25 2023, 7:57 AM · Restricted Project, Restricted Project

Apr 23 2023

peixin added a comment to D148420: [InstCombine] Enhance select icmp and folding.

gentle ping

Apr 23 2023, 7:00 PM · Restricted Project, Restricted Project

Apr 17 2023

peixin updated the diff for D148420: [InstCombine] Enhance select icmp and folding.

Addressed the comments.

Apr 17 2023, 6:58 AM · Restricted Project, Restricted Project
peixin committed rGb7e000d8c926: [InstSimplify] Precommit a test (authored by peixin).
[InstSimplify] Precommit a test
Apr 17 2023, 6:56 AM · Restricted Project, Restricted Project

Apr 16 2023

peixin updated the diff for D148420: [InstCombine] Enhance select icmp and folding.

Address the comments.

  1. Move to instsimplify.
  2. Add support for vector type.
Apr 16 2023, 7:04 AM · Restricted Project, Restricted Project

Apr 15 2023

peixin updated the diff for D147539: [LV] Enable stride versioning to support Fortran IR.
  1. Address the comments.
  2. Reverse to the previous implementation since it is more robust. The use of gep instruction may not be load or store instruction.
Apr 15 2023, 8:44 AM · Restricted Project, Restricted Project
peixin added a comment to D148420: [InstCombine] Enhance select icmp and folding.

Address the comments.

Apr 15 2023, 7:55 AM · Restricted Project, Restricted Project
peixin updated the diff for D148420: [InstCombine] Enhance select icmp and folding.
Apr 15 2023, 7:54 AM · Restricted Project, Restricted Project
peixin added reviewers for D148420: [InstCombine] Enhance select icmp and folding: RKSimon, spatel, nikic.
Apr 15 2023, 2:18 AM · Restricted Project, Restricted Project
peixin requested review of D148420: [InstCombine] Enhance select icmp and folding.
Apr 15 2023, 2:17 AM · Restricted Project, Restricted Project

Apr 11 2023

peixin added a comment to D147539: [LV] Enable stride versioning to support Fortran IR.

ping for review

Apr 11 2023, 8:56 AM · Restricted Project, Restricted Project

Apr 10 2023

peixin retitled D147951: [VPlan][OuterLoop] Relax the canonical loop to stride-one loop from [VPlan][OuterLoop] Release the canonical loop to stride-one loop to [VPlan][OuterLoop] Relax the canonical loop to stride-one loop.
Apr 10 2023, 9:54 AM · Restricted Project, Restricted Project
peixin requested review of D147951: [VPlan][OuterLoop] Relax the canonical loop to stride-one loop.
Apr 10 2023, 9:54 AM · Restricted Project, Restricted Project

Apr 9 2023

peixin added inline comments to D147783: [VPlan] Add stride->constant VPlan mapping at construction..
Apr 9 2023, 6:21 PM · Restricted Project, Restricted Project
peixin added a comment to D147539: [LV] Enable stride versioning to support Fortran IR.

Rebase and refine the implementation so that it is more readable.

Apr 9 2023, 6:10 AM · Restricted Project, Restricted Project
peixin updated the diff for D147539: [LV] Enable stride versioning to support Fortran IR.
Apr 9 2023, 6:09 AM · Restricted Project, Restricted Project
peixin updated the diff for D147378: [LV] Replace symbolic strides with constants in LV.

Simplify the implementation.

Apr 9 2023, 4:19 AM · Restricted Project, Restricted Project
peixin added inline comments to D147783: [VPlan] Add stride->constant VPlan mapping at construction..
Apr 9 2023, 2:11 AM · Restricted Project, Restricted Project

Apr 7 2023

peixin added a comment to D147783: [VPlan] Add stride->constant VPlan mapping at construction..

Found one more problem, the symbolic strides in vector.ph are not replaced, either, although it's negligible.

Apr 7 2023, 6:11 AM · Restricted Project, Restricted Project
peixin added a comment to D147783: [VPlan] Add stride->constant VPlan mapping at construction..

D147378 made changes for 12 lines for strided-accesses.ll.

Apr 7 2023, 6:05 AM · Restricted Project, Restricted Project
peixin added a comment to D141306: Add loop-versioning pass to improve unit-stride.

Thanks for the update. The whole design looks good to me now. Thanks for the work.

Apr 7 2023, 2:20 AM · Restricted Project, Restricted Project
peixin added a comment to D141820: [flang] Generate TBAA information..

Hi @vzakhari , are you still working on box/type descriptor such as type_desc_4/5/6? Or do you plan to work on that later? I am studying some optimizations which may benefit from the complete tbaa.

Hi @peixin, I am not working on this. Can you please share the cases where you think it might be profitable to distinguish boxes with different number of dimensions? Is it from real applications? We can discuss it in email - it seems more convenient.

Yes, it's from one real application, and the hot loop is from one specific input. The loop is like the following where a is assumed shape array, and b is explicit-shape array. To perform the loop interchange optimization, one problem is that it is hard to understand the loop structure from the SCEV of a(i, k) access. I am thinking if the tbaa lower bound, upper bound and stride would help.

DO j=Jstart, Jend
  DO i=Istart, Iend
    DO k=1, N
      a(i,k)=0.5_8*(b(i-1,j,k) + b(i  ,j,k))
    END DO
  END DO
END DO

Hi @peixin, sorry, I do not understand how more specific tbaa for the members of the descriptor can help here. This loop has a single "data" store for a(i,k). The loads of the descriptor members are also inside the loop, but the current tbaa should allow disambiguating the data store with the member loads. So LLVM should be able to hoist the member loads from the loop nest with the current tbaa, since they are invariants.

I suppose the load from N might be a problem for the loop interchange, since the current tbaa does not help disambiguating load from N with the store to a(i,k) inside the outer loop. But having more precise tbaa for the descriptor members won't help here as well.

Apr 7 2023, 2:08 AM · Restricted Project, Restricted Project
peixin added a comment to D147750: [LAA/LV] Simplify stride speculation logic [nfc].

Why do you remove the StrideSet? It can be used to check the const stride in cost model very fast. Now, you check if it is constant one in PSE. The question is if all the const one in PSE are the strides?

Apr 7 2023, 1:54 AM · Restricted Project, Restricted Project

Apr 5 2023

peixin added a comment to D147539: [LV] Enable stride versioning to support Fortran IR.

JFYI, I find the description on this patch confusing. I think I've managed to understand it, but let me confirm.

Given an induction variable with a loop invariant (but not constant) stride, LAA currently speculates the stride is 1. In the case where the access type and the index type differ, this results in a meaningless speculation. Instead of unconditionally speculating 1, we can speculate that the stride is the constant required to stride sizeof(element-type) on each iteration.

Is that a correct understanding?

Apr 5 2023, 9:05 AM · Restricted Project, Restricted Project
peixin updated the diff for D147539: [LV] Enable stride versioning to support Fortran IR.
Apr 5 2023, 8:56 AM · Restricted Project, Restricted Project

Apr 4 2023

peixin added inline comments to D141306: Add loop-versioning pass to improve unit-stride.
Apr 4 2023, 9:16 AM · Restricted Project, Restricted Project
peixin added a comment to D141306: Add loop-versioning pass to improve unit-stride.

Strange. One of my inlined comment is lost.

Apr 4 2023, 9:06 AM · Restricted Project, Restricted Project
peixin added a comment to D141306: Add loop-versioning pass to improve unit-stride.

So, I will have a closer look at this, but IF the call was made with a contiguous array, it would benefit a fair amount, since it vectorizes the loop - I will try to do a compare "with and without" benchmark thing.

Apr 4 2023, 9:00 AM · Restricted Project, Restricted Project
peixin added reviewers for D147378: [LV] Replace symbolic strides with constants in LV: paulwalker-arm, kiranchandramohan, Leporacanthicus.
Apr 4 2023, 8:34 AM · Restricted Project, Restricted Project
peixin requested review of D147539: [LV] Enable stride versioning to support Fortran IR.
Apr 4 2023, 8:33 AM · Restricted Project, Restricted Project

Apr 2 2023

peixin updated the diff for D147378: [LV] Replace symbolic strides with constants in LV.
Apr 2 2023, 4:33 AM · Restricted Project, Restricted Project
peixin updated the diff for D147378: [LV] Replace symbolic strides with constants in LV.

Fix failed tests with ARM and RISCV backends. Replace the strides with const values in vector region, which is expected with this patch.

Apr 2 2023, 1:47 AM · Restricted Project, Restricted Project

Apr 1 2023

peixin requested review of D147378: [LV] Replace symbolic strides with constants in LV.
Apr 1 2023, 7:25 AM · Restricted Project, Restricted Project

Mar 30 2023

peixin added a comment to D141820: [flang] Generate TBAA information..

Thanks @peixin. We were looking at some kernels from lapack. Some of the complex type kernels give an order-of-magnitude slowdown compared to classic-flang. We see plenty of memchecks generated by the llvm vectorizer for an inner loop. But these were not present for the real type kernels. We are not yet sure whether these are caused by a lack of alias info or something specific to the Complex vectorisation in llvm.

Mar 30 2023, 4:28 AM · Restricted Project, Restricted Project

Mar 27 2023

peixin added a comment to D141820: [flang] Generate TBAA information..

@SBallantyne also ran into some cases where additional tbaa information is useful. It is not clear yet whether this involves tbaa for dummy arguments as well. If we can discuss this in discourse that would be great since we can also participate or stay informed.

Mar 27 2023, 7:10 PM · Restricted Project, Restricted Project
peixin added a comment to D141820: [flang] Generate TBAA information..

Hi @vzakhari , are you still working on box/type descriptor such as type_desc_4/5/6? Or do you plan to work on that later? I am studying some optimizations which may benefit from the complete tbaa.

Hi @peixin, I am not working on this. Can you please share the cases where you think it might be profitable to distinguish boxes with different number of dimensions? Is it from real applications? We can discuss it in email - it seems more convenient.

Mar 27 2023, 6:04 PM · Restricted Project, Restricted Project
peixin added a comment to D145684: [flang] Fix host associated vars in OpenMP/OpenACC region.

Thanks @peixin for the patch.

I was wondering where is the right place to fix this issue. I see that this works correctly when a block construct is present with handling in Lowering. A modified version of one of your tests would be,

module tpmod
  real :: val = 2.0
end module

subroutine tptest
  use tpmod
  call tps()
contains
  subroutine tps()
    !$omp parallel
    block
      val = 1.0
    end block
    !$omp end parallel
  end subroutine
end subroutine

Would calling instantiateVar for the scope variables as is done for block (https://github.com/llvm/llvm-project/blob/2d68a42f084a460007b368eab191cf0ff1b976d7/flang/lib/Lower/Bridge.cpp#L2282) help fix the issue? Or did you try this and face any issues?

Mar 27 2023, 5:52 PM · Restricted Project, Restricted Project

Mar 25 2023

peixin added a comment to D141820: [flang] Generate TBAA information..

Hi @vzakhari , are you still working on box/type descriptor such as type_desc_4/5/6? Or do you plan to work on that later? I am studying some optimizations which may benefit from the complete tbaa.

Mar 25 2023, 1:38 AM · Restricted Project, Restricted Project
peixin added a comment to D141306: Add loop-versioning pass to improve unit-stride.

Thanks @peixin for the comment. Mats is away and is only back in April.

Mar 25 2023, 1:36 AM · Restricted Project, Restricted Project

Mar 24 2023

peixin added a comment to D141306: Add loop-versioning pass to improve unit-stride.

The loop versioning is inefficient for the following scenario, in which not all arrays are contiguous in a loop. Maybe there should be one cost model to perform the loop versioning, and be refined driven by the cases in real workloads?

subroutine vadd(c, a, b, n)
  implicit none
  real :: c(:), a(:), b(:)
  integer :: i, n
Mar 24 2023, 1:20 AM · Restricted Project, Restricted Project

Mar 23 2023

peixin accepted D146768: [Flang][OpenMP] Add TODO message for common block privatisation.

LGTM except one small problem

Mar 23 2023, 6:07 PM · Restricted Project, Restricted Project

Mar 20 2023

peixin added inline comments to D141306: Add loop-versioning pass to improve unit-stride.
Mar 20 2023, 1:44 AM · Restricted Project, Restricted Project

Mar 16 2023

peixin added a comment to D141306: Add loop-versioning pass to improve unit-stride.

I failed to apply this patch locally.

Mar 16 2023, 7:50 PM · Restricted Project, Restricted Project

Mar 15 2023

peixin added a comment to D141306: Add loop-versioning pass to improve unit-stride.

FYI, another related scenario is https://github.com/llvm/llvm-project/issues/59388.

Mar 15 2023, 6:28 PM · Restricted Project, Restricted Project
peixin added reviewers for D141306: Add loop-versioning pass to improve unit-stride: jeanPerier, clementval, klausler.
Mar 15 2023, 6:19 PM · Restricted Project, Restricted Project
peixin added a comment to D141306: Add loop-versioning pass to improve unit-stride.

Yes, I've only spent SOME time trying to understand why vectorizer doesn't - it basically comes down to "can't figure out how the stride is calculated, and whether it may change over time". The loop versioning here is helping to solve that for the lower layers in the compiler - and in my experience (I've written quite a few different style loops and such), either some MLIR pass(es) manages to vectorize the codr, or LLVM loop vectorizer can't do it either. [Why is there two layers of vectorizers? I don't know - presumably people like to write code that does similar things on multiple levels].

Mar 15 2023, 6:17 PM · Restricted Project, Restricted Project

Mar 14 2023

peixin added a comment to D141306: Add loop-versioning pass to improve unit-stride.

I have not noticed any measurable increase in compile time - Spec 2017 wrf_r takes within 1-2 seconds, and that's the one that takes about 15 minutes to build in total. I'm not saying it's impossible to come up with something that compiles slowly with this code, but I've made as good an attempt as I can to "exit early" if there's nothing that needs doing, and only duplicate the innermost loop [which potentially is not the most optimal case].

Mar 14 2023, 6:36 PM · Restricted Project, Restricted Project
peixin added a comment to D141306: Add loop-versioning pass to improve unit-stride.

I am thinking if we should do the loop versioning in MLIR since this may increase compilation time in optimization passes.

Mar 14 2023, 5:47 AM · Restricted Project, Restricted Project

Mar 9 2023

peixin added a comment to D129969: [flang][OpenMP][OpenACC] Support stop statement in OpenMP/OpenACC region.

Thanks @kiranchandramohan for the provided test cases. It looks like this method is not good one.

Mar 9 2023, 5:46 PM · Restricted Project, Restricted Project
peixin updated the diff for D145684: [flang] Fix host associated vars in OpenMP/OpenACC region.

Thanks @kiranchandramohan for the notice. Rebase to fix patch application fail.

Mar 9 2023, 5:08 PM · Restricted Project, Restricted Project
peixin added a comment to D129969: [flang][OpenMP][OpenACC] Support stop statement in OpenMP/OpenACC region.

We have such condition enforce in flang/lib/Semantics/check-directive-structure.h by the NoBranchingEnforce. I think STOP should follow this. Sorry I didn't think about this restriction before.

I removed the semantic checks in https://reviews.llvm.org/D126471. At that time, I have tested using gfortran to confirm it should be removed. Not your fault :)

Ok fine. nvfortran also accept this.

Mar 9 2023, 4:21 AM · Restricted Project, Restricted Project