Page MenuHomePhabricator

TiehuZhang (Tiehu Zhang)
User

Projects

User does not belong to any projects.

User Details

User Since
May 18 2021, 8:18 PM (106 w, 4 d)

Recent Activity

Thu, May 11

TiehuZhang added a comment to D105889: [AArch64][SVE] Break false dependencies for inactive lanes of unary operations.

Hi, @bsmith, sorry to bother you, could you tell me the background of the patch?
I have a question about the patch, as I notice that the patch adds movprfx before some instructions, and the optimization may affect the performance(degradation).
; CHECK-NEXT: ptrue p0.d
+; CHECK-NEXT: movprfx z0, z1
; CHECK-NEXT: fsqrt z0.d, p0/m, z1.d

The rational for the patch is to ensure the inactive lanes of unary operations don't have a dependence on previous operations when the results for inactive lanes is not important (i.e. undef). Taking the above example, the problem with not breaking the dependency is that if z0.d is used as the result for a preceding long latency instruction then it can negatively impact the issue latency of this instructions despite there being no "real" dependency.

For this reason it is preferred to always break such dependencies via a movprfx instruction. The intent here is that movprfx is suppose to be free because the architecture prefers implementations to combine movprfx'd destructive instructions into constructive instructions. However, we are aware of issues where register allocation is making some poor decisions that is leading to movprfx being over used because dedicated destination registers are being allocated even though it would be advantageous to reuse an operand register.

Thu, May 11, 7:42 PM · Restricted Project, Restricted Project

Sat, May 6

Herald added a project to D105889: [AArch64][SVE] Break false dependencies for inactive lanes of unary operations: Restricted Project.

Hi, @bsmith, sorry to bother you, could you tell me the background of the patch?
I have a question about the patch, as I notice that the patch adds movprfx before some instructions, and the optimization may affect the performance(degradation).
; CHECK-NEXT: ptrue p0.d
+; CHECK-NEXT: movprfx z0, z1
; CHECK-NEXT: fsqrt z0.d, p0/m, z1.d

Sat, May 6, 12:17 AM · Restricted Project, Restricted Project

Feb 27 2023

TiehuZhang added inline comments to D144718: [ScalarEvolution] Enhance Predicate implication from condition.
Feb 27 2023, 9:09 PM · Restricted Project, Restricted Project

Feb 24 2023

TiehuZhang added reviewers for D144718: [ScalarEvolution] Enhance Predicate implication from condition: fhahn, dmgreen, lebedev.ri, mkazantsev.
Feb 24 2023, 3:59 AM · Restricted Project, Restricted Project
TiehuZhang requested review of D144719: [ScalarEvolution] Precommit a test.
Feb 24 2023, 3:56 AM · Restricted Project, Restricted Project
TiehuZhang retitled D144718: [ScalarEvolution] Enhance Predicate implication from condition from [ScalarEvolution] Precommit a test to [ScalarEvolution] Enhance Predicate implication from condition.
Feb 24 2023, 3:55 AM · Restricted Project, Restricted Project
TiehuZhang requested review of D144718: [ScalarEvolution] Enhance Predicate implication from condition.
Feb 24 2023, 3:53 AM · Restricted Project, Restricted Project

Feb 16 2023

TiehuZhang added a reviewer for D144089: [IndVarSimplify] Transform the trip count into a simpler form: mkazantsev.
Feb 16 2023, 4:57 PM · Restricted Project, Restricted Project
TiehuZhang updated the diff for D144089: [IndVarSimplify] Transform the trip count into a simpler form.
Feb 16 2023, 5:19 AM · Restricted Project, Restricted Project

Feb 15 2023

TiehuZhang retitled D144089: [IndVarSimplify] Transform the trip count into a simpler form from [IndVarSimplify] Transform the trip count to a simpler form to [IndVarSimplify] Transform the trip count into a simpler form.
Feb 15 2023, 3:48 AM · Restricted Project, Restricted Project
TiehuZhang requested review of D144089: [IndVarSimplify] Transform the trip count into a simpler form.
Feb 15 2023, 3:45 AM · Restricted Project, Restricted Project
TiehuZhang requested review of D144088: [IndVarSimplify] Precommit a test.
Feb 15 2023, 3:42 AM · Restricted Project, Restricted Project

Jan 15 2023

TiehuZhang added a comment to D141590: [PassManager] Add some passes to the sequence of extra vector passes.

Yes, of course running more passes helps with optimizations. That is obvious and was not my question.
My question is why we need to do that in the first place, and why we can not catch those cases during the existing pass runs.

Jan 15 2023, 6:06 PM · Restricted Project, Restricted Project

Jan 12 2023

TiehuZhang added a comment to D141590: [PassManager] Add some passes to the sequence of extra vector passes.

I'm also a bit skeptical about this change.
I would really recommend to fix the LoopIdiom pass itself,
or at least explaining why that is impossible/unfeasible.

Jan 12 2023, 7:14 PM · Restricted Project, Restricted Project
TiehuZhang abandoned D111804: [InstCombine] Don't combine CmpInst that used for verify the legality of the lshr.
Jan 12 2023, 5:10 PM · Restricted Project, Restricted Project
TiehuZhang added a comment to D141590: [PassManager] Add some passes to the sequence of extra vector passes.

Could you share a full IR test showing the improvements? The main goal of the extra vectorizer passes is to optimize the vector code, not necessarily other scalar loops.

Jan 12 2023, 4:14 AM · Restricted Project, Restricted Project
TiehuZhang added a comment to D141590: [PassManager] Add some passes to the sequence of extra vector passes.

The patch may be helpful for generating better IR in some cases, e.g.,

void testNestLoop(const int nCells, const double *x, double *b, const double *values, const int max_row_length){
Jan 12 2023, 3:30 AM · Restricted Project, Restricted Project
TiehuZhang requested review of D141590: [PassManager] Add some passes to the sequence of extra vector passes.
Jan 12 2023, 3:16 AM · Restricted Project, Restricted Project

Nov 30 2022

TiehuZhang added a comment to D138793: [AArch64][SVE2] Add patterns for eor3.

Update the test:

  1. Supplement other data types (nxv2i64, nxv4i32 and nxv8i16).
  2. Add functions with different operand orders.
Nov 30 2022, 4:25 AM · Restricted Project, Restricted Project
TiehuZhang updated the diff for D138793: [AArch64][SVE2] Add patterns for eor3.
Nov 30 2022, 4:12 AM · Restricted Project, Restricted Project

Nov 28 2022

TiehuZhang requested review of D138793: [AArch64][SVE2] Add patterns for eor3.
Nov 28 2022, 4:56 AM · Restricted Project, Restricted Project

Aug 23 2022

TiehuZhang added a comment to D127720: [ORC-RT] Add integration tests for AArch64 Linux.

Hi, @housel, the following error occurs when I run the test suite. I am not familiar with compiler-rt. Do you know how this problem occurs? Thank you very much if you can give me some advice!
llvm-jitlink error: Symbols not found: [ _ZTIi ]

These tests passed for me when I ran them on Debian Buster aarch64 a couple months ago. The _ZTIi symbol is the typeinfo for the int type; at least on my system this is available as a weak symbol in libstdc++. What distribution did you test? I'll rebuild and test the current main branch to verify that it still works for me.

I just ran llvm-lit and ran into this problem. I'm based on the main branch. I suspect that my running mode or running environment is faulty. Do these cases have any environment dependency? I'm currently based on the aarch64 platform

Which aarch64 Linux distribution are you using? How did you configure LLVM_ENABLE_PROJECTS and LLVM_ENABLE_RUNTIMES when you configured your build?

I updated to the tip of the main branch and re-built everything, and all supported ORC tests pass for me on Debian Buster using llvm-lit.

Aug 23 2022, 7:24 PM · Restricted Project, Restricted Project

Aug 22 2022

TiehuZhang added a comment to D127720: [ORC-RT] Add integration tests for AArch64 Linux.

Hi, @housel, the following error occurs when I run the test suite. I am not familiar with compiler-rt. Do you know how this problem occurs? Thank you very much if you can give me some advice!
llvm-jitlink error: Symbols not found: [ _ZTIi ]

These tests passed for me when I ran them on Debian Buster aarch64 a couple months ago. The _ZTIi symbol is the typeinfo for the int type; at least on my system this is available as a weak symbol in libstdc++. What distribution did you test? I'll rebuild and test the current main branch to verify that it still works for me.

Aug 22 2022, 11:25 PM · Restricted Project, Restricted Project
TiehuZhang added a comment to D127720: [ORC-RT] Add integration tests for AArch64 Linux.

Hi, @housel, the following error occurs when I run the test suite. I am not familiar with compiler-rt. Do you know how this problem occurs? Thank you very much if you can give me some advice!
llvm-jitlink error: Symbols not found: [ _ZTIi ]

Failed Tests (3):
  ORC-aarch64-linux :: TestCases/Linux/aarch64/ehframe-default.cpp
  ORC-aarch64-linux :: TestCases/Linux/aarch64/ehframe-libunwind.cpp
  ORC-aarch64-linux :: TestCases/Linux/aarch64/lljit-ehframe.cpp
Aug 22 2022, 8:19 PM · Restricted Project, Restricted Project

Jul 1 2022

TiehuZhang requested review of D128988: [GlobalStatus] Handle masked load intrinsics as special loads.
Jul 1 2022, 6:13 AM · Restricted Project, Restricted Project
TiehuZhang requested review of D128987: [GlobalOpt] Precommit a test case.
Jul 1 2022, 6:09 AM · Restricted Project, Restricted Project

Jun 23 2022

TiehuZhang added a comment to D118566: [LoopVectorizer] Don't perform interleaving of predicated scalar loops.

Hello.

Hmm. I remember that this was important for performance in some cases on AArch64 - but we don't enable AggressivelyInterleaveReductions. If you have examples of scalar interleaving causing performance problems with AggressivelyInterleaveReductions then it sounds sensible to me to disable it for the if you pointed to as well.

Jun 23 2022, 6:52 PM · Restricted Project, Restricted Project

Jun 22 2022

Herald added a project to D118566: [LoopVectorizer] Don't perform interleaving of predicated scalar loops: Restricted Project.

Hi, @dmgreen, is it necessary to add ScalarInterleavingRequiresPredication to the guard ? I met some cases that the current patch wants to prevent from interleaving but still be interleaved, as AggressivelyInterleaveReductions is enabled on our target.

if (AggressivelyInterleaveReductions && !ScalarInterleavingRequiresPredication) {
  LLVM_DEBUG(dbgs() << "LV: Interleaving to expose ILP.\n");
  return IC;
}
Jun 22 2022, 6:19 PM · Restricted Project, Restricted Project

Jun 14 2022

TiehuZhang added a comment to D124612: [AArch64][LV] AArch64 does not prefer vectorized addressing.

Thanks. If you can update the test case, then this looks sensible to me.

Jun 14 2022, 12:06 AM · Restricted Project, Restricted Project
TiehuZhang updated the diff for D124612: [AArch64][LV] AArch64 does not prefer vectorized addressing.
Jun 14 2022, 12:02 AM · Restricted Project, Restricted Project

Jun 13 2022

TiehuZhang added a comment to D124612: [AArch64][LV] AArch64 does not prefer vectorized addressing.
Jun 13 2022, 11:49 PM · Restricted Project, Restricted Project
TiehuZhang updated the diff for D124612: [AArch64][LV] AArch64 does not prefer vectorized addressing.
Jun 13 2022, 11:44 PM · Restricted Project, Restricted Project

May 30 2022

TiehuZhang added a comment to D124612: [AArch64][LV] AArch64 does not prefer vectorized addressing.

Thanks for getting the numbers.

May 30 2022, 2:31 AM · Restricted Project, Restricted Project

May 24 2022

TiehuZhang added a comment to D124612: [AArch64][LV] AArch64 does not prefer vectorized addressing.

Did you do any performance measurements to get an impression of the performance impact of this change?

May 24 2022, 9:39 PM · Restricted Project, Restricted Project
TiehuZhang added a comment to D124612: [AArch64][LV] AArch64 does not prefer vectorized addressing.

Yes - do you have benchmarking results for this patch? This option makes sense, but I'm not sure what it's doing is always optimal. There's something going on with how it alters interleaving group costs, that doesn't look like it should be related to vector addresses. One such case was cleaned up (or maybe hidden) by D124786, but more problems might be present.

May 24 2022, 9:34 PM · Restricted Project, Restricted Project

May 16 2022

TiehuZhang added a comment to D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold.

Still LGTM, thanks! The remaining suggestion can be addressed directly before committing the patch.

May 16 2022, 1:30 AM · Restricted Project, Restricted Project, Restricted Project

May 12 2022

TiehuZhang updated the diff for D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold.
May 12 2022, 6:29 PM · Restricted Project, Restricted Project, Restricted Project
TiehuZhang added a comment to D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold.

The code has been updated since accept. Please review it again. Thank you very much! @fhahn @dmgreen

May 12 2022, 6:54 AM · Restricted Project, Restricted Project, Restricted Project
TiehuZhang updated the diff for D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold.

(Updated)
Difference with accepted version: Move memory runtime checks to processLoop to control both VF and IC

May 12 2022, 6:46 AM · Restricted Project, Restricted Project, Restricted Project
TiehuZhang removed a reviewer for D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold: Weiwei-2021.
May 12 2022, 6:02 AM · Restricted Project, Restricted Project, Restricted Project
TiehuZhang edited reviewers for D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold, added: Weiwei-2021; removed: weiwei.
May 12 2022, 6:00 AM · Restricted Project, Restricted Project, Restricted Project
TiehuZhang added a reviewer for D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold: weiwei.
May 12 2022, 6:00 AM · Restricted Project, Restricted Project, Restricted Project

May 5 2022

TiehuZhang updated the diff for D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold.

Fix the failed case (optimization-remark-options.c), because the remark info should be updated

May 5 2022, 6:31 AM · Restricted Project, Restricted Project, Restricted Project

Apr 29 2022

TiehuZhang added a reviewer for D124612: [AArch64][LV] AArch64 does not prefer vectorized addressing: mdchen.
Apr 29 2022, 2:12 AM · Restricted Project, Restricted Project

Apr 28 2022

TiehuZhang requested review of D124612: [AArch64][LV] AArch64 does not prefer vectorized addressing.
Apr 28 2022, 6:38 AM · Restricted Project, Restricted Project

Apr 6 2022

TiehuZhang updated the diff for D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold.
Apr 6 2022, 5:04 AM · Restricted Project, Restricted Project, Restricted Project

Apr 1 2022

TiehuZhang updated the diff for D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold.
Apr 1 2022, 4:41 AM · Restricted Project, Restricted Project, Restricted Project

Mar 31 2022

TiehuZhang updated the diff for D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold.
Mar 31 2022, 6:09 AM · Restricted Project, Restricted Project, Restricted Project

Mar 30 2022

TiehuZhang added a comment to D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold.

Why does this check SelectedVF.Width > 1? Can we just remove it or does that not help?

Does this mean this didn't help? It seems that code would not alter the IC in any case, as it is already returning VectorizationFactor::Disabled.

Just as a point of cleanup, is it possible to move the logic into selectInterleaveCount, so that it returns a value of 1 if there are too many runtime checks? I'm not sure all the info needed would be easily available inside the costmodel though.

Mar 30 2022, 8:46 PM · Restricted Project, Restricted Project, Restricted Project
TiehuZhang updated the diff for D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold.
Mar 30 2022, 8:31 PM · Restricted Project, Restricted Project, Restricted Project

Mar 28 2022

TiehuZhang updated the diff for D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold.
Mar 28 2022, 6:17 AM · Restricted Project, Restricted Project, Restricted Project

Mar 24 2022

TiehuZhang updated the diff for D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold.
Mar 24 2022, 6:06 AM · Restricted Project, Restricted Project, Restricted Project

Mar 23 2022

TiehuZhang added a comment to D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold.

ping

Mar 23 2022, 1:53 AM · Restricted Project, Restricted Project, Restricted Project

Mar 21 2022

TiehuZhang requested review of D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold.
Mar 21 2022, 4:41 AM · Restricted Project, Restricted Project, Restricted Project

Oct 14 2021

TiehuZhang added a comment to D111804: [InstCombine] Don't combine CmpInst that used for verify the legality of the lshr.

If an input to an or instruction is poison, the output of the or is poison regardless of whether the other operand is true when the poison result occurs.

If an incorrect optimization is occurring it must be where the or is created.

Thanks for your review! @craig.topper. Do you mean that we should prevent this pattern from being generated, rather than remedying it after the pattern is generated? Actually, the issue I met started from D95959. That optimization does not seem to apply to all scenarios, such as those where the shift operand is negative. In the case I present, this optimization results in a miscalculation of the KnownBit, which in turn results in a false transformation.

The transform is correct based on the semantics of llvm's or instruction. The or instruction in llvm does not have short circuit evaluation. It can't block the spread of poison. The IR that has the semantics you want is

define i32 @cmp_lshr_cmp(i32 %0) {
entry:
  %cmp = icmp slt i32 %0, 0
  %shr = lshr i32 7, %0
  %cmp1 = icmp sgt i32 %0, %shr
  %or.cond = select i1 %cmp, i1 true, i1 %cmp1
  br i1 %or.cond, label %cond.end, label %cond.false

cond.false:                                       ; preds = %entry
  ret i32 1

cond.end:                                         ; preds = %entry, %cond.false
  ret i32 0
}

Using a select instruction instead of an or prevents the %cmp1 value from being used if %cmp is true. There was previously an incorrect transform that would turn such a select into an or, but it was disabled in https://reviews.llvm.org/D101191

Oct 14 2021, 9:16 PM · Restricted Project, Restricted Project
TiehuZhang added a comment to D111804: [InstCombine] Don't combine CmpInst that used for verify the legality of the lshr.

If an input to an or instruction is poison, the output of the or is poison regardless of whether the other operand is true when the poison result occurs.

If an incorrect optimization is occurring it must be where the or is created.

Oct 14 2021, 8:10 PM · Restricted Project, Restricted Project
TiehuZhang added a comment to D111804: [InstCombine] Don't combine CmpInst that used for verify the legality of the lshr.

Sorry, but i can not parse the description,

The current transformation is *not* a miscompile:
https://godbolt.org/z/WeTeasK53
https://alive2.llvm.org/ce/z/LVDqSX

Oct 14 2021, 7:20 PM · Restricted Project, Restricted Project
TiehuZhang updated the summary of D111804: [InstCombine] Don't combine CmpInst that used for verify the legality of the lshr.
Oct 14 2021, 7:09 PM · Restricted Project, Restricted Project
TiehuZhang requested review of D111804: [InstCombine] Don't combine CmpInst that used for verify the legality of the lshr.
Oct 14 2021, 7:10 AM · Restricted Project, Restricted Project

Aug 12 2021

TiehuZhang added a comment to D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block.

Hi, @dmgreen, the previous implementation doesn't take the order of extends into account, so Instruction does not dominate all uses error will still appear in one of the testcases you mentioned. I have updated the patch to fix the failed case. Could you please check whether this modification is appropriate? Thanks very much.

Aug 12 2021, 12:25 AM · Restricted Project

Aug 11 2021

TiehuZhang updated the diff for D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block.
Aug 11 2021, 9:46 PM · Restricted Project

Aug 10 2021

TiehuZhang added a comment to D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block.

I've committed 013030a0b213a75e0403fcdb5a070d21831ee561. Do you want to rebase and fix those extra testcases here? Or do you think that's best left for a separate patch?

Aug 10 2021, 6:34 PM · Restricted Project

Aug 9 2021

TiehuZhang updated the diff for D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block.
Aug 9 2021, 4:01 AM · Restricted Project
TiehuZhang added a comment to D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block.

Thanks, looking good. But I do still worry about the order of instructions sunk.

I was trying it out, seeing if it would go wrong when we were sinking a lot of operands. I noticed that the add/sub sinking wasn't really working properly though! There is https://reviews.llvm.org/D107623 to improve that and getting the shuffles to sink.

With that in, can you add these two test to show partially sinking two values at the same time:

define <4 x i32> @sinkadd_partial(<8 x i16> %a1, <8 x i16> %a2, i8 %f) {
for.cond4.preheader.lr.ph:
  %cmp = icmp slt i8 %f, 0
  %s2 = shufflevector <8 x i16> %a2, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  %s1 = shufflevector <8 x i16> %a1, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  br i1 %cmp, label %for.cond4.preheader.us.preheader, label %for.cond4.preheader.preheader

for.cond4.preheader.us.preheader:                 ; preds = %for.cond4.preheader.lr.ph
  %e1 = sext <4 x i16> %s1 to <4 x i32>
  %e2 = sext <4 x i16> %s2 to <4 x i32>
  %0 = add <4 x i32> %e1, %e2
  ret <4 x i32> %0

for.cond4.preheader.preheader:                    ; preds = %for.cond4.preheader.lr.ph
  ret <4 x i32> zeroinitializer
}

define <4 x i32> @sinkadd_partial_rev(<8 x i16> %a1, <8 x i16> %a2, i8 %f) {
for.cond4.preheader.lr.ph:
  %cmp = icmp slt i8 %f, 0
  %s2 = shufflevector <8 x i16> %a2, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  %s1 = shufflevector <8 x i16> %a1, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  br i1 %cmp, label %for.cond4.preheader.us.preheader, label %for.cond4.preheader.preheader

for.cond4.preheader.us.preheader:                 ; preds = %for.cond4.preheader.lr.ph
  %e2 = sext <4 x i16> %s2 to <4 x i32>
  %e1 = sext <4 x i16> %s1 to <4 x i32>
  %0 = add <4 x i32> %e1, %e2
  ret <4 x i32> %0

for.cond4.preheader.preheader:                    ; preds = %for.cond4.preheader.lr.ph
  ret <4 x i32> zeroinitializer
}

The order of extends in the target block become important.

Aug 9 2021, 1:14 AM · Restricted Project

Aug 5 2021

TiehuZhang updated the diff for D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block.
Aug 5 2021, 7:55 AM · Restricted Project
TiehuZhang updated the diff for D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block.
Aug 5 2021, 7:52 AM · Restricted Project
TiehuZhang added a comment to D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block.

This happens because out of the chain of insert; shuffle only the insert needs to be sunk?
It may be better to remove the UI->getParent() == TargetBB from if (UI->getParent() == TargetBB || isa<PHINode>(UI)) and update InsertPoint in the loop as we go. It won't need to clone the instruction, but it can update InsertPoint and continue.

Hi, dmgreen, thanks for your review! But I don't quite make sense of your comment. The condition UI->getParent() == TargetBB is used to filter out some instructions already in targetBB to avoid the meaningless sinking, why can it be removed?

shouldSinkOperands will return a chain of instructions, in this case starting at the mul it will return the shuffle and the insert, as those are the instruction it is profitable to sink. They are visited in reverse order in order to sink the last instruction first. The shuffle is already in the TargetBB, so doesn't need to be sunk (or cloned), but we still need to update the InsertPoint when sinking the insert, or else it will fail dominance checks.

It looks like there are some Arm tests failing with the current attempt to fix that. The instruction being sunk may have multiple uses, and moving before the first one we happen to find might not always work, it might not be the instruction that was originally returned by shouldSinkOperands.

Instead, I think it would be better to make sure we are updating InsertPoint as we go in the ToReplace loop. So we add instruction to ToReplace even if they are already in TargetDB, and in the ToReplace loop we check if the instruction is already in the correct BB and just update the InsertPoint if so.

Aug 5 2021, 7:47 AM · Restricted Project
TiehuZhang updated the diff for D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block.
Aug 5 2021, 7:35 AM · Restricted Project
TiehuZhang updated the diff for D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block.
Aug 5 2021, 7:30 AM · Restricted Project

Aug 3 2021

TiehuZhang updated the diff for D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block.
Aug 3 2021, 3:27 AM · Restricted Project
TiehuZhang added a comment to D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block.

This happens because out of the chain of insert; shuffle only the insert needs to be sunk?
It may be better to remove the UI->getParent() == TargetBB from if (UI->getParent() == TargetBB || isa<PHINode>(UI)) and update InsertPoint in the loop as we go. It won't need to clone the instruction, but it can update InsertPoint and continue.

Aug 3 2021, 2:04 AM · Restricted Project

Aug 2 2021

TiehuZhang added a reviewer for D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block: fhahn.
Aug 2 2021, 7:57 PM · Restricted Project
TiehuZhang updated the summary of D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block.
Aug 2 2021, 4:13 AM · Restricted Project
TiehuZhang requested review of D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block.
Aug 2 2021, 4:09 AM · Restricted Project

Jun 28 2021

TiehuZhang added a comment to D102759: [AArch64ISelDAGToDAG] Supplement cases for ORRWrs/ORRXrs when calculating usefulbits.

ping

Jun 28 2021, 10:38 PM · Restricted Project

Jun 6 2021

TiehuZhang added a comment to D102759: [AArch64ISelDAGToDAG] Supplement cases for ORRWrs/ORRXrs when calculating usefulbits.

ping

Jun 6 2021, 6:15 PM · Restricted Project

Jun 3 2021

TiehuZhang updated the diff for D102759: [AArch64ISelDAGToDAG] Supplement cases for ORRWrs/ORRXrs when calculating usefulbits.
Jun 3 2021, 7:32 AM · Restricted Project

May 23 2021

TiehuZhang added a comment to D102759: [AArch64ISelDAGToDAG] Supplement cases for ORRWrs/ORRXrs when calculating usefulbits.

ping

May 23 2021, 6:21 PM · Restricted Project

May 19 2021

TiehuZhang updated the diff for D102759: [AArch64ISelDAGToDAG] Supplement cases for ORRWrs/ORRXrs when calculating usefulbits.

update the testcase

May 19 2021, 6:35 PM · Restricted Project
TiehuZhang updated the summary of D102759: [AArch64ISelDAGToDAG] Supplement cases for ORRWrs/ORRXrs when calculating usefulbits.
May 19 2021, 4:01 AM · Restricted Project
TiehuZhang requested review of D102759: [AArch64ISelDAGToDAG] Supplement cases for ORRWrs/ORRXrs when calculating usefulbits.
May 19 2021, 4:00 AM · Restricted Project