Page MenuHomePhabricator

TiehuZhang (Tiehu Zhang)
User

Projects

User does not belong to any projects.

User Details

User Since
May 18 2021, 8:18 PM (64 w, 1 d)

Recent Activity

Jul 1 2022

TiehuZhang requested review of D128988: [GlobalStatus] Handle masked load intrinsics as special loads.
Jul 1 2022, 6:13 AM · Restricted Project, Restricted Project
TiehuZhang requested review of D128987: [GlobalOpt] Precommit a test case.
Jul 1 2022, 6:09 AM · Restricted Project, Restricted Project

Jun 23 2022

TiehuZhang added a comment to D118566: [LoopVectorizer] Don't perform interleaving of predicated scalar loops.

Hello.

Hmm. I remember that this was important for performance in some cases on AArch64 - but we don't enable AggressivelyInterleaveReductions. If you have examples of scalar interleaving causing performance problems with AggressivelyInterleaveReductions then it sounds sensible to me to disable it for the if you pointed to as well.

Jun 23 2022, 6:52 PM · Restricted Project, Restricted Project

Jun 22 2022

Herald added a project to D118566: [LoopVectorizer] Don't perform interleaving of predicated scalar loops: Restricted Project.

Hi, @dmgreen, is it necessary to add ScalarInterleavingRequiresPredication to the guard ? I met some cases that the current patch wants to prevent from interleaving but still be interleaved, as AggressivelyInterleaveReductions is enabled on our target.

if (AggressivelyInterleaveReductions && !ScalarInterleavingRequiresPredication) {
  LLVM_DEBUG(dbgs() << "LV: Interleaving to expose ILP.\n");
  return IC;
}
Jun 22 2022, 6:19 PM · Restricted Project, Restricted Project

Jun 14 2022

TiehuZhang added a comment to D124612: [AArch64][LV] AArch64 does not prefer vectorized addressing.

Thanks. If you can update the test case, then this looks sensible to me.

Jun 14 2022, 12:06 AM · Restricted Project, Restricted Project
TiehuZhang updated the diff for D124612: [AArch64][LV] AArch64 does not prefer vectorized addressing.
Jun 14 2022, 12:02 AM · Restricted Project, Restricted Project

Jun 13 2022

TiehuZhang added a comment to D124612: [AArch64][LV] AArch64 does not prefer vectorized addressing.
Jun 13 2022, 11:49 PM · Restricted Project, Restricted Project
TiehuZhang updated the diff for D124612: [AArch64][LV] AArch64 does not prefer vectorized addressing.
Jun 13 2022, 11:44 PM · Restricted Project, Restricted Project

May 30 2022

TiehuZhang added a comment to D124612: [AArch64][LV] AArch64 does not prefer vectorized addressing.

Thanks for getting the numbers.

May 30 2022, 2:31 AM · Restricted Project, Restricted Project

May 24 2022

TiehuZhang added a comment to D124612: [AArch64][LV] AArch64 does not prefer vectorized addressing.

Did you do any performance measurements to get an impression of the performance impact of this change?

May 24 2022, 9:39 PM · Restricted Project, Restricted Project
TiehuZhang added a comment to D124612: [AArch64][LV] AArch64 does not prefer vectorized addressing.

Yes - do you have benchmarking results for this patch? This option makes sense, but I'm not sure what it's doing is always optimal. There's something going on with how it alters interleaving group costs, that doesn't look like it should be related to vector addresses. One such case was cleaned up (or maybe hidden) by D124786, but more problems might be present.

May 24 2022, 9:34 PM · Restricted Project, Restricted Project

May 16 2022

TiehuZhang added a comment to D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold.

Still LGTM, thanks! The remaining suggestion can be addressed directly before committing the patch.

May 16 2022, 1:30 AM · Restricted Project, Restricted Project, Restricted Project

May 12 2022

TiehuZhang updated the diff for D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold.
May 12 2022, 6:29 PM · Restricted Project, Restricted Project, Restricted Project
TiehuZhang added a comment to D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold.

The code has been updated since accept. Please review it again. Thank you very much! @fhahn @dmgreen

May 12 2022, 6:54 AM · Restricted Project, Restricted Project, Restricted Project
TiehuZhang updated the diff for D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold.

(Updated)
Difference with accepted version: Move memory runtime checks to processLoop to control both VF and IC

May 12 2022, 6:46 AM · Restricted Project, Restricted Project, Restricted Project
TiehuZhang removed a reviewer for D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold: Weiwei-2021.
May 12 2022, 6:02 AM · Restricted Project, Restricted Project, Restricted Project
TiehuZhang edited reviewers for D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold, added: Weiwei-2021; removed: weiwei.
May 12 2022, 6:00 AM · Restricted Project, Restricted Project, Restricted Project
TiehuZhang added a reviewer for D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold: weiwei.
May 12 2022, 6:00 AM · Restricted Project, Restricted Project, Restricted Project

May 5 2022

TiehuZhang updated the diff for D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold.

Fix the failed case (optimization-remark-options.c), because the remark info should be updated

May 5 2022, 6:31 AM · Restricted Project, Restricted Project, Restricted Project

Apr 29 2022

TiehuZhang added a reviewer for D124612: [AArch64][LV] AArch64 does not prefer vectorized addressing: mdchen.
Apr 29 2022, 2:12 AM · Restricted Project, Restricted Project

Apr 28 2022

TiehuZhang requested review of D124612: [AArch64][LV] AArch64 does not prefer vectorized addressing.
Apr 28 2022, 6:38 AM · Restricted Project, Restricted Project

Apr 6 2022

TiehuZhang updated the diff for D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold.
Apr 6 2022, 5:04 AM · Restricted Project, Restricted Project, Restricted Project

Apr 1 2022

TiehuZhang updated the diff for D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold.
Apr 1 2022, 4:41 AM · Restricted Project, Restricted Project, Restricted Project

Mar 31 2022

TiehuZhang updated the diff for D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold.
Mar 31 2022, 6:09 AM · Restricted Project, Restricted Project, Restricted Project

Mar 30 2022

TiehuZhang added a comment to D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold.

Why does this check SelectedVF.Width > 1? Can we just remove it or does that not help?

Does this mean this didn't help? It seems that code would not alter the IC in any case, as it is already returning VectorizationFactor::Disabled.

Just as a point of cleanup, is it possible to move the logic into selectInterleaveCount, so that it returns a value of 1 if there are too many runtime checks? I'm not sure all the info needed would be easily available inside the costmodel though.

Mar 30 2022, 8:46 PM · Restricted Project, Restricted Project, Restricted Project
TiehuZhang updated the diff for D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold.
Mar 30 2022, 8:31 PM · Restricted Project, Restricted Project, Restricted Project

Mar 28 2022

TiehuZhang updated the diff for D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold.
Mar 28 2022, 6:17 AM · Restricted Project, Restricted Project, Restricted Project

Mar 24 2022

TiehuZhang updated the diff for D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold.
Mar 24 2022, 6:06 AM · Restricted Project, Restricted Project, Restricted Project

Mar 23 2022

TiehuZhang added a comment to D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold.

ping

Mar 23 2022, 1:53 AM · Restricted Project, Restricted Project, Restricted Project

Mar 21 2022

TiehuZhang requested review of D122126: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold.
Mar 21 2022, 4:41 AM · Restricted Project, Restricted Project, Restricted Project

Oct 14 2021

TiehuZhang added a comment to D111804: [InstCombine] Don't combine CmpInst that used for verify the legality of the lshr.

If an input to an or instruction is poison, the output of the or is poison regardless of whether the other operand is true when the poison result occurs.

If an incorrect optimization is occurring it must be where the or is created.

Thanks for your review! @craig.topper. Do you mean that we should prevent this pattern from being generated, rather than remedying it after the pattern is generated? Actually, the issue I met started from D95959. That optimization does not seem to apply to all scenarios, such as those where the shift operand is negative. In the case I present, this optimization results in a miscalculation of the KnownBit, which in turn results in a false transformation.

The transform is correct based on the semantics of llvm's or instruction. The or instruction in llvm does not have short circuit evaluation. It can't block the spread of poison. The IR that has the semantics you want is

define i32 @cmp_lshr_cmp(i32 %0) {
entry:
  %cmp = icmp slt i32 %0, 0
  %shr = lshr i32 7, %0
  %cmp1 = icmp sgt i32 %0, %shr
  %or.cond = select i1 %cmp, i1 true, i1 %cmp1
  br i1 %or.cond, label %cond.end, label %cond.false

cond.false:                                       ; preds = %entry
  ret i32 1

cond.end:                                         ; preds = %entry, %cond.false
  ret i32 0
}

Using a select instruction instead of an or prevents the %cmp1 value from being used if %cmp is true. There was previously an incorrect transform that would turn such a select into an or, but it was disabled in https://reviews.llvm.org/D101191

Oct 14 2021, 9:16 PM · Restricted Project
TiehuZhang added a comment to D111804: [InstCombine] Don't combine CmpInst that used for verify the legality of the lshr.

If an input to an or instruction is poison, the output of the or is poison regardless of whether the other operand is true when the poison result occurs.

If an incorrect optimization is occurring it must be where the or is created.

Oct 14 2021, 8:10 PM · Restricted Project
TiehuZhang added a comment to D111804: [InstCombine] Don't combine CmpInst that used for verify the legality of the lshr.

Sorry, but i can not parse the description,

The current transformation is *not* a miscompile:
https://godbolt.org/z/WeTeasK53
https://alive2.llvm.org/ce/z/LVDqSX

Oct 14 2021, 7:20 PM · Restricted Project
TiehuZhang updated the summary of D111804: [InstCombine] Don't combine CmpInst that used for verify the legality of the lshr.
Oct 14 2021, 7:09 PM · Restricted Project
TiehuZhang requested review of D111804: [InstCombine] Don't combine CmpInst that used for verify the legality of the lshr.
Oct 14 2021, 7:10 AM · Restricted Project

Aug 12 2021

TiehuZhang added a comment to D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block.

Hi, @dmgreen, the previous implementation doesn't take the order of extends into account, so Instruction does not dominate all uses error will still appear in one of the testcases you mentioned. I have updated the patch to fix the failed case. Could you please check whether this modification is appropriate? Thanks very much.

Aug 12 2021, 12:25 AM · Restricted Project

Aug 11 2021

TiehuZhang updated the diff for D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block.
Aug 11 2021, 9:46 PM · Restricted Project

Aug 10 2021

TiehuZhang added a comment to D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block.

I've committed 013030a0b213a75e0403fcdb5a070d21831ee561. Do you want to rebase and fix those extra testcases here? Or do you think that's best left for a separate patch?

Aug 10 2021, 6:34 PM · Restricted Project

Aug 9 2021

TiehuZhang updated the diff for D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block.
Aug 9 2021, 4:01 AM · Restricted Project
TiehuZhang added a comment to D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block.

Thanks, looking good. But I do still worry about the order of instructions sunk.

I was trying it out, seeing if it would go wrong when we were sinking a lot of operands. I noticed that the add/sub sinking wasn't really working properly though! There is https://reviews.llvm.org/D107623 to improve that and getting the shuffles to sink.

With that in, can you add these two test to show partially sinking two values at the same time:

define <4 x i32> @sinkadd_partial(<8 x i16> %a1, <8 x i16> %a2, i8 %f) {
for.cond4.preheader.lr.ph:
  %cmp = icmp slt i8 %f, 0
  %s2 = shufflevector <8 x i16> %a2, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  %s1 = shufflevector <8 x i16> %a1, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  br i1 %cmp, label %for.cond4.preheader.us.preheader, label %for.cond4.preheader.preheader

for.cond4.preheader.us.preheader:                 ; preds = %for.cond4.preheader.lr.ph
  %e1 = sext <4 x i16> %s1 to <4 x i32>
  %e2 = sext <4 x i16> %s2 to <4 x i32>
  %0 = add <4 x i32> %e1, %e2
  ret <4 x i32> %0

for.cond4.preheader.preheader:                    ; preds = %for.cond4.preheader.lr.ph
  ret <4 x i32> zeroinitializer
}

define <4 x i32> @sinkadd_partial_rev(<8 x i16> %a1, <8 x i16> %a2, i8 %f) {
for.cond4.preheader.lr.ph:
  %cmp = icmp slt i8 %f, 0
  %s2 = shufflevector <8 x i16> %a2, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  %s1 = shufflevector <8 x i16> %a1, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  br i1 %cmp, label %for.cond4.preheader.us.preheader, label %for.cond4.preheader.preheader

for.cond4.preheader.us.preheader:                 ; preds = %for.cond4.preheader.lr.ph
  %e2 = sext <4 x i16> %s2 to <4 x i32>
  %e1 = sext <4 x i16> %s1 to <4 x i32>
  %0 = add <4 x i32> %e1, %e2
  ret <4 x i32> %0

for.cond4.preheader.preheader:                    ; preds = %for.cond4.preheader.lr.ph
  ret <4 x i32> zeroinitializer
}

The order of extends in the target block become important.

Aug 9 2021, 1:14 AM · Restricted Project

Aug 5 2021

TiehuZhang updated the diff for D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block.
Aug 5 2021, 7:55 AM · Restricted Project
TiehuZhang updated the diff for D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block.
Aug 5 2021, 7:52 AM · Restricted Project
TiehuZhang added a comment to D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block.

This happens because out of the chain of insert; shuffle only the insert needs to be sunk?
It may be better to remove the UI->getParent() == TargetBB from if (UI->getParent() == TargetBB || isa<PHINode>(UI)) and update InsertPoint in the loop as we go. It won't need to clone the instruction, but it can update InsertPoint and continue.

Hi, dmgreen, thanks for your review! But I don't quite make sense of your comment. The condition UI->getParent() == TargetBB is used to filter out some instructions already in targetBB to avoid the meaningless sinking, why can it be removed?

shouldSinkOperands will return a chain of instructions, in this case starting at the mul it will return the shuffle and the insert, as those are the instruction it is profitable to sink. They are visited in reverse order in order to sink the last instruction first. The shuffle is already in the TargetBB, so doesn't need to be sunk (or cloned), but we still need to update the InsertPoint when sinking the insert, or else it will fail dominance checks.

It looks like there are some Arm tests failing with the current attempt to fix that. The instruction being sunk may have multiple uses, and moving before the first one we happen to find might not always work, it might not be the instruction that was originally returned by shouldSinkOperands.

Instead, I think it would be better to make sure we are updating InsertPoint as we go in the ToReplace loop. So we add instruction to ToReplace even if they are already in TargetDB, and in the ToReplace loop we check if the instruction is already in the correct BB and just update the InsertPoint if so.

Aug 5 2021, 7:47 AM · Restricted Project
TiehuZhang updated the diff for D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block.
Aug 5 2021, 7:35 AM · Restricted Project
TiehuZhang updated the diff for D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block.
Aug 5 2021, 7:30 AM · Restricted Project

Aug 3 2021

TiehuZhang updated the diff for D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block.
Aug 3 2021, 3:27 AM · Restricted Project
TiehuZhang added a comment to D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block.

This happens because out of the chain of insert; shuffle only the insert needs to be sunk?
It may be better to remove the UI->getParent() == TargetBB from if (UI->getParent() == TargetBB || isa<PHINode>(UI)) and update InsertPoint in the loop as we go. It won't need to clone the instruction, but it can update InsertPoint and continue.

Aug 3 2021, 2:04 AM · Restricted Project

Aug 2 2021

TiehuZhang added a reviewer for D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block: fhahn.
Aug 2 2021, 7:57 PM · Restricted Project
TiehuZhang updated the summary of D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block.
Aug 2 2021, 4:13 AM · Restricted Project
TiehuZhang requested review of D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block.
Aug 2 2021, 4:09 AM · Restricted Project

Jun 28 2021

TiehuZhang added a comment to D102759: [AArch64ISelDAGToDAG] Supplement cases for ORRWrs/ORRXrs when calculating usefulbits.

ping

Jun 28 2021, 10:38 PM · Restricted Project

Jun 6 2021

TiehuZhang added a comment to D102759: [AArch64ISelDAGToDAG] Supplement cases for ORRWrs/ORRXrs when calculating usefulbits.

ping

Jun 6 2021, 6:15 PM · Restricted Project

Jun 3 2021

TiehuZhang updated the diff for D102759: [AArch64ISelDAGToDAG] Supplement cases for ORRWrs/ORRXrs when calculating usefulbits.
Jun 3 2021, 7:32 AM · Restricted Project

May 23 2021

TiehuZhang added a comment to D102759: [AArch64ISelDAGToDAG] Supplement cases for ORRWrs/ORRXrs when calculating usefulbits.

ping

May 23 2021, 6:21 PM · Restricted Project

May 19 2021

TiehuZhang updated the diff for D102759: [AArch64ISelDAGToDAG] Supplement cases for ORRWrs/ORRXrs when calculating usefulbits.

update the testcase

May 19 2021, 6:35 PM · Restricted Project
TiehuZhang updated the summary of D102759: [AArch64ISelDAGToDAG] Supplement cases for ORRWrs/ORRXrs when calculating usefulbits.
May 19 2021, 4:01 AM · Restricted Project
TiehuZhang requested review of D102759: [AArch64ISelDAGToDAG] Supplement cases for ORRWrs/ORRXrs when calculating usefulbits.
May 19 2021, 4:00 AM · Restricted Project