Download Raw Diff

Details

Reviewers

sdesmalen
paulwalker-arm
mdchen
fhahn
dmgreen

Commits

rG9cfa9b44a589: [CodeGenPrepare] The instruction to be sunk should be inserted before its user…

Summary

In current implementation, the instruction to be sunk will be inserted before the target instruction without considering the def-use tree, which may case Instruction does not dominate all uses error. We need to choose a suitable location to insert
according to the use chain

Diff Detail

Unit TestsFailed

	Time	Test
	300 ms	x64 debian > LLVM.CodeGen/Thumb2::mve-gather-scatter-optimisation.ll
	310 ms	x64 debian > LLVM.CodeGen/Thumb2::mve-postinc-lsr.ll
	280 ms	x64 debian > LLVM.Transforms/CodeGenPrepare/ARM::sink-add-mul-shufflevector-inseltpoison.ll
	280 ms	x64 debian > LLVM.Transforms/CodeGenPrepare/ARM::sink-add-mul-shufflevector.ll
	1,260 ms	x64 debian > MemProfiler-x86_64-linux-dynamic.TestCases::test_malloc_load_store.c
		View Full Test Results (9 Failed)

Event Timeline

TiehuZhang created this revision.Aug 2 2021, 4:09 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptAug 2 2021, 4:09 AM

TiehuZhang requested review of this revision.Aug 2 2021, 4:09 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2021, 4:09 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

TiehuZhang edited the summary of this revision. (Show Details)Aug 2 2021, 4:13 AM

This happens because out of the chain of insert; shuffle only the insert needs to be sunk?
It may be better to remove the UI->getParent() == TargetBB from if (UI->getParent() == TargetBB || isa<PHINode>(UI)) and update InsertPoint in the loop as we go. It won't need to clone the instruction, but it can update InsertPoint and continue.

llvm/test/Transforms/CodeGenPrepare/AArch64/sink-free-instructions-1.ll
5–6	This can be removed.
26	Bugpoint has a habit of over-reducing test cases, introducing undef where they only make things worse and less maintainable in the longrun. Can you make the test not use any undef?

Harbormaster completed remote builds in B117419: Diff 363430.Aug 2 2021, 4:55 AM

TiehuZhang added a reviewer: fhahn.Aug 2 2021, 7:57 PM

In D107262#2919594, @dmgreen wrote:

This happens because out of the chain of insert; shuffle only the insert needs to be sunk?
It may be better to remove the UI->getParent() == TargetBB from if (UI->getParent() == TargetBB || isa<PHINode>(UI)) and update InsertPoint in the loop as we go. It won't need to clone the instruction, but it can update InsertPoint and continue.

Hi, dmgreen, thanks for your review! But I don't quite make sense of your comment. The condition UI->getParent() == TargetBB is used to filter out some instructions already in targetBB to avoid the meaningless sinking, why can it be removed?

TiehuZhang updated this revision to Diff 363675.Aug 3 2021, 3:27 AM

Harbormaster completed remote builds in B117588: Diff 363675.Aug 3 2021, 4:14 AM

In D107262#2921656, @TiehuZhang wrote:

In D107262#2919594, @dmgreen wrote:

This happens because out of the chain of insert; shuffle only the insert needs to be sunk?
It may be better to remove the UI->getParent() == TargetBB from if (UI->getParent() == TargetBB || isa<PHINode>(UI)) and update InsertPoint in the loop as we go. It won't need to clone the instruction, but it can update InsertPoint and continue.

Hi, dmgreen, thanks for your review! But I don't quite make sense of your comment. The condition UI->getParent() == TargetBB is used to filter out some instructions already in targetBB to avoid the meaningless sinking, why can it be removed?

shouldSinkOperands will return a chain of instructions, in this case starting at the mul it will return the shuffle and the insert, as those are the instruction it is profitable to sink. They are visited in reverse order in order to sink the last instruction first. The shuffle is already in the TargetBB, so doesn't need to be sunk (or cloned), but we still need to update the InsertPoint when sinking the insert, or else it will fail dominance checks.

It looks like there are some Arm tests failing with the current attempt to fix that. The instruction being sunk may have multiple uses, and moving before the first one we happen to find might not always work, it might not be the instruction that was originally returned by shouldSinkOperands.

Instead, I think it would be better to make sure we are updating InsertPoint as we go in the ToReplace loop. So we add instruction to ToReplace even if they are already in TargetDB, and in the ToReplace loop we check if the instruction is already in the correct BB and just update the InsertPoint if so.

llvm/test/Transforms/CodeGenPrepare/AArch64/sink-free-instructions-1.ll
7	Can this test be added to one of the existing files? There is already a sink-free-instructions.ll test. Otherwise the test should preferably have a better name than sink-free-instructions-1.ll.
29	This value is dead? Can it be returned instead?

TiehuZhang updated this revision to Diff 364461.Aug 5 2021, 7:30 AM

TiehuZhang edited the summary of this revision. (Show Details)

TiehuZhang updated this revision to Diff 364464.Aug 5 2021, 7:35 AM

In D107262#2922154, @dmgreen wrote:

In D107262#2921656, @TiehuZhang wrote:

In D107262#2919594, @dmgreen wrote:

This happens because out of the chain of insert; shuffle only the insert needs to be sunk?
It may be better to remove the UI->getParent() == TargetBB from if (UI->getParent() == TargetBB || isa<PHINode>(UI)) and update InsertPoint in the loop as we go. It won't need to clone the instruction, but it can update InsertPoint and continue.

Hi, dmgreen, thanks for your review! But I don't quite make sense of your comment. The condition UI->getParent() == TargetBB is used to filter out some instructions already in targetBB to avoid the meaningless sinking, why can it be removed?

shouldSinkOperands will return a chain of instructions, in this case starting at the mul it will return the shuffle and the insert, as those are the instruction it is profitable to sink. They are visited in reverse order in order to sink the last instruction first. The shuffle is already in the TargetBB, so doesn't need to be sunk (or cloned), but we still need to update the InsertPoint when sinking the insert, or else it will fail dominance checks.

It looks like there are some Arm tests failing with the current attempt to fix that. The instruction being sunk may have multiple uses, and moving before the first one we happen to find might not always work, it might not be the instruction that was originally returned by shouldSinkOperands.

Instead, I think it would be better to make sure we are updating InsertPoint as we go in the ToReplace loop. So we add instruction to ToReplace even if they are already in TargetDB, and in the ToReplace loop we check if the instruction is already in the correct BB and just update the InsertPoint if so.

@dmgreen, thanks very much! We can make use of condition UI->getParent() == TargetBB from if (UI->getParent() == TargetBB || isa<PHINode>(UI)) to update the insertPoint. Actually, the order of users obtained from users() may not follow instructions order in a basic block. So the previous patch was not correct.

llvm/test/Transforms/CodeGenPrepare/AArch64/sink-free-instructions-1.ll
26	Thanks for you comment. The case is reduced from a csmith testcase, so there are some strange instructions. I've made some changes to remove the `undef`.

TiehuZhang updated this revision to Diff 364468.Aug 5 2021, 7:52 AM

TiehuZhang updated this revision to Diff 364473.Aug 5 2021, 7:55 AM

Harbormaster completed remote builds in B118164: Diff 364473.Aug 5 2021, 8:23 AM

Thanks, looking good. But I do still worry about the order of instructions sunk.

I was trying it out, seeing if it would go wrong when we were sinking a lot of operands. I noticed that the add/sub sinking wasn't really working properly though! There is https://reviews.llvm.org/D107623 to improve that and getting the shuffles to sink.

With that in, can you add these two test to show partially sinking two values at the same time:

define <4 x i32> @sinkadd_partial(<8 x i16> %a1, <8 x i16> %a2, i8 %f) {
for.cond4.preheader.lr.ph:
  %cmp = icmp slt i8 %f, 0
  %s2 = shufflevector <8 x i16> %a2, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  %s1 = shufflevector <8 x i16> %a1, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  br i1 %cmp, label %for.cond4.preheader.us.preheader, label %for.cond4.preheader.preheader

for.cond4.preheader.us.preheader:                 ; preds = %for.cond4.preheader.lr.ph
  %e1 = sext <4 x i16> %s1 to <4 x i32>
  %e2 = sext <4 x i16> %s2 to <4 x i32>
  %0 = add <4 x i32> %e1, %e2
  ret <4 x i32> %0

for.cond4.preheader.preheader:                    ; preds = %for.cond4.preheader.lr.ph
  ret <4 x i32> zeroinitializer
}

define <4 x i32> @sinkadd_partial_rev(<8 x i16> %a1, <8 x i16> %a2, i8 %f) {
for.cond4.preheader.lr.ph:
  %cmp = icmp slt i8 %f, 0
  %s2 = shufflevector <8 x i16> %a2, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  %s1 = shufflevector <8 x i16> %a1, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  br i1 %cmp, label %for.cond4.preheader.us.preheader, label %for.cond4.preheader.preheader

for.cond4.preheader.us.preheader:                 ; preds = %for.cond4.preheader.lr.ph
  %e2 = sext <4 x i16> %s2 to <4 x i32>
  %e1 = sext <4 x i16> %s1 to <4 x i32>
  %0 = add <4 x i32> %e1, %e2
  ret <4 x i32> %0

for.cond4.preheader.preheader:                    ; preds = %for.cond4.preheader.lr.ph
  ret <4 x i32> zeroinitializer
}

The order of extends in the target block become important.

In D107262#2930517, @dmgreen wrote:

Thanks, looking good. But I do still worry about the order of instructions sunk.

With that in, can you add these two test to show partially sinking two values at the same time:

define <4 x i32> @sinkadd_partial(<8 x i16> %a1, <8 x i16> %a2, i8 %f) {
for.cond4.preheader.lr.ph:
  %cmp = icmp slt i8 %f, 0
  %s2 = shufflevector <8 x i16> %a2, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  %s1 = shufflevector <8 x i16> %a1, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  br i1 %cmp, label %for.cond4.preheader.us.preheader, label %for.cond4.preheader.preheader

for.cond4.preheader.us.preheader:                 ; preds = %for.cond4.preheader.lr.ph
  %e1 = sext <4 x i16> %s1 to <4 x i32>
  %e2 = sext <4 x i16> %s2 to <4 x i32>
  %0 = add <4 x i32> %e1, %e2
  ret <4 x i32> %0

for.cond4.preheader.preheader:                    ; preds = %for.cond4.preheader.lr.ph
  ret <4 x i32> zeroinitializer
}

define <4 x i32> @sinkadd_partial_rev(<8 x i16> %a1, <8 x i16> %a2, i8 %f) {
for.cond4.preheader.lr.ph:
  %cmp = icmp slt i8 %f, 0
  %s2 = shufflevector <8 x i16> %a2, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  %s1 = shufflevector <8 x i16> %a1, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  br i1 %cmp, label %for.cond4.preheader.us.preheader, label %for.cond4.preheader.preheader

for.cond4.preheader.us.preheader:                 ; preds = %for.cond4.preheader.lr.ph
  %e2 = sext <4 x i16> %s2 to <4 x i32>
  %e1 = sext <4 x i16> %s1 to <4 x i32>
  %0 = add <4 x i32> %e1, %e2
  ret <4 x i32> %0

for.cond4.preheader.preheader:                    ; preds = %for.cond4.preheader.lr.ph
  ret <4 x i32> zeroinitializer
}

The order of extends in the target block become important.

OK, I will, thanks!

TiehuZhang updated this revision to Diff 365144.Aug 9 2021, 4:00 AM

Harbormaster completed remote builds in B118645: Diff 365144.Aug 9 2021, 4:35 AM

dmgreen mentioned this in D107794: [AArch64ISelLowering] Avoid sinking mul's ops in some cases.Aug 10 2021, 1:06 AM

I've committed 013030a0b213a75e0403fcdb5a070d21831ee561. Do you want to rebase and fix those extra testcases here? Or do you think that's best left for a separate patch?

In D107262#2936743, @dmgreen wrote:

I've committed 013030a0b213a75e0403fcdb5a070d21831ee561. Do you want to rebase and fix those extra testcases here? Or do you think that's best left for a separate patch?

Oh, I'll rebase to the latest version and update my patch

TiehuZhang updated this revision to Diff 365907.Aug 11 2021, 9:46 PM

Harbormaster completed remote builds in B119188: Diff 365907.Aug 11 2021, 11:16 PM

Hi, @dmgreen, the previous implementation doesn't take the order of extends into account, so Instruction does not dominate all uses error will still appear in one of the testcases you mentioned. I have updated the patch to fix the failed case. Could you please check whether this modification is appropriate? Thanks very much.

Yeah, this sounds good to me. I might have used std::distance(TargetBB->begin(), UI) < std::distance(TargetBB->begin(), InsertPt) myself, as there will only even be a few sinking nodes and they may not be in the TargetDB, but your way with collecting the block sounds good too.

LGTM

llvm/lib/CodeGen/CodeGenPrepare.cpp
6954	I would either use uint32_t or uint64_t to be specific. I don't think there could be more than 2^32 instructions in a block.

This revision is now accepted and ready to land.Aug 12 2021, 9:17 AM

Closed by commit rG9cfa9b44a589: [CodeGenPrepare] The instruction to be sunk should be inserted before its user… (authored by TiehuZhang, committed by Peilin Guo <guopeilin1@huawei.com>). · Explain WhyAug 17 2021, 3:58 AM

This revision was automatically updated to reflect the committed changes.

guopeilin added a commit: rG9cfa9b44a589: [CodeGenPrepare] The instruction to be sunk should be inserted before its user….

In D107262#2941680, @dmgreen wrote:

Yeah, this sounds good to me. I might have used std::distance(TargetBB->begin(), UI) < std::distance(TargetBB->begin(), InsertPt) myself, as there will only even be a few sinking nodes and they may not be in the TargetDB, but your way with collecting the block sounds good too.

LGTM

Why not ‘comesBefore’ ?

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block
ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 363675

llvm/lib/CodeGen/CodeGenPrepare.cpp

llvm/test/Transforms/CodeGenPrepare/AArch64/sink-free-instructions-1.ll

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGenPrepare] The instruction to be sunk should be inserted before its user in a blockClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 363675

llvm/lib/CodeGen/CodeGenPrepare.cpp

llvm/test/Transforms/CodeGenPrepare/AArch64/sink-free-instructions-1.ll

[CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block
ClosedPublic