This is an archive of the discontinued LLVM Phabricator instance.

Differential D122913

[InstCombine] Simplify PHI node whose type and type of its cond differ
Changes PlannedPublic

Authored by hkmatsumoto on Apr 1 2022, 8:43 AM.

Download Raw Diff

Details

Reviewers

nikic

Summary

Currently LLVM optimizes the code below:

define i8 @foo(i8 %cond) {
entry:
  switch i8 %cond, label %default [
  i8 -1, label %sw.-1
  i8 0, label %sw.0
  i8 1, label %sw.1
  ]

default:
  unreachable

sw.-1:
  br label %merge

sw.0:
  br label %merge

sw.1:
  br label %merge

merge:
  %ret = phi i8 [ 1, %sw.1 ], [ 0, %sw.0 ], [ -1, %sw.-1 ]
  ret i8 %ret
}

to:

define i8 @foo(i8 %cond) {
  ret i8 %cond
}

But once we make @foo return larger types, say i32 it no longer
optimizes. This patch covers such cases.

Fixes https://github.com/llvm/llvm-project/issues/54561

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	320 ms	x64 debian > BOLT.X86::asm-dump.c
	340 ms	x64 debian > BOLT.X86::exceptions-args.test
	330 ms	x64 debian > BOLT.X86::inline-debug-info.test
	60 ms	x64 debian > BOLT.X86::plt-sec-8-byte.test
	90 ms	x64 debian > BOLT.X86::plt-sec.test
		View Full Test Results (12 Failed)

Event Timeline

hkmatsumoto created this revision.Apr 1 2022, 8:43 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 1 2022, 8:43 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

hkmatsumoto requested review of this revision.Apr 1 2022, 8:43 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 1 2022, 8:43 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

hkmatsumoto added inline comments.Apr 1 2022, 8:53 AM

llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp
1292	Since we now have to store integers of heterogeneous types (e.g. i8 1 vs i32 1), we have to align integer types to the biggest one when accessing to SmallDenseMap. APInt has an API for aligning integers types (sextOrSelf) but not for ConstantInt AFAIK, that's why the type had to be changed.

hkmatsumoto edited the summary of this revision. (Show Details)Apr 1 2022, 8:55 AM

Harbormaster completed remote builds in B157432: Diff 419773.Apr 1 2022, 9:27 AM

nikic added inline comments.Apr 5 2022, 8:28 AM

llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp
1277	All inputs of a phi will have the same size (which is the same as the phi result type width), so computing a max in a loop doesn't make sense to me.
1295	I don't like the fact that this privileges sext. The same optimization is also possible with zext (depending on values used) and in some cases both are possible -- in which case our policy is to prefer zext over sext. I think we should be adding zext and sext support at the same time. Also, we need a test with switch width wider than phi width rather than the other way around. I think you'll currently assert in that case.
llvm/test/Transforms/InstCombine/pr54561.ll
2	Please drop the sroa run here and only use the phi representation. If you want to do an end-to-end tests, add it to PhaseOrdering (e.g. llvm/test/Transforms/PhaseOrdering/simplifycfg-switch-lowering-vs-correlatedpropagation.ll is related).

Sorry, currently I don't have the bandwidth to push this forward due to the university assignments. If someone's interested in taking this over, you can do so.

Opened https://reviews.llvm.org/D143720

Herald added a subscriber: StephenFan. · View Herald TranscriptMar 7 2023, 12:36 AM

dtcxzyw added a subscriber: dtcxzyw.Jul 25 2023, 6:38 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombinePHI.cpp

25 lines

test/

Transforms/

InstCombine/

pr54561.ll

233 lines

Diff 419773

llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp

Show First 20 Lines • Show All 1,264 Lines • ▼ Show 20 Lines	static Value *simplifyUsingControlFlow(InstCombiner &Self, PHINode &PN,
const DominatorTree &DT) {		const DominatorTree &DT) {
// Simplify the following patterns:		// Simplify the following patterns:
// if (cond)		// if (cond)
// / \		// / \
// ... ...		// ... ...
// \ /		// \ /
// phi [true] [false]		// phi [true] [false]
// Make sure all inputs are constants.		// Make sure all inputs are constants.
if (!all_of(PN.operands(), [](Value *V) { return isa<ConstantInt>(V); }))
		unsigned BiggestSize = 0;
		for (Use &Op : PN.operands()) {
		if (const auto *CI = dyn_cast_or_null<ConstantInt>(Op)) {
		BiggestSize = std::max(BiggestSize, CI->getBitWidth());
		nikicUnsubmitted Not Done Reply Inline Actions All inputs of a phi will have the same size (which is the same as the phi result type width), so computing a max in a loop doesn't make sense to me. nikic: All inputs of a phi will have the same size (which is the same as the phi result type width)…
		} else {
return nullptr;		return nullptr;
		}
		}

BasicBlock *BB = PN.getParent();		BasicBlock *BB = PN.getParent();
// Do not bother with unreachable instructions.		// Do not bother with unreachable instructions.
if (!DT.isReachableFromEntry(BB))		if (!DT.isReachableFromEntry(BB))
return nullptr;		return nullptr;

// Determine which value the condition of the idom has for which successor.		// Determine which value the condition of the idom has for which successor.
LLVMContext &Context = PN.getContext();		LLVMContext &Context = PN.getContext();
auto *IDom = DT.getNode(BB)->getIDom()->getBlock();		auto *IDom = DT.getNode(BB)->getIDom()->getBlock();
Value *Cond;		Value *Cond;
SmallDenseMap<ConstantInt , BasicBlock , 8> SuccForValue;		SmallDenseMap<APInt, BasicBlock *, 8> SuccForValue;
		hkmatsumotoAuthorUnsubmitted Done Reply Inline Actions Since we now have to store integers of heterogeneous types (e.g. i8 1 vs i32 1), we have to align integer types to the biggest one when accessing to SmallDenseMap. APInt has an API for aligning integers types (sextOrSelf) but not for ConstantInt AFAIK, that's why the type had to be changed. hkmatsumoto: Since we now have to store integers of heterogeneous types (e.g. i8 1 vs i32 1), we have to…
SmallDenseMap<BasicBlock *, unsigned, 8> SuccCount;		SmallDenseMap<BasicBlock *, unsigned, 8> SuccCount;
auto AddSucc = [&](ConstantInt C, BasicBlock Succ) {		auto AddSucc = [&](ConstantInt C, BasicBlock Succ) {
SuccForValue[C] = Succ;		SuccForValue[C->getValue().sextOrSelf(BiggestSize)] = Succ;
		nikicUnsubmitted Not Done Reply Inline Actions I don't like the fact that this privileges sext. The same optimization is also possible with zext (depending on values used) and in some cases both are possible -- in which case our policy is to prefer zext over sext. I think we should be adding zext and sext support at the same time. Also, we need a test with switch width wider than phi width rather than the other way around. I think you'll currently assert in that case. nikic: I don't like the fact that this privileges sext. The same optimization is also possible with…
++SuccCount[Succ];		++SuccCount[Succ];
};		};
if (auto *BI = dyn_cast<BranchInst>(IDom->getTerminator())) {		if (auto *BI = dyn_cast<BranchInst>(IDom->getTerminator())) {
if (BI->isUnconditional())		if (BI->isUnconditional())
return nullptr;		return nullptr;

Cond = BI->getCondition();		Cond = BI->getCondition();
AddSucc(ConstantInt::getTrue(Context), BI->getSuccessor(0));		AddSucc(ConstantInt::getTrue(Context), BI->getSuccessor(0));
AddSucc(ConstantInt::getFalse(Context), BI->getSuccessor(1));		AddSucc(ConstantInt::getFalse(Context), BI->getSuccessor(1));
} else if (auto *SI = dyn_cast<SwitchInst>(IDom->getTerminator())) {		} else if (auto *SI = dyn_cast<SwitchInst>(IDom->getTerminator())) {
Cond = SI->getCondition();		Cond = SI->getCondition();
for (auto Case : SI->cases())		for (auto Case : SI->cases())
AddSucc(Case.getCaseValue(), Case.getCaseSuccessor());		AddSucc(Case.getCaseValue(), Case.getCaseSuccessor());
} else {		} else {
return nullptr;		return nullptr;
}		}

if (Cond->getType() != PN.getType())
return nullptr;

// Check that edges outgoing from the idom's terminators dominate respective		// Check that edges outgoing from the idom's terminators dominate respective
// inputs of the Phi.		// inputs of the Phi.
Optional<bool> Invert;		Optional<bool> Invert;
for (auto Pair : zip(PN.incoming_values(), PN.blocks())) {		for (auto Pair : zip(PN.incoming_values(), PN.blocks())) {
auto *Input = cast<ConstantInt>(std::get<0>(Pair));		auto *Input = cast<ConstantInt>(std::get<0>(Pair));
BasicBlock *Pred = std::get<1>(Pair);		BasicBlock *Pred = std::get<1>(Pair);
auto IsCorrectInput = [&](ConstantInt *Input) {		auto IsCorrectInput = [&](ConstantInt *Input) {
// The input needs to be dominated by the corresponding edge of the idom.		// The input needs to be dominated by the corresponding edge of the idom.
// This edge cannot be a multi-edge, as that would imply that multiple		// This edge cannot be a multi-edge, as that would imply that multiple
// different condition values follow the same edge.		// different condition values follow the same edge.
auto It = SuccForValue.find(Input);		auto It = SuccForValue.find(Input->getValue().sextOrSelf(BiggestSize));
return It != SuccForValue.end() && SuccCount[It->second] == 1 &&		return It != SuccForValue.end() && SuccCount[It->second] == 1 &&
DT.dominates(BasicBlockEdge(IDom, It->second),		DT.dominates(BasicBlockEdge(IDom, It->second),
BasicBlockEdge(Pred, BB));		BasicBlockEdge(Pred, BB));
};		};

// Depending on the constant, the condition may need to be inverted.		// Depending on the constant, the condition may need to be inverted.
bool NeedsInvert;		bool NeedsInvert;
if (IsCorrectInput(Input))		if (IsCorrectInput(Input))
NeedsInvert = false;		NeedsInvert = false;
else if (IsCorrectInput(cast<ConstantInt>(ConstantExpr::getNot(Input))))		else if (IsCorrectInput(cast<ConstantInt>(ConstantExpr::getNot(Input))))
NeedsInvert = true;		NeedsInvert = true;
else		else
return nullptr;		return nullptr;

// Make sure the inversion requirement is always the same.		// Make sure the inversion requirement is always the same.
if (Invert && *Invert != NeedsInvert)		if (Invert && *Invert != NeedsInvert)
return nullptr;		return nullptr;

Invert = NeedsInvert;		Invert = NeedsInvert;
}		}

		unsigned PNTypeSize = PN.getType()->getPrimitiveSizeInBits(),
		CondTypeSize = Cond->getType()->getPrimitiveSizeInBits();
		if (PNTypeSize > CondTypeSize)
		Cond = Self.Builder.CreateSExt(Cond, PN.getType());

if (!*Invert)		if (!*Invert)
return Cond;		return Cond;

// This Phi is actually opposite to branching condition of IDom. We invert		// This Phi is actually opposite to branching condition of IDom. We invert
// the condition that will potentially open up some opportunities for		// the condition that will potentially open up some opportunities for
// sinking.		// sinking.
auto InsertPt = BB->getFirstInsertionPt();		auto InsertPt = BB->getFirstInsertionPt();
if (InsertPt != BB->end()) {		if (InsertPt != BB->end()) {
▲ Show 20 Lines • Show All 195 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/pr54561.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -S < %s -passes=sroa,instcombine \| FileCheck %s
				nikicUnsubmitted Not Done Reply Inline Actions Please drop the sroa run here and only use the phi representation. If you want to do an end-to-end tests, add it to PhaseOrdering (e.g. llvm/test/Transforms/PhaseOrdering/simplifycfg-switch-lowering-vs-correlatedpropagation.ll is related). nikic: Please drop the sroa run here and only use the phi representation. If you want to do an end-to…

				define i32 @switch_to_sext(i8 %cond) {
				; CHECK-LABEL: @switch_to_sext(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: switch i8 [[COND:%.]], label [[DEFAULT:%.]] [
				; CHECK-NEXT: i8 -1, label [[SW__1:%.*]]
				; CHECK-NEXT: i8 0, label [[SW_0:%.*]]
				; CHECK-NEXT: i8 1, label [[SW_1:%.*]]
				; CHECK-NEXT: ]
				; CHECK: default:
				; CHECK-NEXT: unreachable
				; CHECK: sw.-1:
				; CHECK-NEXT: br label [[MERGE:%.*]]
				; CHECK: sw.0:
				; CHECK-NEXT: br label [[MERGE]]
				; CHECK: sw.1:
				; CHECK-NEXT: br label [[MERGE]]
				; CHECK: merge:
				; CHECK-NEXT: [[TMP0:%.*]] = sext i8 [[COND]] to i32
				; CHECK-NEXT: ret i32 [[TMP0]]
				;
				entry:
				switch i8 %cond, label %default [
				i8 -1, label %sw.-1
				i8 0, label %sw.0
				i8 1, label %sw.1
				]

				default:
				unreachable

				sw.-1:
				br label %merge

				sw.0:
				br label %merge

				sw.1:
				br label %merge

				merge:
				%ret = phi i32 [ 1, %sw.1 ], [ 0, %sw.0 ], [ -1, %sw.-1 ]
				ret i32 %ret
				}

				; Copied from <https://github.com/llvm/llvm-project/issues/54561#issue-1181113814>.
				define i32 @switch_to_sext_plane(i8 noundef %0) {
				; CHECK-LABEL: @switch_to_sext_plane(
				; CHECK-NEXT: start:
				; CHECK-NEXT: switch i8 [[TMP0:%.]], label [[BB2:%.]] [
				; CHECK-NEXT: i8 -1, label [[BB3:%.*]]
				; CHECK-NEXT: i8 0, label [[BB4:%.*]]
				; CHECK-NEXT: i8 1, label [[BB1:%.*]]
				; CHECK-NEXT: ]
				; CHECK: bb2:
				; CHECK-NEXT: unreachable
				; CHECK: bb3:
				; CHECK-NEXT: br label [[BB5:%.*]]
				; CHECK: bb4:
				; CHECK-NEXT: br label [[BB5]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB5]]
				; CHECK: bb5:
				; CHECK-NEXT: [[TMP1:%.*]] = sext i8 [[TMP0]] to i32
				; CHECK-NEXT: ret i32 [[TMP1]]
				;
				start:
				%1 = alloca i32, align 4
				%order = alloca i8, align 1
				store i8 %0, i8* %order, align 1
				%_2 = load i8, i8* %order, align 1, !range !2, !noundef !3
				switch i8 %_2, label %bb2 [
				i8 -1, label %bb3
				i8 0, label %bb4
				i8 1, label %bb1
				]

				bb2: ; preds = %start
				unreachable

				bb3: ; preds = %start
				store i32 -1, i32* %1, align 4
				br label %bb5

				bb4: ; preds = %start
				store i32 0, i32* %1, align 4
				br label %bb5

				bb1: ; preds = %start
				store i32 1, i32* %1, align 4
				br label %bb5

				bb5: ; preds = %bb3, %bb4, %bb1
				%2 = load i32, i32* %1, align 4
				ret i32 %2
				}

				!2 = !{i8 -1, i8 2}
				!3 = !{}

				; SROA'd variant of switch_to_sext_plane.
				define i32 @switch_to_sext_plane_sroad(i8 noundef %0) {
				; CHECK-LABEL: @switch_to_sext_plane_sroad(
				; CHECK-NEXT: switch i8 [[TMP0:%.]], label [[BB2:%.]] [
				; CHECK-NEXT: i8 -1, label [[BB3:%.*]]
				; CHECK-NEXT: i8 0, label [[BB4:%.*]]
				; CHECK-NEXT: i8 1, label [[BB1:%.*]]
				; CHECK-NEXT: ]
				; CHECK: bb2:
				; CHECK-NEXT: unreachable
				; CHECK: bb3:
				; CHECK-NEXT: br label [[BB5:%.*]]
				; CHECK: bb4:
				; CHECK-NEXT: br label [[BB5]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB5]]
				; CHECK: bb5:
				; CHECK-NEXT: [[TMP2:%.*]] = sext i8 [[TMP0]] to i32
				; CHECK-NEXT: ret i32 [[TMP2]]
				;
				switch i8 %0, label %bb2 [
				i8 -1, label %bb3
				i8 0, label %bb4
				i8 1, label %bb1
				]

				bb2: ; preds = %start
				unreachable

				bb3: ; preds = %start
				br label %bb5

				bb4: ; preds = %start
				br label %bb5

				bb1: ; preds = %start
				br label %bb5

				bb5: ; preds = %bb1, %bb4, %bb3
				%.0 = phi i32 [ 1, %bb1 ], [ 0, %bb4 ], [ -1, %bb3 ]
				ret i32 %.0
				}

				define i32 @switch_to_sext_wrong_value(i8 %cond) {
				; CHECK-LABEL: @switch_to_sext_wrong_value(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: switch i8 [[COND:%.]], label [[DEFAULT:%.]] [
				; CHECK-NEXT: i8 -1, label [[SW__1:%.*]]
				; CHECK-NEXT: i8 0, label [[SW_0:%.*]]
				; CHECK-NEXT: i8 1, label [[SW_1:%.*]]
				; CHECK-NEXT: ]
				; CHECK: default:
				; CHECK-NEXT: unreachable
				; CHECK: sw.-1:
				; CHECK-NEXT: br label [[MERGE:%.*]]
				; CHECK: sw.0:
				; CHECK-NEXT: br label [[MERGE]]
				; CHECK: sw.1:
				; CHECK-NEXT: br label [[MERGE]]
				; CHECK: merge:
				; CHECK-NEXT: [[RET:%.*]] = phi i32 [ 1, [[SW_1]] ], [ 0, [[SW_0]] ], [ 42, [[SW__1]] ]
				; CHECK-NEXT: ret i32 [[RET]]
				;
				entry:
				switch i8 %cond, label %default [
				i8 -1, label %sw.-1
				i8 0, label %sw.0
				i8 1, label %sw.1
				]

				default:
				unreachable

				sw.-1:
				br label %merge

				sw.0:
				br label %merge

				sw.1:
				br label %merge

				merge:
				%ret = phi i32 [ 1, %sw.1 ], [ 0, %sw.0 ], [ 42, %sw.-1 ]
				ret i32 %ret
				}

				define i32 @switch_to_sext_inverted(i8 %cond) {
				; CHECK-LABEL: @switch_to_sext_inverted(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: switch i8 [[COND:%.]], label [[DEFAULT:%.]] [
				; CHECK-NEXT: i8 -1, label [[SW__1:%.*]]
				; CHECK-NEXT: i8 0, label [[SW_0:%.*]]
				; CHECK-NEXT: i8 1, label [[SW_1:%.*]]
				; CHECK-NEXT: ]
				; CHECK: default:
				; CHECK-NEXT: unreachable
				; CHECK: sw.-1:
				; CHECK-NEXT: br label [[MERGE:%.*]]
				; CHECK: sw.0:
				; CHECK-NEXT: br label [[MERGE]]
				; CHECK: sw.1:
				; CHECK-NEXT: br label [[MERGE]]
				; CHECK: merge:
				; CHECK-NEXT: [[TMP0:%.*]] = xor i8 [[COND]], -1
				; CHECK-NEXT: [[TMP1:%.*]] = sext i8 [[TMP0]] to i32
				; CHECK-NEXT: ret i32 [[TMP1]]
				;
				entry:
				switch i8 %cond, label %default [
				i8 -1, label %sw.-1
				i8 0, label %sw.0
				i8 1, label %sw.1
				]

				default:
				unreachable

				sw.-1:
				br label %merge

				sw.0:
				br label %merge

				sw.1:
				br label %merge

				merge:
				%ret = phi i32 [ -2, %sw.1 ], [ -1, %sw.0 ], [ 0, %sw.-1 ]
				ret i32 %ret
				}