This is an archive of the discontinued LLVM Phabricator instance.

[CGP] widen switch condition and case constants to target's register width
ClosedPublic

Authored by spatel on Oct 7 2015, 2:08 PM.

Download Raw Diff

Details

Reviewers

reames
mehdi_amini
hfinkel

Commits

rG0ed9aeaa5f62: [CGP] widen switch condition and case constants to target's register width (2nd…
rGb90a078de988: [CGP] widen switch condition and case constants to target's register width
rL251857: [CGP] widen switch condition and case constants to target's register width…
rL251849: [CGP] widen switch condition and case constants to target's register width

Summary

This is a follow-up from the discussion in D12965. The block-at-a-time limitation of SelectionDAG also came up in D13297.

Without the InstCombine change from D12965, I don't expect this patch to make any difference in the real world because InstCombine will already be widening cases like this in visitSwitchInst(). But we need to have this CGP safety harness in place before proceeding with any shrinkage in D12965, so we won't generate extra extends for compares.

There are regression tests for CGP in both test/Transforms/CodeGenPrepare and test/CodeGen/* . I opted for IR regression tests in the patch because that seems like a clearer way to test the transform, but PowerPC CodeGen for the i16 widening test is shown below. x86 will need more work to solve: https://llvm.org/bugs/show_bug.cgi?id=22473

Before:

BB#0: mr 4, 3 extsh. 3, 4 ble 0, .LBB0_5
BB#1: cmpwi 3, 99 bgt 0, .LBB0_9
BB#2: rlwinm 4, 4, 0, 16, 31 <--- 32-bit mask/extend li 3, 0 cmplwi 4, 1 beqlr 0
BB#3: cmplwi 4, 10 bne 0, .LBB0_12
BB#4: li 3, 1 blr .LBB0_5: rlwinm 3, 4, 0, 16, 31 <--- 32-bit mask/extend cmplwi 3, 65436 beq 0, .LBB0_13
BB#6: cmplwi 3, 65526 beq 0, .LBB0_15
BB#7: cmplwi 3, 65535 bne 0, .LBB0_12
BB#8: li 3, 4 blr .LBB0_9: rlwinm 3, 4, 0, 16, 31 <--- 32-bit mask/extend cmplwi 3, 100 beq 0, .LBB0_14 ...

After:

1. BB#0: rlwinm 4, 3, 0, 16, 31 <--- mask/extend to 32-bit and then use that for comparisons cmpwi 4, 999 ble 0, .LBB0_5
2. BB#1: lis 3, 0 ori 3, 3, 65525 cmpw 4, 3 bgt 0, .LBB0_9
3. BB#2: cmplwi 4, 1000 beq 0, .LBB0_14
BB#3: cmplwi 4, 65436 bne 0, .LBB0_13
1. BB#4: li 3, 6 blr .LBB0_5: li 3, 0 cmplwi 4, 1 beqlr 0
2. BB#6: cmplwi 4, 10 beq 0, .LBB0_12
3. BB#7: cmplwi 4, 100 bne 0, .LBB0_13
4. BB#8: li 3, 2 blr .LBB0_9: cmplwi 4, 65526 beq 0, .LBB0_15
5. BB#10: cmplwi 4, 65535 bne 0, .LBB0_13 ...

Diff Detail

Event Timeline

spatel updated this revision to Diff 36790.Oct 7 2015, 2:08 PM

spatel retitled this revision from to [CGP] widen switch condition and case constants to target's register width.

spatel updated this object.

spatel added reviewers: hfinkel, reames, mehdi_amini.

spatel added a subscriber: llvm-commits.

Phab formatting was thrown off by '#' on the BB's. Removed below:

Before:

 BB#0:
  mr 4, 3
  extsh. 3, 4
  ble 0, .LBB0_5
 BB#1: 
  cmpwi	 3, 99
  bgt	 0, .LBB0_9
 BB#2:            
  rlwinm 4, 4, 0, 16, 31      <--- 32-bit mask/extend
  li 3, 0
  cmplwi	 4, 1
  beqlr 0
 BB#3:            
  cmplwi	 4, 10
  bne	 0, .LBB0_12
 BB#4:                      
  li 3, 1
  blr
.LBB0_5:                             
  rlwinm 3, 4, 0, 16, 31         <--- 32-bit mask/extend
  cmplwi	 3, 65436
  beq	 0, .LBB0_13
 BB#6:                            
  cmplwi	 3, 65526
  beq	 0, .LBB0_15
 BB#7:                       
  cmplwi	 3, 65535
  bne	 0, .LBB0_12
 BB#8:                       
  li 3, 4
  blr
.LBB0_9:                       
  rlwinm 3, 4, 0, 16, 31      <--- 32-bit mask/extend
  cmplwi	 3, 100
  beq	 0, .LBB0_14
...

After:

 BB#0:        
  rlwinm 4, 3, 0, 16, 31   <--- mask/extend to 32-bit and then use that for comparisons
  cmpwi	 4, 999
  ble 0, .LBB0_5
 BB#1:          
  lis 3, 0
  ori 3, 3, 65525
  cmpw	 4, 3
  bgt	 0, .LBB0_9
 BB#2:         
  cmplwi	 4, 1000
  beq	 0, .LBB0_14
 BB#3:    
  cmplwi	 4, 65436
  bne	 0, .LBB0_13
 BB#4:       
  li 3, 6
  blr
.LBB0_5:   
  li 3, 0
  cmplwi	 4, 1
  beqlr 0
 BB#6: 
  cmplwi	 4, 10
  beq	 0, .LBB0_12
 BB#7:             
  cmplwi	 4, 100
  bne	 0, .LBB0_13
 BB#8:             
  li 3, 2
  blr
.LBB0_9:       
  cmplwi	 4, 65526
  beq	 0, .LBB0_15
 BB#10:      
  cmplwi	 4, 65535
  bne	 0, .LBB0_13
...

Ping.

This looks entirely reasonable to me, but I'm not an expert in this area. The code LGTM, but I'm not sure about the implications of the change. Can someone else comment on that?

lib/CodeGen/CodeGenPrepare.cpp
4015	auto

Ping * 2.

spatel added a child revision: D12965: [InstCombine] shrink switch conditions better (PR24766).Oct 22 2015, 9:35 AM

In D13532#262174, @spatel wrote:

After:

BB#0:        
 rlwinm 4, 3, 0, 16, 31   <--- mask/extend to 32-bit and then use that for comparisons
 cmpwi	 4, 999

...

Someone amusingly, even though you can't tell from this test case, this is somewhat suboptimal. Consider this:

$ cat /tmp/foo.c 
short foo(short a) { return a; }

$ clang -O3 -S -emit-llvm -o - /tmp/foo.c 
target datalayout = "E-m:e-i64:64-n32:64"
target triple = "powerpc64-unknown-linux-gnu"

define signext i16 @foo(i16 signext %a) #0 {
entry:
  ret i16 %a
}

and, notice here that the argument has the 'signext' attribute. This argument will be carried in an i32 register, but sign extended from i16. Thus, at least in theory, if we sign extended in this transformation, instead of zero-extended, we'd not actually need any extension instruction at all.

Can you think of any way to have it pick sext instead of zext when we know the input is really sign extended anyway? This might only apply to function arguments?

lib/CodeGen/CodeGenPrepare.cpp
4016	We don't need the 'OrBitCast' here, do we?

spatel marked 2 inline comments as done.Nov 2 2015, 11:28 AM

spatel added inline comments.

lib/CodeGen/CodeGenPrepare.cpp
4016	Correct - the check above guarantees that RegWidth is greater than the original type.

In D13532#276839, @hfinkel wrote:

Can you think of any way to have it pick sext instead of zext when we know the input is really sign extended anyway? This might only apply to function arguments?

Nice catch. I agree that we should sext instead of zext in that case. If it's only applicable to function args, it seems straightforward to add that check. New patch coming soon.

Patch updated:

Add check for a sign-extended function argument - in that case sext everything instead of zext.
Minimized tests - we don't need so many switch cases to show the differences in IR (the larger examples were better for visualizing the asm differences).
Added test case for #1.

LGTM, thanks!

This revision is now accepted and ready to land.Nov 2 2015, 11:51 AM

Closed by commit rL251849: [CGP] widen switch condition and case constants to target's register width (authored by spatel). · Explain WhyNov 2 2015, 2:48 PM

This revision was automatically updated to reflect the committed changes.

spatel mentioned this in D12965: [InstCombine] shrink switch conditions better (PR24766).May 13 2016, 11:32 AM

Revision Contents

Path

Size

lib/

CodeGen/

CodeGenPrepare.cpp

35 lines

test/

Transforms/

CodeGenPrepare/

widen_switch.ll

125 lines

Diff 36790

lib/CodeGen/CodeGenPrepare.cpp

Show First 20 Lines • Show All 168 Lines • ▼ Show 20 Lines	private:
bool optimizeMemoryInst(Instruction I, Value Addr,		bool optimizeMemoryInst(Instruction I, Value Addr,
Type *AccessTy, unsigned AS);		Type *AccessTy, unsigned AS);
bool optimizeInlineAsmInst(CallInst *CS);		bool optimizeInlineAsmInst(CallInst *CS);
bool optimizeCallInst(CallInst *CI, bool& ModifiedDT);		bool optimizeCallInst(CallInst *CI, bool& ModifiedDT);
bool moveExtToFormExtLoad(Instruction *&I);		bool moveExtToFormExtLoad(Instruction *&I);
bool optimizeExtUses(Instruction *I);		bool optimizeExtUses(Instruction *I);
bool optimizeSelectInst(SelectInst *SI);		bool optimizeSelectInst(SelectInst *SI);
bool optimizeShuffleVectorInst(ShuffleVectorInst *SI);		bool optimizeShuffleVectorInst(ShuffleVectorInst *SI);
		bool optimizeSwitchInst(SwitchInst *CI);
bool optimizeExtractElementInst(Instruction *Inst);		bool optimizeExtractElementInst(Instruction *Inst);
bool dupRetToEnableTailCallOpts(BasicBlock *BB);		bool dupRetToEnableTailCallOpts(BasicBlock *BB);
bool placeDbgValues(Function &F);		bool placeDbgValues(Function &F);
bool sinkAndCmp(Function &F);		bool sinkAndCmp(Function &F);
bool extLdPromotion(TypePromotionTransaction &TPT, LoadInst *&LI,		bool extLdPromotion(TypePromotionTransaction &TPT, LoadInst *&LI,
Instruction *&Inst,		Instruction *&Inst,
const SmallVectorImpl<Instruction *> &Exts,		const SmallVectorImpl<Instruction *> &Exts,
unsigned CreatedInstCost);		unsigned CreatedInstCost);
▲ Show 20 Lines • Show All 3,802 Lines • ▼ Show 20 Lines	bool CodeGenPrepare::optimizeShuffleVectorInst(ShuffleVectorInst *SVI) {
if (SVI->use_empty()) {		if (SVI->use_empty()) {
SVI->eraseFromParent();		SVI->eraseFromParent();
MadeChange = true;		MadeChange = true;
}		}

return MadeChange;		return MadeChange;
}		}

		bool CodeGenPrepare::optimizeSwitchInst(SwitchInst *SI) {
		if (!TLI \|\| !DL)
		return false;

		Value *Cond = SI->getCondition();
		Type *OldType = Cond->getType();
		LLVMContext &Context = Cond->getContext();
		MVT RegType = TLI->getRegisterType(Context, TLI->getValueType(*DL, OldType));
		unsigned RegWidth = RegType.getSizeInBits();

		if (RegWidth <= cast<IntegerType>(OldType)->getBitWidth())
		return false;

		// If the register width is greater than the type width, expand the condition
		// of the switch instruction and each case constant to the width of the
		// register. By widening the type of the switch condition, subsequent
		// comparisons (for case comparisons) will not need to be extended to the
		// preferred register width, so we will potentially eliminate N-1 extends,
		// where N is the number of cases in the switch.
		IntegerType *NewType = Type::getIntNTy(Context, RegWidth);
		reamesUnsubmitted Done Reply Inline Actions auto reames: auto
		CastInst *Zext = CastInst::CreateZExtOrBitCast(Cond, NewType);
		hfinkelUnsubmitted Done Reply Inline Actions We don't need the 'OrBitCast' here, do we? hfinkel: We don't need the 'OrBitCast' here, do we?
		spatelAuthorUnsubmitted Not Done Reply Inline Actions Correct - the check above guarantees that RegWidth is greater than the original type. spatel: Correct - the check above guarantees that RegWidth is greater than the original type.
		Zext->insertBefore(SI);
		SI->setCondition(Zext);
		for (SwitchInst::CaseIt Case : SI->cases()) {
		APInt WiderCaseConst = Case.getCaseValue()->getValue().zext(RegWidth);
		Case.setValue(ConstantInt::get(Context, WiderCaseConst));
		}

		return true;
		}

namespace {		namespace {
/// \brief Helper class to promote a scalar operation to a vector one.		/// \brief Helper class to promote a scalar operation to a vector one.
/// This class is used to move downward extractelement transition.		/// This class is used to move downward extractelement transition.
/// E.g.,		/// E.g.,
/// a = vector_op <2 x i32>		/// a = vector_op <2 x i32>
/// b = extractelement <2 x i32> a, i32 0		/// b = extractelement <2 x i32> a, i32 0
/// c = scalar_op b		/// c = scalar_op b
/// store c		/// store c
▲ Show 20 Lines • Show All 456 Lines • ▼ Show 20 Lines	if (CallInst *CI = dyn_cast<CallInst>(I))
return optimizeCallInst(CI, ModifiedDT);		return optimizeCallInst(CI, ModifiedDT);

if (SelectInst *SI = dyn_cast<SelectInst>(I))		if (SelectInst *SI = dyn_cast<SelectInst>(I))
return optimizeSelectInst(SI);		return optimizeSelectInst(SI);

if (ShuffleVectorInst *SVI = dyn_cast<ShuffleVectorInst>(I))		if (ShuffleVectorInst *SVI = dyn_cast<ShuffleVectorInst>(I))
return optimizeShuffleVectorInst(SVI);		return optimizeShuffleVectorInst(SVI);

		if (auto *Switch = dyn_cast<SwitchInst>(I))
		return optimizeSwitchInst(Switch);

if (isa<ExtractElementInst>(I))		if (isa<ExtractElementInst>(I))
return optimizeExtractElementInst(I);		return optimizeExtractElementInst(I);

return false;		return false;
}		}

// In this pass we look for GEP and cast instructions that are used		// In this pass we look for GEP and cast instructions that are used
// across basic blocks and rewrite them to improve basic-block-at-a-time		// across basic blocks and rewrite them to improve basic-block-at-a-time
▲ Show 20 Lines • Show All 353 Lines • Show Last 20 Lines

test/Transforms/CodeGenPrepare/widen_switch.ll

				;; PowerPC is arbitralily chosen as a 32/64-bit RISC representative to show the transform in both tests.
				;; x86 is chosen to show that there is no transform in the 16-bit test if 16-bit registers are available.

				; RUN: opt < %s -codegenprepare -S -mtriple=powerpc64-unknown-unknown \| FileCheck %s --check-prefix=PPC --check-prefix=ALL
				; RUN: opt < %s -codegenprepare -S -mtriple=x86_64-unknown-unknown \| FileCheck %s --check-prefix=X86 --check-prefix=ALL


				define i32 @widen_switch_i16(i32 %a) {
				entry:
				%trunc = trunc i32 %a to i16
				switch i16 %trunc, label %sw.default [
				i16 1, label %sw.bb0
				i16 10, label %sw.bb1
				i16 100, label %sw.bb2
				i16 1000, label %sw.bb3
				i16 -1, label %sw.bb4
				i16 -10, label %sw.bb5
				i16 -100, label %sw.bb6
				]

				sw.bb0:
				br label %return

				sw.bb1:
				br label %return

				sw.bb2:
				br label %return

				sw.bb3:
				br label %return

				sw.bb4:
				br label %return

				sw.bb5:
				br label %return

				sw.bb6:
				br label %return

				sw.default:
				br label %return

				return:
				%retval = phi i32 [ -1, %sw.default ], [ 0, %sw.bb0 ], [ 1, %sw.bb1 ], [ 2, %sw.bb2 ], [ 3, %sw.bb3 ], [ 4, %sw.bb4 ], [ 5, %sw.bb5 ], [ 6, %sw.bb6 ]
				ret i32 %retval

				; ALL-LABEL: @widen_switch_i16(
				; PPC: %0 = zext i16 %trunc to i32
				; PPC-NEXT: switch i32 %0, label %sw.default [
				; PPC-NEXT: i32 1, label %return
				; PPC-NEXT: i32 10, label %sw.bb1
				; PPC-NEXT: i32 100, label %sw.bb2
				; PPC-NEXT: i32 1000, label %sw.bb3
				; PPC-NEXT: i32 65535, label %sw.bb4
				; PPC-NEXT: i32 65526, label %sw.bb5
				; PPC-NEXT: i32 65436, label %sw.bb6
				;
				; X86: %trunc = trunc i32 %a to i16
				; X86-NEXT: switch i16 %trunc, label %sw.default [
				; X86-NEXT: i16 1, label %return
				; X86-NEXT: i16 10, label %sw.bb1
				; X86-NEXT: i16 100, label %sw.bb2
				; X86-NEXT: i16 1000, label %sw.bb3
				; X86-NEXT: i16 -1, label %sw.bb4
				; X86-NEXT: i16 -10, label %sw.bb5
				; X86-NEXT: i16 -100, label %sw.bb6
				}

				define i32 @widen_switch_i17(i32 %a) {
				entry:
				%trunc = trunc i32 %a to i17
				switch i17 %trunc, label %sw.default [
				i17 10, label %sw.bb0
				i17 100, label %sw.bb1
				i17 1000, label %sw.bb2
				i17 10000, label %sw.bb3
				i17 -1, label %sw.bb4
				i17 -2, label %sw.bb5
				i17 -3, label %sw.bb6
				]

				sw.bb0:
				br label %return

				sw.bb1:
				br label %return

				sw.bb2:
				br label %return

				sw.bb3:
				br label %return

				sw.bb4:
				br label %return

				sw.bb5:
				br label %return

				sw.bb6:
				br label %return

				sw.default:
				br label %return

				return:
				%retval = phi i32 [ -1, %sw.default ], [ 0, %sw.bb0 ], [ 1, %sw.bb1 ], [ 2, %sw.bb2 ], [ 3, %sw.bb3 ], [ 4, %sw.bb4 ], [ 5, %sw.bb5 ], [ 6, %sw.bb6 ]
				ret i32 %retval

				; ALL-LABEL: @widen_switch_i17(
				; ALL: %0 = zext i17 %trunc to i32
				; ALL-NEXT: switch i32 %0, label %sw.default [
				; ALL-NEXT: i32 10, label %return
				; ALL-NEXT: i32 100, label %sw.bb1
				; ALL-NEXT: i32 1000, label %sw.bb2
				; ALL-NEXT: i32 10000, label %sw.bb3
				; ALL-NEXT: i32 131071, label %sw.bb4
				; ALL-NEXT: i32 131070, label %sw.bb5
				; ALL-NEXT: i32 131069, label %sw.bb6


				}

This is an archive of the discontinued LLVM Phabricator instance.

[CGP] widen switch condition and case constants to target's register widthClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 36790

lib/CodeGen/CodeGenPrepare.cpp

test/Transforms/CodeGenPrepare/widen_switch.ll

[CGP] widen switch condition and case constants to target's register width
ClosedPublic