This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
1
LegalizeIntegerTypes.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
shift_minsize.ll
-
ARM/
-
shift_minsize.ll
-
X86/
-
shift_minsize.ll

Differential D57386

[SelectionDAG] Codesize: don't expand SHIFT to SHIFT_PARTS
ClosedPublic

Authored by SjoerdMeijer on Jan 29 2019, 7:27 AM.

Download Raw Diff

Details

Reviewers

samparker
efriedma
craig.topper
RKSimon
t.p.northover

Commits

rGf7cc34cae890: [SelectionDAG] Codesize: don't expand SHIFT to SHIFT_PARTS
rL352736: [SelectionDAG] Codesize: don't expand SHIFT to SHIFT_PARTS

Summary

And instead just generate a libcall. My motivating example on ARM was a simple:

  
shl i64 %A, %B

for which the code bloat is quite significant. For other targets that also
accept __int128/i128 such as AArch64 and X86, it also seems beneficial for these
cases to generate a libcall when optimising for minsize. On these 64-bit targets,
the 64-bits shifts are of course unaffected because the SHIFT/SHIFT_PARTS
lowering operation action is not set to custom/expand.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Jan 29 2019, 7:27 AM

Herald added subscribers: kristof.beyls, javed.absar. · View Herald TranscriptJan 29 2019, 7:27 AM

efriedma added inline comments.Jan 29 2019, 12:46 PM

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2775	The optForMinSize check should probably be in target-specific code; yes, the call is smaller on all the popular targets I can think of, but that's a function of the specific opcodes available, not a general rule.

Thanks for reviewing!

The optForMinSize check should probably be in target-specific code

Agreed. I have created TLI.expandShift() to allow target-specific decision making.

lebedev.ri added a subscriber: lebedev.ri.Jan 30 2019, 5:19 AM

lebedev.ri added inline comments.

include/llvm/CodeGen/TargetLowering.h
648 ↗	(On Diff #184278)	It probably should be `shouldExpandShift` or something? Just `expandShift` reads as "call this function to expand the shift operation"

It probably should be shouldExpandShift or something?

Yep, thanks, done.

LGTM

This revision is now accepted and ready to land.Jan 30 2019, 11:28 AM

Closed by commit rL352736: [SelectionDAG] Codesize: don't expand SHIFT to SHIFT_PARTS (authored by SjoerdMeijer). · Explain WhyJan 31 2019, 12:09 AM

This revision was automatically updated to reflect the committed changes.

Hi, we recently found this revision breaks Linux kernel (https://bugs.chromium.org/p/chromium/issues/detail?id=938985). Please advise us how to solve it. Thanks!

Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2019, 10:57 AM

We generally expect that code built using clang will link against compiler-rt or libgcc, even when targeting a freestanding environment. We aren't going to restrict that to only use the subset of compiler-rt functions Linux 4.4 built with some other compiler would use.

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

LegalizeIntegerTypes.cpp

10 lines

test/

CodeGen/

AArch64/

shift_minsize.ll

122 lines

ARM/

shift_minsize.ll

32 lines

X86/

shift_minsize.ll

134 lines

Diff 184085

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

Show First 20 Lines • Show All 2,758 Lines • ▼ Show 20 Lines	void DAGTypeLegalizer::ExpandIntRes_Shift(SDNode *N,
} else if (N->getOpcode() == ISD::SRL) {		} else if (N->getOpcode() == ISD::SRL) {
PartsOpc = ISD::SRL_PARTS;		PartsOpc = ISD::SRL_PARTS;
} else {		} else {
assert(N->getOpcode() == ISD::SRA && "Unknown shift!");		assert(N->getOpcode() == ISD::SRA && "Unknown shift!");
PartsOpc = ISD::SRA_PARTS;		PartsOpc = ISD::SRA_PARTS;
}		}

// Next check to see if the target supports this SHL_PARTS operation or if it		// Next check to see if the target supports this SHL_PARTS operation or if it
// will custom expand it.		// will custom expand it. Don't lower this to SHL_PARTS when we optimise for
		// size, but create a libcall instead.
EVT NVT = TLI.getTypeToTransformTo(*DAG.getContext(), VT);		EVT NVT = TLI.getTypeToTransformTo(*DAG.getContext(), VT);
TargetLowering::LegalizeAction Action = TLI.getOperationAction(PartsOpc, NVT);		TargetLowering::LegalizeAction Action = TLI.getOperationAction(PartsOpc, NVT);
if ((Action == TargetLowering::Legal && TLI.isTypeLegal(NVT)) \|\|		const bool LegalOrCustom =
Action == TargetLowering::Custom) {		(Action == TargetLowering::Legal && TLI.isTypeLegal(NVT)) \|\|
		Action == TargetLowering::Custom;
		const bool MinSize = DAG.getMachineFunction().getFunction().optForMinSize();
		if (!MinSize && LegalOrCustom) {
		efriedmaUnsubmitted Not Done Reply Inline Actions The optForMinSize check should probably be in target-specific code; yes, the call is smaller on all the popular targets I can think of, but that's a function of the specific opcodes available, not a general rule. efriedma: The optForMinSize check should probably be in target-specific code; yes, the call is smaller on…
// Expand the subcomponents.		// Expand the subcomponents.
SDValue LHSL, LHSH;		SDValue LHSL, LHSH;
GetExpandedInteger(N->getOperand(0), LHSL, LHSH);		GetExpandedInteger(N->getOperand(0), LHSL, LHSH);
EVT VT = LHSL.getValueType();		EVT VT = LHSL.getValueType();

// If the shift amount operand is coming from a vector legalization it may		// If the shift amount operand is coming from a vector legalization it may
// have an illegal type. Fix that first by casting the operand, otherwise		// have an illegal type. Fix that first by casting the operand, otherwise
// the new SHL_PARTS operation would need further legalization.		// the new SHL_PARTS operation would need further legalization.
▲ Show 20 Lines • Show All 1,082 Lines • Show Last 20 Lines

test/CodeGen/AArch64/shift_minsize.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=aarch64-unknown-unknown \| FileCheck %s

				define i64 @f0(i64 %val, i64 %amt) minsize optsize {
				; CHECK-LABEL: f0:
				; CHECK: // %bb.0:
				; CHECK-NEXT: lsl x0, x0, x1
				; CHECK-NEXT: ret
				%res = shl i64 %val, %amt
				ret i64 %res
				}

				define i32 @f1(i64 %x, i64 %y) minsize optsize {
				; CHECK-LABEL: f1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: lsl x0, x0, x1
				; CHECK-NEXT: // kill: def $w0 killed $w0 killed $x0
				; CHECK-NEXT: ret
				%a = shl i64 %x, %y
				%b = trunc i64 %a to i32
				ret i32 %b
				}

				define i32 @f2(i64 %x, i64 %y) minsize optsize {
				; CHECK-LABEL: f2:
				; CHECK: // %bb.0:
				; CHECK-NEXT: asr x0, x0, x1
				; CHECK-NEXT: // kill: def $w0 killed $w0 killed $x0
				; CHECK-NEXT: ret
				%a = ashr i64 %x, %y
				%b = trunc i64 %a to i32
				ret i32 %b
				}

				define i32 @f3(i64 %x, i64 %y) minsize optsize {
				; CHECK-LABEL: f3:
				; CHECK: // %bb.0:
				; CHECK-NEXT: lsr x0, x0, x1
				; CHECK-NEXT: // kill: def $w0 killed $w0 killed $x0
				; CHECK-NEXT: ret
				%a = lshr i64 %x, %y
				%b = trunc i64 %a to i32
				ret i32 %b
				}

				define dso_local { i64, i64 } @shl128(i64 %x.coerce0, i64 %x.coerce1, i8 signext %y) minsize optsize {
				; CHECK-LABEL: shl128:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: .cfi_offset w30, -16
				; CHECK-NEXT: // kill: def $w2 killed $w2 def $x2
				; CHECK-NEXT: bl __ashlti3
				; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%x.sroa.2.0.insert.ext = zext i64 %x.coerce1 to i128
				%x.sroa.2.0.insert.shift = shl nuw i128 %x.sroa.2.0.insert.ext, 64
				%x.sroa.0.0.insert.ext = zext i64 %x.coerce0 to i128
				%x.sroa.0.0.insert.insert = or i128 %x.sroa.2.0.insert.shift, %x.sroa.0.0.insert.ext
				%conv = sext i8 %y to i32
				%sh_prom = zext i32 %conv to i128
				%shl = shl i128 %x.sroa.0.0.insert.insert, %sh_prom
				%retval.sroa.0.0.extract.trunc = trunc i128 %shl to i64
				%retval.sroa.2.0.extract.shift = lshr i128 %shl, 64
				%retval.sroa.2.0.extract.trunc = trunc i128 %retval.sroa.2.0.extract.shift to i64
				%.fca.0.insert = insertvalue { i64, i64 } undef, i64 %retval.sroa.0.0.extract.trunc, 0
				%.fca.1.insert = insertvalue { i64, i64 } %.fca.0.insert, i64 %retval.sroa.2.0.extract.trunc, 1
				ret { i64, i64 } %.fca.1.insert
				}

				define dso_local { i64, i64 } @ashr128(i64 %x.coerce0, i64 %x.coerce1, i8 signext %y) minsize optsize {
				; CHECK-LABEL: ashr128:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: .cfi_offset w30, -16
				; CHECK-NEXT: // kill: def $w2 killed $w2 def $x2
				; CHECK-NEXT: bl __ashrti3
				; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%x.sroa.2.0.insert.ext = zext i64 %x.coerce1 to i128
				%x.sroa.2.0.insert.shift = shl nuw i128 %x.sroa.2.0.insert.ext, 64
				%x.sroa.0.0.insert.ext = zext i64 %x.coerce0 to i128
				%x.sroa.0.0.insert.insert = or i128 %x.sroa.2.0.insert.shift, %x.sroa.0.0.insert.ext
				%conv = sext i8 %y to i32
				%sh_prom = zext i32 %conv to i128
				%shr = ashr i128 %x.sroa.0.0.insert.insert, %sh_prom
				%retval.sroa.0.0.extract.trunc = trunc i128 %shr to i64
				%retval.sroa.2.0.extract.shift = lshr i128 %shr, 64
				%retval.sroa.2.0.extract.trunc = trunc i128 %retval.sroa.2.0.extract.shift to i64
				%.fca.0.insert = insertvalue { i64, i64 } undef, i64 %retval.sroa.0.0.extract.trunc, 0
				%.fca.1.insert = insertvalue { i64, i64 } %.fca.0.insert, i64 %retval.sroa.2.0.extract.trunc, 1
				ret { i64, i64 } %.fca.1.insert
				}

				define dso_local { i64, i64 } @lshr128(i64 %x.coerce0, i64 %x.coerce1, i8 signext %y) minsize optsize {
				; CHECK-LABEL: lshr128:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: .cfi_offset w30, -16
				; CHECK-NEXT: // kill: def $w2 killed $w2 def $x2
				; CHECK-NEXT: bl __lshrti3
				; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%x.sroa.2.0.insert.ext = zext i64 %x.coerce1 to i128
				%x.sroa.2.0.insert.shift = shl nuw i128 %x.sroa.2.0.insert.ext, 64
				%x.sroa.0.0.insert.ext = zext i64 %x.coerce0 to i128
				%x.sroa.0.0.insert.insert = or i128 %x.sroa.2.0.insert.shift, %x.sroa.0.0.insert.ext
				%conv = sext i8 %y to i32
				%sh_prom = zext i32 %conv to i128
				%shr = lshr i128 %x.sroa.0.0.insert.insert, %sh_prom
				%retval.sroa.0.0.extract.trunc = trunc i128 %shr to i64
				%retval.sroa.2.0.extract.shift = lshr i128 %shr, 64
				%retval.sroa.2.0.extract.trunc = trunc i128 %retval.sroa.2.0.extract.shift to i64
				%.fca.0.insert = insertvalue { i64, i64 } undef, i64 %retval.sroa.0.0.extract.trunc, 0
				%.fca.1.insert = insertvalue { i64, i64 } %.fca.0.insert, i64 %retval.sroa.2.0.extract.trunc, 1
				ret { i64, i64 } %.fca.1.insert
				}

test/CodeGen/ARM/shift_minsize.ll

This file was added.

				; RUN: llc -mtriple=arm-eabi %s -o - \| FileCheck %s

				define i64 @f0(i64 %val, i64 %amt) minsize optsize {
				; CHECK-LABEL: f0:
				; CHECK: bl __aeabi_llsl
				%res = shl i64 %val, %amt
				ret i64 %res
				}

				define i32 @f1(i64 %x, i64 %y) minsize optsize {
				; CHECK-LABEL: f1:
				; CHECK: bl __aeabi_llsl
				%a = shl i64 %x, %y
				%b = trunc i64 %a to i32
				ret i32 %b
				}

				define i32 @f2(i64 %x, i64 %y) minsize optsize {
				; CHECK-LABEL: f2:
				; CHECK: bl __aeabi_lasr
				%a = ashr i64 %x, %y
				%b = trunc i64 %a to i32
				ret i32 %b
				}

				define i32 @f3(i64 %x, i64 %y) minsize optsize {
				; CHECK-LABEL: f3:
				; CHECK: bl __aeabi_llsr
				%a = lshr i64 %x, %y
				%b = trunc i64 %a to i32
				ret i32 %b
				}

test/CodeGen/X86/shift_minsize.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=x86_64-unknown \| FileCheck %s

				define i64 @f0(i64 %val, i64 %amt) minsize optsize {
				; CHECK-LABEL: f0:
				; CHECK: # %bb.0:
				; CHECK-NEXT: movq %rsi, %rcx
				; CHECK-NEXT: movq %rdi, %rax
				; CHECK-NEXT: # kill: def $cl killed $cl killed $rcx
				; CHECK-NEXT: shlq %cl, %rax
				; CHECK-NEXT: retq
				%res = shl i64 %val, %amt
				ret i64 %res
				}

				define i32 @f1(i64 %x, i64 %y) minsize optsize {
				; CHECK-LABEL: f1:
				; CHECK: # %bb.0:
				; CHECK-NEXT: movq %rsi, %rcx
				; CHECK-NEXT: movq %rdi, %rax
				; CHECK-NEXT: # kill: def $cl killed $cl killed $rcx
				; CHECK-NEXT: shlq %cl, %rax
				; CHECK-NEXT: # kill: def $eax killed $eax killed $rax
				; CHECK-NEXT: retq
				%a = shl i64 %x, %y
				%b = trunc i64 %a to i32
				ret i32 %b
				}

				define i32 @f2(i64 %x, i64 %y) minsize optsize {
				; CHECK-LABEL: f2:
				; CHECK: # %bb.0:
				; CHECK-NEXT: movq %rsi, %rcx
				; CHECK-NEXT: movq %rdi, %rax
				; CHECK-NEXT: # kill: def $cl killed $cl killed $rcx
				; CHECK-NEXT: sarq %cl, %rax
				; CHECK-NEXT: # kill: def $eax killed $eax killed $rax
				; CHECK-NEXT: retq
				%a = ashr i64 %x, %y
				%b = trunc i64 %a to i32
				ret i32 %b
				}

				define i32 @f3(i64 %x, i64 %y) minsize optsize {
				; CHECK-LABEL: f3:
				; CHECK: # %bb.0:
				; CHECK-NEXT: movq %rsi, %rcx
				; CHECK-NEXT: movq %rdi, %rax
				; CHECK-NEXT: # kill: def $cl killed $cl killed $rcx
				; CHECK-NEXT: shrq %cl, %rax
				; CHECK-NEXT: # kill: def $eax killed $eax killed $rax
				; CHECK-NEXT: retq
				%a = lshr i64 %x, %y
				%b = trunc i64 %a to i32
				ret i32 %b
				}

				define dso_local { i64, i64 } @shl128(i64 %x.coerce0, i64 %x.coerce1, i8 signext %y) minsize optsize {
				; CHECK-LABEL: shl128:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: movzbl %dl, %edx
				; CHECK-NEXT: callq __ashlti3
				; CHECK-NEXT: popq %rcx
				; CHECK-NEXT: .cfi_def_cfa_offset 8
				; CHECK-NEXT: retq
				entry:
				%x.sroa.2.0.insert.ext = zext i64 %x.coerce1 to i128
				%x.sroa.2.0.insert.shift = shl nuw i128 %x.sroa.2.0.insert.ext, 64
				%x.sroa.0.0.insert.ext = zext i64 %x.coerce0 to i128
				%x.sroa.0.0.insert.insert = or i128 %x.sroa.2.0.insert.shift, %x.sroa.0.0.insert.ext
				%conv = sext i8 %y to i32
				%sh_prom = zext i32 %conv to i128
				%shl = shl i128 %x.sroa.0.0.insert.insert, %sh_prom
				%retval.sroa.0.0.extract.trunc = trunc i128 %shl to i64
				%retval.sroa.2.0.extract.shift = lshr i128 %shl, 64
				%retval.sroa.2.0.extract.trunc = trunc i128 %retval.sroa.2.0.extract.shift to i64
				%.fca.0.insert = insertvalue { i64, i64 } undef, i64 %retval.sroa.0.0.extract.trunc, 0
				%.fca.1.insert = insertvalue { i64, i64 } %.fca.0.insert, i64 %retval.sroa.2.0.extract.trunc, 1
				ret { i64, i64 } %.fca.1.insert
				}

				define dso_local { i64, i64 } @ashr128(i64 %x.coerce0, i64 %x.coerce1, i8 signext %y) minsize optsize {
				; CHECK-LABEL: ashr128:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: callq __ashrti3
				; CHECK-NEXT: popq %rcx
				; CHECK-NEXT: .cfi_def_cfa_offset 8
				; CHECK-NEXT: retq
				entry:
				%x.sroa.2.0.insert.ext = zext i64 %x.coerce1 to i128
				%x.sroa.2.0.insert.shift = shl nuw i128 %x.sroa.2.0.insert.ext, 64
				%x.sroa.0.0.insert.ext = zext i64 %x.coerce0 to i128
				%x.sroa.0.0.insert.insert = or i128 %x.sroa.2.0.insert.shift, %x.sroa.0.0.insert.ext
				%conv = sext i8 %y to i32
				%sh_prom = zext i32 %conv to i128
				%shr = ashr i128 %x.sroa.0.0.insert.insert, %sh_prom
				%retval.sroa.0.0.extract.trunc = trunc i128 %shr to i64
				%retval.sroa.2.0.extract.shift = lshr i128 %shr, 64
				%retval.sroa.2.0.extract.trunc = trunc i128 %retval.sroa.2.0.extract.shift to i64
				%.fca.0.insert = insertvalue { i64, i64 } undef, i64 %retval.sroa.0.0.extract.trunc, 0
				%.fca.1.insert = insertvalue { i64, i64 } %.fca.0.insert, i64 %retval.sroa.2.0.extract.trunc, 1
				ret { i64, i64 } %.fca.1.insert
				}

				define dso_local { i64, i64 } @lshr128(i64 %x.coerce0, i64 %x.coerce1, i8 signext %y) minsize optsize {
				; CHECK-LABEL: lshr128:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: movzbl %dl, %edx
				; CHECK-NEXT: callq __lshrti3
				; CHECK-NEXT: popq %rcx
				; CHECK-NEXT: .cfi_def_cfa_offset 8
				; CHECK-NEXT: retq
				entry:
				%x.sroa.2.0.insert.ext = zext i64 %x.coerce1 to i128
				%x.sroa.2.0.insert.shift = shl nuw i128 %x.sroa.2.0.insert.ext, 64
				%x.sroa.0.0.insert.ext = zext i64 %x.coerce0 to i128
				%x.sroa.0.0.insert.insert = or i128 %x.sroa.2.0.insert.shift, %x.sroa.0.0.insert.ext
				%conv = sext i8 %y to i32
				%sh_prom = zext i32 %conv to i128
				%shr = lshr i128 %x.sroa.0.0.insert.insert, %sh_prom
				%retval.sroa.0.0.extract.trunc = trunc i128 %shr to i64
				%retval.sroa.2.0.extract.shift = lshr i128 %shr, 64
				%retval.sroa.2.0.extract.trunc = trunc i128 %retval.sroa.2.0.extract.shift to i64
				%.fca.0.insert = insertvalue { i64, i64 } undef, i64 %retval.sroa.0.0.extract.trunc, 0
				%.fca.1.insert = insertvalue { i64, i64 } %.fca.0.insert, i64 %retval.sroa.2.0.extract.trunc, 1
				ret { i64, i64 } %.fca.1.insert
				}

This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Codesize: don't expand SHIFT to SHIFT_PARTSClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 184085

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

test/CodeGen/AArch64/shift_minsize.ll

test/CodeGen/ARM/shift_minsize.ll

test/CodeGen/X86/shift_minsize.ll

[SelectionDAG] Codesize: don't expand SHIFT to SHIFT_PARTS
ClosedPublic