This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Ensure we do not attempt to create lsll #0
ClosedPublic

Authored by dmgreen on Sep 17 2019, 9:20 AM.

Download Raw Diff

Details

Reviewers

t.p.northover
samparker
SjoerdMeijer
simon_tatham
ostannard
efriedma

Commits

rG10d10102a443: [ARM] Ensure we do not attempt to create lsll #0
rL372839: [ARM] Ensure we do not attempt to create lsll #0

Summary

During legalisation we can end up with some pretty odd nodes, like shifts of 0. We need to make sure we don't try to make long shifts of these, ending up with invalid nodes. A long shift with a zero immediate actually encodes a shift by 32.

Diff Detail

Event Timeline

dmgreen created this revision.Sep 17 2019, 9:20 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 17 2019, 9:20 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

efriedma added a subscriber: efriedma.Sep 17 2019, 9:37 AM

efriedma added inline comments.

llvm/lib/Target/ARM/ARMISelLowering.cpp
6013	What happens if we discover the shift amount is zero after legalization? If MVE_LSLLi doesn't accept arbitrary immediates, the isel pattern should reflect that. (With only that fix, I think we still end up with an MVE_LSLLr, but that's not a correctness issue, just a missed optimization, I think.)

Now using long_shift as an ImmLeaf.

I'm not sure how to test what will happen after legalisation. Any suggestions?

Just to verify the patch is doing what you think it is, you could hack the code.

We don't have any way to write isel tests where the input is a DAG. You could probably come up with some sequence which currently isn't folded until after type legalization, but it wouldn't really be reliable against future changes to DAG optimizations. I guess we could introduce an intrinsic that specifically turns into a constant after legalization for testing? But that's maybe overkill...

Maybe you could actually construct a test using GlobalISel? GlobalISel is at least partially implemented on ARM, but you'd need to introduce some way to express ARMlsll with GlobalISel, and I'm not sure how to do that, off the top of my head.

And as a fix for the problem here, and what I believe is a sensible fix for the long_shift pattern, does this patch look OK?

If you can't figure out how to write a test for the patterns once the optimization in Expand64BitShift is implemented, that's okay. LGTM

On a related note, why are we using a target-specific node here, as opposed to ISD::SHL_PARTS?

This revision is now accepted and ready to land.Sep 24 2019, 12:22 PM

Thanks.

We did originally try to make this work with the SHL_PARTS nodes, and got quite far if my memory is correct. There was a certain amount of target independent code that was changed to keep them legal and prevent optimisations that we didn't want to happen from going off. My memory is fuzzy as to what the final showstopper was (if there really was one). Maybe something about treating LSRL as a LSLL with a negated operand, with the SRL_PARTS not there really being legal?

Closed by commit rL372839: [ARM] Ensure we do not attempt to create lsll #0 (authored by dmgreen). · Explain WhySep 25 2019, 3:16 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

ARM/

ARMISelLowering.cpp

2 lines

test/

CodeGen/

Thumb2/

lsll0.ll

10 lines

Diff 220517

llvm/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,004 Lines • ▼ Show 20 Lines	static SDValue Expand64BitShift(SDNode *N, SelectionDAG &DAG,
if (ST->hasMVEIntegerOps()) {		if (ST->hasMVEIntegerOps()) {
SDValue ShAmt = N->getOperand(1);		SDValue ShAmt = N->getOperand(1);
unsigned ShPartsOpc = ARMISD::LSLL;		unsigned ShPartsOpc = ARMISD::LSLL;
ConstantSDNode *Con = dyn_cast<ConstantSDNode>(ShAmt);		ConstantSDNode *Con = dyn_cast<ConstantSDNode>(ShAmt);

// If the shift amount is greater than 32 or has a greater bitwidth than 64		// If the shift amount is greater than 32 or has a greater bitwidth than 64
// then do the default optimisation		// then do the default optimisation
if (ShAmt->getValueType(0).getSizeInBits() > 64 \|\|		if (ShAmt->getValueType(0).getSizeInBits() > 64 \|\|
(Con && Con->getZExtValue() >= 32))		(Con && (Con->getZExtValue() == 0 \|\| Con->getZExtValue() >= 32)))
		efriedmaUnsubmitted Not Done Reply Inline Actions What happens if we discover the shift amount is zero after legalization? If MVE_LSLLi doesn't accept arbitrary immediates, the isel pattern should reflect that. (With only that fix, I think we still end up with an MVE_LSLLr, but that's not a correctness issue, just a missed optimization, I think.) efriedma: What happens if we discover the shift amount is zero after legalization? If MVE_LSLLi doesn't…
return SDValue();		return SDValue();

// Extract the lower 32 bits of the shift amount if it's not an i32		// Extract the lower 32 bits of the shift amount if it's not an i32
if (ShAmt->getValueType(0) != MVT::i32)		if (ShAmt->getValueType(0) != MVT::i32)
ShAmt = DAG.getZExtOrTrunc(ShAmt, dl, MVT::i32);		ShAmt = DAG.getZExtOrTrunc(ShAmt, dl, MVT::i32);

if (ShOpc == ISD::SRL) {		if (ShOpc == ISD::SRL) {
if (!Con)		if (!Con)
▲ Show 20 Lines • Show All 11,109 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/lsll0.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=thumbv8.1m.main-arm-none-eabi -mattr=+mve -verify-machineinstrs %s -o - \| FileCheck %s			; RUN: llc -mtriple=thumbv8.1m.main-arm-none-eabi -mattr=+mve -verify-machineinstrs %s -o - \| FileCheck %s

	define void @_Z4loopPxS_iS_i(i64* %d) {			define void @_Z4loopPxS_iS_i(i64* %d) {
	; CHECK-LABEL: _Z4loopPxS_iS_i:			; CHECK-LABEL: _Z4loopPxS_iS_i:
	; CHECK: @ %bb.0: @ %entry			; CHECK: @ %bb.0: @ %entry
	; CHECK-NEXT: vldrw.u32 q0, [r0]			; CHECK-NEXT: vldrw.u32 q0, [r0]
	; CHECK-NEXT: vmov r1, s2			; CHECK-NEXT: vmov r1, s2
	; CHECK-NEXT: sxth r2, r1
	; CHECK-NEXT: asrs r1, r2, #31
	; CHECK-NEXT: lsll r2, r1, #0
	; CHECK-NEXT: rsbs r1, r2, #0
	; CHECK-NEXT: vmov r2, s0			; CHECK-NEXT: vmov r2, s0
	; CHECK-NEXT: sxth r1, r1			; CHECK-NEXT: sxth r1, r1
	; CHECK-NEXT: asr.w r12, r1, #31
	; CHECK-NEXT: sxth r2, r2			; CHECK-NEXT: sxth r2, r2
	; CHECK-NEXT: asrs r3, r2, #31			; CHECK-NEXT: rsbs r1, r1, #0
	; CHECK-NEXT: lsll r2, r3, #0
	; CHECK-NEXT: rsbs r2, r2, #0			; CHECK-NEXT: rsbs r2, r2, #0
				; CHECK-NEXT: sxth r1, r1
	; CHECK-NEXT: sxth r2, r2			; CHECK-NEXT: sxth r2, r2
				; CHECK-NEXT: asr.w r12, r1, #31
	; CHECK-NEXT: asrs r3, r2, #31			; CHECK-NEXT: asrs r3, r2, #31
	; CHECK-NEXT: strd r2, r3, [r0]			; CHECK-NEXT: strd r2, r3, [r0]
	; CHECK-NEXT: strd r1, r12, [r0, #8]			; CHECK-NEXT: strd r1, r12, [r0, #8]
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	entry:			entry:
	%wide.load = load <2 x i64>, <2 x i64>* undef, align 8			%wide.load = load <2 x i64>, <2 x i64>* undef, align 8
	%0 = trunc <2 x i64> %wide.load to <2 x i32>			%0 = trunc <2 x i64> %wide.load to <2 x i32>
	%1 = shl <2 x i32> %0, <i32 16, i32 16>			%1 = shl <2 x i32> %0, <i32 16, i32 16>
	Show All 24 Lines