This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Ensure we do not attempt to create lsll #0
ClosedPublic

Authored by dmgreen on Sep 17 2019, 9:20 AM.

Download Raw Diff

Details

Reviewers

t.p.northover
samparker
SjoerdMeijer
simon_tatham
ostannard
efriedma

Commits

rG10d10102a443: [ARM] Ensure we do not attempt to create lsll #0
rL372839: [ARM] Ensure we do not attempt to create lsll #0

Summary

During legalisation we can end up with some pretty odd nodes, like shifts of 0. We need to make sure we don't try to make long shifts of these, ending up with invalid nodes. A long shift with a zero immediate actually encodes a shift by 32.

Diff Detail

Repository: rL LLVM

Event Timeline

dmgreen created this revision.Sep 17 2019, 9:20 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 17 2019, 9:20 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

efriedma added a subscriber: efriedma.Sep 17 2019, 9:37 AM

efriedma added inline comments.

llvm/lib/Target/ARM/ARMISelLowering.cpp
6013 ↗	(On Diff #220517)	What happens if we discover the shift amount is zero after legalization? If MVE_LSLLi doesn't accept arbitrary immediates, the isel pattern should reflect that. (With only that fix, I think we still end up with an MVE_LSLLr, but that's not a correctness issue, just a missed optimization, I think.)

Now using long_shift as an ImmLeaf.

I'm not sure how to test what will happen after legalisation. Any suggestions?

Just to verify the patch is doing what you think it is, you could hack the code.

We don't have any way to write isel tests where the input is a DAG. You could probably come up with some sequence which currently isn't folded until after type legalization, but it wouldn't really be reliable against future changes to DAG optimizations. I guess we could introduce an intrinsic that specifically turns into a constant after legalization for testing? But that's maybe overkill...

Maybe you could actually construct a test using GlobalISel? GlobalISel is at least partially implemented on ARM, but you'd need to introduce some way to express ARMlsll with GlobalISel, and I'm not sure how to do that, off the top of my head.

And as a fix for the problem here, and what I believe is a sensible fix for the long_shift pattern, does this patch look OK?

If you can't figure out how to write a test for the patterns once the optimization in Expand64BitShift is implemented, that's okay. LGTM

On a related note, why are we using a target-specific node here, as opposed to ISD::SHL_PARTS?

This revision is now accepted and ready to land.Sep 24 2019, 12:22 PM

Thanks.

We did originally try to make this work with the SHL_PARTS nodes, and got quite far if my memory is correct. There was a certain amount of target independent code that was changed to keep them legal and prevent optimisations that we didn't want to happen from going off. My memory is fuzzy as to what the final showstopper was (if there really was one). Maybe something about treating LSRL as a LSLL with a negated operand, with the SRL_PARTS not there really being legal?

Closed by commit rL372839: [ARM] Ensure we do not attempt to create lsll #0 (authored by dmgreen). · Explain WhySep 25 2019, 3:16 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

ARM/

ARMISelLowering.cpp

2 lines

ARMInstrMVE.td

6 lines

ARMInstrThumb2.td

3 lines

test/

CodeGen/

Thumb2/

lsll0.ll

48 lines

Diff 221700

llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,005 Lines • ▼ Show 20 Lines	static SDValue Expand64BitShift(SDNode *N, SelectionDAG &DAG,
if (ST->hasMVEIntegerOps()) {		if (ST->hasMVEIntegerOps()) {
SDValue ShAmt = N->getOperand(1);		SDValue ShAmt = N->getOperand(1);
unsigned ShPartsOpc = ARMISD::LSLL;		unsigned ShPartsOpc = ARMISD::LSLL;
ConstantSDNode *Con = dyn_cast<ConstantSDNode>(ShAmt);		ConstantSDNode *Con = dyn_cast<ConstantSDNode>(ShAmt);

// If the shift amount is greater than 32 or has a greater bitwidth than 64		// If the shift amount is greater than 32 or has a greater bitwidth than 64
// then do the default optimisation		// then do the default optimisation
if (ShAmt->getValueType(0).getSizeInBits() > 64 \|\|		if (ShAmt->getValueType(0).getSizeInBits() > 64 \|\|
(Con && Con->getZExtValue() >= 32))		(Con && (Con->getZExtValue() == 0 \|\| Con->getZExtValue() >= 32)))
return SDValue();		return SDValue();

// Extract the lower 32 bits of the shift amount if it's not an i32		// Extract the lower 32 bits of the shift amount if it's not an i32
if (ShAmt->getValueType(0) != MVT::i32)		if (ShAmt->getValueType(0) != MVT::i32)
ShAmt = DAG.getZExtOrTrunc(ShAmt, dl, MVT::i32);		ShAmt = DAG.getZExtOrTrunc(ShAmt, dl, MVT::i32);

if (ShOpc == ISD::SRL) {		if (ShOpc == ISD::SRL) {
if (!Con)		if (!Con)
▲ Show 20 Lines • Show All 10,973 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMInstrMVE.td

Show First 20 Lines • Show All 447 Lines • ▼ Show 20 Lines	class MVE_ScalarShiftDRegRegWithSat<string iname, bit op5, list<dag> pattern=[]>
let Inst{7} = sat;		let Inst{7} = sat;
}		}

def MVE_ASRLr : MVE_ScalarShiftDRegReg<"asrl", 0b1, [(set tGPREven:$RdaLo, tGPROdd:$RdaHi,		def MVE_ASRLr : MVE_ScalarShiftDRegReg<"asrl", 0b1, [(set tGPREven:$RdaLo, tGPROdd:$RdaHi,
(ARMasrl tGPREven:$RdaLo_src,		(ARMasrl tGPREven:$RdaLo_src,
tGPROdd:$RdaHi_src, rGPR:$Rm))]>;		tGPROdd:$RdaHi_src, rGPR:$Rm))]>;
def MVE_ASRLi : MVE_ScalarShiftDRegImm<"asrl", 0b10, ?, [(set tGPREven:$RdaLo, tGPROdd:$RdaHi,		def MVE_ASRLi : MVE_ScalarShiftDRegImm<"asrl", 0b10, ?, [(set tGPREven:$RdaLo, tGPROdd:$RdaHi,
(ARMasrl tGPREven:$RdaLo_src,		(ARMasrl tGPREven:$RdaLo_src,
tGPROdd:$RdaHi_src, (i32 imm:$imm)))]>;		tGPROdd:$RdaHi_src, (i32 long_shift:$imm)))]>;
def MVE_LSLLr : MVE_ScalarShiftDRegReg<"lsll", 0b0, [(set tGPREven:$RdaLo, tGPROdd:$RdaHi,		def MVE_LSLLr : MVE_ScalarShiftDRegReg<"lsll", 0b0, [(set tGPREven:$RdaLo, tGPROdd:$RdaHi,
(ARMlsll tGPREven:$RdaLo_src,		(ARMlsll tGPREven:$RdaLo_src,
tGPROdd:$RdaHi_src, rGPR:$Rm))]>;		tGPROdd:$RdaHi_src, rGPR:$Rm))]>;
def MVE_LSLLi : MVE_ScalarShiftDRegImm<"lsll", 0b00, ?, [(set tGPREven:$RdaLo, tGPROdd:$RdaHi,		def MVE_LSLLi : MVE_ScalarShiftDRegImm<"lsll", 0b00, ?, [(set tGPREven:$RdaLo, tGPROdd:$RdaHi,
(ARMlsll tGPREven:$RdaLo_src,		(ARMlsll tGPREven:$RdaLo_src,
tGPROdd:$RdaHi_src, (i32 imm:$imm)))]>;		tGPROdd:$RdaHi_src, (i32 long_shift:$imm)))]>;
def MVE_LSRL : MVE_ScalarShiftDRegImm<"lsrl", 0b01, ?, [(set tGPREven:$RdaLo, tGPROdd:$RdaHi,		def MVE_LSRL : MVE_ScalarShiftDRegImm<"lsrl", 0b01, ?, [(set tGPREven:$RdaLo, tGPROdd:$RdaHi,
(ARMlsrl tGPREven:$RdaLo_src,		(ARMlsrl tGPREven:$RdaLo_src,
tGPROdd:$RdaHi_src, (i32 imm:$imm)))]>;		tGPROdd:$RdaHi_src, (i32 long_shift:$imm)))]>;

def MVE_SQRSHRL : MVE_ScalarShiftDRegRegWithSat<"sqrshrl", 0b1>;		def MVE_SQRSHRL : MVE_ScalarShiftDRegRegWithSat<"sqrshrl", 0b1>;
def MVE_SQSHLL : MVE_ScalarShiftDRegImm<"sqshll", 0b11, 0b1>;		def MVE_SQSHLL : MVE_ScalarShiftDRegImm<"sqshll", 0b11, 0b1>;
def MVE_SRSHRL : MVE_ScalarShiftDRegImm<"srshrl", 0b10, 0b1>;		def MVE_SRSHRL : MVE_ScalarShiftDRegImm<"srshrl", 0b10, 0b1>;

def MVE_UQRSHLL : MVE_ScalarShiftDRegRegWithSat<"uqrshll", 0b0>;		def MVE_UQRSHLL : MVE_ScalarShiftDRegRegWithSat<"uqrshll", 0b0>;
def MVE_UQSHLL : MVE_ScalarShiftDRegImm<"uqshll", 0b00, 0b1>;		def MVE_UQSHLL : MVE_ScalarShiftDRegImm<"uqshll", 0b00, 0b1>;
def MVE_URSHRL : MVE_ScalarShiftDRegImm<"urshrl", 0b01, 0b1>;		def MVE_URSHRL : MVE_ScalarShiftDRegImm<"urshrl", 0b01, 0b1>;
▲ Show 20 Lines • Show All 4,843 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMInstrThumb2.td

Show All 39 Lines	def t2_shift_imm : Operand<i32> {
let DecoderMethod = "DecodeT2ShifterImmOperand";		let DecoderMethod = "DecodeT2ShifterImmOperand";
}		}

def mve_shift_imm : AsmOperandClass {		def mve_shift_imm : AsmOperandClass {
let Name = "MVELongShift";		let Name = "MVELongShift";
let RenderMethod = "addImmOperands";		let RenderMethod = "addImmOperands";
let DiagnosticString = "operand must be an immediate in the range [1,32]";		let DiagnosticString = "operand must be an immediate in the range [1,32]";
}		}
def long_shift : Operand<i32> {		def long_shift : Operand<i32>,
		ImmLeaf<i32, [{ return Imm > 0 && Imm <= 32; }]> {
let ParserMatchClass = mve_shift_imm;		let ParserMatchClass = mve_shift_imm;
let DecoderMethod = "DecodeLongShiftOperand";		let DecoderMethod = "DecodeLongShiftOperand";
}		}

// Shifted operands. No register controlled shifts for Thumb2.		// Shifted operands. No register controlled shifts for Thumb2.
// Note: We do not support rrx shifted operands yet.		// Note: We do not support rrx shifted operands yet.
def t2_so_reg : Operand<i32>, // reg imm		def t2_so_reg : Operand<i32>, // reg imm
ComplexPattern<i32, 2, "SelectShiftImmShifterOperand",		ComplexPattern<i32, 2, "SelectShiftImmShifterOperand",
▲ Show 20 Lines • Show All 5,240 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/Thumb2/lsll0.ll

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=thumbv8.1m.main-arm-none-eabi -mattr=+mve -verify-machineinstrs %s -o - \| FileCheck %s

				define void @_Z4loopPxS_iS_i(i64* %d) {
				; CHECK-LABEL: _Z4loopPxS_iS_i:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vldrw.u32 q0, [r0]
				; CHECK-NEXT: vmov r1, s2
				; CHECK-NEXT: vmov r2, s0
				; CHECK-NEXT: sxth r1, r1
				; CHECK-NEXT: sxth r2, r2
				; CHECK-NEXT: rsbs r1, r1, #0
				; CHECK-NEXT: rsbs r2, r2, #0
				; CHECK-NEXT: sxth r1, r1
				; CHECK-NEXT: sxth r2, r2
				; CHECK-NEXT: asr.w r12, r1, #31
				; CHECK-NEXT: asrs r3, r2, #31
				; CHECK-NEXT: strd r2, r3, [r0]
				; CHECK-NEXT: strd r1, r12, [r0, #8]
				; CHECK-NEXT: bx lr
				entry:
				%wide.load = load <2 x i64>, <2 x i64>* undef, align 8
				%0 = trunc <2 x i64> %wide.load to <2 x i32>
				%1 = shl <2 x i32> %0, <i32 16, i32 16>
				%2 = ashr exact <2 x i32> %1, <i32 16, i32 16>
				%3 = sub <2 x i32> %2, %0
				%4 = and <2 x i32> %3, <i32 7, i32 7>
				%5 = shl <2 x i32> %2, %4
				%6 = extractelement <2 x i32> %5, i32 0
				%7 = zext i32 %6 to i64
				%8 = select i1 false, i64 %7, i64 undef
				%9 = trunc i64 %8 to i16
				%10 = sub i16 0, %9
				%11 = sext i16 %10 to i64
				%12 = getelementptr inbounds i64, i64* %d, i64 undef
				store i64 %11, i64* %12, align 8
				%13 = extractelement <2 x i32> %5, i32 1
				%14 = zext i32 %13 to i64
				%15 = select i1 false, i64 %14, i64 undef
				%16 = trunc i64 %15 to i16
				%17 = sub i16 0, %16
				%18 = sext i16 %17 to i64
				%19 = or i32 0, 1
				%20 = sext i32 %19 to i64
				%21 = getelementptr inbounds i64, i64* %d, i64 %20
				store i64 %18, i64* %21, align 8
				ret void
				}