This is an archive of the discontinued LLVM Phabricator instance.

[RISCV][DAG] vslideup/vslidedown with zero offset and undef pass through is a nop
AbandonedPublic

Authored by reames on Oct 24 2022, 2:33 PM.

Download Raw Diff

Details

Reviewers

craig.topper
asb
kito-cheng
frasercrmck

Summary

If we have a vslideup or vslidedown with a zero offset, this is equivalent to copying the active elements into the destination. If the destination is undefined, then we can ignore masking and VL predication and simply return the source operand. This may cause additional lanes to be defined, but is otherwise a nop.

Not thought to be important; noticed while glancing at a test diff for something else, and figured I'd knock it out.

Diff Detail

Event Timeline

reames created this revision.Oct 24 2022, 2:33 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 24 2022, 2:33 PM

Herald added subscribers: sunshaoce, VincentWu, StephenFan and 28 others. · View Herald Transcript

reames requested review of this revision.Oct 24 2022, 2:33 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 24 2022, 2:33 PM

Herald added subscribers: • pcwang-thead, eopXD, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B194034: Diff 470292.Oct 24 2022, 5:14 PM

craig.topper added inline comments.Oct 25 2022, 2:03 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9888	Is this something we should squash in lowering of insert_vector_elt instead?

craig.topper added inline comments.Oct 25 2022, 2:09 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9888	These all seem to be coming from RV32 insert_vector_elt of i64. If the index is 0 we can use two tail undisturbed slide1ups with VL=2 if the vector to insert into isn't undef. If it is undef we just the slide1ups.

craig.topper mentioned this in D136738: [RISCV] Optimize i64 insertelt on RV32..Oct 25 2022, 10:52 PM

reames added inline comments.Oct 26 2022, 8:56 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9888	So, I agree we could do this via lowering. Your D136738 does exactly that. What I'm not clear on is why you prefer special casing this in lowering, as opposed to having a generic combine which could catch other - currently unknown - cases as well. This code actually looks simpler to me than the version you had, and more general. What's your concern with this approach?

craig.topper added inline comments.Oct 26 2022, 9:31 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9888	D136738 is mostly about improving the non-undef case. I had to special case undef between the two slide1downs to keep the second vslide1down from being TU in the undef case. I could check for 0 index and non-undef instead and let the old code run for the 0 index undef case. Then use this DAGCombine to fix the vslideup.

Defer to craig on approach.

craig.topper mentioned this in rG6a794419cddb: [RISCV] Optimize i64 insertelt on RV32..Oct 28 2022, 10:23 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVISelLowering.cpp

7 lines

test/

CodeGen/

RISCV/

rvv/

fixed-vectors-bitcast.ll

12 lines

fixed-vectors-fp-bitcast.ll

12 lines

Diff 470292

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,877 Lines • ▼ Show 20 Lines	if ((SrcVT == MVT::v1i1 \|\| SrcVT == MVT::v2i1 \|\| SrcVT == MVT::v4i1) &&
SDLoc DL(N);		SDLoc DL(N);
N0 = DAG.getNode(ISD::CONCAT_VECTORS, DL, MVT::v8i1, Ops);		N0 = DAG.getNode(ISD::CONCAT_VECTORS, DL, MVT::v8i1, Ops);
N0 = DAG.getBitcast(MVT::i8, N0);		N0 = DAG.getBitcast(MVT::i8, N0);
return DAG.getNode(ISD::TRUNCATE, DL, VT, N0);		return DAG.getNode(ISD::TRUNCATE, DL, VT, N0);
}		}

return SDValue();		return SDValue();
}		}
		case RISCVISD::VSLIDEDOWN_VL:
		case RISCVISD::VSLIDEUP_VL:
		// vslidedown.vi undef, src, 0 -> src
		craig.topperUnsubmitted Not Done Reply Inline Actions Is this something we should squash in lowering of insert_vector_elt instead? craig.topper: Is this something we should squash in lowering of insert_vector_elt instead?
		craig.topperUnsubmitted Not Done Reply Inline Actions These all seem to be coming from RV32 insert_vector_elt of i64. If the index is 0 we can use two tail undisturbed slide1ups with VL=2 if the vector to insert into isn't undef. If it is undef we just the slide1ups. craig.topper: These all seem to be coming from RV32 insert_vector_elt of i64. If the index is 0 we can use…
		reamesAuthorUnsubmitted Done Reply Inline Actions So, I agree we could do this via lowering. Your D136738 does exactly that. What I'm not clear on is why you prefer special casing this in lowering, as opposed to having a generic combine which could catch other - currently unknown - cases as well. This code actually looks simpler to me than the version you had, and more general. What's your concern with this approach? reames: So, I agree we could do this via lowering. Your D136738 does exactly that. What I'm not clear…
		craig.topperUnsubmitted Not Done Reply Inline Actions D136738 is mostly about improving the non-undef case. I had to special case undef between the two slide1downs to keep the second vslide1down from being TU in the undef case. I could check for 0 index and non-undef instead and let the old code run for the 0 index undef case. Then use this DAGCombine to fix the vslideup. craig.topper: D136738 is mostly about improving the non-undef case. I had to special case undef between the…
		// vslideup.vi undef, src, 0 -> src
		if (N->getOperand(0).isUndef() && isa<ConstantSDNode>(N->getOperand(2)) &&
		cast<ConstantSDNode>(N->getOperand(2))->isZero())
		return N->getOperand(1);
}		}

return SDValue();		return SDValue();
}		}

bool RISCVTargetLowering::isDesirableToCommuteWithShift(		bool RISCVTargetLowering::isDesirableToCommuteWithShift(
const SDNode *N, CombineLevel Level) const {		const SDNode *N, CombineLevel Level) const {
assert((N->getOpcode() == ISD::SHL \|\| N->getOpcode() == ISD::SRA \|\|		assert((N->getOpcode() == ISD::SHL \|\| N->getOpcode() == ISD::SRA \|\|
▲ Show 20 Lines • Show All 3,223 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bitcast.ll

	Show First 20 Lines • Show All 504 Lines • ▼ Show 20 Lines
	}			}

	define <4 x i16> @bitcast_i64_v4i16(i64 %a) {			define <4 x i16> @bitcast_i64_v4i16(i64 %a) {
	; RV32-LABEL: bitcast_i64_v4i16:			; RV32-LABEL: bitcast_i64_v4i16:
	; RV32: # %bb.0:			; RV32: # %bb.0:
	; RV32-NEXT: vsetivli zero, 2, e32, m1, ta, ma			; RV32-NEXT: vsetivli zero, 2, e32, m1, ta, ma
	; RV32-NEXT: vmv.v.i v8, 0			; RV32-NEXT: vmv.v.i v8, 0
	; RV32-NEXT: vslide1up.vx v9, v8, a1			; RV32-NEXT: vslide1up.vx v9, v8, a1
	; RV32-NEXT: vslide1up.vx v10, v9, a0			; RV32-NEXT: vslide1up.vx v8, v9, a0
	; RV32-NEXT: vsetivli zero, 1, e64, m1, ta, ma
	; RV32-NEXT: vslideup.vi v8, v10, 0
	; RV32-NEXT: ret			; RV32-NEXT: ret
	;			;
	; RV64-LABEL: bitcast_i64_v4i16:			; RV64-LABEL: bitcast_i64_v4i16:
	; RV64: # %bb.0:			; RV64: # %bb.0:
	; RV64-NEXT: vsetivli zero, 1, e64, m1, ta, ma			; RV64-NEXT: vsetivli zero, 1, e64, m1, ta, ma
	; RV64-NEXT: vmv.s.x v8, a0			; RV64-NEXT: vmv.s.x v8, a0
	; RV64-NEXT: ret			; RV64-NEXT: ret
	;			;
	Show All 20 Lines
	}			}

	define <2 x i32> @bitcast_i64_v2i32(i64 %a) {			define <2 x i32> @bitcast_i64_v2i32(i64 %a) {
	; RV32-LABEL: bitcast_i64_v2i32:			; RV32-LABEL: bitcast_i64_v2i32:
	; RV32: # %bb.0:			; RV32: # %bb.0:
	; RV32-NEXT: vsetivli zero, 2, e32, m1, ta, ma			; RV32-NEXT: vsetivli zero, 2, e32, m1, ta, ma
	; RV32-NEXT: vmv.v.i v8, 0			; RV32-NEXT: vmv.v.i v8, 0
	; RV32-NEXT: vslide1up.vx v9, v8, a1			; RV32-NEXT: vslide1up.vx v9, v8, a1
	; RV32-NEXT: vslide1up.vx v10, v9, a0			; RV32-NEXT: vslide1up.vx v8, v9, a0
	; RV32-NEXT: vsetivli zero, 1, e64, m1, ta, ma
	; RV32-NEXT: vslideup.vi v8, v10, 0
	; RV32-NEXT: ret			; RV32-NEXT: ret
	;			;
	; RV64-LABEL: bitcast_i64_v2i32:			; RV64-LABEL: bitcast_i64_v2i32:
	; RV64: # %bb.0:			; RV64: # %bb.0:
	; RV64-NEXT: vsetivli zero, 1, e64, m1, ta, ma			; RV64-NEXT: vsetivli zero, 1, e64, m1, ta, ma
	; RV64-NEXT: vmv.s.x v8, a0			; RV64-NEXT: vmv.s.x v8, a0
	; RV64-NEXT: ret			; RV64-NEXT: ret
	;			;
	Show All 20 Lines
	}			}

	define <1 x i64> @bitcast_i64_v1i64(i64 %a) {			define <1 x i64> @bitcast_i64_v1i64(i64 %a) {
	; RV32-LABEL: bitcast_i64_v1i64:			; RV32-LABEL: bitcast_i64_v1i64:
	; RV32: # %bb.0:			; RV32: # %bb.0:
	; RV32-NEXT: vsetivli zero, 2, e32, m1, ta, ma			; RV32-NEXT: vsetivli zero, 2, e32, m1, ta, ma
	; RV32-NEXT: vmv.v.i v8, 0			; RV32-NEXT: vmv.v.i v8, 0
	; RV32-NEXT: vslide1up.vx v9, v8, a1			; RV32-NEXT: vslide1up.vx v9, v8, a1
	; RV32-NEXT: vslide1up.vx v10, v9, a0			; RV32-NEXT: vslide1up.vx v8, v9, a0
	; RV32-NEXT: vsetivli zero, 1, e64, m1, ta, ma
	; RV32-NEXT: vslideup.vi v8, v10, 0
	; RV32-NEXT: ret			; RV32-NEXT: ret
	;			;
	; RV64-LABEL: bitcast_i64_v1i64:			; RV64-LABEL: bitcast_i64_v1i64:
	; RV64: # %bb.0:			; RV64: # %bb.0:
	; RV64-NEXT: vsetivli zero, 1, e64, m1, ta, ma			; RV64-NEXT: vsetivli zero, 1, e64, m1, ta, ma
	; RV64-NEXT: vmv.s.x v8, a0			; RV64-NEXT: vmv.s.x v8, a0
	; RV64-NEXT: ret			; RV64-NEXT: ret
	;			;
	; ELEN32-LABEL: bitcast_i64_v1i64:			; ELEN32-LABEL: bitcast_i64_v1i64:
	; ELEN32: # %bb.0:			; ELEN32: # %bb.0:
	; ELEN32-NEXT: ret			; ELEN32-NEXT: ret
	%b = bitcast i64 %a to <1 x i64>			%b = bitcast i64 %a to <1 x i64>
	ret <1 x i64> %b			ret <1 x i64> %b
	}			}

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-bitcast.ll

	Show First 20 Lines • Show All 195 Lines • ▼ Show 20 Lines
	}			}

	define <4 x half> @bitcast_i64_v4f16(i64 %a) {			define <4 x half> @bitcast_i64_v4f16(i64 %a) {
	; RV32-FP-LABEL: bitcast_i64_v4f16:			; RV32-FP-LABEL: bitcast_i64_v4f16:
	; RV32-FP: # %bb.0:			; RV32-FP: # %bb.0:
	; RV32-FP-NEXT: vsetivli zero, 2, e32, m1, ta, ma			; RV32-FP-NEXT: vsetivli zero, 2, e32, m1, ta, ma
	; RV32-FP-NEXT: vmv.v.i v8, 0			; RV32-FP-NEXT: vmv.v.i v8, 0
	; RV32-FP-NEXT: vslide1up.vx v9, v8, a1			; RV32-FP-NEXT: vslide1up.vx v9, v8, a1
	; RV32-FP-NEXT: vslide1up.vx v10, v9, a0			; RV32-FP-NEXT: vslide1up.vx v8, v9, a0
	; RV32-FP-NEXT: vsetivli zero, 1, e64, m1, ta, ma
	; RV32-FP-NEXT: vslideup.vi v8, v10, 0
	; RV32-FP-NEXT: ret			; RV32-FP-NEXT: ret
	;			;
	; RV64-FP-LABEL: bitcast_i64_v4f16:			; RV64-FP-LABEL: bitcast_i64_v4f16:
	; RV64-FP: # %bb.0:			; RV64-FP: # %bb.0:
	; RV64-FP-NEXT: vsetivli zero, 1, e64, m1, ta, ma			; RV64-FP-NEXT: vsetivli zero, 1, e64, m1, ta, ma
	; RV64-FP-NEXT: vmv.s.x v8, a0			; RV64-FP-NEXT: vmv.s.x v8, a0
	; RV64-FP-NEXT: ret			; RV64-FP-NEXT: ret
	%b = bitcast i64 %a to <4 x half>			%b = bitcast i64 %a to <4 x half>
	ret <4 x half> %b			ret <4 x half> %b
	}			}

	define <2 x float> @bitcast_i64_v2f32(i64 %a) {			define <2 x float> @bitcast_i64_v2f32(i64 %a) {
	; RV32-FP-LABEL: bitcast_i64_v2f32:			; RV32-FP-LABEL: bitcast_i64_v2f32:
	; RV32-FP: # %bb.0:			; RV32-FP: # %bb.0:
	; RV32-FP-NEXT: vsetivli zero, 2, e32, m1, ta, ma			; RV32-FP-NEXT: vsetivli zero, 2, e32, m1, ta, ma
	; RV32-FP-NEXT: vmv.v.i v8, 0			; RV32-FP-NEXT: vmv.v.i v8, 0
	; RV32-FP-NEXT: vslide1up.vx v9, v8, a1			; RV32-FP-NEXT: vslide1up.vx v9, v8, a1
	; RV32-FP-NEXT: vslide1up.vx v10, v9, a0			; RV32-FP-NEXT: vslide1up.vx v8, v9, a0
	; RV32-FP-NEXT: vsetivli zero, 1, e64, m1, ta, ma
	; RV32-FP-NEXT: vslideup.vi v8, v10, 0
	; RV32-FP-NEXT: ret			; RV32-FP-NEXT: ret
	;			;
	; RV64-FP-LABEL: bitcast_i64_v2f32:			; RV64-FP-LABEL: bitcast_i64_v2f32:
	; RV64-FP: # %bb.0:			; RV64-FP: # %bb.0:
	; RV64-FP-NEXT: vsetivli zero, 1, e64, m1, ta, ma			; RV64-FP-NEXT: vsetivli zero, 1, e64, m1, ta, ma
	; RV64-FP-NEXT: vmv.s.x v8, a0			; RV64-FP-NEXT: vmv.s.x v8, a0
	; RV64-FP-NEXT: ret			; RV64-FP-NEXT: ret
	%b = bitcast i64 %a to <2 x float>			%b = bitcast i64 %a to <2 x float>
	ret <2 x float> %b			ret <2 x float> %b
	}			}

	define <1 x double> @bitcast_i64_v1f64(i64 %a) {			define <1 x double> @bitcast_i64_v1f64(i64 %a) {
	; RV32-FP-LABEL: bitcast_i64_v1f64:			; RV32-FP-LABEL: bitcast_i64_v1f64:
	; RV32-FP: # %bb.0:			; RV32-FP: # %bb.0:
	; RV32-FP-NEXT: vsetivli zero, 2, e32, m1, ta, ma			; RV32-FP-NEXT: vsetivli zero, 2, e32, m1, ta, ma
	; RV32-FP-NEXT: vmv.v.i v8, 0			; RV32-FP-NEXT: vmv.v.i v8, 0
	; RV32-FP-NEXT: vslide1up.vx v9, v8, a1			; RV32-FP-NEXT: vslide1up.vx v9, v8, a1
	; RV32-FP-NEXT: vslide1up.vx v10, v9, a0			; RV32-FP-NEXT: vslide1up.vx v8, v9, a0
	; RV32-FP-NEXT: vsetivli zero, 1, e64, m1, ta, ma
	; RV32-FP-NEXT: vslideup.vi v8, v10, 0
	; RV32-FP-NEXT: ret			; RV32-FP-NEXT: ret
	;			;
	; RV64-FP-LABEL: bitcast_i64_v1f64:			; RV64-FP-LABEL: bitcast_i64_v1f64:
	; RV64-FP: # %bb.0:			; RV64-FP: # %bb.0:
	; RV64-FP-NEXT: vsetivli zero, 1, e64, m1, ta, ma			; RV64-FP-NEXT: vsetivli zero, 1, e64, m1, ta, ma
	; RV64-FP-NEXT: vmv.s.x v8, a0			; RV64-FP-NEXT: vmv.s.x v8, a0
	; RV64-FP-NEXT: ret			; RV64-FP-NEXT: ret
	%b = bitcast i64 %a to <1 x double>			%b = bitcast i64 %a to <1 x double>
	▲ Show 20 Lines • Show All 122 Lines • Show Last 20 Lines