This is an archive of the discontinued LLVM Phabricator instance.

[RISCV]Combine to make the fixed vector vadd of splats are scalarized
Needs ReviewPublic

Authored by liaolucy on Aug 30 2023, 6:00 AM.

Download Raw Diff

Details

Reviewers

craig.topper
luke

Summary

vadd (build_vector x), (build_vector c) -> build_vector (add x, c)
Try to fix https://github.com/llvm/llvm-project/issues/65068
Subsequently, more operands can be extended.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

liaolucy created this revision.Aug 30 2023, 6:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 30 2023, 6:00 AM

Herald added subscribers: jobnoorman, sunshaoce, VincentWu and 28 others. · View Herald Transcript

liaolucy requested review of this revision.Aug 30 2023, 6:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 30 2023, 6:00 AM

Herald added subscribers: llvm-commits, wangpc, eopXD, MaskRay. · View Herald Transcript

Thanks for the patch. Did you take a look at scalarizeBinOpOfSplats in DAGCombiner? I think that might be more generic, it's the combine that kicks in for the scalable vector test case in that issue. I was wondering if it would be possible to get it to work on fixed vectors too.

In D159190#4628179, @luke wrote:

Thanks for the patch. Did you take a look at scalarizeBinOpOfSplats in DAGCombiner? I think that might be more generic, it's the combine that kicks in for the scalable vector test case in that issue. I was wondering if it would be possible to get it to work on fixed vectors too.

I have a question, this ir has been optimized in the opt phase. https://godbolt.org/z/MPvvTG5dT Maybe we don't need another combine in codegen?

In D159190#4628284, @liaolucy wrote:

In D159190#4628179, @luke wrote:

Thanks for the patch. Did you take a look at scalarizeBinOpOfSplats in DAGCombiner? I think that might be more generic, it's the combine that kicks in for the scalable vector test case in that issue. I was wondering if it would be possible to get it to work on fixed vectors too.

I have a question, this ir has been optimized in the opt phase. https://godbolt.org/z/MPvvTG5dT Maybe we don't need another combine in codegen?

Good point, I didn't realise opt already took care of this. I guess it depends if this pattern is introduced during codegen at all

In D159190#4628357, @luke wrote:

In D159190#4628284, @liaolucy wrote:

In D159190#4628179, @luke wrote:

Thanks for the patch. Did you take a look at scalarizeBinOpOfSplats in DAGCombiner? I think that might be more generic, it's the combine that kicks in for the scalable vector test case in that issue. I was wondering if it would be possible to get it to work on fixed vectors too.

I have a question, this ir has been optimized in the opt phase. https://godbolt.org/z/MPvvTG5dT Maybe we don't need another combine in codegen?

Good point, I didn't realise opt already took care of this. I guess it depends if this pattern is introduced during codegen at all

I originally noticed this when looking at the code generated whilst expanding ISD::ROT{L,R}, e.g. https://godbolt.org/z/Wh99EPbja
So it can be introduced after the middle-end (but it might not be that common)

In D159190#4628397, @luke wrote:

In D159190#4628357, @luke wrote:

In D159190#4628284, @liaolucy wrote:

In D159190#4628179, @luke wrote:

Thanks for the patch. Did you take a look at scalarizeBinOpOfSplats in DAGCombiner? I think that might be more generic, it's the combine that kicks in for the scalable vector test case in that issue. I was wondering if it would be possible to get it to work on fixed vectors too.

I have a question, this ir has been optimized in the opt phase. https://godbolt.org/z/MPvvTG5dT Maybe we don't need another combine in codegen?

Good point, I didn't realise opt already took care of this. I guess it depends if this pattern is introduced during codegen at all

I originally noticed this when looking at the code generated whilst expanding ISD::ROT{L,R}, e.g. https://godbolt.org/z/Wh99EPbja
So it can be introduced after the middle-end (but it might not be that common)

Okay, making sure it's back-end introduced looks like it can be optimized. thanks

Another solution, enabling the isExtractVecEltCheap when XlenVT == vector_element_type, It's working for pr65068.ll.

But I'm worried about the side effects of this approach. How to test fixed length vectors? Any suggestions?

Try to use the command: -march=rv64gcv -mllvm -riscv-v-vector-bits-max=128 -mllvm -riscv-v-vector-bits-min=128 , but the llvm ir scalable(not fixed length vectors.)

In D159190#4634146, @liaolucy wrote:

Another solution, enabling the isExtractVecEltCheap when XlenVT == vector_element_type, It's working for pr65068.ll.

It looks like isExtractVecEltCheap might be meant for "free" extracts:

For example, if scalar operations occur on the same register file as vector operations, then an extract element may be a sub-register rename rather than an actual instruction.

But on RISC-V we'll need a vmv.x.s, so I don't think that qualifies as "free". But this is only if it's coming from a vector_shuffle if I understand correctly: build_vector splats and splat_vector splats should indeed be free because the extract_vector_elt will be combined away.

But I'm worried about the side effects of this approach. How to test fixed length vectors? Any suggestions?

Try to use the command: -march=rv64gcv -mllvm -riscv-v-vector-bits-max=128 -mllvm -riscv-v-vector-bits-min=128 , but the llvm ir scalable(not fixed length vectors.)

Is pr65068.ll not already testing this on fixed length vectors?

In D159190#4634302, @luke wrote:

But on RISC-V we'll need a vmv.x.s, so I don't think that qualifies as "free". But this is only if it's coming from a vector_shuffle if I understand correctly: build_vector splats and splat_vector splats should indeed be free because the extract_vector_elt will be combined away.

I gave this a try, and it sadly results in a lot of regressions: https://github.com/llvm/llvm-project/commit/0dee41b1ea16a022495a06680410090a25b5bb66
I haven't looked into it yet, but I'm definitely missing something

GitHub <noreply@github.com> mentioned this in rG450dfab8c363: [RISCV] Add tests where bin ops of splats could be scalarized. NFC (#65747).Sep 20 2023, 5:24 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVISelLowering.cpp

46 lines

test/

CodeGen/

RISCV/

rvv/

pr65068.ll

16 lines

Diff 554682

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 11,198 Lines • ▼ Show 20 Lines	static SDValue combineAddOfBooleanXor(SDNode *N, SelectionDAG &DAG) {
if (!DAG.MaskedValueIsZero(N0.getOperand(0), Mask))		if (!DAG.MaskedValueIsZero(N0.getOperand(0), Mask))
return SDValue();		return SDValue();

// Emit a negate of the setcc.		// Emit a negate of the setcc.
return DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT),		return DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT),
N0.getOperand(0));		N0.getOperand(0));
}		}

		// vadd (build_vector x), (build_vector c)
		// -> build_vector (add x, c)
		static SDValue transformSplatVectorToScalar(SDNode *N, SelectionDAG &DAG,
		const RISCVSubtarget &Subtarget) {
		SDValue N0 = N->getOperand(0);
		SDValue N1 = N->getOperand(1);
		APInt ShAmt;
		if (N->getOpcode() == ISD::ADD && N0.getOpcode() == ISD::BUILD_VECTOR &&
		ISD::isConstantSplatVector(N1.getNode(), ShAmt) && N0.hasOneUse() &&
		N1.hasOneUse()) {
		EVT VT0 = N0.getValueType();
		EVT VT1 = N1.getValueType();
		unsigned NumElts0 = VT0.getVectorNumElements();
		unsigned NumElts1 = VT1.getVectorNumElements();
		SDValue Base;
		SDLoc DL(N);
		bool AllSame = true;
		for (unsigned i = 0; i != NumElts0; ++i) {
		if (!N0.getOperand(i).isUndef()) {
		Base = N0.getOperand(i);
		break;
		}
		}
		for (unsigned i = 0; i != NumElts0; ++i) {
		if (N0.getOperand(i) != Base) {
		AllSame = false;
		break;
		}
		}
		MVT XLenVT = Subtarget.getXLenVT();
		if (AllSame && NumElts0 == NumElts1 && Base.getValueType() == XLenVT) {
		SDValue NAdd = DAG.getNode(ISD::ADD, DL, XLenVT, Base, N1.getOperand(0));
		SmallVector<SDValue, 6> NOps;
		for (unsigned i = 0; i != NumElts0; ++i) {
		NOps.push_back(NAdd);
		}
		return DAG.getNode(ISD::BUILD_VECTOR, DL, VT0, NOps);
		}
		}
		return SDValue();
		}

static SDValue performADDCombine(SDNode *N, SelectionDAG &DAG,		static SDValue performADDCombine(SDNode *N, SelectionDAG &DAG,
const RISCVSubtarget &Subtarget) {		const RISCVSubtarget &Subtarget) {
		EVT VT = N->getValueType(0);
		if (VT.isFixedLengthVector())
		return transformSplatVectorToScalar(N, DAG, Subtarget);

if (SDValue V = combineAddOfBooleanXor(N, DAG))		if (SDValue V = combineAddOfBooleanXor(N, DAG))
return V;		return V;
if (SDValue V = transformAddImmMulImm(N, DAG, Subtarget))		if (SDValue V = transformAddImmMulImm(N, DAG, Subtarget))
return V;		return V;
if (SDValue V = transformAddShlImm(N, DAG, Subtarget))		if (SDValue V = transformAddShlImm(N, DAG, Subtarget))
return V;		return V;
if (SDValue V = combineBinOpToReduce(N, DAG, Subtarget))		if (SDValue V = combineBinOpToReduce(N, DAG, Subtarget))
return V;		return V;
▲ Show 20 Lines • Show All 7,078 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/pr65068.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3
				; RUN: llc -mtriple=riscv64 -mattr=+v -riscv-v-vector-bits-min=128 -verify-machineinstrs < %s \| FileCheck %s --check-prefixes=CHECK-RV64

				define <4 x i64> @f_v4i64(<4 x i64> %x, i64 %y) {
				; CHECK-RV64-LABEL: f_v4i64:
				; CHECK-RV64: # %bb.0:
				; CHECK-RV64-NEXT: addi a0, a0, 3
				; CHECK-RV64-NEXT: vsetivli zero, 4, e64, m2, ta, ma
				; CHECK-RV64-NEXT: vmul.vx v8, v8, a0
				; CHECK-RV64-NEXT: ret
				%1 = insertelement <4 x i64> poison, i64 %y, i32 0
				%2 = shufflevector <4 x i64> %1, <4 x i64> poison, <4 x i32> zeroinitializer
				%3 = add <4 x i64> %2, <i64 3, i64 3, i64 3, i64 3>
				%4 = mul <4 x i64> %x, %3
				ret <4 x i64> %4
				}