This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
2/5
DAGCombiner.cpp
-
test/CodeGen/
-
CodeGen/
-
RISCV/
-
vector-extract-elt-bitcast-splat.ll
-
WebAssembly/
-
pr59626.ll

Differential D141079

[DAGCombine] Fold constants across (extract_vector_elt (bitcast (splat_vector)))
Needs ReviewPublic

Authored by luke on Jan 5 2023, 10:54 AM.

Download Raw Diff

Details

Reviewers

aheejin
tlively
reames
hokein
frasercrmck
paulwalker-arm
RKSimon

Summary

In order to fix a regression in WebAssembly introduced by D139871, manually fold in any constants through a bitcast of a splat when visiting extract_vec_elt.

Ideally we would do the constant folding on the bitcast itself (i.e. (bitcast (splat_vector x)) -> (splat_vector y)), but this is not always possible.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	110 ms	x64 debian > LLVM.CodeGen/SystemZ::memset-08.ll
	90 ms	x64 debian > LLVM.CodeGen/SystemZ::store-replicated-vals.ll

Event Timeline

luke created this revision.Jan 5 2023, 10:54 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2023, 10:54 AM

Herald added subscribers: pmatos, asb, ecnelises and 5 others. · View Herald Transcript

luke requested review of this revision.Jan 5 2023, 10:54 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2023, 10:54 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

luke added a parent revision: D141075: [SelectionDAG] Implicitly truncate known bits in SPLAT_VECTOR.Jan 5 2023, 10:55 AM

Could you add a test showing the regression in a preliminary patch so that we can see the improvement in this patch?

Harbormaster completed remote builds in B205956: Diff 486633.Jan 5 2023, 1:02 PM

reames added inline comments.Jan 5 2023, 2:50 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14066	This looks to be assuming fixed width splat_vectors. The primary use of splat_vector are scalable vectors.
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
3033 ↗	(On Diff #486633)	You should be able to separate this into it's own patch with test coverage. Note that this code is currently restricted to fixed length splat_vectors - which only hexagon currently uses. You could chose to generalize the routine to scalable vectors if that was helpful.

luke mentioned this in D141120: [WebAssembly][NFC] Add test case for PR59626.Jan 6 2023, 3:33 AM

luke added a parent revision: D141120: [WebAssembly][NFC] Add test case for PR59626.Jan 6 2023, 3:34 AM

luke removed a parent revision: D141120: [WebAssembly][NFC] Add test case for PR59626.Jan 6 2023, 3:43 AM

Rebase on top of D141075

Update test

In D141079#4029594, @tlively wrote:

Could you add a test showing the regression in a preliminary patch so that we can see the improvement in this patch?

Of course, I've separated out my journey here into D141120 and D141075.
To give some context, this is to prevent a regression from D139871 in the code generated in pr59626.ll, which is taken from https://github.com/llvm/llvm-project/issues/59626. (For wasm32: on wasm64 it just crashed, but D139871 fixed that)

luke added inline comments.Jan 6 2023, 4:13 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14066	That makes sense, I was wondering what the difference was between a splat_vector and a splatted build_vector. In this case then is it still possible to fold here?
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
3033 ↗	(On Diff #486633)	WebAssembly now uses fixed length splat_vectors too to aid in selecting splatted loads (D139871). Will take a look at generalising this

luke marked an inline comment as not done.Jan 6 2023, 4:14 AM

Harbormaster completed remote builds in B206071: Diff 486805.Jan 6 2023, 4:49 AM

reames added inline comments.Jan 6 2023, 7:23 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14066	To my knowledge, we're a bit inconsistent about this. RISCV uses SPLAT_VECTOR only for scalable vectors. Hexagaon (and per your other comment, WebAssembly) use them for both fixed and scalable. I'm also unclear on when they use SPLAT_VECTOR vs BUILD_VECTOR. Longer term, I do think that having one canonical representation for a splat vector makes sense, and that it'll probably be SPLAT_VECTOR. We're just not there yet. In particular, DAGCombine has various weaknesses for SPLAT_VECTOR that need to be worked through.

luke mentioned this in rGb599a30e931e: [WebAssembly][NFC] Add test case for PR59626.Jan 6 2023, 7:44 AM

luke added inline comments.Jan 6 2023, 9:27 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14066	This is a RISC-V test case I was able to throw together that shows the optimisation opportunity for scalable vectors: define i32 @f(<vscale x 2 x i64> %a) { %v = insertelement <vscale x 2 x i64> %a, i64 0, i32 0 %w = shufflevector <vscale x 2 x i64> %v, <vscale x 2 x i64> undef, <vscale x 2 x i32> zeroinitializer %x = bitcast <vscale x 2 x i64> %w to <vscale x 4 x i32> %y = extractelement <vscale x 4 x i32> %x, i32 0 ret i32 %y } After the first DAG combine it looks like this: Optimized lowered selection DAG: %bb.0 'f:' SelectionDAG has 9 nodes: t0: ch,glue = EntryToken t7: nxv2i64 = splat_vector Constant:i64<0> t8: nxv4i32 = bitcast t7 t9: i32 = extract_vector_elt t8, Constant:i32<0> t11: ch,glue = CopyToReg t0, Register:i32 $x10, t9 t12: ch = RISCVISD::RET_FLAG t11, Register:i32 $x10, t11:1 If I'm not mistaken, it should be possible to constant fold the constant in `t7` into `t9`, but the lack of constant folding for `splat_vector`s in `bitcast`s prevents this. I guess this is what I was trying to achieve with WebAssembly, except it was with fixed size vectors, so as you pointed out just making a splatted `build_vector` doesn't work.
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
3033 ↗	(On Diff #486633)	Writing this down here before I forget: I needed to provide this information in simplifyDemandedVecElts, because it was used by `SimplifyDemandedBits`, which is in turn used in `DAGCombiner::visitSTORE`

luke added inline comments.Jan 6 2023, 12:34 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14066	After some digging, I've found it's not always possible to constant fold `(bitcast (splat_vector x)) -> (splat_vector y)`, at least not into another `splat_vector`. It's only possible whenever the bitcast type has a larger scalar element type than the original element type. For now, I'm trying to see if combining `(extract_vector_elt (bitcast (splat_vector x)) n) -> y` yields similar results.

Rework the approach

Herald added subscribers: • pcwang-thead, luismarques, apazos and 18 others. · View Herald TranscriptJan 6 2023, 12:57 PM

luke retitled this revision from [SelectionDAG] Improve constant folding in the presence of SPLAT_VECTOR to [DAGCombine] Fold constants across (extract_vector_elt (bitcast (splat_vector))).Jan 6 2023, 12:58 PM

luke edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B206184: Diff 486966.Jan 6 2023, 2:00 PM

How come SelectionDAG::computeKnownBits isn't handling this?

Also, we don't catch this at the IR level either (InstCombine / ValueTracking)

RKSimon added a reviewer: RKSimon.Jan 8 2023, 9:55 AM

luke added a comment.Jan 9 2023, 3:00 AM

This comment was removed by luke.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

20 lines

test/

CodeGen/

RISCV/

vector-extract-elt-bitcast-splat.ll

17 lines

WebAssembly/

pr59626.ll

10 lines

Diff 486966

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 14,057 Lines • ▼ Show 20 Lines	return ConstantFoldBITCASTofBUILD_VECTOR(N0.getNode(),
VT.getVectorElementType());		VT.getVectorElementType());

// If the input is a constant, let getNode fold it.		// If the input is a constant, let getNode fold it.
if (isIntOrFPConstant(N0)) {		if (isIntOrFPConstant(N0)) {
// If we can't allow illegal operations, we need to check that this is just		// If we can't allow illegal operations, we need to check that this is just
// a fp -> int or int -> conversion and that the resulting operation will		// a fp -> int or int -> conversion and that the resulting operation will
// be legal.		// be legal.
if (!LegalOperations \|\|		if (!LegalOperations \|\|
(isa<ConstantSDNode>(N0) && VT.isFloatingPoint() && !VT.isVector() &&		(isa<ConstantSDNode>(N0) && VT.isFloatingPoint() && !VT.isVector() &&
		reamesUnsubmitted Not Done Reply Inline Actions This looks to be assuming fixed width splat_vectors. The primary use of splat_vector are scalable vectors. reames: This looks to be assuming fixed width splat_vectors. The primary use of splat_vector are…
		lukeAuthorUnsubmitted Not Done Reply Inline Actions That makes sense, I was wondering what the difference was between a splat_vector and a splatted build_vector. In this case then is it still possible to fold here? luke: That makes sense, I was wondering what the difference was between a splat_vector and a splatted…
		reamesUnsubmitted Not Done Reply Inline Actions To my knowledge, we're a bit inconsistent about this. RISCV uses SPLAT_VECTOR only for scalable vectors. Hexagaon (and per your other comment, WebAssembly) use them for both fixed and scalable. I'm also unclear on when they use SPLAT_VECTOR vs BUILD_VECTOR. Longer term, I do think that having one canonical representation for a splat vector makes sense, and that it'll probably be SPLAT_VECTOR. We're just not there yet. In particular, DAGCombine has various weaknesses for SPLAT_VECTOR that need to be worked through. reames: To my knowledge, we're a bit inconsistent about this. RISCV uses SPLAT_VECTOR only for…
		lukeAuthorUnsubmitted Done Reply Inline Actions This is a RISC-V test case I was able to throw together that shows the optimisation opportunity for scalable vectors: define i32 @f(<vscale x 2 x i64> %a) { %v = insertelement <vscale x 2 x i64> %a, i64 0, i32 0 %w = shufflevector <vscale x 2 x i64> %v, <vscale x 2 x i64> undef, <vscale x 2 x i32> zeroinitializer %x = bitcast <vscale x 2 x i64> %w to <vscale x 4 x i32> %y = extractelement <vscale x 4 x i32> %x, i32 0 ret i32 %y } After the first DAG combine it looks like this: Optimized lowered selection DAG: %bb.0 'f:' SelectionDAG has 9 nodes: t0: ch,glue = EntryToken t7: nxv2i64 = splat_vector Constant:i64<0> t8: nxv4i32 = bitcast t7 t9: i32 = extract_vector_elt t8, Constant:i32<0> t11: ch,glue = CopyToReg t0, Register:i32 $x10, t9 t12: ch = RISCVISD::RET_FLAG t11, Register:i32 $x10, t11:1 If I'm not mistaken, it should be possible to constant fold the constant in `t7` into `t9`, but the lack of constant folding for `splat_vector`s in `bitcast`s prevents this. I guess this is what I was trying to achieve with WebAssembly, except it was with fixed size vectors, so as you pointed out just making a splatted `build_vector` doesn't work. luke: This is a RISC-V test case I was able to throw together that shows the optimisation opportunity…
		lukeAuthorUnsubmitted Done Reply Inline Actions After some digging, I've found it's not always possible to constant fold `(bitcast (splat_vector x)) -> (splat_vector y)`, at least not into another `splat_vector`. It's only possible whenever the bitcast type has a larger scalar element type than the original element type. For now, I'm trying to see if combining `(extract_vector_elt (bitcast (splat_vector x)) n) -> y` yields similar results. luke: After some digging, I've found it's not always possible to constant fold `(bitcast…
TLI.isOperationLegal(ISD::ConstantFP, VT)) \|\|		TLI.isOperationLegal(ISD::ConstantFP, VT)) \|\|
(isa<ConstantFPSDNode>(N0) && VT.isInteger() && !VT.isVector() &&		(isa<ConstantFPSDNode>(N0) && VT.isInteger() && !VT.isVector() &&
TLI.isOperationLegal(ISD::Constant, VT))) {		TLI.isOperationLegal(ISD::Constant, VT))) {
SDValue C = DAG.getBitcast(VT, N0);		SDValue C = DAG.getBitcast(VT, N0);
if (C.getNode() != N)		if (C.getNode() != N)
return C;		return C;
}		}
}		}
▲ Show 20 Lines • Show All 6,458 Lines • ▼ Show 20 Lines	if (((IndexC && VecOp.getOpcode() == ISD::BUILD_VECTOR) \|\|

// TODO: It may be useful to truncate if free if the build_vector implicitly		// TODO: It may be useful to truncate if free if the build_vector implicitly
// converts.		// converts.
}		}

if (SDValue BO = scalarizeExtractedBinop(N, DAG, LegalOperations))		if (SDValue BO = scalarizeExtractedBinop(N, DAG, LegalOperations))
return BO;		return BO;

		// extract_vector_elt (bitcast (splat_vector x)), n -> y
		if (IndexC && ScalarVT.isInteger() && VecOp.getOpcode() == ISD::BITCAST &&
		VecOp.getOperand(0).getOpcode() == ISD::SPLAT_VECTOR) {
		SDNode *Splat = VecOp.getOperand(0).getNode();
		SDValue SplatVal = Splat->getOperand(0);
		if (SplatVal.isUndef())
		return DAG.getUNDEF(ScalarVT);

		if (auto *CInt = dyn_cast<ConstantSDNode>(SplatVal.getNode())) {
		APInt Val = APInt::getSplat(
		Splat->getValueType(0).getSizeInBits().getKnownMinSize(),
		CInt->getAPIntValue());

		APInt BitcastExtEl = Val.extractBits(ScalarVT.getFixedSizeInBits(),
		IndexC->getZExtValue() *
		ScalarVT.getFixedSizeInBits());
		return DAG.getConstant(BitcastExtEl, DL, ScalarVT);
		}
		}

if (VecVT.isScalableVector())		if (VecVT.isScalableVector())
return SDValue();		return SDValue();

// All the code from this point onwards assumes fixed width vectors, but it's		// All the code from this point onwards assumes fixed width vectors, but it's
// possible that some of the combinations could be made to work for scalable		// possible that some of the combinations could be made to work for scalable
// vectors too.		// vectors too.
unsigned NumElts = VecVT.getVectorNumElements();		unsigned NumElts = VecVT.getVectorNumElements();
unsigned VecEltBitWidth = VecVT.getScalarSizeInBits();		unsigned VecEltBitWidth = VecVT.getScalarSizeInBits();
▲ Show 20 Lines • Show All 5,662 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/vector-extract-elt-bitcast-splat.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=riscv32 -mattr=+v < %s \| FileCheck %s

				; This ensures that a bitcast of a splat_vector can be still be constant folded
				; from extract_vector_elt

				define i32 @f(<vscale x 2 x i64> %a) {
				; CHECK-LABEL: f:
				; CHECK: # %bb.0:
				; CHECK-NEXT: li a0, 0
				; CHECK-NEXT: ret
				%v = insertelement <vscale x 2 x i64> %a, i64 0, i32 0
				%w = shufflevector <vscale x 2 x i64> %v, <vscale x 2 x i64> undef, <vscale x 2 x i32> zeroinitializer
				%x = bitcast <vscale x 2 x i64> %w to <vscale x 4 x i32>
				%y = extractelement <vscale x 4 x i32> %x, i32 0
				ret i32 %y
				}

llvm/test/CodeGen/WebAssembly/pr59626.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=wasm32-- -mattr=+simd128 \| FileCheck --check-prefix=CHECK-32 %s			; RUN: llc < %s -mtriple=wasm32-- -mattr=+simd128 \| FileCheck --check-prefix=CHECK-32 %s
	; RUN: llc < %s -mtriple=wasm64-- -mattr=+simd128 \| FileCheck --check-prefix=CHECK-64 %s			; RUN: llc < %s -mtriple=wasm64-- -mattr=+simd128 \| FileCheck --check-prefix=CHECK-64 %s

	define i8 @f(ptr %0, ptr %1) {			define i8 @f(ptr %0, ptr %1) {
	; CHECK-32-LABEL: f:			; CHECK-32-LABEL: f:
	; CHECK-32: .functype f (i32, i32) -> (i32)			; CHECK-32: .functype f (i32, i32) -> (i32)
	; CHECK-32-NEXT: # %bb.0: # %BB			; CHECK-32-NEXT: # %bb.0: # %BB
	; CHECK-32-NEXT: local.get 0			; CHECK-32-NEXT: local.get 0
	; CHECK-32-NEXT: i32.const 0			; CHECK-32-NEXT: i32.const 0
	; CHECK-32-NEXT: i32.store8 2			; CHECK-32-NEXT: i32.store8 2
	; CHECK-32-NEXT: local.get 0			; CHECK-32-NEXT: local.get 0
	; CHECK-32-NEXT: i32.const 0			; CHECK-32-NEXT: i32.const 0
	; CHECK-32-NEXT: i32.store16 0			; CHECK-32-NEXT: i32.store16 0
	; CHECK-32-NEXT: local.get 1			; CHECK-32-NEXT: i32.const 0
	; CHECK-32-NEXT: local.get 0
	; CHECK-32-NEXT: i8x16.splat
	; CHECK-32-NEXT: v128.store16_lane 0, 0
	; CHECK-32-NEXT: v128.const 0, 0
	; CHECK-32-NEXT: i32x4.extract_lane 0
	; CHECK-32-NEXT: # fallthrough-return			; CHECK-32-NEXT: # fallthrough-return
	;			;
	; CHECK-64-LABEL: f:			; CHECK-64-LABEL: f:
	; CHECK-64: .functype f (i64, i64) -> (i32)			; CHECK-64: .functype f (i64, i64) -> (i32)
	; CHECK-64-NEXT: .local i32			; CHECK-64-NEXT: .local i32
	; CHECK-64-NEXT: # %bb.0: # %BB			; CHECK-64-NEXT: # %bb.0: # %BB
	; CHECK-64-NEXT: local.get 0			; CHECK-64-NEXT: local.get 0
	; CHECK-64-NEXT: i32.const 0			; CHECK-64-NEXT: i32.const 0
	; CHECK-64-NEXT: i32.store8 2			; CHECK-64-NEXT: i32.store8 2
	; CHECK-64-NEXT: local.get 0			; CHECK-64-NEXT: local.get 0
	; CHECK-64-NEXT: i32.const 0			; CHECK-64-NEXT: i32.const 0
	; CHECK-64-NEXT: i32.store16 0			; CHECK-64-NEXT: i32.store16 0
	; CHECK-64-NEXT: local.get 1			; CHECK-64-NEXT: local.get 1
	; CHECK-64-NEXT: local.get 2			; CHECK-64-NEXT: local.get 2
	; CHECK-64-NEXT: i8x16.splat			; CHECK-64-NEXT: i8x16.splat
	; CHECK-64-NEXT: v128.store16_lane 0, 0			; CHECK-64-NEXT: v128.store16_lane 0, 0
	; CHECK-64-NEXT: drop			; CHECK-64-NEXT: drop
	; CHECK-64-NEXT: v128.const 0, 0			; CHECK-64-NEXT: i32.const 0
	; CHECK-64-NEXT: i32x4.extract_lane 0
	; CHECK-64-NEXT: # fallthrough-return			; CHECK-64-NEXT: # fallthrough-return
	BB:			BB:
	store <3 x i8> zeroinitializer, ptr %0			store <3 x i8> zeroinitializer, ptr %0
	%S = shufflevector <3 x i128> zeroinitializer, <3 x i128> <i128 0, i128 1, i128 2>, <3 x i32> undef			%S = shufflevector <3 x i128> zeroinitializer, <3 x i128> <i128 0, i128 1, i128 2>, <3 x i32> undef
	%C = icmp ule <3 x i128> %S, zeroinitializer			%C = icmp ule <3 x i128> %S, zeroinitializer
	%C1 = zext <3 x i1> %C to <3 x i8>			%C1 = zext <3 x i1> %C to <3 x i8>
	%E = extractelement <3 x i8> %C1, i32 0			%E = extractelement <3 x i8> %C1, i32 0
	%B = sdiv <3 x i8> <i8 1, i8 3, i8 5>, %C1			%B = sdiv <3 x i8> <i8 1, i8 3, i8 5>, %C1
	store <3 x i8> %B, ptr %1			store <3 x i8> %B, ptr %1
	ret i8 %E			ret i8 %E
	}			}