This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
4
DAGCombiner.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
2
fold-constants.ll
-
ARM/
-
big-endian-vector-callee.ll
-
big-endian-vector-caller.ll
1
vmov.ll
-
R600/
2
ds_read2.ll
-
fceil64.ll
-
ftrunc.f64.ll
-
gep-address-space.ll

Differential D9094

Fold EXTRACT_VECTOR_ELT(BUILD_VECTOR(Elt[0], ...), CstX ) -> Elt[CstX]
AbandonedPublic

Authored by mehdi_amini on Apr 18 2015, 1:06 AM.

Download Raw Diff

Details

Reviewers

Summary

Add a new DAG combine to fold EXTRACT_VECTOR_ELT through BUILD_VECTOR.
It breaks one ARM test, but I believe it canonicalizes the test in
a different way and the DAG should be updated to handle it. Added
another variant of the test, conceptually identical but which was
failing before this patch. CC:

Tim to have a look at the AArch64 test change, I hope it is legal.
Tom/Matt for the R600 test changes.

Diff Detail

Event Timeline

mehdi_amini updated this revision to Diff 23987.Apr 18 2015, 1:06 AM

mehdi_amini retitled this revision from to Fold EXTRACT_VECTOR_ELT(BUILD_VECTOR(Elt[0], ...), CstX ) -> Elt[CstX].

mehdi_amini updated this object.

mehdi_amini edited the test plan for this revision. (Show Details)

mehdi_amini added subscribers: t.p.northover, • tstellarAMD, arsenm, Unknown Object (MLST).

Herald added a subscriber: aemerson. · View Herald TranscriptApr 18 2015, 1:06 AM

Fix ARM/vmov.ll test (was unintentionally truncated)

mehdi_amini added a reviewer: ab.Apr 20 2015, 10:16 AM

Ping.

The combine seems sound, but ARM definitely needs to learn about special constants before this can go in.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11004	Would it make sense to check that the BUILD_VECTOR .hasOneUse()?
11010	space before (
test/CodeGen/AArch64/fold-constants.ll
6–8	This looks fine to me, but we can do better, I think. The code looks like it's just shuffling zeros around, so I'd expect: mov x0, xzr
18–19	This is the part where the combiners get confused. We could catch this with: (v1i64 bitcast (v4i16 build_vector (i16 C), undef, undef, undef)) -> (v1i64 bitcast (i64 zext/sext (i16 C))) It's not clear to me that's better though. This would be more legitimate: (i64 bitcast (v4i16 build_vector (i16 C), undef, undef, undef)) -> (i64 zext/sext (i16 C)) But for this testcase, we'd also need: (i64 extract_element (v1i64 bitcast N), i32 0) -> (i64 bitcast N) And that might be a whole 'nother mess on AArch64 (it might be very useful as well, I'm not sure). I'll have a closer look if you don't beat me to it ;)
test/CodeGen/ARM/vmov.ll
3	These failures are caused by the logic in LowerBUILD_VECTOR (recognizing splats) not firing anymore. With the combine, the vector doesn't survive until lowering, so we're stuck with a magic f64 constant that happens to have the same bitpattern as a vector splat. As you say, ARM probably should learn about recognizing special patterns in LowerConstantFP and whatnot. That would let it materialize stuff like: double 0x0808080808080808 using MOVI instead of constant pools (judging by the failure, I don't think it does). Feel free to file a PR and/or give it a shot; I'll try to have a look myself.
test/CodeGen/R600/ds_read2.ll
220–221	A single -NOT is enough

Thanks for the review. I'll wait for you to have time to fix the ARM64 failure ;)

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11004	Good question, I can imagine cases where I don't want to this check but I'm not sure I see a lot of cases where it would be beneficial?
11010	grahh clang-format :(
test/CodeGen/R600/ds_read2.ll
220–221	Of course! Thanks.

ping!

D13655 implements this transformation, but the example I added here v_movQi8_double is probably still a missing optimization in the ARM backend.

mehdi_amini abandoned this revision.Dec 11 2015, 2:02 PM

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

14 lines

test/

CodeGen/

AArch64/

fold-constants.ll

3 lines

ARM/

big-endian-vector-callee.ll

12 lines

big-endian-vector-caller.ll

4 lines

vmov.ll

18 lines

R600/

7 lines

4 lines

2 lines

18 lines

Diff 23988

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 10,994 Lines • ▼ Show 20 Lines	if (InOp.getValueType() != NVT) {
return DAG.getSExtOrTrunc(InOp, SDLoc(InVec), NVT);		return DAG.getSExtOrTrunc(InOp, SDLoc(InVec), NVT);
}		}
return InOp;		return InOp;
}		}

SDValue EltNo = N->getOperand(1);		SDValue EltNo = N->getOperand(1);
bool ConstEltNo = isa<ConstantSDNode>(EltNo);		bool ConstEltNo = isa<ConstantSDNode>(EltNo);

		// Fold EXTRACT_VECTOR_ELT(BUILD_VECTOR(Elt[0], ...), CstX ) -> Elt[CstX]
		if (InVec.getOpcode() == ISD::BUILD_VECTOR && ConstEltNo) {
		abUnsubmitted Not Done Reply Inline Actions Would it make sense to check that the BUILD_VECTOR .hasOneUse()? ab: Would it make sense to check that the BUILD_VECTOR .hasOneUse()?
		mehdi_aminiAuthorUnsubmitted Not Done Reply Inline Actions Good question, I can imagine cases where I don't want to this check but I'm not sure I see a lot of cases where it would be beneficial? mehdi_amini: Good question, I can imagine cases where I don't want to this check but I'm not sure I see a…
		auto Elt = InVec.getOperand(N->getConstantOperandVal(1));
		// Take care of potential implicit truncation in ISD::BUILD_VECTOR
		// Because for instance ARM has legal v4i16 but not legal i16, BUILD_VECTOR
		// can build such vector out of i32. We need to insert an explicit truncate
		// when folding this case.
		if(Elt.getValueType() == NVT)
		abUnsubmitted Not Done Reply Inline Actions space before ( ab: space before (
		mehdi_aminiAuthorUnsubmitted Not Done Reply Inline Actions grahh clang-format :( mehdi_amini: grahh clang-format :(
		return Elt;
		assert(Elt.getValueType().isInteger() && "BUILD_VECTOR can implicitly "
		"truncate integer exclusively");
		return DAG.getNode(ISD::TRUNCATE, SDLoc(N), NVT, Elt);
		}

// Transform: (EXTRACT_VECTOR_ELT( VECTOR_SHUFFLE )) -> EXTRACT_VECTOR_ELT.		// Transform: (EXTRACT_VECTOR_ELT( VECTOR_SHUFFLE )) -> EXTRACT_VECTOR_ELT.
// We only perform this optimization before the op legalization phase because		// We only perform this optimization before the op legalization phase because
// we may introduce new vector instructions which are not backed by TD		// we may introduce new vector instructions which are not backed by TD
// patterns. For example on AVX, extracting elements from a wide vector		// patterns. For example on AVX, extracting elements from a wide vector
// without using extract_subvector. However, if we can find an underlying		// without using extract_subvector. However, if we can find an underlying
// scalar value, then we can always use that.		// scalar value, then we can always use that.
if (InVec.getOpcode() == ISD::VECTOR_SHUFFLE		if (InVec.getOpcode() == ISD::VECTOR_SHUFFLE
&& ConstEltNo) {		&& ConstEltNo) {
▲ Show 20 Lines • Show All 2,440 Lines • Show Last 20 Lines

test/CodeGen/AArch64/fold-constants.ll

	; RUN: llc -mtriple=aarch64-linux-gnu -o - %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-linux-gnu -o - %s \| FileCheck %s

	define i64 @dotests_616() {			define i64 @dotests_616() {
	; CHECK-LABEL: dotests_616			; CHECK-LABEL: dotests_616
	; CHECK: movi d0, #0000000000000000			; CHECK: movi d0, #0000000000000000
	; CHECK-NEXT: umov w8, v0.b[2]
	; CHECK-NEXT: sbfx w8, w8, #0, #1
	; CHECK-NEXT: fmov s0, w8
	abUnsubmitted Not Done Reply Inline Actions This looks fine to me, but we can do better, I think. The code looks like it's just shuffling zeros around, so I'd expect: mov x0, xzr ab: This looks fine to me, but we can do better, I think. The code looks like it's just shuffling…
	; CHECK-NEXT: fmov x0, d0			; CHECK-NEXT: fmov x0, d0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%0 = bitcast <2 x i64> zeroinitializer to <8 x i16>			%0 = bitcast <2 x i64> zeroinitializer to <8 x i16>
	%1 = and <8 x i16> zeroinitializer, %0			%1 = and <8 x i16> zeroinitializer, %0
	%2 = icmp ne <8 x i16> %1, zeroinitializer			%2 = icmp ne <8 x i16> %1, zeroinitializer
	%3 = extractelement <8 x i1> %2, i32 2			%3 = extractelement <8 x i1> %2, i32 2
	%vgetq_lane285 = sext i1 %3 to i16			%vgetq_lane285 = sext i1 %3 to i16
	%vset_lane = insertelement <4 x i16> undef, i16 %vgetq_lane285, i32 0			%vset_lane = insertelement <4 x i16> undef, i16 %vgetq_lane285, i32 0
	%4 = bitcast <4 x i16> %vset_lane to <1 x i64>			%4 = bitcast <4 x i16> %vset_lane to <1 x i64>
	%vget_lane = extractelement <1 x i64> %4, i32 0			%vget_lane = extractelement <1 x i64> %4, i32 0
	abUnsubmitted Not Done Reply Inline Actions This is the part where the combiners get confused. We could catch this with: (v1i64 bitcast (v4i16 build_vector (i16 C), undef, undef, undef)) -> (v1i64 bitcast (i64 zext/sext (i16 C))) It's not clear to me that's better though. This would be more legitimate: (i64 bitcast (v4i16 build_vector (i16 C), undef, undef, undef)) -> (i64 zext/sext (i16 C)) But for this testcase, we'd also need: (i64 extract_element (v1i64 bitcast N), i32 0) -> (i64 bitcast N) And that might be a whole 'nother mess on AArch64 (it might be very useful as well, I'm not sure). I'll have a closer look if you don't beat me to it ;) ab: This is the part where the combiners get confused. We could catch this with: (v1i64…
	ret i64 %vget_lane			ret i64 %vget_lane
	}			}

test/CodeGen/ARM/big-endian-vector-callee.ll

	Show First 20 Lines • Show All 654 Lines • ▼ Show 20 Lines
	; CHECK: vmov.32 [[REG1:d[0-9]+]][0], r0			; CHECK: vmov.32 [[REG1:d[0-9]+]][0], r0
	; CHECK: vmov.32 [[REG1]][1], r1			; CHECK: vmov.32 [[REG1]][1], r1
	; CHECK: vmov.32 [[REG2:d[0-9]+]][0], r2			; CHECK: vmov.32 [[REG2:d[0-9]+]][0], r2
	; CHECK: vmov.32 [[REG2]][1], r3			; CHECK: vmov.32 [[REG2]][1], r3
	%1 = fadd fp128 %p, %p			%1 = fadd fp128 %p, %p
	%2 = bitcast fp128 %1 to <2 x double>			%2 = bitcast fp128 %1 to <2 x double>
	%3 = fadd <2 x double> %2, %2			%3 = fadd <2 x double> %2, %2
	ret <2 x double> %3			ret <2 x double> %3
	; SOFT: vadd.f64 [[REG1:d[0-9]+]]
	; SOFT: vadd.f64 [[REG2:d[0-9]+]]			; SOFT: vadd.f64 [[REG2:d[0-9]+]]
				; SOFT: vadd.f64 [[REG1:d[0-9]+]]
	; SOFT: vmov r1, r0, [[REG2]]			; SOFT: vmov r1, r0, [[REG2]]
	; SOFT: vmov r3, r2, [[REG1]]			; SOFT: vmov r3, r2, [[REG1]]
	; HARD: vadd.f64 d1			; HARD: vadd.f64 d1
	; HARD: vadd.f64 d0			; HARD: vadd.f64 d0
	}			}

	; CHECK-LABEL: test_v2f64_v2i64:			; CHECK-LABEL: test_v2f64_v2i64:
	define <2 x double> @test_v2f64_v2i64(<2 x i64> %p) {			define <2 x double> @test_v2f64_v2i64(<2 x i64> %p) {
	; SOFT: vmov [[REG1:d[0-9]+]], r3, r2			; SOFT: vmov [[REG1:d[0-9]+]], r3, r2
	; SOFT: vmov [[REG2:d[0-9]+]], r1, r0			; SOFT: vmov [[REG2:d[0-9]+]], r1, r0
	; HARD: vadd.i64 q{{[0-9]+}}, q0			; HARD: vadd.i64 q{{[0-9]+}}, q0
	%1 = add <2 x i64> %p, %p			%1 = add <2 x i64> %p, %p
	%2 = bitcast <2 x i64> %1 to <2 x double>			%2 = bitcast <2 x i64> %1 to <2 x double>
	%3 = fadd <2 x double> %2, %2			%3 = fadd <2 x double> %2, %2
	ret <2 x double> %3			ret <2 x double> %3
	; SOFT: vadd.f64 [[REG1:d[0-9]+]]
	; SOFT: vadd.f64 [[REG2:d[0-9]+]]			; SOFT: vadd.f64 [[REG2:d[0-9]+]]
				; SOFT: vadd.f64 [[REG1:d[0-9]+]]
	; SOFT: vmov r1, r0, [[REG2]]			; SOFT: vmov r1, r0, [[REG2]]
	; SOFT: vmov r3, r2, [[REG1]]			; SOFT: vmov r3, r2, [[REG1]]
	; HARD: vadd.f64 d1			; HARD: vadd.f64 d1
	; HARD: vadd.f64 d0			; HARD: vadd.f64 d0
	}			}

	; CHECK-LABEL: test_v2f64_v4f32:			; CHECK-LABEL: test_v2f64_v4f32:
	define <2 x double> @test_v2f64_v4f32(<4 x float> %p) {			define <2 x double> @test_v2f64_v4f32(<4 x float> %p) {
	; HARD: vrev64.32 q{{[0-9]+}}, q0			; HARD: vrev64.32 q{{[0-9]+}}, q0
	%1 = fadd <4 x float> %p, %p			%1 = fadd <4 x float> %p, %p
	%2 = bitcast <4 x float> %1 to <2 x double>			%2 = bitcast <4 x float> %1 to <2 x double>
	%3 = fadd <2 x double> %2, %2			%3 = fadd <2 x double> %2, %2
	ret <2 x double> %3			ret <2 x double> %3
	; SOFT: vadd.f64 [[REG1:d[0-9]+]]
	; SOFT: vadd.f64 [[REG2:d[0-9]+]]			; SOFT: vadd.f64 [[REG2:d[0-9]+]]
				; SOFT: vadd.f64 [[REG1:d[0-9]+]]
	; SOFT: vmov r1, r0, [[REG2]]			; SOFT: vmov r1, r0, [[REG2]]
	; SOFT: vmov r3, r2, [[REG1]]			; SOFT: vmov r3, r2, [[REG1]]
	; HARD: vadd.f64 d1			; HARD: vadd.f64 d1
	; HARD: vadd.f64 d0			; HARD: vadd.f64 d0
	}			}

	; CHECK-LABEL: test_v2f64_v4i32:			; CHECK-LABEL: test_v2f64_v4i32:
	define <2 x double> @test_v2f64_v4i32(<4 x i32> %p) {			define <2 x double> @test_v2f64_v4i32(<4 x i32> %p) {
	; HARD: vrev64.32 q{{[0-9]+}}, q0			; HARD: vrev64.32 q{{[0-9]+}}, q0
	%1 = add <4 x i32> %p, %p			%1 = add <4 x i32> %p, %p
	%2 = bitcast <4 x i32> %1 to <2 x double>			%2 = bitcast <4 x i32> %1 to <2 x double>
	%3 = fadd <2 x double> %2, %2			%3 = fadd <2 x double> %2, %2
	ret <2 x double> %3			ret <2 x double> %3
	; SOFT: vadd.f64 [[REG1:d[0-9]+]]
	; SOFT: vadd.f64 [[REG2:d[0-9]+]]			; SOFT: vadd.f64 [[REG2:d[0-9]+]]
				; SOFT: vadd.f64 [[REG1:d[0-9]+]]
	; SOFT: vmov r1, r0, [[REG2]]			; SOFT: vmov r1, r0, [[REG2]]
	; SOFT: vmov r3, r2, [[REG1]]			; SOFT: vmov r3, r2, [[REG1]]
	; HARD: vadd.f64 d1			; HARD: vadd.f64 d1
	; HARD: vadd.f64 d0			; HARD: vadd.f64 d0
	}			}

	; CHECK-LABEL: test_v2f64_v8i16:			; CHECK-LABEL: test_v2f64_v8i16:
	define <2 x double> @test_v2f64_v8i16(<8 x i16> %p) {			define <2 x double> @test_v2f64_v8i16(<8 x i16> %p) {
	; HARD: vrev64.16 q{{[0-9]+}}, q0			; HARD: vrev64.16 q{{[0-9]+}}, q0
	%1 = add <8 x i16> %p, %p			%1 = add <8 x i16> %p, %p
	%2 = bitcast <8 x i16> %1 to <2 x double>			%2 = bitcast <8 x i16> %1 to <2 x double>
	%3 = fadd <2 x double> %2, %2			%3 = fadd <2 x double> %2, %2
	ret <2 x double> %3			ret <2 x double> %3
	; SOFT: vadd.f64 [[REG1:d[0-9]+]]
	; SOFT: vadd.f64 [[REG2:d[0-9]+]]			; SOFT: vadd.f64 [[REG2:d[0-9]+]]
				; SOFT: vadd.f64 [[REG1:d[0-9]+]]
	; SOFT: vmov r1, r0, [[REG2]]			; SOFT: vmov r1, r0, [[REG2]]
	; SOFT: vmov r3, r2, [[REG1]]			; SOFT: vmov r3, r2, [[REG1]]
	; HARD: vadd.f64 d1			; HARD: vadd.f64 d1
	; HARD: vadd.f64 d0			; HARD: vadd.f64 d0
	}			}

	; CHECK-LABEL: test_v2f64_v16i8:			; CHECK-LABEL: test_v2f64_v16i8:
	define <2 x double> @test_v2f64_v16i8(<16 x i8> %p) {			define <2 x double> @test_v2f64_v16i8(<16 x i8> %p) {
	; HARD: vrev64.8 q{{[0-9]+}}, q0			; HARD: vrev64.8 q{{[0-9]+}}, q0
	%1 = add <16 x i8> %p, %p			%1 = add <16 x i8> %p, %p
	%2 = bitcast <16 x i8> %1 to <2 x double>			%2 = bitcast <16 x i8> %1 to <2 x double>
	%3 = fadd <2 x double> %2, %2			%3 = fadd <2 x double> %2, %2
	ret <2 x double> %3			ret <2 x double> %3
	; SOFT: vadd.f64 [[REG1:d[0-9]+]]
	; SOFT: vadd.f64 [[REG2:d[0-9]+]]			; SOFT: vadd.f64 [[REG2:d[0-9]+]]
				; SOFT: vadd.f64 [[REG1:d[0-9]+]]
	; SOFT: vmov r1, r0, [[REG2]]			; SOFT: vmov r1, r0, [[REG2]]
	; SOFT: vmov r3, r2, [[REG1]]			; SOFT: vmov r3, r2, [[REG1]]
	; HARD: vadd.f64 d1			; HARD: vadd.f64 d1
	; HARD: vadd.f64 d0			; HARD: vadd.f64 d0
	}			}

	; CHECK-LABEL: test_v2i64_f128:			; CHECK-LABEL: test_v2i64_f128:
	define <2 x i64> @test_v2i64_f128(fp128 %p) {			define <2 x i64> @test_v2i64_f128(fp128 %p) {
	▲ Show 20 Lines • Show All 423 Lines • Show Last 20 Lines

test/CodeGen/ARM/big-endian-vector-caller.ll

Show First 20 Lines • Show All 708 Lines • ▼ Show 20 Lines
; SOFT: vmov [[REG:d[0-9]+]], r1, r0		; SOFT: vmov [[REG:d[0-9]+]], r1, r0
; SOFT: vrev64.8 [[REG]]		; SOFT: vrev64.8 [[REG]]
; HARD: vrev64.8 {{d[0-9]+}}, d0		; HARD: vrev64.8 {{d[0-9]+}}, d0
}		}

; CHECK-LABEL: test_f128_v2f64:		; CHECK-LABEL: test_f128_v2f64:
declare fp128 @test_f128_v2f64_helper(<2 x double> %p)		declare fp128 @test_f128_v2f64_helper(<2 x double> %p)
define void @test_f128_v2f64(<2 x double>* %p, fp128* %q) {		define void @test_f128_v2f64(<2 x double>* %p, fp128* %q) {
; SOFT: vadd.f64 [[REG2:d[0-9]+]]
; SOFT: vadd.f64 [[REG1:d[0-9]+]]		; SOFT: vadd.f64 [[REG1:d[0-9]+]]
		; SOFT: vadd.f64 [[REG2:d[0-9]+]]
; SOFT: vmov r1, r0, [[REG1]]		; SOFT: vmov r1, r0, [[REG1]]
; SOFT: vmov r3, r2, [[REG2]]		; SOFT: vmov r3, r2, [[REG2]]
; HARD: vadd.f64 d1		; HARD: vadd.f64 d1
; HARD: vadd.f64 d0		; HARD: vadd.f64 d0
%1 = load <2 x double>, <2 x double>* %p		%1 = load <2 x double>, <2 x double>* %p
%2 = fadd <2 x double> %1, %1		%2 = fadd <2 x double> %1, %1
%3 = call fp128 @test_f128_v2f64_helper(<2 x double> %2)		%3 = call fp128 @test_f128_v2f64_helper(<2 x double> %2)
%4 = fadd fp128 %3, %3		%4 = fadd fp128 %3, %3
▲ Show 20 Lines • Show All 197 Lines • ▼ Show 20 Lines	; HARD: vadd.f64 d0
%4 = add <2 x i64> %3, %3		%4 = add <2 x i64> %3, %3
store <2 x i64> %4, <2 x i64>* %q		store <2 x i64> %4, <2 x i64>* %q
ret void		ret void
; SOFT: vmov {{d[0-9]+}}, r3, r2		; SOFT: vmov {{d[0-9]+}}, r3, r2
; SOFT: vmov {{d[0-9]+}}, r1, r0		; SOFT: vmov {{d[0-9]+}}, r1, r0
}		}

; CHECK-LABEL: test_v2i64_v4f32:		; CHECK-LABEL: test_v2i64_v4f32:
declare <2 x i64> @test_v2i64_v4f32_helper(<4 x float> %p)		declare <2 x i64> @test_v2i64_v4f32_helper(<4 x float> %p)
define void @test_v2i64_v4f32(<4 x float>* %p, <2 x i64>* %q) {		define void @test_v2i64_v4f32(<4 x float>* %p, <2 x i64>* %q) {
; SOFT: vmov r1, r0		; SOFT: vmov r1, r0
; SOFT: vmov r3, r2		; SOFT: vmov r3, r2
; HARD: vrev64.32 q0		; HARD: vrev64.32 q0
%1 = load <4 x float>, <4 x float>* %p		%1 = load <4 x float>, <4 x float>* %p
%2 = fadd <4 x float> %1, %1		%2 = fadd <4 x float> %1, %1
%3 = call <2 x i64> @test_v2i64_v4f32_helper(<4 x float> %2)		%3 = call <2 x i64> @test_v2i64_v4f32_helper(<4 x float> %2)
%4 = add <2 x i64> %3, %3		%4 = add <2 x i64> %3, %3
▲ Show 20 Lines • Show All 429 Lines • Show Last 20 Lines

test/CodeGen/ARM/vmov.ll

	; RUN: llc -mtriple=arm-eabi -mattr=+neon %s -o - \| FileCheck %s			; RUN: llc -mtriple=arm-eabi -mattr=+neon %s -o - \| FileCheck %s

				; XFAIL: *
				abUnsubmitted Not Done Reply Inline Actions These failures are caused by the logic in LowerBUILD_VECTOR (recognizing splats) not firing anymore. With the combine, the vector doesn't survive until lowering, so we're stuck with a magic f64 constant that happens to have the same bitpattern as a vector splat. As you say, ARM probably should learn about recognizing special patterns in LowerConstantFP and whatnot. That would let it materialize stuff like: double 0x0808080808080808 using MOVI instead of constant pools (judging by the failure, I don't think it does). Feel free to file a PR and/or give it a shot; I'll try to have a look myself. ab: These failures are caused by the logic in LowerBUILD_VECTOR (recognizing splats) not firing…

	define <8 x i8> @v_movi8() nounwind {			define <8 x i8> @v_movi8() nounwind {
	;CHECK-LABEL: v_movi8:			;CHECK-LABEL: v_movi8:
	;CHECK: vmov.i8 d{{.*}}, #0x8			;CHECK: vmov.i8 d{{.*}}, #0x8
	ret <8 x i8> < i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8 >			ret <8 x i8> < i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8 >
	}			}

	define <4 x i16> @v_movi16a() nounwind {			define <4 x i16> @v_movi16a() nounwind {
	;CHECK-LABEL: v_movi16a:			;CHECK-LABEL: v_movi16a:
	▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
	}			}

	define <1 x i64> @v_movi64() nounwind {			define <1 x i64> @v_movi64() nounwind {
	;CHECK-LABEL: v_movi64:			;CHECK-LABEL: v_movi64:
	;CHECK: vmov.i64 d{{.*}}, #0xff0000ff0000ffff			;CHECK: vmov.i64 d{{.*}}, #0xff0000ff0000ffff
	ret <1 x i64> < i64 18374687574888349695 >			ret <1 x i64> < i64 18374687574888349695 >
	}			}


				; FIXME: the following two tests should generate:
				; vmov.i8 q8, #0x8
				; vmov r0, r1, d16
				; vmov r2, r3, d17
				; mov pc, lr

	define <16 x i8> @v_movQi8() nounwind {			define <16 x i8> @v_movQi8() nounwind {
	;CHECK-LABEL: v_movQi8:			;CHECK-LABEL: v_movQi8:
	;CHECK: vmov.i8 q{{.*}}, #0x8			;CHECK: vmov.i8 q{{.*}}, #0x8
	ret <16 x i8> < i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8 >			ret <16 x i8> < i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8 >
	}			}

				define <2 x double> @v_movQi8_double() nounwind {
				;CHECK-LABEL: v_movQi8_double:
				;CHECK: vmov.i8 q{{.*}}, #0x8
				%f = bitcast <8 x i8> < i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8 > to double
				%vec.tmp = insertelement <2 x double> undef, double %f, i32 0
				%vec = insertelement <2 x double> %vec.tmp, double %f, i32 1
				ret <2 x double> %vec
				}

	define <8 x i16> @v_movQi16a() nounwind {			define <8 x i16> @v_movQi16a() nounwind {
	;CHECK-LABEL: v_movQi16a:			;CHECK-LABEL: v_movQi16a:
	;CHECK: vmov.i16 q{{.*}}, #0x10			;CHECK: vmov.i16 q{{.*}}, #0x10
	ret <8 x i16> < i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16 >			ret <8 x i16> < i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16 >
	}			}

	define <8 x i16> @v_movQi16b() nounwind {			define <8 x i16> @v_movQi16b() nounwind {
	;CHECK-LABEL: v_movQi16b:			;CHECK-LABEL: v_movQi16b:
	▲ Show 20 Lines • Show All 276 Lines • Show Last 20 Lines

test/CodeGen/R600/ds_read2.ll

Show First 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	define void @read2_ptr_is_subreg_arg_offset_f32(float addrspace(1)* %out, <2 x float addrspace(3)*> %lds.ptr) #0 {
%val1 = load float, float addrspace(3)* %gep.1.offset, align 4		%val1 = load float, float addrspace(3)* %gep.1.offset, align 4
%add.x = add nsw i32 %x.i, 8		%add.x = add nsw i32 %x.i, 8
%sum = fadd float %val0, %val1		%sum = fadd float %val0, %val1
%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i32 %x.i		%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i32 %x.i
store float %sum, float addrspace(1)* %out.gep, align 4		store float %sum, float addrspace(1)* %out.gep, align 4
ret void		ret void
}		}

; We should be able to merge in this case, but probably not worth the effort.		; SI: ds_read2_b32
; SI-NOT: ds_read2_b32		; SI-NOT: ds_read_b32
; SI: ds_read_b32		; SI-NOT: ds_read_b32
		abUnsubmitted Not Done Reply Inline Actions A single -NOT is enough ab: A single -NOT is enough
		mehdi_aminiAuthorUnsubmitted Not Done Reply Inline Actions Of course! Thanks. mehdi_amini: Of course! Thanks.
; SI: ds_read_b32
; SI: s_endpgm		; SI: s_endpgm
define void @read2_ptr_is_subreg_f32(float addrspace(1)* %out) #0 {		define void @read2_ptr_is_subreg_f32(float addrspace(1)* %out) #0 {
%x.i = tail call i32 @llvm.r600.read.tidig.x() #1		%x.i = tail call i32 @llvm.r600.read.tidig.x() #1
%ptr.0 = insertelement <2 x [512 x float] addrspace(3)> undef, [512 x float] addrspace(3) @lds, i32 0		%ptr.0 = insertelement <2 x [512 x float] addrspace(3)> undef, [512 x float] addrspace(3) @lds, i32 0
%ptr.1 = insertelement <2 x [512 x float] addrspace(3)> %ptr.0, [512 x float] addrspace(3) @lds, i32 1		%ptr.1 = insertelement <2 x [512 x float] addrspace(3)> %ptr.0, [512 x float] addrspace(3) @lds, i32 1
%x.i.v.0 = insertelement <2 x i32> undef, i32 %x.i, i32 0		%x.i.v.0 = insertelement <2 x i32> undef, i32 %x.i, i32 0
%x.i.v.1 = insertelement <2 x i32> %x.i.v.0, i32 %x.i, i32 1		%x.i.v.1 = insertelement <2 x i32> %x.i.v.0, i32 %x.i, i32 1
%idx = add <2 x i32> %x.i.v.1, <i32 0, i32 8>		%idx = add <2 x i32> %x.i.v.1, <i32 0, i32 8>
▲ Show 20 Lines • Show All 285 Lines • Show Last 20 Lines

test/CodeGen/R600/fceil64.ll

	Show All 14 Lines
	; SI: s_and_b32 s{{[0-9]+}}, s{{[0-9]+}}, 0x80000000			; SI: s_and_b32 s{{[0-9]+}}, s{{[0-9]+}}, 0x80000000
	; SI: s_add_i32 s{{[0-9]+}}, [[SEXP]], 0xfffffc01			; SI: s_add_i32 s{{[0-9]+}}, [[SEXP]], 0xfffffc01
	; SI: s_lshr_b64			; SI: s_lshr_b64
	; SI: s_not_b64			; SI: s_not_b64
	; SI: s_and_b64			; SI: s_and_b64
	; SI: cmp_gt_i32			; SI: cmp_gt_i32
	; SI: cndmask_b32			; SI: cndmask_b32
	; SI: cndmask_b32			; SI: cndmask_b32
	; SI: cmp_lt_i32			; SI: v_cmp_lt_f64
				; SI: v_cmp_lt_i32
	; SI: cndmask_b32			; SI: cndmask_b32
	; SI: cndmask_b32			; SI: cndmask_b32
	; SI-DAG: v_cmp_lt_f64
	; SI-DAG: v_cmp_lg_f64			; SI-DAG: v_cmp_lg_f64
	; SI: s_and_b64			; SI: s_and_b64
	; SI: v_cndmask_b32			; SI: v_cndmask_b32
	; SI: v_cndmask_b32			; SI: v_cndmask_b32
	; SI: v_add_f64			; SI: v_add_f64
	; SI: s_endpgm			; SI: s_endpgm
	define void @fceil_f64(double addrspace(1)* %out, double %x) {			define void @fceil_f64(double addrspace(1)* %out, double %x) {
	%y = call double @llvm.ceil.f64(double %x) nounwind readnone			%y = call double @llvm.ceil.f64(double %x) nounwind readnone
	▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

test/CodeGen/R600/ftrunc.f64.ll

	Show All 24 Lines

	; SI: s_bfe_u32 [[SEXP:s[0-9]+]], {{s[0-9]+}}, 0xb0014			; SI: s_bfe_u32 [[SEXP:s[0-9]+]], {{s[0-9]+}}, 0xb0014
	; SI: s_and_b32 s{{[0-9]+}}, s{{[0-9]+}}, 0x80000000			; SI: s_and_b32 s{{[0-9]+}}, s{{[0-9]+}}, 0x80000000
	; SI: s_add_i32 s{{[0-9]+}}, [[SEXP]], 0xfffffc01			; SI: s_add_i32 s{{[0-9]+}}, [[SEXP]], 0xfffffc01
	; SI: s_lshr_b64			; SI: s_lshr_b64
	; SI: s_not_b64			; SI: s_not_b64
	; SI: s_and_b64			; SI: s_and_b64
	; SI: cmp_gt_i32			; SI: cmp_gt_i32
				; SI: cmp_lt_i32
	; SI: cndmask_b32			; SI: cndmask_b32
	; SI: cndmask_b32			; SI: cndmask_b32
	; SI: cmp_lt_i32
	; SI: cndmask_b32			; SI: cndmask_b32
	; SI: cndmask_b32			; SI: cndmask_b32
	; SI: s_endpgm			; SI: s_endpgm
	define void @ftrunc_f64(double addrspace(1)* %out, double %x) {			define void @ftrunc_f64(double addrspace(1)* %out, double %x) {
	%y = call double @llvm.trunc.f64(double %x) nounwind readnone			%y = call double @llvm.trunc.f64(double %x) nounwind readnone
	store double %y, double addrspace(1)* %out			store double %y, double addrspace(1)* %out
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

test/CodeGen/R600/gep-address-space.ll

	Show All 19 Lines
	; CHECK: ds_write_b32			; CHECK: ds_write_b32
	%p = getelementptr [1024 x i32], [1024 x i32] addrspace(3)* %array, i16 0, i16 16384			%p = getelementptr [1024 x i32], [1024 x i32] addrspace(3)* %array, i16 0, i16 16384
	store i32 99, i32 addrspace(3)* %p			store i32 99, i32 addrspace(3)* %p
	ret void			ret void
	}			}

	define void @gep_as_vector_v4(<4 x [1024 x i32] addrspace(3)*> %array) nounwind {			define void @gep_as_vector_v4(<4 x [1024 x i32] addrspace(3)*> %array) nounwind {
	; CHECK-LABEL: {{^}}gep_as_vector_v4:			; CHECK-LABEL: {{^}}gep_as_vector_v4:
	; CHECK: s_add_i32			; SI: s_add_i32
	; CHECK: s_add_i32			; SI: s_add_i32
	; CHECK: s_add_i32			; SI: s_add_i32
	; CHECK: s_add_i32			; SI: s_add_i32
				; CI: ds_write_b32 v{{.}}, v{{.}} offset:64
				; CI: ds_write_b32 v{{.}}, v{{.}} offset:64
				; CI: ds_write_b32 v{{.}}, v{{.}} offset:64
				; CI: ds_write_b32 v{{.}}, v{{.}} offset:64
	%p = getelementptr [1024 x i32], <4 x [1024 x i32] addrspace(3)*> %array, <4 x i16> zeroinitializer, <4 x i16> <i16 16, i16 16, i16 16, i16 16>			%p = getelementptr [1024 x i32], <4 x [1024 x i32] addrspace(3)*> %array, <4 x i16> zeroinitializer, <4 x i16> <i16 16, i16 16, i16 16, i16 16>
	%p0 = extractelement <4 x i32 addrspace(3)*> %p, i32 0			%p0 = extractelement <4 x i32 addrspace(3)*> %p, i32 0
	%p1 = extractelement <4 x i32 addrspace(3)*> %p, i32 1			%p1 = extractelement <4 x i32 addrspace(3)*> %p, i32 1
	%p2 = extractelement <4 x i32 addrspace(3)*> %p, i32 2			%p2 = extractelement <4 x i32 addrspace(3)*> %p, i32 2
	%p3 = extractelement <4 x i32 addrspace(3)*> %p, i32 3			%p3 = extractelement <4 x i32 addrspace(3)*> %p, i32 3
	store i32 99, i32 addrspace(3)* %p0			store i32 99, i32 addrspace(3)* %p0
	store i32 99, i32 addrspace(3)* %p1			store i32 99, i32 addrspace(3)* %p1
	store i32 99, i32 addrspace(3)* %p2			store i32 99, i32 addrspace(3)* %p2
	store i32 99, i32 addrspace(3)* %p3			store i32 99, i32 addrspace(3)* %p3
	ret void			ret void
	}			}

	define void @gep_as_vector_v2(<2 x [1024 x i32] addrspace(3)*> %array) nounwind {			define void @gep_as_vector_v2(<2 x [1024 x i32] addrspace(3)*> %array) nounwind {
	; CHECK-LABEL: {{^}}gep_as_vector_v2:			; CHECK-LABEL: {{^}}gep_as_vector_v2:
	; CHECK: s_add_i32			; SI: s_add_i32
	; CHECK: s_add_i32			; SI: s_add_i32
				; CI: ds_write_b32 v{{.}}, v{{.}} offset:64
				; CI: ds_write_b32 v{{.}}, v{{.}} offset:64
	%p = getelementptr [1024 x i32], <2 x [1024 x i32] addrspace(3)*> %array, <2 x i16> zeroinitializer, <2 x i16> <i16 16, i16 16>			%p = getelementptr [1024 x i32], <2 x [1024 x i32] addrspace(3)*> %array, <2 x i16> zeroinitializer, <2 x i16> <i16 16, i16 16>
	%p0 = extractelement <2 x i32 addrspace(3)*> %p, i32 0			%p0 = extractelement <2 x i32 addrspace(3)*> %p, i32 0
	%p1 = extractelement <2 x i32 addrspace(3)*> %p, i32 1			%p1 = extractelement <2 x i32 addrspace(3)*> %p, i32 1
	store i32 99, i32 addrspace(3)* %p0			store i32 99, i32 addrspace(3)* %p0
	store i32 99, i32 addrspace(3)* %p1			store i32 99, i32 addrspace(3)* %p1
	ret void			ret void
	}			}