This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/IR/
-
IR/
-
DataLayout.cpp
-
test/Transforms/
-
Transforms/
-
InstCombine/
1
load-gep-overalign.ll
-
LowerMatrixIntrinsics/
-
multiply-fused-dominance.ll

Differential D142146

[IR] Avoid creation of GEPs into vectors (in one place)
ClosedPublic

Authored by jsilvanus on Jan 19 2023, 11:04 AM.

Download Raw Diff

Details

Reviewers

mpaszkowski
sebastian-ne
nikic

Commits

rGa4753f5dc0a9: [IR] Avoid creation of GEPs into vectors (in one place)

Summary

The method DataLayout::getGEPIndexForOffset(Type *&ElemTy, APInt &Offset)
allows to generate GEP indices for a given byte-based offset.
This allows to generate "natural" GEPs using the given type structure
if the byte offset happens to match a nested element object.

With opaque pointers and a general move towards byte-based GEPs [1],
this function may be questionable in the future.

This patch avoids creation of GEPs into vectors in routines that use
DataLayout::getGEPIndexForOffset by not returning indices in that case.

The reason is that A) GEPs into vectors have been discouraged for a long
time [2], and B) that GEPs into vectors are currently broken if the element
type is overaligned [1]. This is also demonstrated by a lit test where
previously InstCombine replaced valid loads by poison. Note that
the result of InstCombine on that test is *still* invalid, because
padding bytes are assumed.
Moreover, GEPs into vectors may be outright forbidden in the future [1].

[1]: https://discourse.llvm.org/t/67497
[2]: https://llvm.org/docs/GetElementPtr.html

The test case is new. It will be precommitted if this patch is accepted.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jsilvanus created this revision.Jan 19 2023, 11:04 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 19 2023, 11:04 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

jsilvanus requested review of this revision.Jan 19 2023, 11:04 AM

Herald added a reviewer: mpaszkowski. · View Herald TranscriptJan 19 2023, 11:04 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added a subscriber: llvm-commits. · View Herald Transcript

jsilvanus added a reviewer: sebastian-ne.Jan 19 2023, 11:05 AM

Harbormaster completed remote builds in B208815: Diff 490601.Jan 19 2023, 11:05 AM

nikic added a reviewer: nikic.Jan 19 2023, 11:37 AM

nikic added a subscriber: nikic.

nikic added inline comments.

llvm/test/Transforms/InstCombine/load-gep-overalign.ll
1–2	Pass datalayout to opt instead. You should be able to use default DL in one case, and minimal DL (omitting irrelevant other type specifications) in the other.

Herald added a subscriber: StephenFan. · View Herald TranscriptJan 19 2023, 11:37 AM

Review feedback: Simplify testcase by passing DL to opt

Harbormaster completed remote builds in B208832: Diff 490628.Jan 19 2023, 12:18 PM

Thanks for the suggestion, that is much cleaner indeed. I updated the patch accordingly.

This change has two effects: First. we will no longer generate GEPs indexing into vectors during transforms. With opaque pointers, the main remaining transform of this kind is the one that canonicalized constant expression GEPs, and which is also the one leading to the miscompile here.

Second, this API is also used to index into constant aggregates based on offsets. For example, I expect that this change will regress operations like "load ptr at offset 4 of <2 x ptr> <ptr @g, ptr @g>, because we will now fail to index into the vector constant. Integer/float load folding should still work via the reinterpret load fold (which is, incidentally, the second one involved in the miscompile here, introducing those "padding" bytes). It can probably also negatively affect the global ctor evaluator.

I think overall, those optimization regressions are unlikely to matter (vector of pointer global initializers sound pretty exotic) and this does mitigate an active miscompile, so I'm okay with doing the change. We might want to undo it later, if we change our canonical form for constant GEPs.

So LGTM.

This revision is now accepted and ready to land.Jan 20 2023, 2:43 AM

Thanks for the review.
Your example of accessing a known pointer in a vector is interesting.

For the sake of completeness, it would be possible to restrict this patch to vectors of overaligned elements, which would still fix the first issue in the test case, but avoid possible regressions you mentioned.
But based on the discourse discussion I did not do that because it would introduce a pooly tested special case.

Moreover, none of the existing test cases is affected by the change, which I also took as an indication that performance regressions would be unlikely.

Update changed testcases.

Harbormaster completed remote builds in B209307: Diff 491272.Jan 23 2023, 2:49 AM

This revision was landed with ongoing or failed builds.Jan 23 2023, 4:26 AM

Closed by commit rGa4753f5dc0a9: [IR] Avoid creation of GEPs into vectors (in one place) (authored by jsilvanus). · Explain Why

This revision was automatically updated to reflect the committed changes.

jsilvanus added a commit: rGa4753f5dc0a9: [IR] Avoid creation of GEPs into vectors (in one place).

Revision Contents

Path

Size

llvm/

lib/

IR/

DataLayout.cpp

11 lines

test/

Transforms/

InstCombine/

load-gep-overalign.ll

10 lines

LowerMatrixIntrinsics/

multiply-fused-dominance.ll

8 lines

Diff 491305

llvm/lib/IR/DataLayout.cpp

	Show First 20 Lines • Show All 934 Lines • ▼ Show 20 Lines
	std::optional<APInt> DataLayout::getGEPIndexForOffset(Type *&ElemTy,			std::optional<APInt> DataLayout::getGEPIndexForOffset(Type *&ElemTy,
	APInt &Offset) const {			APInt &Offset) const {
	if (auto *ArrTy = dyn_cast<ArrayType>(ElemTy)) {			if (auto *ArrTy = dyn_cast<ArrayType>(ElemTy)) {
	ElemTy = ArrTy->getElementType();			ElemTy = ArrTy->getElementType();
	return getElementIndex(getTypeAllocSize(ElemTy), Offset);			return getElementIndex(getTypeAllocSize(ElemTy), Offset);
	}			}

	if (auto *VecTy = dyn_cast<VectorType>(ElemTy)) {			if (auto *VecTy = dyn_cast<VectorType>(ElemTy)) {
	ElemTy = VecTy->getElementType();			// Vector GEPs are partially broken (e.g. for overaligned element types),
	unsigned ElemSizeInBits = getTypeSizeInBits(ElemTy).getFixedValue();			// and may be forbidden in the future, so avoid generating GEPs into
	// GEPs over non-multiple of 8 size vector elements are invalid.			// vectors. See https://discourse.llvm.org/t/67497
	if (ElemSizeInBits % 8 != 0)
	return std::nullopt;			return std::nullopt;

	return getElementIndex(TypeSize::Fixed(ElemSizeInBits / 8), Offset);
	}			}

	if (auto *STy = dyn_cast<StructType>(ElemTy)) {			if (auto *STy = dyn_cast<StructType>(ElemTy)) {
	const StructLayout *SL = getStructLayout(STy);			const StructLayout *SL = getStructLayout(STy);
	uint64_t IntOffset = Offset.getZExtValue();			uint64_t IntOffset = Offset.getZExtValue();
	if (IntOffset >= SL->getSizeInBytes())			if (IntOffset >= SL->getSizeInBytes())
	return std::nullopt;			return std::nullopt;

	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/load-gep-overalign.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=instcombine -S %s \| FileCheck %s --check-prefix=NATURAL			; RUN: opt -passes=instcombine -S %s \| FileCheck %s --check-prefix=NATURAL
				nikicUnsubmitted Not Done Reply Inline Actions Pass datalayout to opt instead. You should be able to use default DL in one case, and minimal DL (omitting irrelevant other type specifications) in the other. nikic: Pass datalayout to opt instead. You should be able to use default DL in one case, and minimal…
	; RUN: opt --data-layout="i16:32" -passes=instcombine -S %s \| FileCheck %s --check-prefix=OVERALIGNED			; RUN: opt --data-layout="i16:32" -passes=instcombine -S %s \| FileCheck %s --check-prefix=OVERALIGNED

	; The data layouts are little endian, so @foo is 0x0123456789ABCDEF in memory.			; The data layouts are little endian, so @foo is 0x0123456789ABCDEF in memory.
	@foo = constant <4 x i16> <i16 u0x2301, i16 u0x6745, i16 u0xAB89, i16 u0xEFCD>, align 8			@foo = constant <4 x i16> <i16 u0x2301, i16 u0x6745, i16 u0xAB89, i16 u0xEFCD>, align 8

	declare void @report(i64 %index, i8 %val)			declare void @report(i64 %index, i8 %val)

	define void @test_vector_load_i8() {			define void @test_vector_load_i8() {
	; Access and report each individual byte in @foo.			; Access and report each individual byte in @foo.
	; OVERALIGNED and NATURAL should have the same result, because the layout of vectors ignores			; OVERALIGNED and NATURAL should have the same result, because the layout of vectors ignores
	; element type alignment, and thus the representation of @foo is the same in both cases.			; element type alignment, and thus the representation of @foo is the same in both cases.
	;			;
	; TODO: The OVERALIGNED result is incorrect.			; TODO: The OVERALIGNED result is incorrect, as apparently padding bytes
	; First, for nonzero even indices, the valid load is replaced by poison.
	; Second, the remaining bytes at indices >= 2 are also incorrect, as apparently padding bytes
	; are assumed as they would appear in an array. In vectors, there is no padding.			; are assumed as they would appear in an array. In vectors, there is no padding.
	;			;
	; NATURAL-LABEL: @test_vector_load_i8(			; NATURAL-LABEL: @test_vector_load_i8(
	; NATURAL-NEXT: call void @report(i64 0, i8 1)			; NATURAL-NEXT: call void @report(i64 0, i8 1)
	; NATURAL-NEXT: call void @report(i64 1, i8 35)			; NATURAL-NEXT: call void @report(i64 1, i8 35)
	; NATURAL-NEXT: call void @report(i64 2, i8 69)			; NATURAL-NEXT: call void @report(i64 2, i8 69)
	; NATURAL-NEXT: call void @report(i64 3, i8 103)			; NATURAL-NEXT: call void @report(i64 3, i8 103)
	; NATURAL-NEXT: call void @report(i64 4, i8 -119)			; NATURAL-NEXT: call void @report(i64 4, i8 -119)
	; NATURAL-NEXT: call void @report(i64 5, i8 -85)			; NATURAL-NEXT: call void @report(i64 5, i8 -85)
	; NATURAL-NEXT: call void @report(i64 6, i8 -51)			; NATURAL-NEXT: call void @report(i64 6, i8 -51)
	; NATURAL-NEXT: call void @report(i64 7, i8 -17)			; NATURAL-NEXT: call void @report(i64 7, i8 -17)
	; NATURAL-NEXT: ret void			; NATURAL-NEXT: ret void
	;			;
	; OVERALIGNED-LABEL: @test_vector_load_i8(			; OVERALIGNED-LABEL: @test_vector_load_i8(
	; OVERALIGNED-NEXT: call void @report(i64 0, i8 1)			; OVERALIGNED-NEXT: call void @report(i64 0, i8 1)
	; OVERALIGNED-NEXT: call void @report(i64 1, i8 35)			; OVERALIGNED-NEXT: call void @report(i64 1, i8 35)
	; OVERALIGNED-NEXT: call void @report(i64 2, i8 poison)			; OVERALIGNED-NEXT: call void @report(i64 2, i8 0)
	; OVERALIGNED-NEXT: call void @report(i64 3, i8 0)			; OVERALIGNED-NEXT: call void @report(i64 3, i8 0)
	; OVERALIGNED-NEXT: call void @report(i64 4, i8 poison)			; OVERALIGNED-NEXT: call void @report(i64 4, i8 69)
	; OVERALIGNED-NEXT: call void @report(i64 5, i8 103)			; OVERALIGNED-NEXT: call void @report(i64 5, i8 103)
	; OVERALIGNED-NEXT: call void @report(i64 6, i8 poison)			; OVERALIGNED-NEXT: call void @report(i64 6, i8 0)
	; OVERALIGNED-NEXT: call void @report(i64 7, i8 0)			; OVERALIGNED-NEXT: call void @report(i64 7, i8 0)
	; OVERALIGNED-NEXT: ret void			; OVERALIGNED-NEXT: ret void
	;			;
	%ptr0 = getelementptr i8, ptr @foo, i64 0			%ptr0 = getelementptr i8, ptr @foo, i64 0
	%res0 = load i8, ptr %ptr0, align 1			%res0 = load i8, ptr %ptr0, align 1
	call void @report(i64 0, i8 %res0)			call void @report(i64 0, i8 %res0)

	%ptr1 = getelementptr i8, ptr @foo, i64 1			%ptr1 = getelementptr i8, ptr @foo, i64 1
	Show All 29 Lines

llvm/test/Transforms/LowerMatrixIntrinsics/multiply-fused-dominance.ll

	Show First 20 Lines • Show All 182 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[COL_LOAD8:%.*]] = load <1 x double>, ptr [[TMP8]], align 8			; CHECK-NEXT: [[COL_LOAD8:%.*]] = load <1 x double>, ptr [[TMP8]], align 8
	; CHECK-NEXT: [[COL_LOAD9:%.*]] = load <1 x double>, ptr [[TMP3]], align 8			; CHECK-NEXT: [[COL_LOAD9:%.*]] = load <1 x double>, ptr [[TMP3]], align 8
	; CHECK-NEXT: [[TMP9:%.*]] = fmul contract <1 x double> [[COL_LOAD8]], [[COL_LOAD9]]			; CHECK-NEXT: [[TMP9:%.*]] = fmul contract <1 x double> [[COL_LOAD8]], [[COL_LOAD9]]
	; CHECK-NEXT: [[TMP10:%.*]] = getelementptr double, ptr [[A]], i64 3			; CHECK-NEXT: [[TMP10:%.*]] = getelementptr double, ptr [[A]], i64 3
	; CHECK-NEXT: [[COL_LOAD13:%.*]] = load <1 x double>, ptr [[TMP10]], align 8			; CHECK-NEXT: [[COL_LOAD13:%.*]] = load <1 x double>, ptr [[TMP10]], align 8
	; CHECK-NEXT: [[TMP11:%.*]] = getelementptr double, ptr [[TMP3]], i64 1			; CHECK-NEXT: [[TMP11:%.*]] = getelementptr double, ptr [[TMP3]], i64 1
	; CHECK-NEXT: [[COL_LOAD14:%.*]] = load <1 x double>, ptr [[TMP11]], align 8			; CHECK-NEXT: [[COL_LOAD14:%.*]] = load <1 x double>, ptr [[TMP11]], align 8
	; CHECK-NEXT: [[TMP12:%.*]] = call contract <1 x double> @llvm.fmuladd.v1f64(<1 x double> [[COL_LOAD13]], <1 x double> [[COL_LOAD14]], <1 x double> [[TMP9]])			; CHECK-NEXT: [[TMP12:%.*]] = call contract <1 x double> @llvm.fmuladd.v1f64(<1 x double> [[COL_LOAD13]], <1 x double> [[COL_LOAD14]], <1 x double> [[TMP9]])
	; CHECK-NEXT: [[TMP13:%.*]] = getelementptr <4 x double>, ptr [[C]], i64 42, i64 1			; CHECK-NEXT: [[TMP13:%.*]] = getelementptr i8, ptr [[C]], i64 1352
	; CHECK-NEXT: store <1 x double> [[TMP12]], ptr [[TMP13]], align 8			; CHECK-NEXT: store <1 x double> [[TMP12]], ptr [[TMP13]], align 8
	; CHECK-NEXT: [[COL_LOAD19:%.*]] = load <1 x double>, ptr [[A]], align 8			; CHECK-NEXT: [[COL_LOAD19:%.*]] = load <1 x double>, ptr [[A]], align 8
	; CHECK-NEXT: [[TMP14:%.*]] = getelementptr double, ptr [[TMP3]], i64 2			; CHECK-NEXT: [[TMP14:%.*]] = getelementptr double, ptr [[TMP3]], i64 2
	; CHECK-NEXT: [[COL_LOAD20:%.*]] = load <1 x double>, ptr [[TMP14]], align 8			; CHECK-NEXT: [[COL_LOAD20:%.*]] = load <1 x double>, ptr [[TMP14]], align 8
	; CHECK-NEXT: [[TMP15:%.*]] = fmul contract <1 x double> [[COL_LOAD19]], [[COL_LOAD20]]			; CHECK-NEXT: [[TMP15:%.*]] = fmul contract <1 x double> [[COL_LOAD19]], [[COL_LOAD20]]
	; CHECK-NEXT: [[TMP16:%.*]] = getelementptr double, ptr [[A]], i64 2			; CHECK-NEXT: [[TMP16:%.*]] = getelementptr double, ptr [[A]], i64 2
	; CHECK-NEXT: [[COL_LOAD24:%.*]] = load <1 x double>, ptr [[TMP16]], align 8			; CHECK-NEXT: [[COL_LOAD24:%.*]] = load <1 x double>, ptr [[TMP16]], align 8
	; CHECK-NEXT: [[TMP17:%.*]] = getelementptr double, ptr [[TMP3]], i64 3			; CHECK-NEXT: [[TMP17:%.*]] = getelementptr double, ptr [[TMP3]], i64 3
	; CHECK-NEXT: [[COL_LOAD25:%.*]] = load <1 x double>, ptr [[TMP17]], align 8			; CHECK-NEXT: [[COL_LOAD25:%.*]] = load <1 x double>, ptr [[TMP17]], align 8
	; CHECK-NEXT: [[TMP18:%.*]] = call contract <1 x double> @llvm.fmuladd.v1f64(<1 x double> [[COL_LOAD24]], <1 x double> [[COL_LOAD25]], <1 x double> [[TMP15]])			; CHECK-NEXT: [[TMP18:%.*]] = call contract <1 x double> @llvm.fmuladd.v1f64(<1 x double> [[COL_LOAD24]], <1 x double> [[COL_LOAD25]], <1 x double> [[TMP15]])
	; CHECK-NEXT: [[TMP19:%.*]] = getelementptr <4 x double>, ptr [[C]], i64 42, i64 2			; CHECK-NEXT: [[TMP19:%.*]] = getelementptr i8, ptr [[C]], i64 1360
	; CHECK-NEXT: store <1 x double> [[TMP18]], ptr [[TMP19]], align 8			; CHECK-NEXT: store <1 x double> [[TMP18]], ptr [[TMP19]], align 8
	; CHECK-NEXT: [[TMP20:%.*]] = getelementptr double, ptr [[A]], i64 1			; CHECK-NEXT: [[TMP20:%.*]] = getelementptr double, ptr [[A]], i64 1
	; CHECK-NEXT: [[COL_LOAD30:%.*]] = load <1 x double>, ptr [[TMP20]], align 8			; CHECK-NEXT: [[COL_LOAD30:%.*]] = load <1 x double>, ptr [[TMP20]], align 8
	; CHECK-NEXT: [[TMP21:%.*]] = getelementptr double, ptr [[TMP3]], i64 2			; CHECK-NEXT: [[TMP21:%.*]] = getelementptr double, ptr [[TMP3]], i64 2
	; CHECK-NEXT: [[COL_LOAD31:%.*]] = load <1 x double>, ptr [[TMP21]], align 8			; CHECK-NEXT: [[COL_LOAD31:%.*]] = load <1 x double>, ptr [[TMP21]], align 8
	; CHECK-NEXT: [[TMP22:%.*]] = fmul contract <1 x double> [[COL_LOAD30]], [[COL_LOAD31]]			; CHECK-NEXT: [[TMP22:%.*]] = fmul contract <1 x double> [[COL_LOAD30]], [[COL_LOAD31]]
	; CHECK-NEXT: [[TMP23:%.*]] = getelementptr double, ptr [[A]], i64 3			; CHECK-NEXT: [[TMP23:%.*]] = getelementptr double, ptr [[A]], i64 3
	; CHECK-NEXT: [[COL_LOAD35:%.*]] = load <1 x double>, ptr [[TMP23]], align 8			; CHECK-NEXT: [[COL_LOAD35:%.*]] = load <1 x double>, ptr [[TMP23]], align 8
	; CHECK-NEXT: [[TMP24:%.*]] = getelementptr double, ptr [[TMP3]], i64 3			; CHECK-NEXT: [[TMP24:%.*]] = getelementptr double, ptr [[TMP3]], i64 3
	; CHECK-NEXT: [[COL_LOAD36:%.*]] = load <1 x double>, ptr [[TMP24]], align 8			; CHECK-NEXT: [[COL_LOAD36:%.*]] = load <1 x double>, ptr [[TMP24]], align 8
	; CHECK-NEXT: [[TMP25:%.*]] = call contract <1 x double> @llvm.fmuladd.v1f64(<1 x double> [[COL_LOAD35]], <1 x double> [[COL_LOAD36]], <1 x double> [[TMP22]])			; CHECK-NEXT: [[TMP25:%.*]] = call contract <1 x double> @llvm.fmuladd.v1f64(<1 x double> [[COL_LOAD35]], <1 x double> [[COL_LOAD36]], <1 x double> [[TMP22]])
	; CHECK-NEXT: [[TMP26:%.*]] = getelementptr <4 x double>, ptr [[C]], i64 42, i64 3			; CHECK-NEXT: [[TMP26:%.*]] = getelementptr i8, ptr [[C]], i64 1368
	; CHECK-NEXT: store <1 x double> [[TMP25]], ptr [[TMP26]], align 8			; CHECK-NEXT: store <1 x double> [[TMP25]], ptr [[TMP26]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%a = load <4 x double>, ptr %A, align 8			%a = load <4 x double>, ptr %A, align 8
	%b = load <4 x double>, ptr %B, align 8			%b = load <4 x double>, ptr %B, align 8
	%c = call <4 x double> @llvm.matrix.multiply(<4 x double> %a, <4 x double> %b, i32 2, i32 2, i32 2)			%c = call <4 x double> @llvm.matrix.multiply(<4 x double> %a, <4 x double> %b, i32 2, i32 2, i32 2)
	%off.0 = add i32 10, 10			%off.0 = add i32 10, 10
	Show All 20 Lines
	; CHECK-NEXT: [[TMP1:%.*]] = call contract <2 x double> @llvm.fmuladd.v2f64(<2 x double> [[COL_LOAD1]], <2 x double> [[SPLAT_SPLAT13]], <2 x double> [[TMP0]])			; CHECK-NEXT: [[TMP1:%.*]] = call contract <2 x double> @llvm.fmuladd.v2f64(<2 x double> [[COL_LOAD1]], <2 x double> [[SPLAT_SPLAT13]], <2 x double> [[TMP0]])
	; CHECK-NEXT: [[COL_LOAD2:%.*]] = load <2 x double>, ptr [[B]], align 8			; CHECK-NEXT: [[COL_LOAD2:%.*]] = load <2 x double>, ptr [[B]], align 8
	; CHECK-NEXT: [[SPLAT_SPLAT7:%.*]] = shufflevector <2 x double> [[COL_LOAD2]], <2 x double> undef, <2 x i32> <i32 1, i32 1>			; CHECK-NEXT: [[SPLAT_SPLAT7:%.*]] = shufflevector <2 x double> [[COL_LOAD2]], <2 x double> undef, <2 x i32> <i32 1, i32 1>
	; CHECK-NEXT: [[SPLAT_SPLAT:%.*]] = shufflevector <2 x double> [[COL_LOAD2]], <2 x double> poison, <2 x i32> zeroinitializer			; CHECK-NEXT: [[SPLAT_SPLAT:%.*]] = shufflevector <2 x double> [[COL_LOAD2]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = fmul contract <2 x double> [[COL_LOAD]], [[SPLAT_SPLAT]]			; CHECK-NEXT: [[TMP2:%.*]] = fmul contract <2 x double> [[COL_LOAD]], [[SPLAT_SPLAT]]
	; CHECK-NEXT: [[TMP3:%.*]] = call contract <2 x double> @llvm.fmuladd.v2f64(<2 x double> [[COL_LOAD1]], <2 x double> [[SPLAT_SPLAT7]], <2 x double> [[TMP2]])			; CHECK-NEXT: [[TMP3:%.*]] = call contract <2 x double> @llvm.fmuladd.v2f64(<2 x double> [[COL_LOAD1]], <2 x double> [[SPLAT_SPLAT7]], <2 x double> [[TMP2]])
	; CHECK-NEXT: [[GEP_1:%.]] = getelementptr <4 x double>, ptr [[C:%.]], i64 26			; CHECK-NEXT: [[GEP_1:%.]] = getelementptr <4 x double>, ptr [[C:%.]], i64 26
	; CHECK-NEXT: store <2 x double> [[TMP3]], ptr [[GEP_1]], align 8			; CHECK-NEXT: store <2 x double> [[TMP3]], ptr [[GEP_1]], align 8
	; CHECK-NEXT: [[VEC_GEP14:%.*]] = getelementptr <4 x double>, ptr [[C]], i64 26, i64 2			; CHECK-NEXT: [[VEC_GEP14:%.*]] = getelementptr i8, ptr [[C]], i64 848
	; CHECK-NEXT: store <2 x double> [[TMP1]], ptr [[VEC_GEP14]], align 8			; CHECK-NEXT: store <2 x double> [[TMP1]], ptr [[VEC_GEP14]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%a = load <4 x double>, ptr %A, align 8			%a = load <4 x double>, ptr %A, align 8
	%b = load <4 x double>, ptr %B, align 8			%b = load <4 x double>, ptr %B, align 8
	%c = call <4 x double> @llvm.matrix.multiply(<4 x double> %a, <4 x double> %b, i32 2, i32 2, i32 2)			%c = call <4 x double> @llvm.matrix.multiply(<4 x double> %a, <4 x double> %b, i32 2, i32 2, i32 2)
	br label %next			br label %next
	▲ Show 20 Lines • Show All 114 Lines • Show Last 20 Lines