This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/test/
-
test/
-
CodeGen/
-
matrix-type-builtins.c
-
CodeGenCXX/
-
matrix-type-builtins.cpp
-
CodeGenObjC/
-
matrix-type-builtins.m
-
llvm/
-
docs/
-
LangRef.rst
-
include/llvm/IR/
-
llvm/
-
IR/
-
Intrinsics.td
-
MatrixBuilder.h
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
-
LowerMatrixIntrinsics.cpp
-
test/
-
Transforms/LowerMatrixIntrinsics/
-
LowerMatrixIntrinsics/
-
strided-load-double.ll
-
strided-store-double.ll
-
Verifier/
-
matrix-intrinsics.ll

Differential D107349

[Matrix] Overload stride arg in matrix.columnwise.load/store.
ClosedPublic

Authored by fhahn on Aug 3 2021, 6:48 AM.

Download Raw Diff

Details

Reviewers

rjmccall
anemet
thegameg
erichkeane

Commits

rGa1ef81de35a4: [Matrix] Overload stride arg in matrix.columnwise.load/store.

Summary

This patch adjusts the intrinsics definition of
llvm.matrix.column.major.load and llvm.matrix.column.major.store to
allow overloading the type of the stride. The bitwidth of the stride is
used to perform the offset computation.

This fixes a crash when using builtin_matrix_column_major_load or
builtin_matrix_column_major_store on 32 bit platforms. The stride argument
of the builtins are defined as size_t, which is 32 bits wide on 32 bit
platforms.

Note that we still perform offset computations with 64 bit width on 32
bit platforms for accesses that do not take a user-specified stride.
This can be fixed separately.

Fixes PR51304.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.Aug 3 2021, 6:48 AM

Herald added subscribers: dexonsmith, jdoerfert, tschuett, hiraditya. · View Herald TranscriptAug 3 2021, 6:48 AM

fhahn requested review of this revision.Aug 3 2021, 6:48 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 3 2021, 6:48 AM

This looks fine to me, and seems to fix my problem, thanks! I didn't spot anything obvious,and proof-read the LangRef and think it is all fine, but am not really the expert here, so please don't commit without the others having a day or two to comment.

This revision is now accepted and ready to land.Aug 3 2021, 6:55 AM

Harbormaster completed remote builds in B117626: Diff 363723.Aug 3 2021, 7:38 AM

This revision was landed with ongoing or failed builds.Aug 12 2021, 3:07 AM

Closed by commit rGa1ef81de35a4: [Matrix] Overload stride arg in matrix.columnwise.load/store. (authored by fhahn). · Explain Why

This revision was automatically updated to reflect the committed changes.

fhahn added a commit: rGa1ef81de35a4: [Matrix] Overload stride arg in matrix.columnwise.load/store..

In D107349#2922208, @erichkeane wrote:

This looks fine to me, and seems to fix my problem, thanks! I didn't spot anything obvious,and proof-read the LangRef and think it is all fine, but am not really the expert here, so please don't commit without the others having a day or two to comment.

Thanks! I landed the change as there have not been any further comments in a while.

It seems this patch caused a test failure in MLIR:
test/Target/LLVMIR/llvmir-intrinsics.mlir

mehdi_amini added a reverting change: rG28c04794df74: Revert "[Matrix] Overload stride arg in matrix.columnwise.load/store.".Aug 12 2021, 4:57 AM

Revert to unbreak bots (like this one : https://lab.llvm.org/buildbot/#/builders/13/builds/10930 )

In D107349#2941257, @mehdi_amini wrote:

Revert to unbreak bots (like this one : https://lab.llvm.org/buildbot/#/builders/13/builds/10930 )

Looks like this should be a pretty trivial test update at least.

In D107349#2941385, @erichkeane wrote:

In D107349#2941257, @mehdi_amini wrote:

Revert to unbreak bots (like this one : https://lab.llvm.org/buildbot/#/builders/13/builds/10930 )

Looks like this should be a pretty trivial test update at least.

Thanks for the heads-up! Recommitted with an updated MLIR test: f999312872b8

Revision Contents

Path

Size

clang/

test/

CodeGen/

matrix-type-builtins.c

300 lines

CodeGenCXX/

matrix-type-builtins.cpp

22 lines

CodeGenObjC/

matrix-type-builtins.m

4 lines

llvm/

docs/

LangRef.rst

14 lines

include/

llvm/

IR/

Intrinsics.td

4 lines

MatrixBuilder.h

4 lines

lib/

Transforms/

Scalar/

LowerMatrixIntrinsics.cpp

13 lines

test/

Transforms/

LowerMatrixIntrinsics/

strided-load-double.ll

33 lines

strided-store-double.ll

33 lines

Verifier/

matrix-intrinsics.ll

44 lines

Diff 365949

clang/test/CodeGen/matrix-type-builtins.c

	// RUN: %clang_cc1 -fenable-matrix -triple x86_64-apple-darwin %s -emit-llvm -disable-llvm-passes -o - \| FileCheck %s			// RUN: %clang_cc1 -fenable-matrix -triple x86_64-apple-darwin %s -emit-llvm -disable-llvm-passes -o - \| FileCheck --check-prefixes=COMMON,CHECK64 %s
				// RUN: %clang_cc1 -fenable-matrix -triple i386-apple-darwin %s -emit-llvm -disable-llvm-passes -o - \| FileCheck --check-prefixes=COMMON,CHECK32 %s

	// Also check we do not crash when running some middle-end passes. Most			// Also check we do not crash when running some middle-end passes. Most
	// importantly this includes the IR verifier, to ensure we emit valid IR.			// importantly this includes the IR verifier, to ensure we emit valid IR.
	// RUN: %clang_cc1 -fenable-matrix -emit-llvm -triple x86_64-apple-darwin %s -o %t			// RUN: %clang_cc1 -fenable-matrix -emit-llvm -triple x86_64-apple-darwin %s -o %t

	// Tests for the matrix type builtins.			// Tests for the matrix type builtins.

	typedef double dx5x5_t __attribute__((matrix_type(5, 5)));			typedef double dx5x5_t __attribute__((matrix_type(5, 5)));
	typedef float fx2x3_t __attribute__((matrix_type(2, 3)));			typedef float fx2x3_t __attribute__((matrix_type(2, 3)));
	typedef float fx3x2_t __attribute__((matrix_type(3, 2)));			typedef float fx3x2_t __attribute__((matrix_type(3, 2)));
	typedef int ix20x4_t __attribute__((matrix_type(20, 4)));			typedef int ix20x4_t __attribute__((matrix_type(20, 4)));
	typedef int ix4x20_t __attribute__((matrix_type(4, 20)));			typedef int ix4x20_t __attribute__((matrix_type(4, 20)));
	typedef unsigned ux1x6_t __attribute__((matrix_type(1, 6)));			typedef unsigned ux1x6_t __attribute__((matrix_type(1, 6)));
	typedef unsigned ux6x1_t __attribute__((matrix_type(6, 1)));			typedef unsigned ux6x1_t __attribute__((matrix_type(6, 1)));

	void transpose_double_5x5(dx5x5_t *a) {			void transpose_double_5x5(dx5x5_t *a) {
	// CHECK-LABEL: define{{.*}} void @transpose_double_5x5(			// COMMON-LABEL: define{{.*}} void @transpose_double_5x5(
	// CHECK: [[A:%.]] = load <25 x double>, <25 x double> {{.*}}, align 8			// CHECK32: [[A:%.]] = load <25 x double>, <25 x double> {{.*}}, align 4
	// CHECK-NEXT: [[TRANS:%.*]] = call <25 x double> @llvm.matrix.transpose.v25f64(<25 x double> [[A]], i32 5, i32 5)			// CHECK64: [[A:%.]] = load <25 x double>, <25 x double> {{.*}}, align 8
	// CHECK-NEXT: [[AT_ADDR:%.]] = bitcast [25 x double] %a_t to <25 x double>*			// COMMON-NEXT: [[TRANS:%.*]] = call <25 x double> @llvm.matrix.transpose.v25f64(<25 x double> [[A]], i32 5, i32 5)
	// CHECK-NEXT: store <25 x double> [[TRANS]], <25 x double>* [[AT_ADDR]], align 8			// COMMON-NEXT: [[AT_ADDR:%.]] = bitcast [25 x double] %a_t to <25 x double>*
				// CHECK32-NEXT: store <25 x double> [[TRANS]], <25 x double>* [[AT_ADDR]], align 4
				// CHECK64-NEXT: store <25 x double> [[TRANS]], <25 x double>* [[AT_ADDR]], align 8

	dx5x5_t a_t = __builtin_matrix_transpose(*a);			dx5x5_t a_t = __builtin_matrix_transpose(*a);
	}			}

	void transpose_float_3x2(fx3x2_t *a) {			void transpose_float_3x2(fx3x2_t *a) {
	// CHECK-LABEL: define{{.*}} void @transpose_float_3x2(			// COMMON-LABEL: define{{.*}} void @transpose_float_3x2(
	// CHECK: [[A:%.]] = load <6 x float>, <6 x float> {{.*}}, align 4			// COMMON: [[A:%.]] = load <6 x float>, <6 x float> {{.*}}, align 4
	// CHECK-NEXT: [[TRANS:%.*]] = call <6 x float> @llvm.matrix.transpose.v6f32(<6 x float> [[A]], i32 3, i32 2)			// COMMON-NEXT: [[TRANS:%.*]] = call <6 x float> @llvm.matrix.transpose.v6f32(<6 x float> [[A]], i32 3, i32 2)
	// CHECK-NEXT: [[AT_ADDR:%.]] = bitcast [6 x float] %a_t to <6 x float>*			// COMMON-NEXT: [[AT_ADDR:%.]] = bitcast [6 x float] %a_t to <6 x float>*
	// CHECK-NEXT: store <6 x float> [[TRANS]], <6 x float>* [[AT_ADDR]], align 4			// COMMON-NEXT: store <6 x float> [[TRANS]], <6 x float>* [[AT_ADDR]], align 4

	fx2x3_t a_t = __builtin_matrix_transpose(*a);			fx2x3_t a_t = __builtin_matrix_transpose(*a);
	}			}

	void transpose_int_20x4(ix20x4_t *a) {			void transpose_int_20x4(ix20x4_t *a) {
	// CHECK-LABEL: define{{.*}} void @transpose_int_20x4(			// COMMON-LABEL: define{{.*}} void @transpose_int_20x4(
	// CHECK: [[A:%.]] = load <80 x i32>, <80 x i32> {{.*}}, align 4			// COMMON: [[A:%.]] = load <80 x i32>, <80 x i32> {{.*}}, align 4
	// CHECK-NEXT: [[TRANS:%.*]] = call <80 x i32> @llvm.matrix.transpose.v80i32(<80 x i32> [[A]], i32 20, i32 4)			// COMMON-NEXT: [[TRANS:%.*]] = call <80 x i32> @llvm.matrix.transpose.v80i32(<80 x i32> [[A]], i32 20, i32 4)
	// CHECK-NEXT: [[AT_ADDR:%.]] = bitcast [80 x i32] %a_t to <80 x i32>*			// COMMON-NEXT: [[AT_ADDR:%.]] = bitcast [80 x i32] %a_t to <80 x i32>*
	// CHECK-NEXT: store <80 x i32> [[TRANS]], <80 x i32>* [[AT_ADDR]], align 4			// COMMON-NEXT: store <80 x i32> [[TRANS]], <80 x i32>* [[AT_ADDR]], align 4

	ix4x20_t a_t = __builtin_matrix_transpose(*a);			ix4x20_t a_t = __builtin_matrix_transpose(*a);
	}			}

	struct Foo {			struct Foo {
	ux1x6_t in;			ux1x6_t in;
	ux6x1_t out;			ux6x1_t out;
	};			};

	void transpose_struct_member(struct Foo *F) {			void transpose_struct_member(struct Foo *F) {
	// CHECK-LABEL: define{{.*}} void @transpose_struct_member(			// COMMON-LABEL: define{{.*}} void @transpose_struct_member(
	// CHECK: [[M:%.]] = load <6 x i32>, <6 x i32> {{.*}}, align 4			// COMMON: [[M:%.]] = load <6 x i32>, <6 x i32> {{.*}}, align 4
	// CHECK-NEXT: [[M_T:%.*]] = call <6 x i32> @llvm.matrix.transpose.v6i32(<6 x i32> [[M]], i32 1, i32 6)			// COMMON-NEXT: [[M_T:%.*]] = call <6 x i32> @llvm.matrix.transpose.v6i32(<6 x i32> [[M]], i32 1, i32 6)
	// CHECK-NEXT: [[F_ADDR:%.]] = load %struct.Foo, %struct.Foo** %F.addr, align 8			// CHECK32-NEXT: [[F_ADDR:%.]] = load %struct.Foo, %struct.Foo** %F.addr, align 4
	// CHECK-NEXT: [[OUT_PTR:%.]] = getelementptr inbounds %struct.Foo, %struct.Foo [[F_ADDR]], i32 0, i32 1			// CHECK64-NEXT: [[F_ADDR:%.]] = load %struct.Foo, %struct.Foo** %F.addr, align 8
	// CHECK-NEXT: [[OUT_PTR_C:%.]] = bitcast [6 x i32] [[OUT_PTR]] to <6 x i32>*			// COMMON-NEXT: [[OUT_PTR:%.]] = getelementptr inbounds %struct.Foo, %struct.Foo [[F_ADDR]], i32 0, i32 1
	// CHECK-NEXT: store <6 x i32> [[M_T]], <6 x i32>* [[OUT_PTR_C]], align 4			// COMMON-NEXT: [[OUT_PTR_C:%.]] = bitcast [6 x i32] [[OUT_PTR]] to <6 x i32>*
				// COMMON-NEXT: store <6 x i32> [[M_T]], <6 x i32>* [[OUT_PTR_C]], align 4

	F->out = __builtin_matrix_transpose(F->in);			F->out = __builtin_matrix_transpose(F->in);
	}			}

	void transpose_transpose_struct_member(struct Foo *F) {			void transpose_transpose_struct_member(struct Foo *F) {
	// CHECK-LABEL: define{{.*}} void @transpose_transpose_struct_member(			// COMMON-LABEL: define{{.*}} void @transpose_transpose_struct_member(
	// CHECK: [[M:%.]] = load <6 x i32>, <6 x i32> {{.*}}, align 4			// COMMON: [[M:%.]] = load <6 x i32>, <6 x i32> {{.*}}, align 4
	// CHECK-NEXT: [[M_T:%.*]] = call <6 x i32> @llvm.matrix.transpose.v6i32(<6 x i32> [[M]], i32 1, i32 6)			// COMMON-NEXT: [[M_T:%.*]] = call <6 x i32> @llvm.matrix.transpose.v6i32(<6 x i32> [[M]], i32 1, i32 6)
	// CHECK-NEXT: [[M_T2:%.*]] = call <6 x i32> @llvm.matrix.transpose.v6i32(<6 x i32> [[M_T]], i32 6, i32 1)			// COMMON-NEXT: [[M_T2:%.*]] = call <6 x i32> @llvm.matrix.transpose.v6i32(<6 x i32> [[M_T]], i32 6, i32 1)
	// CHECK-NEXT: [[F_ADDR:%.]] = load %struct.Foo, %struct.Foo** %F.addr, align 8			// CHECK32-NEXT: [[F_ADDR:%.]] = load %struct.Foo, %struct.Foo** %F.addr, align 4
	// CHECK-NEXT: [[IN_PTR:%.]] = getelementptr inbounds %struct.Foo, %struct.Foo [[F_ADDR]], i32 0, i32 0			// CHECK64-NEXT: [[F_ADDR:%.]] = load %struct.Foo, %struct.Foo** %F.addr, align 8
	// CHECK-NEXT: [[IN_PTR_C:%.]] = bitcast [6 x i32] [[IN_PTR]] to <6 x i32>*			// COMMON-NEXT: [[IN_PTR:%.]] = getelementptr inbounds %struct.Foo, %struct.Foo [[F_ADDR]], i32 0, i32 0
	// CHECK-NEXT: store <6 x i32> [[M_T2]], <6 x i32>* [[IN_PTR_C]], align 4			// COMMON-NEXT: [[IN_PTR_C:%.]] = bitcast [6 x i32] [[IN_PTR]] to <6 x i32>*
				// COMMON-NEXT: store <6 x i32> [[M_T2]], <6 x i32>* [[IN_PTR_C]], align 4

	F->in = __builtin_matrix_transpose(__builtin_matrix_transpose(F->in));			F->in = __builtin_matrix_transpose(__builtin_matrix_transpose(F->in));
	}			}

	dx5x5_t get_matrix();			dx5x5_t get_matrix();

	void transpose_rvalue() {			void transpose_rvalue() {
	// CHECK-LABEL: define{{.*}} void @transpose_rvalue()			// COMMON-LABEL: define{{.*}} void @transpose_rvalue()
	// CHECK-NEXT: entry:			// COMMON-NEXT: entry:
	// CHECK-NEXT: [[M_T_ADDR:%.*]] = alloca [25 x double], align 8			// CHECK32-NEXT: [[M_T_ADDR:%.*]] = alloca [25 x double], align 4
	// CHECK-NEXT: [[CALL:%.*]] = call <25 x double> (...) @get_matrix()			// CHECK64-NEXT: [[M_T_ADDR:%.*]] = alloca [25 x double], align 8
	// CHECK-NEXT: [[M_T:%.*]] = call <25 x double> @llvm.matrix.transpose.v25f64(<25 x double> [[CALL]], i32 5, i32 5)			// CHECK32-NEXT: [[CALL:%.]] = call <25 x double> bitcast (<25 x double> (...) @get_matrix to <25 x double> ()*)()
	// CHECK-NEXT: [[M_T_ADDR_C:%.]] = bitcast [25 x double] [[M_T_ADDR]] to <25 x double>*			// CHECK64-NEXT: [[CALL:%.*]] = call <25 x double> (...) @get_matrix()
	// CHECK-NEXT: store <25 x double> [[M_T]], <25 x double>* [[M_T_ADDR_C]], align 8			// COMMON-NEXT: [[M_T:%.*]] = call <25 x double> @llvm.matrix.transpose.v25f64(<25 x double> [[CALL]], i32 5, i32 5)
				// COMMON-NEXT: [[M_T_ADDR_C:%.]] = bitcast [25 x double] [[M_T_ADDR]] to <25 x double>*
				// CHECK32-NEXT: store <25 x double> [[M_T]], <25 x double>* [[M_T_ADDR_C]], align 4
				// CHECK64-NEXT: store <25 x double> [[M_T]], <25 x double>* [[M_T_ADDR_C]], align 8

	dx5x5_t m_t = __builtin_matrix_transpose(get_matrix());			dx5x5_t m_t = __builtin_matrix_transpose(get_matrix());
	}			}

	const dx5x5_t global_matrix;			const dx5x5_t global_matrix;

	void transpose_global() {			void transpose_global() {
	// CHECK-LABEL: define{{.*}} void @transpose_global()			// COMMON-LABEL: define{{.*}} void @transpose_global()
	// CHECK-NEXT: entry:			// COMMON-NEXT: entry:
	// CHECK-NEXT: [[M_T_ADDR:%.*]] = alloca [25 x double], align 8			// CHECK32-NEXT: [[M_T_ADDR:%.*]] = alloca [25 x double], align 4
	// CHECK-NEXT: [[GLOBAL_MATRIX:%.]] = load <25 x double>, <25 x double> bitcast ([25 x double]* @global_matrix to <25 x double>*), align 8			// CHECK32-NEXT: [[GLOBAL_MATRIX:%.]] = load <25 x double>, <25 x double> bitcast ([25 x double]* @global_matrix to <25 x double>*), align 4
	// CHECK-NEXT: [[M_T:%.*]] = call <25 x double> @llvm.matrix.transpose.v25f64(<25 x double> [[GLOBAL_MATRIX]], i32 5, i32 5)			// CHECK64-NEXT: [[M_T_ADDR:%.*]] = alloca [25 x double], align 8
	// CHECK-NEXT: [[M_T_ADDR_C:%.]] = bitcast [25 x double] [[M_T_ADDR]] to <25 x double>*			// CHECK64-NEXT: [[GLOBAL_MATRIX:%.]] = load <25 x double>, <25 x double> bitcast ([25 x double]* @global_matrix to <25 x double>*), align 8
	// CHECK-NEXT: store <25 x double> [[M_T]], <25 x double>* [[M_T_ADDR_C]], align 8			// COMMON-NEXT: [[M_T:%.*]] = call <25 x double> @llvm.matrix.transpose.v25f64(<25 x double> [[GLOBAL_MATRIX]], i32 5, i32 5)
				// COMMON-NEXT: [[M_T_ADDR_C:%.]] = bitcast [25 x double] [[M_T_ADDR]] to <25 x double>*
				// CHECK32-NEXT: store <25 x double> [[M_T]], <25 x double>* [[M_T_ADDR_C]], align 4
				// CHECK64-NEXT: store <25 x double> [[M_T]], <25 x double>* [[M_T_ADDR_C]], align 8

	dx5x5_t m_t = __builtin_matrix_transpose(global_matrix);			dx5x5_t m_t = __builtin_matrix_transpose(global_matrix);
	}			}

	void column_major_load_with_const_stride_double(double *Ptr) {			void column_major_load_with_const_stride_double(double *Ptr) {
	// CHECK-LABEL: define{{.}} void @column_major_load_with_const_stride_double(double %Ptr)			// COMMON-LABEL: define{{.}} void @column_major_load_with_const_stride_double(double %Ptr)
	// CHECK: [[PTR:%.]] = load double, double** %Ptr.addr, align 8			// CHECK32: [[PTR:%.]] = load double, double** %Ptr.addr, align 4
	// CHECK-NEXT: call <25 x double> @llvm.matrix.column.major.load.v25f64(double* align 8 [[PTR]], i64 5, i1 false, i32 5, i32 5)			// CHECK32-NEXT: call <25 x double> @llvm.matrix.column.major.load.v25f64.i32(double* align 4 [[PTR]], i32 5, i1 false, i32 5, i32 5)
				// CHECK64: [[PTR:%.]] = load double, double** %Ptr.addr, align 8
				// CHECK64-NEXT: call <25 x double> @llvm.matrix.column.major.load.v25f64.i64(double* align 8 [[PTR]], i64 5, i1 false, i32 5, i32 5)

	dx5x5_t m_a1 = __builtin_matrix_column_major_load(Ptr, 5, 5, 5);			dx5x5_t m_a1 = __builtin_matrix_column_major_load(Ptr, 5, 5, 5);
	}			}

	void column_major_load_with_const_stride2_double(double *Ptr) {			void column_major_load_with_const_stride2_double(double *Ptr) {
	// CHECK-LABEL: define{{.}} void @column_major_load_with_const_stride2_double(double %Ptr)			// COMMON-LABEL: define{{.}} void @column_major_load_with_const_stride2_double(double %Ptr)
	// CHECK: [[PTR:%.]] = load double, double** %Ptr.addr, align 8			// CHECK32: [[PTR:%.]] = load double, double** %Ptr.addr, align 4
	// CHECK-NEXT: call <25 x double> @llvm.matrix.column.major.load.v25f64(double* align 8 [[PTR]], i64 15, i1 false, i32 5, i32 5)			// CHECK32-NEXT: call <25 x double> @llvm.matrix.column.major.load.v25f64.i32(double* align 4 [[PTR]], i32 15, i1 false, i32 5, i32 5)
				// CHECK64: [[PTR:%.]] = load double, double** %Ptr.addr, align 8
				// CHECK64-NEXT: call <25 x double> @llvm.matrix.column.major.load.v25f64.i64(double* align 8 [[PTR]], i64 15, i1 false, i32 5, i32 5)

	dx5x5_t m_a2 = __builtin_matrix_column_major_load(Ptr, 5, 5, 2 * 3 + 9);			dx5x5_t m_a2 = __builtin_matrix_column_major_load(Ptr, 5, 5, 2 * 3 + 9);
	}			}

	void column_major_load_with_variable_stride_ull_float(float *Ptr, unsigned long long S) {			void column_major_load_with_variable_stride_ull_float(float *Ptr, unsigned long long S) {
	// CHECK-LABEL: define{{.}} void @column_major_load_with_variable_stride_ull_float(float %Ptr, i64 %S)			// COMMON-LABEL: define{{.}} void @column_major_load_with_variable_stride_ull_float(float %Ptr, i64 %S)
	// CHECK: [[S:%.]] = load i64, i64 %S.addr, align 8			// CHECK32: [[S:%.]] = load i64, i64 %S.addr, align 8
	// CHECK-NEXT: [[PTR:%.]] = load float, float** %Ptr.addr, align 8			// CHECK32-NEXT: [[STRIDE_TRUNC:%.*]] = trunc i64 [[S]] to i32
	// CHECK-NEXT: call <6 x float> @llvm.matrix.column.major.load.v6f32(float* align 4 [[PTR]], i64 [[S]], i1 false, i32 2, i32 3)			// CHECK32-NEXT: [[PTR:%.]] = load float, float** %Ptr.addr, align 4
				// CHECK32-NEXT: call <6 x float> @llvm.matrix.column.major.load.v6f32.i32(float* align 4 [[PTR]], i32 [[STRIDE_TRUNC]], i1 false, i32 2, i32 3)

				// CHECK64: [[S:%.]] = load i64, i64 %S.addr, align 8
				// CHECK64-NEXT: [[PTR:%.]] = load float, float** %Ptr.addr, align 8
				// CHECK64-NEXT: call <6 x float> @llvm.matrix.column.major.load.v6f32.i64(float* align 4 [[PTR]], i64 [[S]], i1 false, i32 2, i32 3)

	fx2x3_t m_b = __builtin_matrix_column_major_load(Ptr, 2, 3, S);			fx2x3_t m_b = __builtin_matrix_column_major_load(Ptr, 2, 3, S);
	}			}

	void column_major_load_with_stride_math_int(int *Ptr, int S) {			void column_major_load_with_stride_math_int(int *Ptr, int S) {
	// CHECK-LABEL: define{{.}} void @column_major_load_with_stride_math_int(i32 %Ptr, i32 %S)			// COMMON-LABEL: define{{.}} void @column_major_load_with_stride_math_int(i32 %Ptr, i32 %S)
	// CHECK: [[S:%.]] = load i32, i32 %S.addr, align 4			// COMMON: [[S:%.]] = load i32, i32 %S.addr, align 4
	// CHECK-NEXT: [[STRIDE:%.*]] = add nsw i32 [[S]], 32			// COMMON-NEXT: [[STRIDE:%.*]] = add nsw i32 [[S]], 32
	// CHECK-NEXT: [[STRIDE_EXT:%.*]] = sext i32 [[STRIDE]] to i64			// CHECK32-NEXT: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 4
	// CHECK-NEXT: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8			// CHECK32-NEXT: call <80 x i32> @llvm.matrix.column.major.load.v80i32.i32(i32* align 4 [[PTR]], i32 [[STRIDE]], i1 false, i32 4, i32 20)
	// CHECK-NEXT: call <80 x i32> @llvm.matrix.column.major.load.v80i32(i32* align 4 [[PTR]], i64 [[STRIDE_EXT]], i1 false, i32 4, i32 20)			//
				// CHECK64-NEXT: [[STRIDE_EXT:%.*]] = sext i32 [[STRIDE]] to i64
				// CHECK64-NEXT: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8
				// CHECK64-NEXT: call <80 x i32> @llvm.matrix.column.major.load.v80i32.i64(i32* align 4 [[PTR]], i64 [[STRIDE_EXT]], i1 false, i32 4, i32 20)

	ix4x20_t m_c = __builtin_matrix_column_major_load(Ptr, 4, 20, S + 32);			ix4x20_t m_c = __builtin_matrix_column_major_load(Ptr, 4, 20, S + 32);
	}			}

	void column_major_load_with_stride_math_s_int(int *Ptr, short S) {			void column_major_load_with_stride_math_s_int(int *Ptr, short S) {
	// CHECK-LABEL: define{{.}} void @column_major_load_with_stride_math_s_int(i32 %Ptr, i16 signext %S)			// COMMON-LABEL: define{{.}} void @column_major_load_with_stride_math_s_int(i32 %Ptr, i16 signext %S)
	// CHECK: [[S:%.]] = load i16, i16 %S.addr, align 2			// COMMON: [[S:%.]] = load i16, i16 %S.addr, align 2
	// CHECK-NEXT: [[S_EXT:%.*]] = sext i16 [[S]] to i32			// COMMON-NEXT: [[S_EXT:%.*]] = sext i16 [[S]] to i32
	// CHECK-NEXT: [[STRIDE:%.*]] = add nsw i32 [[S_EXT]], 32			// COMMON-NEXT: [[STRIDE:%.*]] = add nsw i32 [[S_EXT]], 32
	// CHECK-NEXT: [[STRIDE_EXT:%.*]] = sext i32 [[STRIDE]] to i64			// CHECK32-NEXT: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 4
	// CHECK-NEXT: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8			// CHECK32-NEXT: %matrix = call <80 x i32> @llvm.matrix.column.major.load.v80i32.i32(i32* align 4 [[PTR]], i32 [[STRIDE]], i1 false, i32 4, i32 20)
	// CHECK-NEXT: %matrix = call <80 x i32> @llvm.matrix.column.major.load.v80i32(i32* align 4 [[PTR]], i64 [[STRIDE_EXT]], i1 false, i32 4, i32 20)			//
				// CHECK64-NEXT: [[STRIDE_EXT:%.*]] = sext i32 [[STRIDE]] to i64
				// CHECK64-NEXT: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8
				// CHECK64-NEXT: %matrix = call <80 x i32> @llvm.matrix.column.major.load.v80i32.i64(i32* align 4 [[PTR]], i64 [[STRIDE_EXT]], i1 false, i32 4, i32 20)

	ix4x20_t m_c = __builtin_matrix_column_major_load(Ptr, 4, 20, S + 32);			ix4x20_t m_c = __builtin_matrix_column_major_load(Ptr, 4, 20, S + 32);
	}			}

	void column_major_load_array1(double Ptr[25]) {			void column_major_load_array1(double Ptr[25]) {
	// CHECK-LABEL: define{{.}} void @column_major_load_array1(double %Ptr)			// COMMON-LABEL: define{{.}} void @column_major_load_array1(double %Ptr)
	// CHECK: [[ADDR:%.]] = load double, double** %Ptr.addr, align 8			// CHECK32: [[ADDR:%.]] = load double, double** %Ptr.addr, align 4
	// CHECK-NEXT: call <25 x double> @llvm.matrix.column.major.load.v25f64(double* align 8 [[ADDR]], i64 5, i1 false, i32 5, i32 5)			// CHECK32-NEXT: call <25 x double> @llvm.matrix.column.major.load.v25f64.i32(double* align 4 [[ADDR]], i32 5, i1 false, i32 5, i32 5)

				// CHECK64: [[ADDR:%.]] = load double, double** %Ptr.addr, align 8
				// CHECK64-NEXT: call <25 x double> @llvm.matrix.column.major.load.v25f64.i64(double* align 8 [[ADDR]], i64 5, i1 false, i32 5, i32 5)

	dx5x5_t m = __builtin_matrix_column_major_load(Ptr, 5, 5, 5);			dx5x5_t m = __builtin_matrix_column_major_load(Ptr, 5, 5, 5);
	}			}

	void column_major_load_array2() {			void column_major_load_array2() {
	// CHECK-LABEL: define{{.*}} void @column_major_load_array2() #0 {			// COMMON-LABEL: define{{.*}} void @column_major_load_array2() #0 {
	// CHECK-NEXT: entry:			// COMMON-NEXT: entry:
	// CHECK-NEXT: [[PTR:%.*]] = alloca [25 x double], align 16			// CHECK32-NEXT: [[PTR:%.*]] = alloca [25 x double], align 8
	// CHECK: [[ARRAY_DEC:%.]] = getelementptr inbounds [25 x double], [25 x double] [[PTR]], i64 0, i64 0			// CHECK32: [[ARRAY_DEC:%.]] = getelementptr inbounds [25 x double], [25 x double] [[PTR]], i32 0, i32 0
	// CHECK-NEXT: call <25 x double> @llvm.matrix.column.major.load.v25f64(double* align 16 [[ARRAY_DEC]], i64 5, i1 false, i32 5, i32 5)			// CHECK32-NEXT: call <25 x double> @llvm.matrix.column.major.load.v25f64.i32(double* align 8 [[ARRAY_DEC]], i32 5, i1 false, i32 5, i32 5)

				// CHECK64-NEXT: [[PTR:%.*]] = alloca [25 x double], align 16
				// CHECK64: [[ARRAY_DEC:%.]] = getelementptr inbounds [25 x double], [25 x double] [[PTR]], i64 0, i64 0
				// CHECK64-NEXT: call <25 x double> @llvm.matrix.column.major.load.v25f64.i64(double* align 16 [[ARRAY_DEC]], i64 5, i1 false, i32 5, i32 5)

	double Ptr[25];			double Ptr[25];
	dx5x5_t m = __builtin_matrix_column_major_load(Ptr, 5, 5, 5);			dx5x5_t m = __builtin_matrix_column_major_load(Ptr, 5, 5, 5);
	}			}

	void column_major_load_const(const double *Ptr) {			void column_major_load_const(const double *Ptr) {
	// CHECK-LABEL: define{{.}} void @column_major_load_const(double %Ptr)			// COMMON-LABEL: define{{.}} void @column_major_load_const(double %Ptr)
	// CHECK: [[PTR:%.]] = load double, double** %Ptr.addr, align 8			// CHECK32: [[PTR:%.]] = load double, double** %Ptr.addr, align 4
	// CHECK-NEXT: call <25 x double> @llvm.matrix.column.major.load.v25f64(double* align 8 [[PTR]], i64 5, i1 false, i32 5, i32 5)			// CHECK32-NEXT: call <25 x double> @llvm.matrix.column.major.load.v25f64.i32(double* align 4 [[PTR]], i32 5, i1 false, i32 5, i32 5)
				//
				// CHECK64: [[PTR:%.]] = load double, double** %Ptr.addr, align 8
				// CHECK64-NEXT: call <25 x double> @llvm.matrix.column.major.load.v25f64.i64(double* align 8 [[PTR]], i64 5, i1 false, i32 5, i32 5)

	dx5x5_t m_a1 = __builtin_matrix_column_major_load(Ptr, 5, 5, 5);			dx5x5_t m_a1 = __builtin_matrix_column_major_load(Ptr, 5, 5, 5);
	}			}

	void column_major_load_volatile(volatile double *Ptr) {			void column_major_load_volatile(volatile double *Ptr) {
	// CHECK-LABEL: define{{.}} void @column_major_load_volatile(double %Ptr)			// COMMON-LABEL: define{{.}} void @column_major_load_volatile(double %Ptr)
	// CHECK: [[PTR:%.]] = load double, double** %Ptr.addr, align 8			// CHECK32: [[PTR:%.]] = load double, double** %Ptr.addr, align 4
	// CHECK-NEXT: call <25 x double> @llvm.matrix.column.major.load.v25f64(double* align 8 [[PTR]], i64 5, i1 true, i32 5, i32 5)			// CHECK32-NEXT: call <25 x double> @llvm.matrix.column.major.load.v25f64.i32(double* align 4 [[PTR]], i32 5, i1 true, i32 5, i32 5)
				//
				// CHECK64: [[PTR:%.]] = load double, double** %Ptr.addr, align 8
				// CHECK64-NEXT: call <25 x double> @llvm.matrix.column.major.load.v25f64.i64(double* align 8 [[PTR]], i64 5, i1 true, i32 5, i32 5)

	dx5x5_t m_a1 = __builtin_matrix_column_major_load(Ptr, 5, 5, 5);			dx5x5_t m_a1 = __builtin_matrix_column_major_load(Ptr, 5, 5, 5);
	}			}

	void column_major_store_with_const_stride_double(double *Ptr) {			void column_major_store_with_const_stride_double(double *Ptr) {
	// CHECK-LABEL: define{{.}} void @column_major_store_with_const_stride_double(double %Ptr)			// COMMON-LABEL: define{{.}} void @column_major_store_with_const_stride_double(double %Ptr)
	// CHECK: [[M:%.]] = load <25 x double>, <25 x double> {{.*}}, align 8			// CHECK32: [[M:%.]] = load <25 x double>, <25 x double> {{.*}}, align 4
	// CHECK-NEXT: [[PTR:%.]] = load double, double** %Ptr.addr, align 8			// CHECK32-NEXT: [[PTR:%.]] = load double, double** %Ptr.addr, align 4
	// CHECK-NEXT: call void @llvm.matrix.column.major.store.v25f64(<25 x double> [[M]], double* align 8 [[PTR]], i64 5, i1 false, i32 5, i32 5)			// CHECK32-NEXT: call void @llvm.matrix.column.major.store.v25f64.i32(<25 x double> [[M]], double* align 4 [[PTR]], i32 5, i1 false, i32 5, i32 5)
				//
				// CHECK64: [[M:%.]] = load <25 x double>, <25 x double> {{.*}}, align 8
				// CHECK64-NEXT: [[PTR:%.]] = load double, double** %Ptr.addr, align 8
				// CHECK64-NEXT: call void @llvm.matrix.column.major.store.v25f64.i64(<25 x double> [[M]], double* align 8 [[PTR]], i64 5, i1 false, i32 5, i32 5)

	dx5x5_t m;			dx5x5_t m;
	__builtin_matrix_column_major_store(m, Ptr, 5);			__builtin_matrix_column_major_store(m, Ptr, 5);
	}			}

	void column_major_store_with_const_stride2_double(double *Ptr) {			void column_major_store_with_const_stride2_double(double *Ptr) {
	// CHECK-LABEL: define{{.}} void @column_major_store_with_const_stride2_double(double %Ptr)			// COMMON-LABEL: define{{.}} void @column_major_store_with_const_stride2_double(double %Ptr)
	// CHECK: [[M:%.]] = load <25 x double>, <25 x double> {{.*}}, align 8			// CHECK32: [[M:%.]] = load <25 x double>, <25 x double> {{.*}}, align 4
	// CHECK-NEXT: [[PTR:%.]] = load double, double** %Ptr.addr, align 8			// CHECK32-NEXT: [[PTR:%.]] = load double, double** %Ptr.addr, align 4
	// CHECK-NEXT: call void @llvm.matrix.column.major.store.v25f64(<25 x double> [[M]], double* align 8 [[PTR]], i64 15, i1 false, i32 5, i32 5)			// CHECK32-NEXT: call void @llvm.matrix.column.major.store.v25f64.i32(<25 x double> [[M]], double* align 4 [[PTR]], i32 15, i1 false, i32 5, i32 5)
				//
				// CHECK64: [[M:%.]] = load <25 x double>, <25 x double> {{.*}}, align 8
				// CHECK64-NEXT: [[PTR:%.]] = load double, double** %Ptr.addr, align 8
				// CHECK64-NEXT: call void @llvm.matrix.column.major.store.v25f64.i64(<25 x double> [[M]], double* align 8 [[PTR]], i64 15, i1 false, i32 5, i32 5)
	//			//
	dx5x5_t m;			dx5x5_t m;
	__builtin_matrix_column_major_store(m, Ptr, 2 * 3 + 9);			__builtin_matrix_column_major_store(m, Ptr, 2 * 3 + 9);
	}			}

	void column_major_store_with_stride_math_int(int *Ptr, int S) {			void column_major_store_with_stride_math_int(int *Ptr, int S) {
	// CHECK-LABEL: define{{.}} void @column_major_store_with_stride_math_int(i32 %Ptr, i32 %S)			// COMMON-LABEL: define{{.}} void @column_major_store_with_stride_math_int(i32 %Ptr, i32 %S)
	// CHECK: [[M:%.]] = load <80 x i32>, <80 x i32> {{.*}}, align 4			// COMMON: [[M:%.]] = load <80 x i32>, <80 x i32> {{.*}}, align 4
	// CHECK-NEXT: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8			// CHECK32-NEXT: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 4
	// CHECK-NEXT: [[S:%.]] = load i32, i32 %S.addr, align 4			// CHECK64-NEXT: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8
	// CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[S]], 32			// COMMON-NEXT: [[S:%.]] = load i32, i32 %S.addr, align 4
	// CHECK-NEXT: [[IDX:%.*]] = sext i32 [[ADD]] to i64			// COMMON-NEXT: [[ADD:%.*]] = add nsw i32 [[S]], 32
	// CHECK-NEXT: call void @llvm.matrix.column.major.store.v80i32(<80 x i32> [[M]], i32* align 4 [[PTR]], i64 [[IDX]], i1 false, i32 4, i32 20)			// CHECK32-NEXT: call void @llvm.matrix.column.major.store.v80i32.i32(<80 x i32> [[M]], i32* align 4 [[PTR]], i32 [[ADD]], i1 false, i32 4, i32 20)
				//
				// CHECK64-NEXT: [[IDX:%.*]] = sext i32 [[ADD]] to i64
				// CHECK64-NEXT: call void @llvm.matrix.column.major.store.v80i32.i64(<80 x i32> [[M]], i32* align 4 [[PTR]], i64 [[IDX]], i1 false, i32 4, i32 20)

	ix4x20_t m;			ix4x20_t m;
	__builtin_matrix_column_major_store(m, Ptr, S + 32);			__builtin_matrix_column_major_store(m, Ptr, S + 32);
	}			}

	void column_major_store_with_stride_math_s_int(int *Ptr, short S) {			void column_major_store_with_stride_math_s_int(int *Ptr, short S) {
	// CHECK-LABEL: define{{.}} void @column_major_store_with_stride_math_s_int(i32 %Ptr, i16 signext %S)			// COMMON-LABEL: define{{.}} void @column_major_store_with_stride_math_s_int(i32 %Ptr, i16 signext %S)
	// CHECK: [[M:%.]] = load <80 x i32>, <80 x i32> {{.*}}, align 4			// COMMON: [[M:%.]] = load <80 x i32>, <80 x i32> {{.*}}, align 4
	// CHECK-NEXT: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8			// CHECK32-NEXT: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 4
	// CHECK-NEXT: [[S:%.]] = load i16, i16 %S.addr, align 2			// CHECK64-NEXT: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8
	// CHECK-NEXT: [[EXT:%.*]] = sext i16 [[S]] to i32			// COMMON-NEXT: [[S:%.]] = load i16, i16 %S.addr, align 2
	// CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[EXT]], 2			// COMMON-NEXT: [[EXT:%.*]] = sext i16 [[S]] to i32
	// CHECK-NEXT: [[IDX:%.*]] = sext i32 [[ADD]] to i64			// COMMON-NEXT: [[ADD:%.*]] = add nsw i32 [[EXT]], 2
	// CHECK-NEXT: call void @llvm.matrix.column.major.store.v80i32(<80 x i32> [[M]], i32* align 4 [[PTR]], i64 [[IDX]], i1 false, i32 4, i32 20)			// CHECK32-NEXT: call void @llvm.matrix.column.major.store.v80i32.i32(<80 x i32> [[M]], i32* align 4 [[PTR]], i32 [[ADD]], i1 false, i32 4, i32 20)
				//
				// CHECK64-NEXT: [[IDX:%.*]] = sext i32 [[ADD]] to i64
				// CHECK64-NEXT: call void @llvm.matrix.column.major.store.v80i32.i64(<80 x i32> [[M]], i32* align 4 [[PTR]], i64 [[IDX]], i1 false, i32 4, i32 20)

	ix4x20_t m;			ix4x20_t m;
	__builtin_matrix_column_major_store(m, Ptr, S + 2);			__builtin_matrix_column_major_store(m, Ptr, S + 2);
	}			}

	void column_major_store_array1(double Ptr[25]) {			void column_major_store_array1(double Ptr[25]) {
	// CHECK-LABEL: define{{.}} void @column_major_store_array1(double %Ptr)			// COMMON-LABEL: define{{.}} void @column_major_store_array1(double %Ptr)
	// CHECK: [[M:%.]] = load <25 x double>, <25 x double> {{.*}}, align 8			// CHECK32: [[M:%.]] = load <25 x double>, <25 x double> {{.*}}, align 4
	// CHECK-NEXT: [[PTR:%.]] = load double, double** %Ptr.addr, align 8			// CHECK32-NEXT: [[PTR:%.]] = load double, double** %Ptr.addr, align 4
	// CHECK-NEXT: call void @llvm.matrix.column.major.store.v25f64(<25 x double> [[M]], double* align 8 [[PTR]], i64 5, i1 false, i32 5, i32 5)			// CHECK32-NEXT: call void @llvm.matrix.column.major.store.v25f64.i32(<25 x double> [[M]], double* align 4 [[PTR]], i32 5, i1 false, i32 5, i32 5)
				//
				// CHECK64: [[M:%.]] = load <25 x double>, <25 x double> {{.*}}, align 8
				// CHECK64-NEXT: [[PTR:%.]] = load double, double** %Ptr.addr, align 8
				// CHECK64-NEXT: call void @llvm.matrix.column.major.store.v25f64.i64(<25 x double> [[M]], double* align 8 [[PTR]], i64 5, i1 false, i32 5, i32 5)

	dx5x5_t m;			dx5x5_t m;
	__builtin_matrix_column_major_store(m, Ptr, 5);			__builtin_matrix_column_major_store(m, Ptr, 5);
	}			}

	void column_major_store_array2() {			void column_major_store_array2() {
	// CHECK-LABEL: define{{.*}} void @column_major_store_array2()			// COMMON-LABEL: define{{.*}} void @column_major_store_array2()
	// CHECK: [[M:%.]] = load <25 x double>, <25 x double> {{.*}}, align 8			// CHECK32: [[M:%.]] = load <25 x double>, <25 x double> {{.*}}, align 4
	// CHECK-NEXT: [[PTR:%.]] = getelementptr inbounds [25 x double], [25 x double] %Ptr, i64 0, i64 0			// CHECK32-NEXT: [[PTR:%.]] = getelementptr inbounds [25 x double], [25 x double] %Ptr, i32 0, i32 0
	// CHECK-NEXT: call void @llvm.matrix.column.major.store.v25f64(<25 x double> [[M]], double* align 16 [[PTR]], i64 5, i1 false, i32 5, i32 5)			// CHECK32-NEXT: call void @llvm.matrix.column.major.store.v25f64.i32(<25 x double> [[M]], double* align 8 [[PTR]], i32 5, i1 false, i32 5, i32 5)
				//
				// CHECK64: [[M:%.]] = load <25 x double>, <25 x double> {{.*}}, align 8
				// CHECK64-NEXT: [[PTR:%.]] = getelementptr inbounds [25 x double], [25 x double] %Ptr, i64 0, i64 0
				// CHECK64-NEXT: call void @llvm.matrix.column.major.store.v25f64.i64(<25 x double> [[M]], double* align 16 [[PTR]], i64 5, i1 false, i32 5, i32 5)

	double Ptr[25];			double Ptr[25];
	dx5x5_t m;			dx5x5_t m;
	__builtin_matrix_column_major_store(m, Ptr, 5);			__builtin_matrix_column_major_store(m, Ptr, 5);
	}			}

	void column_major_store_volatile(volatile double *Ptr) {			void column_major_store_volatile(volatile double *Ptr) {
	// CHECK-LABEL: define{{.}} void @column_major_store_volatile(double %Ptr) #0 {			// COMMON-LABEL: define{{.}} void @column_major_store_volatile(double %Ptr) #0 {
	// CHECK: [[M:%.]] = load <25 x double>, <25 x double> {{.*}}, align 8			// CHECK32: [[M:%.]] = load <25 x double>, <25 x double> {{.*}}, align 4
	// CHECK-NEXT: [[PTR:%.]] = load double, double** %Ptr.addr, align 8			// CHECK32-NEXT: [[PTR:%.]] = load double, double** %Ptr.addr, align 4
	// CHECK-NEXT: call void @llvm.matrix.column.major.store.v25f64(<25 x double> [[M]], double* align 8 [[PTR]], i64 5, i1 true, i32 5, i32 5)			// CHECK32-NEXT: call void @llvm.matrix.column.major.store.v25f64.i32(<25 x double> [[M]], double* align 4 [[PTR]], i32 5, i1 true, i32 5, i32 5)
				//
				// CHECK64: [[M:%.]] = load <25 x double>, <25 x double> {{.*}}, align 8
				// CHECK64-NEXT: [[PTR:%.]] = load double, double** %Ptr.addr, align 8
				// CHECK64-NEXT: call void @llvm.matrix.column.major.store.v25f64.i64(<25 x double> [[M]], double* align 8 [[PTR]], i64 5, i1 true, i32 5, i32 5)

	dx5x5_t m;			dx5x5_t m;
	__builtin_matrix_column_major_store(m, Ptr, 5);			__builtin_matrix_column_major_store(m, Ptr, 5);
	}			}

clang/test/CodeGenCXX/matrix-type-builtins.cpp

	Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines

	void test_column_major_load_with_stride_template_double(double *Ptr) {			void test_column_major_load_with_stride_template_double(double *Ptr) {
	// CHECK-LABEL: define{{.}} void @_Z50test_column_major_load_with_stride_template_doublePd(double %Ptr)			// CHECK-LABEL: define{{.}} void @_Z50test_column_major_load_with_stride_template_doublePd(double %Ptr)
	// CHECK: [[PTR:%.]] = load double, double** %Ptr.addr, align 8			// CHECK: [[PTR:%.]] = load double, double** %Ptr.addr, align 8
	// CHECK-NEXT: call <40 x double> @_Z29column_major_load_with_strideIdLj10ELj4ELj15EEu11matrix_typeIXT0_EXT1_ET_EPS0_(double* [[PTR]])			// CHECK-NEXT: call <40 x double> @_Z29column_major_load_with_strideIdLj10ELj4ELj15EEu11matrix_typeIXT0_EXT1_ET_EPS0_(double* [[PTR]])

	// CHECK-LABEL: define linkonce_odr <40 x double> @_Z29column_major_load_with_strideIdLj10ELj4ELj15EEu11matrix_typeIXT0_EXT1_ET_EPS0_(double* %Ptr)			// CHECK-LABEL: define linkonce_odr <40 x double> @_Z29column_major_load_with_strideIdLj10ELj4ELj15EEu11matrix_typeIXT0_EXT1_ET_EPS0_(double* %Ptr)
	// CHECK: [[PTR:%.]] = load double, double** %Ptr.addr, align 8			// CHECK: [[PTR:%.]] = load double, double** %Ptr.addr, align 8
	// CHECK-NEXT: call <40 x double> @llvm.matrix.column.major.load.v40f64(double* align 8 [[PTR]], i64 15, i1 false, i32 10, i32 4)			// CHECK-NEXT: call <40 x double> @llvm.matrix.column.major.load.v40f64.i64(double* align 8 [[PTR]], i64 15, i1 false, i32 10, i32 4)

	matrix_t<double, 10, 4> M1 = column_major_load_with_stride<double, 10, 4, 15>(Ptr);			matrix_t<double, 10, 4> M1 = column_major_load_with_stride<double, 10, 4, 15>(Ptr);
	}			}

	void test_column_major_load_with_stride_template_int(int *Ptr) {			void test_column_major_load_with_stride_template_int(int *Ptr) {
	// CHECK-LABEL: define{{.}} void @_Z47test_column_major_load_with_stride_template_intPi(i32 %Ptr) #5 {			// CHECK-LABEL: define{{.}} void @_Z47test_column_major_load_with_stride_template_intPi(i32 %Ptr) #5 {
	// CHECK: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8			// CHECK: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8
	// CHECK-NEXT: call <6 x i32> @_Z29column_major_load_with_strideIiLj3ELj2ELj12EEu11matrix_typeIXT0_EXT1_ET_EPS0_(i32* [[PTR]])			// CHECK-NEXT: call <6 x i32> @_Z29column_major_load_with_strideIiLj3ELj2ELj12EEu11matrix_typeIXT0_EXT1_ET_EPS0_(i32* [[PTR]])

	// CHECK-LABEL: define linkonce_odr <6 x i32> @_Z29column_major_load_with_strideIiLj3ELj2ELj12EEu11matrix_typeIXT0_EXT1_ET_EPS0_(i32* %Ptr)			// CHECK-LABEL: define linkonce_odr <6 x i32> @_Z29column_major_load_with_strideIiLj3ELj2ELj12EEu11matrix_typeIXT0_EXT1_ET_EPS0_(i32* %Ptr)
	// CHECK: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8			// CHECK: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8
	// CHECK-NEXT: call <6 x i32> @llvm.matrix.column.major.load.v6i32(i32* align 4 [[PTR]], i64 12, i1 false, i32 3, i32 2)			// CHECK-NEXT: call <6 x i32> @llvm.matrix.column.major.load.v6i32.i64(i32* align 4 [[PTR]], i64 12, i1 false, i32 3, i32 2)

	matrix_t<int, 3, 2> M1 = column_major_load_with_stride<int, 3, 2, 12>(Ptr);			matrix_t<int, 3, 2> M1 = column_major_load_with_stride<int, 3, 2, 12>(Ptr);
	}			}

	struct UnsignedWrapper {			struct UnsignedWrapper {
	char x;			char x;
	operator unsigned() {			operator unsigned() {
	return x;			return x;
	}			}
	};			};

	void test_column_major_load_stride_wrapper(int *Ptr, UnsignedWrapper &W) {			void test_column_major_load_stride_wrapper(int *Ptr, UnsignedWrapper &W) {
	// CHECK-LABEL: define{{.}} void @_Z37test_column_major_load_stride_wrapperPiR15UnsignedWrapper(i32 %Ptr, %struct.UnsignedWrapper* nonnull align 1 dereferenceable(1) %W)			// CHECK-LABEL: define{{.}} void @_Z37test_column_major_load_stride_wrapperPiR15UnsignedWrapper(i32 %Ptr, %struct.UnsignedWrapper* nonnull align 1 dereferenceable(1) %W)
	// CHECK: [[W:%.]] = load %struct.UnsignedWrapper, %struct.UnsignedWrapper** %W.addr, align 8			// CHECK: [[W:%.]] = load %struct.UnsignedWrapper, %struct.UnsignedWrapper** %W.addr, align 8
	// CHECK-NEXT: [[STRIDE:%.]] = call i32 @_ZN15UnsignedWrappercvjEv(%struct.UnsignedWrapper {{[^,]*}} [[W]])			// CHECK-NEXT: [[STRIDE:%.]] = call i32 @_ZN15UnsignedWrappercvjEv(%struct.UnsignedWrapper {{[^,]*}} [[W]])
	// CHECK-NEXT: [[STRIDE_EXT:%.*]] = zext i32 [[STRIDE]] to i64			// CHECK-NEXT: [[STRIDE_EXT:%.*]] = zext i32 [[STRIDE]] to i64
	// CHECK-NEXT: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8			// CHECK-NEXT: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8
	// CHECK-NEXT: call <4 x i32> @llvm.matrix.column.major.load.v4i32(i32* align 4 [[PTR]], i64 [[STRIDE_EXT]], i1 false, i32 2, i32 2)			// CHECK-NEXT: call <4 x i32> @llvm.matrix.column.major.load.v4i32.i64(i32* align 4 [[PTR]], i64 [[STRIDE_EXT]], i1 false, i32 2, i32 2)
	matrix_t<int, 2, 2> M1 = __builtin_matrix_column_major_load(Ptr, 2, 2, W);			matrix_t<int, 2, 2> M1 = __builtin_matrix_column_major_load(Ptr, 2, 2, W);
	}			}

	constexpr int constexpr3() { return 3; }			constexpr int constexpr3() { return 3; }

	void test_column_major_load_constexpr_num_rows(int *Ptr) {			void test_column_major_load_constexpr_num_rows(int *Ptr) {
	// CHECK-LABEL: define{{.}} void @_Z41test_column_major_load_constexpr_num_rowsPi(i32 %Ptr)			// CHECK-LABEL: define{{.}} void @_Z41test_column_major_load_constexpr_num_rowsPi(i32 %Ptr)
	// CHECK: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8			// CHECK: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8
	// CHECK-NEXT: call <6 x i32> @llvm.matrix.column.major.load.v6i32(i32* align 4 [[PTR]], i64 3, i1 false, i32 3, i32 2)			// CHECK-NEXT: call <6 x i32> @llvm.matrix.column.major.load.v6i32.i64(i32* align 4 [[PTR]], i64 3, i1 false, i32 3, i32 2)

	matrix_t<int, 3, 2> M1 = __builtin_matrix_column_major_load(Ptr, constexpr3(), 2, 3);			matrix_t<int, 3, 2> M1 = __builtin_matrix_column_major_load(Ptr, constexpr3(), 2, 3);
	}			}

	constexpr int constexpr1() { return 1; }			constexpr int constexpr1() { return 1; }

	void test_column_major_load_constexpr_num_columns(int *Ptr) {			void test_column_major_load_constexpr_num_columns(int *Ptr) {
	// CHECK-LABEL: define{{.}} void @_Z44test_column_major_load_constexpr_num_columnsPi(i32 %Ptr)			// CHECK-LABEL: define{{.}} void @_Z44test_column_major_load_constexpr_num_columnsPi(i32 %Ptr)
	// CHECK: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8			// CHECK: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8
	// CHECK-NEXT: call <2 x i32> @llvm.matrix.column.major.load.v2i32(i32* align 4 [[PTR]], i64 3, i1 false, i32 2, i32 1)			// CHECK-NEXT: call <2 x i32> @llvm.matrix.column.major.load.v2i32.i64(i32* align 4 [[PTR]], i64 3, i1 false, i32 2, i32 1)
	matrix_t<int, 2, 1> M1 = __builtin_matrix_column_major_load(Ptr, 2, constexpr1(), 3);			matrix_t<int, 2, 1> M1 = __builtin_matrix_column_major_load(Ptr, 2, constexpr1(), 3);
	}			}

	template <unsigned N>			template <unsigned N>
	constexpr int constexpr_plus1() { return N + 1; }			constexpr int constexpr_plus1() { return N + 1; }

	void test_column_major_load_constexpr_num_columns_temp(int *Ptr) {			void test_column_major_load_constexpr_num_columns_temp(int *Ptr) {
	// CHECK-LABEL: define{{.}} void @_Z49test_column_major_load_constexpr_num_columns_tempPi(i32 %Ptr)			// CHECK-LABEL: define{{.}} void @_Z49test_column_major_load_constexpr_num_columns_tempPi(i32 %Ptr)
	// CHECK: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8			// CHECK: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8
	// CHECK-NEXT: call <10 x i32> @llvm.matrix.column.major.load.v10i32(i32* align 4 [[PTR]], i64 3, i1 false, i32 2, i32 5)			// CHECK-NEXT: call <10 x i32> @llvm.matrix.column.major.load.v10i32.i64(i32* align 4 [[PTR]], i64 3, i1 false, i32 2, i32 5)
	matrix_t<int, 2, 5> M1 = __builtin_matrix_column_major_load(Ptr, 2, constexpr_plus1<4>(), 3);			matrix_t<int, 2, 5> M1 = __builtin_matrix_column_major_load(Ptr, 2, constexpr_plus1<4>(), 3);
	}			}

	void test_column_major_load_constexpr_stride_constexpr(int *Ptr) {			void test_column_major_load_constexpr_stride_constexpr(int *Ptr) {
	// CHECK-LABEL: define{{.}} void @_Z49test_column_major_load_constexpr_stride_constexprPi(i32 %Ptr)			// CHECK-LABEL: define{{.}} void @_Z49test_column_major_load_constexpr_stride_constexprPi(i32 %Ptr)
	// CHECK: [[STRIDE:%.*]] = call i32 @_Z10constexpr3v()			// CHECK: [[STRIDE:%.*]] = call i32 @_Z10constexpr3v()
	// CHECK-NEXT: [[STRIDE_EXT:%.*]] = sext i32 [[STRIDE]] to i64			// CHECK-NEXT: [[STRIDE_EXT:%.*]] = sext i32 [[STRIDE]] to i64
	// CHECK-NEXT: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8			// CHECK-NEXT: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8
	// CHECK-NEXT: call <4 x i32> @llvm.matrix.column.major.load.v4i32(i32* align 4 [[PTR]], i64 [[STRIDE_EXT]], i1 false, i32 2, i32 2)			// CHECK-NEXT: call <4 x i32> @llvm.matrix.column.major.load.v4i32.i64(i32* align 4 [[PTR]], i64 [[STRIDE_EXT]], i1 false, i32 2, i32 2)

	matrix_t<int, 2, 2> M1 = __builtin_matrix_column_major_load(Ptr, 2, 2, constexpr3());			matrix_t<int, 2, 2> M1 = __builtin_matrix_column_major_load(Ptr, 2, 2, constexpr3());
	}			}

	template <typename T>			template <typename T>
	struct remove_pointer {			struct remove_pointer {
	typedef T type;			typedef T type;
	};			};
	Show All 21 Lines
	void test_column_major_store_with_stride_template_double(double *Ptr) {			void test_column_major_store_with_stride_template_double(double *Ptr) {
	// CHECK-LABEL: define{{.}} void @_Z51test_column_major_store_with_stride_template_doublePd(double %Ptr)			// CHECK-LABEL: define{{.}} void @_Z51test_column_major_store_with_stride_template_doublePd(double %Ptr)
	// CHECK: [[PTR:%.]] = load double, double** %Ptr.addr, align 8			// CHECK: [[PTR:%.]] = load double, double** %Ptr.addr, align 8
	// CHECK-NEXT: call void @_Z30column_major_store_with_strideIdLj10ELj4ELj15EEvRu11matrix_typeIXT0_EXT1_ET_EPS0_([40 x double]* nonnull align 8 dereferenceable(320) %M1, double* [[PTR]])			// CHECK-NEXT: call void @_Z30column_major_store_with_strideIdLj10ELj4ELj15EEvRu11matrix_typeIXT0_EXT1_ET_EPS0_([40 x double]* nonnull align 8 dereferenceable(320) %M1, double* [[PTR]])

	// CHECK-LABEL: define linkonce_odr void @_Z30column_major_store_with_strideIdLj10ELj4ELj15EEvRu11matrix_typeIXT0_EXT1_ET_EPS0_([40 x double]* nonnull align 8 dereferenceable(320) %m, double* %Ptr)			// CHECK-LABEL: define linkonce_odr void @_Z30column_major_store_with_strideIdLj10ELj4ELj15EEvRu11matrix_typeIXT0_EXT1_ET_EPS0_([40 x double]* nonnull align 8 dereferenceable(320) %m, double* %Ptr)
	// CHECK: [[M:%.]] = load <40 x double>, <40 x double> {{.*}}, align 8			// CHECK: [[M:%.]] = load <40 x double>, <40 x double> {{.*}}, align 8
	// CHECK-NEXT: [[PTR:%.]] = load double, double** %Ptr.addr, align 8			// CHECK-NEXT: [[PTR:%.]] = load double, double** %Ptr.addr, align 8
	// CHECK-NEXT: call void @llvm.matrix.column.major.store.v40f64(<40 x double> [[M]], double* align 8 [[PTR]], i64 15, i1 false, i32 10, i32 4)			// CHECK-NEXT: call void @llvm.matrix.column.major.store.v40f64.i64(<40 x double> [[M]], double* align 8 [[PTR]], i64 15, i1 false, i32 10, i32 4)

	matrix_t<double, 10, 4> M1;			matrix_t<double, 10, 4> M1;
	column_major_store_with_stride<double, 10, 4, 15>(M1, Ptr);			column_major_store_with_stride<double, 10, 4, 15>(M1, Ptr);
	}			}

	void test_column_major_store_with_stride_template_int(int *Ptr) {			void test_column_major_store_with_stride_template_int(int *Ptr) {
	// CHECK-LABEL: define{{.}} void @_Z48test_column_major_store_with_stride_template_intPi(i32 %Ptr)			// CHECK-LABEL: define{{.}} void @_Z48test_column_major_store_with_stride_template_intPi(i32 %Ptr)
	// CHECK: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8			// CHECK: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8
	// CHECK-NEXT: call void @_Z30column_major_store_with_strideIiLj3ELj2ELj3EEvRu11matrix_typeIXT0_EXT1_ET_EPS0_([6 x i32]* nonnull align 4 dereferenceable(24) %M1, i32* [[PTR]])			// CHECK-NEXT: call void @_Z30column_major_store_with_strideIiLj3ELj2ELj3EEvRu11matrix_typeIXT0_EXT1_ET_EPS0_([6 x i32]* nonnull align 4 dereferenceable(24) %M1, i32* [[PTR]])

	// CHECK-LABEL: define linkonce_odr void @_Z30column_major_store_with_strideIiLj3ELj2ELj3EEvRu11matrix_typeIXT0_EXT1_ET_EPS0_([6 x i32]* nonnull align 4 dereferenceable(24) %m, i32* %Ptr)			// CHECK-LABEL: define linkonce_odr void @_Z30column_major_store_with_strideIiLj3ELj2ELj3EEvRu11matrix_typeIXT0_EXT1_ET_EPS0_([6 x i32]* nonnull align 4 dereferenceable(24) %m, i32* %Ptr)
	// CHECK: [[M:%.]] = load <6 x i32>, <6 x i32> {{.*}}, align 4			// CHECK: [[M:%.]] = load <6 x i32>, <6 x i32> {{.*}}, align 4
	// CHECK-NEXT: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8			// CHECK-NEXT: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8
	// CHECK-NEXT: call void @llvm.matrix.column.major.store.v6i32(<6 x i32> [[M]], i32* align 4 [[PTR]], i64 3, i1 false, i32 3, i32 2)			// CHECK-NEXT: call void @llvm.matrix.column.major.store.v6i32.i64(<6 x i32> [[M]], i32* align 4 [[PTR]], i64 3, i1 false, i32 3, i32 2)

	matrix_t<int, 3, 2> M1;			matrix_t<int, 3, 2> M1;
	column_major_store_with_stride<int, 3, 2, 3>(M1, Ptr);			column_major_store_with_stride<int, 3, 2, 3>(M1, Ptr);
	}			}

	void test_column_major_store_stride_wrapper(int *Ptr, UnsignedWrapper &W) {			void test_column_major_store_stride_wrapper(int *Ptr, UnsignedWrapper &W) {
	// CHECK-LABEL: define{{.}} void @_Z38test_column_major_store_stride_wrapperPiR15UnsignedWrapper(i32 %Ptr, %struct.UnsignedWrapper* nonnull align 1 dereferenceable(1) %W)			// CHECK-LABEL: define{{.}} void @_Z38test_column_major_store_stride_wrapperPiR15UnsignedWrapper(i32 %Ptr, %struct.UnsignedWrapper* nonnull align 1 dereferenceable(1) %W)
	// CHECK: [[M:%.]] = load <4 x i32>, <4 x i32> {{.*}}, align 4			// CHECK: [[M:%.]] = load <4 x i32>, <4 x i32> {{.*}}, align 4
	// CHECK-NEXT: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8			// CHECK-NEXT: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8
	// CHECK-NEXT: [[W:%.]] = load %struct.UnsignedWrapper, %struct.UnsignedWrapper** %W.addr, align 8			// CHECK-NEXT: [[W:%.]] = load %struct.UnsignedWrapper, %struct.UnsignedWrapper** %W.addr, align 8
	// CHECK-NEXT: [[IDX:%.]] = call i32 @_ZN15UnsignedWrappercvjEv(%struct.UnsignedWrapper {{[^,]*}} [[W]])			// CHECK-NEXT: [[IDX:%.]] = call i32 @_ZN15UnsignedWrappercvjEv(%struct.UnsignedWrapper {{[^,]*}} [[W]])
	// CHECK-NEXT: [[IDX_EXT:%.*]] = zext i32 [[IDX]] to i64			// CHECK-NEXT: [[IDX_EXT:%.*]] = zext i32 [[IDX]] to i64
	// CHECK-NEXT: call void @llvm.matrix.column.major.store.v4i32(<4 x i32> [[M]], i32* align 4 [[PTR]], i64 [[IDX_EXT]], i1 false, i32 2, i32 2)			// CHECK-NEXT: call void @llvm.matrix.column.major.store.v4i32.i64(<4 x i32> [[M]], i32* align 4 [[PTR]], i64 [[IDX_EXT]], i1 false, i32 2, i32 2)

	matrix_t<int, 2, 2> M1;			matrix_t<int, 2, 2> M1;
	__builtin_matrix_column_major_store(M1, Ptr, W);			__builtin_matrix_column_major_store(M1, Ptr, W);
	}			}

	void test_column_major_store_constexpr_stride_constexpr(int *Ptr) {			void test_column_major_store_constexpr_stride_constexpr(int *Ptr) {
	// CHECK-LABEL: define{{.}} void @_Z50test_column_major_store_constexpr_stride_constexprPi(i32 %Ptr)			// CHECK-LABEL: define{{.}} void @_Z50test_column_major_store_constexpr_stride_constexprPi(i32 %Ptr)
	// CHECK: [[M:%.]] = load <4 x i32>, <4 x i32> %0, align 4			// CHECK: [[M:%.]] = load <4 x i32>, <4 x i32> %0, align 4
	// CHECK-NEXT: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8			// CHECK-NEXT: [[PTR:%.]] = load i32, i32** %Ptr.addr, align 8
	// CHECK-NEXT: [[IDX:%.*]] = call i32 @_Z10constexpr3v()			// CHECK-NEXT: [[IDX:%.*]] = call i32 @_Z10constexpr3v()
	// CHECK-NEXT: [[IDX_EXT:%.*]] = sext i32 [[IDX]] to i64			// CHECK-NEXT: [[IDX_EXT:%.*]] = sext i32 [[IDX]] to i64
	// CHECK-NEXT: call void @llvm.matrix.column.major.store.v4i32(<4 x i32> [[M]], i32* align 4 [[PTR]], i64 [[IDX_EXT]], i1 false, i32 2, i32 2)			// CHECK-NEXT: call void @llvm.matrix.column.major.store.v4i32.i64(<4 x i32> [[M]], i32* align 4 [[PTR]], i64 [[IDX_EXT]], i1 false, i32 2, i32 2)

	matrix_t<int, 2, 2> M;			matrix_t<int, 2, 2> M;
	__builtin_matrix_column_major_store(M, Ptr, constexpr3());			__builtin_matrix_column_major_store(M, Ptr, constexpr3());
	}			}

clang/test/CodeGenObjC/matrix-type-builtins.m

	Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	@property int value;			@property int value;
	@end			@end

	void test_column_major_load(PtrValue Ptr, IntValue Stride) {			void test_column_major_load(PtrValue Ptr, IntValue Stride) {
	// CHECK-LABEL: define{{.}} void @test_column_major_load(%2 %Ptr, %3* %Stride) #4 {			// CHECK-LABEL: define{{.}} void @test_column_major_load(%2 %Ptr, %3* %Stride) #4 {
	// CHECK: [[STRIDE:%.]] = call i32 bitcast (i8 (i8, i8, ...)* @objc_msgSend to i32 (i8, i8)*)			// CHECK: [[STRIDE:%.]] = call i32 bitcast (i8 (i8, i8, ...)* @objc_msgSend to i32 (i8, i8)*)
	// CHECK-NEXT: [[STRIDE_EXT:%.*]] = sext i32 [[STRIDE]] to i64			// CHECK-NEXT: [[STRIDE_EXT:%.*]] = sext i32 [[STRIDE]] to i64
	// CHECK: [[PTR:%.]] = call i32 bitcast (i8* (i8, i8, ...)* @objc_msgSend to i32* (i8, i8)*)			// CHECK: [[PTR:%.]] = call i32 bitcast (i8* (i8, i8, ...)* @objc_msgSend to i32* (i8, i8)*)
	// CHECK-NEXT: call <12 x i32> @llvm.matrix.column.major.load.v12i32(i32* align 4 [[PTR]], i64 [[STRIDE_EXT]], i1 false, i32 3, i32 4)			// CHECK-NEXT: call <12 x i32> @llvm.matrix.column.major.load.v12i32.i64(i32* align 4 [[PTR]], i64 [[STRIDE_EXT]], i1 false, i32 3, i32 4)

	u3x4 m = __builtin_matrix_column_major_load(Ptr.value, 3, 4, Stride.value);			u3x4 m = __builtin_matrix_column_major_load(Ptr.value, 3, 4, Stride.value);
	}			}

	void test_column_major_store(UnsignedMatrixValue M, PtrValue Ptr, IntValue *Stride) {			void test_column_major_store(UnsignedMatrixValue M, PtrValue Ptr, IntValue *Stride) {
	// CHECK-LABEL: define{{.}} void @test_column_major_store(%1 %M, %2* %Ptr, %3* %Stride) #3 {			// CHECK-LABEL: define{{.}} void @test_column_major_store(%1 %M, %2* %Ptr, %3* %Stride) #3 {
	// CHECK: [[M:%.]] = call <12 x i32> bitcast (i8 (i8, i8, ...)* @objc_msgSend to <12 x i32> (i8, i8)*)			// CHECK: [[M:%.]] = call <12 x i32> bitcast (i8 (i8, i8, ...)* @objc_msgSend to <12 x i32> (i8, i8)*)
	// CHECK: [[PTR:%.]] = call i32 bitcast (i8* (i8, i8, ...)* @objc_msgSend to i32* (i8, i8)*)			// CHECK: [[PTR:%.]] = call i32 bitcast (i8* (i8, i8, ...)* @objc_msgSend to i32* (i8, i8)*)
	// CHECK: [[IDX:%.]] = call i32 bitcast (i8 (i8, i8, ...)* @objc_msgSend to i32 (i8, i8)*)			// CHECK: [[IDX:%.]] = call i32 bitcast (i8 (i8, i8, ...)* @objc_msgSend to i32 (i8, i8)*)
	// CHECK-NEXT: [[IDX_EXT:%.*]] = sext i32 [[IDX]] to i64			// CHECK-NEXT: [[IDX_EXT:%.*]] = sext i32 [[IDX]] to i64
	// CHECK-NEXT: call void @llvm.matrix.column.major.store.v12i32(<12 x i32> [[M]], i32* align 4 [[PTR]], i64 [[IDX_EXT]], i1 false, i32 3, i32 4)			// CHECK-NEXT: call void @llvm.matrix.column.major.store.v12i32.i64(<12 x i32> [[M]], i32* align 4 [[PTR]], i64 [[IDX_EXT]], i1 false, i32 3, i32 4)

	__builtin_matrix_column_major_store(M.value, Ptr.value, Stride.value);			__builtin_matrix_column_major_store(M.value, Ptr.value, Stride.value);
	}			}

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 17,246 Lines • ▼ Show 20 Lines	::
declare vectorty @llvm.matrix.column.major.load.*(		declare vectorty @llvm.matrix.column.major.load.*(
ptrty %Ptr, i64 %Stride, i1 <IsVolatile>, i32 <Rows>, i32 <Cols>)		ptrty %Ptr, i64 %Stride, i1 <IsVolatile>, i32 <Rows>, i32 <Cols>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.matrix.column.major.load.*``' intrinsics load a ``<Rows> x <Cols>``		The '``llvm.matrix.column.major.load.*``' intrinsics load a ``<Rows> x <Cols>``
matrix using a stride of ``%Stride`` to compute the start address of the		matrix using a stride of ``%Stride`` to compute the start address of the
different columns. This allows for convenient loading of sub matrixes. If		different columns. The offset is computed using ``%Stride``'s bitwidth. This
``<IsVolatile>`` is true, the intrinsic is considered a :ref:`volatile memory		allows for convenient loading of sub matrixes. If ``<IsVolatile>`` is true, the
access <volatile>`. The result matrix is returned in the result vector. If the		intrinsic is considered a :ref:`volatile memory access <volatile>`. The result
``%Ptr`` argument is known to be aligned to some boundary, this can be		matrix is returned in the result vector. If the ``%Ptr`` argument is known to
specified as an attribute on the argument.		be aligned to some boundary, this can be specified as an attribute on the
		argument.

Arguments:		Arguments:
""""""""""		""""""""""

The first argument ``%Ptr`` is a pointer type to the returned vector type, and		The first argument ``%Ptr`` is a pointer type to the returned vector type, and
corresponds to the start address to load from. The second argument ``%Stride``		corresponds to the start address to load from. The second argument ``%Stride``
is a positive, constant integer with ``%Stride >= <Rows>``. ``%Stride`` is used		is a positive, constant integer with ``%Stride >= <Rows>``. ``%Stride`` is used
to compute the column memory addresses. I.e., for a column ``C``, its start		to compute the column memory addresses. I.e., for a column ``C``, its start
Show All 18 Lines	::
declare void @llvm.matrix.column.major.store.*(		declare void @llvm.matrix.column.major.store.*(
vectorty %In, ptrty %Ptr, i64 %Stride, i1 <IsVolatile>, i32 <Rows>, i32 <Cols>)		vectorty %In, ptrty %Ptr, i64 %Stride, i1 <IsVolatile>, i32 <Rows>, i32 <Cols>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.matrix.column.major.store.*``' intrinsics store the ``<Rows> x		The '``llvm.matrix.column.major.store.*``' intrinsics store the ``<Rows> x
<Cols>`` matrix in ``%In`` to memory using a stride of ``%Stride`` between		<Cols>`` matrix in ``%In`` to memory using a stride of ``%Stride`` between
columns. If ``<IsVolatile>`` is true, the intrinsic is considered a		columns. The offset is computed using ``%Stride``'s bitwidth. If
		``<IsVolatile>`` is true, the intrinsic is considered a
:ref:`volatile memory access <volatile>`.		:ref:`volatile memory access <volatile>`.

If the ``%Ptr`` argument is known to be aligned to some boundary, this can be		If the ``%Ptr`` argument is known to be aligned to some boundary, this can be
specified as an attribute on the argument.		specified as an attribute on the argument.

Arguments:		Arguments:
""""""""""		""""""""""

▲ Show 20 Lines • Show All 5,401 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Intrinsics.td

Show First 20 Lines • Show All 1,662 Lines • ▼ Show 20 Lines	def int_matrix_multiply
: DefaultAttrsIntrinsic<[llvm_anyvector_ty],		: DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[llvm_anyvector_ty, llvm_anyvector_ty, llvm_i32_ty, llvm_i32_ty,		[llvm_anyvector_ty, llvm_anyvector_ty, llvm_i32_ty, llvm_i32_ty,
llvm_i32_ty],		llvm_i32_ty],
[IntrNoSync, IntrWillReturn, IntrNoMem, IntrSpeculatable, ImmArg<ArgIndex<2>>,		[IntrNoSync, IntrWillReturn, IntrNoMem, IntrSpeculatable, ImmArg<ArgIndex<2>>,
ImmArg<ArgIndex<3>>, ImmArg<ArgIndex<4>>]>;		ImmArg<ArgIndex<3>>, ImmArg<ArgIndex<4>>]>;

def int_matrix_column_major_load		def int_matrix_column_major_load
: DefaultAttrsIntrinsic<[llvm_anyvector_ty],		: DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[LLVMPointerToElt<0>, llvm_i64_ty, llvm_i1_ty,		[LLVMPointerToElt<0>, llvm_anyint_ty, llvm_i1_ty,
llvm_i32_ty, llvm_i32_ty],		llvm_i32_ty, llvm_i32_ty],
[IntrNoSync, IntrWillReturn, IntrArgMemOnly, IntrReadMem,		[IntrNoSync, IntrWillReturn, IntrArgMemOnly, IntrReadMem,
NoCapture<ArgIndex<0>>, ImmArg<ArgIndex<2>>, ImmArg<ArgIndex<3>>,		NoCapture<ArgIndex<0>>, ImmArg<ArgIndex<2>>, ImmArg<ArgIndex<3>>,
ImmArg<ArgIndex<4>>]>;		ImmArg<ArgIndex<4>>]>;

def int_matrix_column_major_store		def int_matrix_column_major_store
: DefaultAttrsIntrinsic<[],		: DefaultAttrsIntrinsic<[],
[llvm_anyvector_ty, LLVMPointerToElt<0>,		[llvm_anyvector_ty, LLVMPointerToElt<0>,
llvm_i64_ty, llvm_i1_ty, llvm_i32_ty, llvm_i32_ty],		llvm_anyint_ty, llvm_i1_ty, llvm_i32_ty, llvm_i32_ty],
[IntrNoSync, IntrWillReturn, IntrArgMemOnly, IntrWriteMem,		[IntrNoSync, IntrWillReturn, IntrArgMemOnly, IntrWriteMem,
WriteOnly<ArgIndex<1>>, NoCapture<ArgIndex<1>>,		WriteOnly<ArgIndex<1>>, NoCapture<ArgIndex<1>>,
ImmArg<ArgIndex<3>>, ImmArg<ArgIndex<4>>, ImmArg<ArgIndex<5>>]>;		ImmArg<ArgIndex<3>>, ImmArg<ArgIndex<4>>, ImmArg<ArgIndex<5>>]>;

//===---------- Intrinsics to control hardware supported loops ----------===//		//===---------- Intrinsics to control hardware supported loops ----------===//

// Specify that the value given is the number of iterations that the next loop		// Specify that the value given is the number of iterations that the next loop
// will execute.		// will execute.
▲ Show 20 Lines • Show All 105 Lines • Show Last 20 Lines

llvm/include/llvm/IR/MatrixBuilder.h

Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	CallInst CreateColumnMajorLoad(Value DataPtr, Align Alignment,
// Deal with the pointer		// Deal with the pointer
PointerType *PtrTy = cast<PointerType>(DataPtr->getType());		PointerType *PtrTy = cast<PointerType>(DataPtr->getType());
Type *EltTy = PtrTy->getElementType();		Type *EltTy = PtrTy->getElementType();

auto RetType = FixedVectorType::get(EltTy, Rows Columns);		auto RetType = FixedVectorType::get(EltTy, Rows Columns);

Value *Ops[] = {DataPtr, Stride, B.getInt1(IsVolatile), B.getInt32(Rows),		Value *Ops[] = {DataPtr, Stride, B.getInt1(IsVolatile), B.getInt32(Rows),
B.getInt32(Columns)};		B.getInt32(Columns)};
Type *OverloadedTypes[] = {RetType};		Type *OverloadedTypes[] = {RetType, Stride->getType()};

Function *TheFn = Intrinsic::getDeclaration(		Function *TheFn = Intrinsic::getDeclaration(
getModule(), Intrinsic::matrix_column_major_load, OverloadedTypes);		getModule(), Intrinsic::matrix_column_major_load, OverloadedTypes);

CallInst *Call = B.CreateCall(TheFn->getFunctionType(), TheFn, Ops, Name);		CallInst *Call = B.CreateCall(TheFn->getFunctionType(), TheFn, Ops, Name);
Attribute AlignAttr =		Attribute AlignAttr =
Attribute::getWithAlignment(Call->getContext(), Alignment);		Attribute::getWithAlignment(Call->getContext(), Alignment);
Call->addAttribute(1, AlignAttr);		Call->addAttribute(1, AlignAttr);
return Call;		return Call;
}		}

/// Create a column major, strided matrix store.		/// Create a column major, strided matrix store.
/// \p Matrix - Matrix to store		/// \p Matrix - Matrix to store
/// \p Ptr - Pointer to write back to		/// \p Ptr - Pointer to write back to
/// \p Stride - Space between columns		/// \p Stride - Space between columns
CallInst CreateColumnMajorStore(Value Matrix, Value *Ptr, Align Alignment,		CallInst CreateColumnMajorStore(Value Matrix, Value *Ptr, Align Alignment,
Value *Stride, bool IsVolatile,		Value *Stride, bool IsVolatile,
unsigned Rows, unsigned Columns,		unsigned Rows, unsigned Columns,
const Twine &Name = "") {		const Twine &Name = "") {
Value *Ops[] = {Matrix, Ptr,		Value *Ops[] = {Matrix, Ptr,
Stride, B.getInt1(IsVolatile),		Stride, B.getInt1(IsVolatile),
B.getInt32(Rows), B.getInt32(Columns)};		B.getInt32(Rows), B.getInt32(Columns)};
Type *OverloadedTypes[] = {Matrix->getType()};		Type *OverloadedTypes[] = {Matrix->getType(), Stride->getType()};

Function *TheFn = Intrinsic::getDeclaration(		Function *TheFn = Intrinsic::getDeclaration(
getModule(), Intrinsic::matrix_column_major_store, OverloadedTypes);		getModule(), Intrinsic::matrix_column_major_store, OverloadedTypes);

CallInst *Call = B.CreateCall(TheFn->getFunctionType(), TheFn, Ops, Name);		CallInst *Call = B.CreateCall(TheFn->getFunctionType(), TheFn, Ops, Name);
Attribute AlignAttr =		Attribute AlignAttr =
Attribute::getWithAlignment(Call->getContext(), Alignment);		Attribute::getWithAlignment(Call->getContext(), Alignment);
Call->addAttribute(2, AlignAttr);		Call->addAttribute(2, AlignAttr);
▲ Show 20 Lines • Show All 144 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp

Show First 20 Lines • Show All 975 Lines • ▼ Show 20 Lines	public:
MatrixTy loadMatrix(Type Ty, Value Ptr, MaybeAlign MAlign, Value *Stride,		MatrixTy loadMatrix(Type Ty, Value Ptr, MaybeAlign MAlign, Value *Stride,
bool IsVolatile, ShapeInfo Shape, IRBuilder<> &Builder) {		bool IsVolatile, ShapeInfo Shape, IRBuilder<> &Builder) {
auto *VType = cast<VectorType>(Ty);		auto *VType = cast<VectorType>(Ty);
Type *EltTy = VType->getElementType();		Type *EltTy = VType->getElementType();
Type *VecTy = FixedVectorType::get(EltTy, Shape.getStride());		Type *VecTy = FixedVectorType::get(EltTy, Shape.getStride());
Value *EltPtr = createElementPtr(Ptr, EltTy, Builder);		Value *EltPtr = createElementPtr(Ptr, EltTy, Builder);
MatrixTy Result;		MatrixTy Result;
for (unsigned I = 0, E = Shape.getNumVectors(); I < E; ++I) {		for (unsigned I = 0, E = Shape.getNumVectors(); I < E; ++I) {
Value *GEP = computeVectorAddr(EltPtr, Builder.getInt64(I), Stride,		Value *GEP = computeVectorAddr(
Shape.getStride(), EltTy, Builder);		EltPtr, Builder.getIntN(Stride->getType()->getScalarSizeInBits(), I),
		Stride, Shape.getStride(), EltTy, Builder);
Value *Vector = Builder.CreateAlignedLoad(		Value *Vector = Builder.CreateAlignedLoad(
VecTy, GEP, getAlignForIndex(I, Stride, EltTy, MAlign),		VecTy, GEP, getAlignForIndex(I, Stride, EltTy, MAlign),
IsVolatile, "col.load");		IsVolatile, "col.load");

Result.addVector(Vector);		Result.addVector(Vector);
}		}
return Result.addNumLoads(getNumOps(Result.getVectorTy()) *		return Result.addNumLoads(getNumOps(Result.getVectorTy()) *
Result.getNumVectors());		Result.getNumVectors());
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	public:
/// Store matrix \p StoreVal starting at \p Ptr and using \p Stride between		/// Store matrix \p StoreVal starting at \p Ptr and using \p Stride between
/// vectors.		/// vectors.
MatrixTy storeMatrix(Type Ty, MatrixTy StoreVal, Value Ptr,		MatrixTy storeMatrix(Type Ty, MatrixTy StoreVal, Value Ptr,
MaybeAlign MAlign, Value *Stride, bool IsVolatile,		MaybeAlign MAlign, Value *Stride, bool IsVolatile,
IRBuilder<> &Builder) {		IRBuilder<> &Builder) {
auto VType = cast<VectorType>(Ty);		auto VType = cast<VectorType>(Ty);
Value *EltPtr = createElementPtr(Ptr, VType->getElementType(), Builder);		Value *EltPtr = createElementPtr(Ptr, VType->getElementType(), Builder);
for (auto Vec : enumerate(StoreVal.vectors())) {		for (auto Vec : enumerate(StoreVal.vectors())) {
Value *GEP = computeVectorAddr(EltPtr, Builder.getInt64(Vec.index()),		Value *GEP = computeVectorAddr(
Stride, StoreVal.getStride(),		EltPtr,
VType->getElementType(), Builder);		Builder.getIntN(Stride->getType()->getScalarSizeInBits(),
		Vec.index()),
		Stride, StoreVal.getStride(), VType->getElementType(), Builder);
Builder.CreateAlignedStore(Vec.value(), GEP,		Builder.CreateAlignedStore(Vec.value(), GEP,
getAlignForIndex(Vec.index(), Stride,		getAlignForIndex(Vec.index(), Stride,
VType->getElementType(),		VType->getElementType(),
MAlign),		MAlign),
IsVolatile);		IsVolatile);
}		}
return MatrixTy().addNumStores(getNumOps(StoreVal.getVectorTy()) *		return MatrixTy().addNumStores(getNumOps(StoreVal.getVectorTy()) *
StoreVal.getNumVectors());		StoreVal.getNumVectors());
▲ Show 20 Lines • Show All 1,269 Lines • Show Last 20 Lines

llvm/test/Transforms/LowerMatrixIntrinsics/strided-load-double.ll

	Show All 17 Lines
	; CHECK-NEXT: [[VEC_CAST7:%.]] = bitcast double [[VEC_GEP6]] to <3 x double>*			; CHECK-NEXT: [[VEC_CAST7:%.]] = bitcast double [[VEC_GEP6]] to <3 x double>*
	; CHECK-NEXT: [[COL_LOAD8:%.]] = load <3 x double>, <3 x double> [[VEC_CAST7]], align 8			; CHECK-NEXT: [[COL_LOAD8:%.]] = load <3 x double>, <3 x double> [[VEC_CAST7]], align 8
	; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <3 x double> [[COL_LOAD]], <3 x double> [[COL_LOAD4]], <6 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5>			; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <3 x double> [[COL_LOAD]], <3 x double> [[COL_LOAD4]], <6 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5>
	; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <3 x double> [[COL_LOAD8]], <3 x double> poison, <6 x i32> <i32 0, i32 1, i32 2, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <3 x double> [[COL_LOAD8]], <3 x double> poison, <6 x i32> <i32 0, i32 1, i32 2, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <6 x double> [[TMP0]], <6 x double> [[TMP1]], <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <6 x double> [[TMP0]], <6 x double> [[TMP1]], <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>
	; CHECK-NEXT: ret <9 x double> [[TMP2]]			; CHECK-NEXT: ret <9 x double> [[TMP2]]
	;			;
	entry:			entry:
	%load = call <9 x double> @llvm.matrix.column.major.load(double* %in, i64 %stride, i1 false, i32 3, i32 3)			%load = call <9 x double> @llvm.matrix.column.major.load.v9f64.i64(double* %in, i64 %stride, i1 false, i32 3, i32 3)
	ret <9 x double> %load			ret <9 x double> %load
	}			}

	declare <9 x double> @llvm.matrix.column.major.load(double*, i64, i1, i32, i32)			declare <9 x double> @llvm.matrix.column.major.load.v9f64.i64(double*, i64, i1, i32, i32)

	define <9 x double> @strided_load_9x1(double* %in, i64 %stride) {			define <9 x double> @strided_load_9x1(double* %in, i64 %stride) {
	; CHECK-LABEL: @strided_load_9x1(			; CHECK-LABEL: @strided_load_9x1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[VEC_START:%.]] = mul i64 0, [[STRIDE:%.]]			; CHECK-NEXT: [[VEC_START:%.]] = mul i64 0, [[STRIDE:%.]]
	; CHECK-NEXT: [[VEC_GEP:%.]] = getelementptr double, double [[IN:%.*]], i64 [[VEC_START]]			; CHECK-NEXT: [[VEC_GEP:%.]] = getelementptr double, double [[IN:%.*]], i64 [[VEC_START]]
	; CHECK-NEXT: [[VEC_CAST:%.]] = bitcast double [[VEC_GEP]] to <9 x double>*			; CHECK-NEXT: [[VEC_CAST:%.]] = bitcast double [[VEC_GEP]] to <9 x double>*
	; CHECK-NEXT: [[COL_LOAD:%.]] = load <9 x double>, <9 x double> [[VEC_CAST]], align 8			; CHECK-NEXT: [[COL_LOAD:%.]] = load <9 x double>, <9 x double> [[VEC_CAST]], align 8
	; CHECK-NEXT: ret <9 x double> [[COL_LOAD]]			; CHECK-NEXT: ret <9 x double> [[COL_LOAD]]
	;			;
	entry:			entry:
	%load = call <9 x double> @llvm.matrix.column.major.load(double* %in, i64 %stride, i1 false, i32 9, i32 1)			%load = call <9 x double> @llvm.matrix.column.major.load.v9f64.i64(double* %in, i64 %stride, i1 false, i32 9, i32 1)
	ret <9 x double> %load			ret <9 x double> %load
	}			}

	declare <8 x double> @llvm.matrix.column.major.load.v8f64(double*, i64, i1, i32, i32)			declare <8 x double> @llvm.matrix.column.major.load.v8f64.i64(double*, i64, i1, i32, i32)
	; CHECK: declare <8 x double> @llvm.matrix.column.major.load.v8f64(double* nocapture, i64, i1 immarg, i32 immarg, i32 immarg) [[READONLY:#[0-9]]]

	define <8 x double> @strided_load_4x2(double* %in, i64 %stride) {			define <8 x double> @strided_load_4x2(double* %in, i64 %stride) {
	; CHECK-LABEL: @strided_load_4x2(			; CHECK-LABEL: @strided_load_4x2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[VEC_START:%.]] = mul i64 0, [[STRIDE:%.]]			; CHECK-NEXT: [[VEC_START:%.]] = mul i64 0, [[STRIDE:%.]]
	; CHECK-NEXT: [[VEC_GEP:%.]] = getelementptr double, double [[IN:%.*]], i64 [[VEC_START]]			; CHECK-NEXT: [[VEC_GEP:%.]] = getelementptr double, double [[IN:%.*]], i64 [[VEC_START]]
	; CHECK-NEXT: [[VEC_CAST:%.]] = bitcast double [[VEC_GEP]] to <4 x double>*			; CHECK-NEXT: [[VEC_CAST:%.]] = bitcast double [[VEC_GEP]] to <4 x double>*
	; CHECK-NEXT: [[COL_LOAD:%.]] = load <4 x double>, <4 x double> [[VEC_CAST]], align 8			; CHECK-NEXT: [[COL_LOAD:%.]] = load <4 x double>, <4 x double> [[VEC_CAST]], align 8
	; CHECK-NEXT: [[VEC_START1:%.*]] = mul i64 1, [[STRIDE]]			; CHECK-NEXT: [[VEC_START1:%.*]] = mul i64 1, [[STRIDE]]
	; CHECK-NEXT: [[VEC_GEP2:%.]] = getelementptr double, double [[IN]], i64 [[VEC_START1]]			; CHECK-NEXT: [[VEC_GEP2:%.]] = getelementptr double, double [[IN]], i64 [[VEC_START1]]
	; CHECK-NEXT: [[VEC_CAST3:%.]] = bitcast double [[VEC_GEP2]] to <4 x double>*			; CHECK-NEXT: [[VEC_CAST3:%.]] = bitcast double [[VEC_GEP2]] to <4 x double>*
	; CHECK-NEXT: [[COL_LOAD4:%.]] = load <4 x double>, <4 x double> [[VEC_CAST3]], align 8			; CHECK-NEXT: [[COL_LOAD4:%.]] = load <4 x double>, <4 x double> [[VEC_CAST3]], align 8
	; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <4 x double> [[COL_LOAD]], <4 x double> [[COL_LOAD4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <4 x double> [[COL_LOAD]], <4 x double> [[COL_LOAD4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	; CHECK-NEXT: ret <8 x double> [[TMP0]]			; CHECK-NEXT: ret <8 x double> [[TMP0]]
	;			;
	entry:			entry:
	%load = call <8 x double> @llvm.matrix.column.major.load.v8f64(double* %in, i64 %stride, i1 false, i32 4, i32 2)			%load = call <8 x double> @llvm.matrix.column.major.load.v8f64.i64(double* %in, i64 %stride, i1 false, i32 4, i32 2)
	ret <8 x double> %load			ret <8 x double> %load
	}			}

	; CHECK: declare <9 x double> @llvm.matrix.column.major.load.v9f64(double* nocapture, i64, i1 immarg, i32 immarg, i32 immarg) [[READONLY]]			declare <8 x double> @llvm.matrix.column.major.load.v8f64.i32(double*, i32, i1, i32, i32)
	; CHECK: attributes [[READONLY]] = { argmemonly nofree nosync nounwind readonly willreturn }
				define <8 x double> @strided_load_4x2_stride_i32(double* %in, i32 %stride) {
				; CHECK-LABEL: @strided_load_4x2_stride_i32(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[VEC_START:%.]] = mul i32 0, [[STRIDE:%.]]
				; CHECK-NEXT: [[VEC_GEP:%.]] = getelementptr double, double [[IN:%.*]], i32 [[VEC_START]]
				; CHECK-NEXT: [[VEC_CAST:%.]] = bitcast double [[VEC_GEP]] to <4 x double>*
				; CHECK-NEXT: [[COL_LOAD:%.]] = load <4 x double>, <4 x double> [[VEC_CAST]], align 8
				; CHECK-NEXT: [[VEC_START1:%.*]] = mul i32 1, [[STRIDE]]
				; CHECK-NEXT: [[VEC_GEP2:%.]] = getelementptr double, double [[IN]], i32 [[VEC_START1]]
				; CHECK-NEXT: [[VEC_CAST3:%.]] = bitcast double [[VEC_GEP2]] to <4 x double>*
				; CHECK-NEXT: [[COL_LOAD4:%.]] = load <4 x double>, <4 x double> [[VEC_CAST3]], align 8
				; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <4 x double> [[COL_LOAD]], <4 x double> [[COL_LOAD4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
				; CHECK-NEXT: ret <8 x double> [[TMP0]]
				;
				entry:
				%load = call <8 x double> @llvm.matrix.column.major.load.v8f64.i32(double* %in, i32 %stride, i1 false, i32 4, i32 2)
				ret <8 x double> %load
				}

llvm/test/Transforms/LowerMatrixIntrinsics/strided-store-double.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -lower-matrix-intrinsics -S < %s \| FileCheck %s			; RUN: opt -lower-matrix-intrinsics -S < %s \| FileCheck %s
	; RUN: opt -passes='lower-matrix-intrinsics' -S < %s \| FileCheck %s			; RUN: opt -passes='lower-matrix-intrinsics' -S < %s \| FileCheck %s

	define void @strided_store_3x2(<6 x double> %in, double* %out) {			define void @strided_store_3x2(<6 x double> %in, double* %out) {
	; CHECK-LABEL: @strided_store_3x2(			; CHECK-LABEL: @strided_store_3x2(
	; CHECK-NEXT: [[SPLIT:%.]] = shufflevector <6 x double> [[IN:%.]], <6 x double> poison, <3 x i32> <i32 0, i32 1, i32 2>			; CHECK-NEXT: [[SPLIT:%.]] = shufflevector <6 x double> [[IN:%.]], <6 x double> poison, <3 x i32> <i32 0, i32 1, i32 2>
	; CHECK-NEXT: [[SPLIT1:%.*]] = shufflevector <6 x double> [[IN]], <6 x double> poison, <3 x i32> <i32 3, i32 4, i32 5>			; CHECK-NEXT: [[SPLIT1:%.*]] = shufflevector <6 x double> [[IN]], <6 x double> poison, <3 x i32> <i32 3, i32 4, i32 5>
	; CHECK-NEXT: [[VEC_CAST:%.]] = bitcast double [[OUT:%.]] to <3 x double>			; CHECK-NEXT: [[VEC_CAST:%.]] = bitcast double [[OUT:%.]] to <3 x double>
	; CHECK-NEXT: store <3 x double> [[SPLIT]], <3 x double>* [[VEC_CAST]], align 8			; CHECK-NEXT: store <3 x double> [[SPLIT]], <3 x double>* [[VEC_CAST]], align 8
	; CHECK-NEXT: [[VEC_GEP:%.]] = getelementptr double, double [[OUT]], i64 5			; CHECK-NEXT: [[VEC_GEP:%.]] = getelementptr double, double [[OUT]], i64 5
	; CHECK-NEXT: [[VEC_CAST2:%.]] = bitcast double [[VEC_GEP]] to <3 x double>*			; CHECK-NEXT: [[VEC_CAST2:%.]] = bitcast double [[VEC_GEP]] to <3 x double>*
	; CHECK-NEXT: store <3 x double> [[SPLIT1]], <3 x double>* [[VEC_CAST2]], align 8			; CHECK-NEXT: store <3 x double> [[SPLIT1]], <3 x double>* [[VEC_CAST2]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	call void @llvm.matrix.column.major.store.v6f64(<6 x double> %in, double* %out, i64 5, i1 false, i32 3, i32 2)			call void @llvm.matrix.column.major.store.v6f64.i64(<6 x double> %in, double* %out, i64 5, i1 false, i32 3, i32 2)
	ret void			ret void
	}			}

	define void @strided_store_3x2_nonconst_stride(<6 x double> %in, i64 %stride, double* %out) {			define void @strided_store_3x2_nonconst_stride(<6 x double> %in, i64 %stride, double* %out) {
	; CHECK-LABEL: @strided_store_3x2_nonconst_stride(			; CHECK-LABEL: @strided_store_3x2_nonconst_stride(
	; CHECK-NEXT: [[SPLIT:%.]] = shufflevector <6 x double> [[IN:%.]], <6 x double> poison, <3 x i32> <i32 0, i32 1, i32 2>			; CHECK-NEXT: [[SPLIT:%.]] = shufflevector <6 x double> [[IN:%.]], <6 x double> poison, <3 x i32> <i32 0, i32 1, i32 2>
	; CHECK-NEXT: [[SPLIT1:%.*]] = shufflevector <6 x double> [[IN]], <6 x double> poison, <3 x i32> <i32 3, i32 4, i32 5>			; CHECK-NEXT: [[SPLIT1:%.*]] = shufflevector <6 x double> [[IN]], <6 x double> poison, <3 x i32> <i32 3, i32 4, i32 5>
	; CHECK-NEXT: [[VEC_START:%.]] = mul i64 0, [[STRIDE:%.]]			; CHECK-NEXT: [[VEC_START:%.]] = mul i64 0, [[STRIDE:%.]]
	; CHECK-NEXT: [[VEC_GEP:%.]] = getelementptr double, double [[OUT:%.*]], i64 [[VEC_START]]			; CHECK-NEXT: [[VEC_GEP:%.]] = getelementptr double, double [[OUT:%.*]], i64 [[VEC_START]]
	; CHECK-NEXT: [[VEC_CAST:%.]] = bitcast double [[VEC_GEP]] to <3 x double>*			; CHECK-NEXT: [[VEC_CAST:%.]] = bitcast double [[VEC_GEP]] to <3 x double>*
	; CHECK-NEXT: store <3 x double> [[SPLIT]], <3 x double>* [[VEC_CAST]], align 8			; CHECK-NEXT: store <3 x double> [[SPLIT]], <3 x double>* [[VEC_CAST]], align 8
	; CHECK-NEXT: [[VEC_START2:%.*]] = mul i64 1, [[STRIDE]]			; CHECK-NEXT: [[VEC_START2:%.*]] = mul i64 1, [[STRIDE]]
	; CHECK-NEXT: [[VEC_GEP3:%.]] = getelementptr double, double [[OUT]], i64 [[VEC_START2]]			; CHECK-NEXT: [[VEC_GEP3:%.]] = getelementptr double, double [[OUT]], i64 [[VEC_START2]]
	; CHECK-NEXT: [[VEC_CAST4:%.]] = bitcast double [[VEC_GEP3]] to <3 x double>*			; CHECK-NEXT: [[VEC_CAST4:%.]] = bitcast double [[VEC_GEP3]] to <3 x double>*
	; CHECK-NEXT: store <3 x double> [[SPLIT1]], <3 x double>* [[VEC_CAST4]], align 8			; CHECK-NEXT: store <3 x double> [[SPLIT1]], <3 x double>* [[VEC_CAST4]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	call void @llvm.matrix.column.major.store.v6f64(<6 x double> %in, double* %out, i64 %stride, i1 false, i32 3, i32 2)			call void @llvm.matrix.column.major.store.v6f64.i64(<6 x double> %in, double* %out, i64 %stride, i1 false, i32 3, i32 2)
				ret void
				}

				define void @strided_store_3x2_nonconst_i32_stride(<6 x double> %in, i32 %stride, double* %out) {
				; CHECK-LABEL: @strided_store_3x2_nonconst_i32_stride(
				; CHECK-NEXT: [[SPLIT:%.]] = shufflevector <6 x double> [[IN:%.]], <6 x double> poison, <3 x i32> <i32 0, i32 1, i32 2>
				; CHECK-NEXT: [[SPLIT1:%.*]] = shufflevector <6 x double> [[IN]], <6 x double> poison, <3 x i32> <i32 3, i32 4, i32 5>
				; CHECK-NEXT: [[VEC_START:%.]] = mul i32 0, [[STRIDE:%.]]
				; CHECK-NEXT: [[VEC_GEP:%.]] = getelementptr double, double [[OUT:%.*]], i32 [[VEC_START]]
				; CHECK-NEXT: [[VEC_CAST:%.]] = bitcast double [[VEC_GEP]] to <3 x double>*
				; CHECK-NEXT: store <3 x double> [[SPLIT]], <3 x double>* [[VEC_CAST]], align 8
				; CHECK-NEXT: [[VEC_START2:%.*]] = mul i32 1, [[STRIDE]]
				; CHECK-NEXT: [[VEC_GEP3:%.]] = getelementptr double, double [[OUT]], i32 [[VEC_START2]]
				; CHECK-NEXT: [[VEC_CAST4:%.]] = bitcast double [[VEC_GEP3]] to <3 x double>*
				; CHECK-NEXT: store <3 x double> [[SPLIT1]], <3 x double>* [[VEC_CAST4]], align 8
				; CHECK-NEXT: ret void
				;
				call void @llvm.matrix.column.major.store.v6f64.i32(<6 x double> %in, double* %out, i32 %stride, i1 false, i32 3, i32 2)
	ret void			ret void
	}			}

	define void @strided_store_2x3(<10 x double> %in, double* %out) {			define void @strided_store_2x3(<10 x double> %in, double* %out) {
	; CHECK-LABEL: @strided_store_2x3(			; CHECK-LABEL: @strided_store_2x3(
	; CHECK-NEXT: [[SPLIT:%.]] = shufflevector <10 x double> [[IN:%.]], <10 x double> poison, <2 x i32> <i32 0, i32 1>			; CHECK-NEXT: [[SPLIT:%.]] = shufflevector <10 x double> [[IN:%.]], <10 x double> poison, <2 x i32> <i32 0, i32 1>
	; CHECK-NEXT: [[SPLIT1:%.*]] = shufflevector <10 x double> [[IN]], <10 x double> poison, <2 x i32> <i32 2, i32 3>			; CHECK-NEXT: [[SPLIT1:%.*]] = shufflevector <10 x double> [[IN]], <10 x double> poison, <2 x i32> <i32 2, i32 3>
	; CHECK-NEXT: [[SPLIT2:%.*]] = shufflevector <10 x double> [[IN]], <10 x double> poison, <2 x i32> <i32 4, i32 5>			; CHECK-NEXT: [[SPLIT2:%.*]] = shufflevector <10 x double> [[IN]], <10 x double> poison, <2 x i32> <i32 4, i32 5>
	Show All 10 Lines
	; CHECK-NEXT: [[VEC_GEP8:%.]] = getelementptr double, double [[OUT]], i64 12			; CHECK-NEXT: [[VEC_GEP8:%.]] = getelementptr double, double [[OUT]], i64 12
	; CHECK-NEXT: [[VEC_CAST9:%.]] = bitcast double [[VEC_GEP8]] to <2 x double>*			; CHECK-NEXT: [[VEC_CAST9:%.]] = bitcast double [[VEC_GEP8]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[SPLIT3]], <2 x double>* [[VEC_CAST9]], align 8			; CHECK-NEXT: store <2 x double> [[SPLIT3]], <2 x double>* [[VEC_CAST9]], align 8
	; CHECK-NEXT: [[VEC_GEP10:%.]] = getelementptr double, double [[OUT]], i64 16			; CHECK-NEXT: [[VEC_GEP10:%.]] = getelementptr double, double [[OUT]], i64 16
	; CHECK-NEXT: [[VEC_CAST11:%.]] = bitcast double [[VEC_GEP10]] to <2 x double>*			; CHECK-NEXT: [[VEC_CAST11:%.]] = bitcast double [[VEC_GEP10]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[SPLIT4]], <2 x double>* [[VEC_CAST11]], align 8			; CHECK-NEXT: store <2 x double> [[SPLIT4]], <2 x double>* [[VEC_CAST11]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	call void @llvm.matrix.column.major.store.v10f64(<10 x double> %in, double* %out, i64 4, i1 false, i32 2, i32 5)			call void @llvm.matrix.column.major.store.v10f64.i64(<10 x double> %in, double* %out, i64 4, i1 false, i32 2, i32 5)
	ret void			ret void
	}			}

	declare void @llvm.matrix.column.major.store.v6f64(<6 x double>, double*, i64, i1, i32, i32)			declare void @llvm.matrix.column.major.store.v6f64.i64(<6 x double>, double*, i64, i1, i32, i32)
	declare void @llvm.matrix.column.major.store.v10f64(<10 x double>, double*, i64, i1, i32, i32)			declare void @llvm.matrix.column.major.store.v6f64.i32(<6 x double>, double*, i32, i1, i32, i32)
				declare void @llvm.matrix.column.major.store.v10f64.i64(<10 x double>, double*, i64, i1, i32, i32)

	; CHECK: declare void @llvm.matrix.column.major.store.v6f64(<6 x double>, double* nocapture writeonly, i64, i1 immarg, i32 immarg, i32 immarg) #0			; CHECK: declare void @llvm.matrix.column.major.store.v6f64.i64(<6 x double>, double* nocapture writeonly, i64, i1 immarg, i32 immarg, i32 immarg) #0
	; CHECK: declare void @llvm.matrix.column.major.store.v10f64(<10 x double>, double* nocapture writeonly, i64, i1 immarg, i32 immarg, i32 immarg) #0			; CHECK: declare void @llvm.matrix.column.major.store.v10f64.i64(<10 x double>, double* nocapture writeonly, i64, i1 immarg, i32 immarg, i32 immarg) #0
	; CHECK: attributes #0 = { argmemonly nofree nosync nounwind willreturn writeonly }			; CHECK: attributes #0 = { argmemonly nofree nosync nounwind willreturn writeonly }

llvm/test/Verifier/matrix-intrinsics.ll

Show All 33 Lines
}		}

define <4 x float> @column.major_load(float* %m, float* %n, i32 %arg) {		define <4 x float> @column.major_load(float* %m, float* %n, i32 %arg) {
; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector!		; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector!
; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector!		; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector!
; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector!		; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector!
; CHECK-NEXT: immarg operand has non-immediate parameter		; CHECK-NEXT: immarg operand has non-immediate parameter
; CHECK-NEXT: i32 %arg		; CHECK-NEXT: i32 %arg
; CHECK-NEXT: %result.3 = call <6 x float> @llvm.matrix.column.major.load.v6f32(float* %n, i64 2, i1 true, i32 3, i32 %arg)		; CHECK-NEXT: %result.3 = call <6 x float> @llvm.matrix.column.major.load.v6f32.i64(float* %n, i64 2, i1 true, i32 3, i32 %arg)
%result.0 = call <4 x float> @llvm.matrix.column.major.load.v4f32(float* %m, i64 0, i1 false, i32 0, i32 0)		%result.0 = call <4 x float> @llvm.matrix.column.major.load.v4f32.i64(float* %m, i64 0, i1 false, i32 0, i32 0)
%result.1 = call <4 x float> @llvm.matrix.column.major.load.v4f32(float* %m, i64 2, i1 false, i32 1, i32 2)		%result.1 = call <4 x float> @llvm.matrix.column.major.load.v4f32.i64(float* %m, i64 2, i1 false, i32 1, i32 2)
%result.2 = call <6 x float> @llvm.matrix.column.major.load.v6f32(float* %n, i64 2, i1 true, i32 3, i32 3)		%result.2 = call <6 x float> @llvm.matrix.column.major.load.v6f32.i64(float* %n, i64 2, i1 true, i32 3, i32 3)
%result.3 = call <6 x float> @llvm.matrix.column.major.load.v6f32(float* %n, i64 2, i1 true, i32 3, i32 %arg)		%result.3 = call <6 x float> @llvm.matrix.column.major.load.v6f32.i64(float* %n, i64 2, i1 true, i32 3, i32 %arg)
ret <4 x float> %result.1		ret <4 x float> %result.1
}		}

define void @column.major_store(float* %m, float* %n, i64 %arg) {		define void @column.major_store(float* %m, float* %n, i64 %arg) {
; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector!		; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector!
; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector!		; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector!
; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector!		; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector!
; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector!		; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector!
call void @llvm.matrix.column.major.store.v4f32(<4 x float> zeroinitializer, float* %m, i64 0, i1 false, i32 0, i32 0)		call void @llvm.matrix.column.major.store.v4f32.i64(<4 x float> zeroinitializer, float* %m, i64 0, i1 false, i32 0, i32 0)
call void @llvm.matrix.column.major.store.v4f32(<4 x float> zeroinitializer, float* %m, i64 2, i1 false, i32 1, i32 2)		call void @llvm.matrix.column.major.store.v4f32.i64(<4 x float> zeroinitializer, float* %m, i64 2, i1 false, i32 1, i32 2)
call void @llvm.matrix.column.major.store.v6f32(<6 x float> zeroinitializer, float* %n, i64 2, i1 false, i32 3, i32 3)		call void @llvm.matrix.column.major.store.v6f32.i64(<6 x float> zeroinitializer, float* %n, i64 2, i1 false, i32 3, i32 3)
call void @llvm.matrix.column.major.store.v6f32(<6 x float> zeroinitializer, float* %n, i64 %arg, i1 false, i32 3, i32 3)		call void @llvm.matrix.column.major.store.v6f32.i64(<6 x float> zeroinitializer, float* %n, i64 %arg, i1 false, i32 3, i32 3)
ret void		ret void
}		}

define <4 x float> @transpose_mixed_types(<4 x float> %fvec, <4 x i32> %ivec, i32 %arg) {		define <4 x float> @transpose_mixed_types(<4 x float> %fvec, <4 x i32> %ivec, i32 %arg) {
;		;
; CHECK-NEXT: Intrinsic has incorrect argument type!		; CHECK-NEXT: Intrinsic has incorrect argument type!
; CHECK-NEXT: <4 x float> (<4 x i32>, i32, i32)* @llvm.matrix.transpose.v4f32.v4i32		; CHECK-NEXT: <4 x float> (<4 x i32>, i32, i32)* @llvm.matrix.transpose.v4f32.v4i32
; CHECK-NEXT: Intrinsic has incorrect argument type!		; CHECK-NEXT: Intrinsic has incorrect argument type!
Show All 22 Lines	;
ret <4 x float> %result.3		ret <4 x float> %result.3
}		}

define <4 x float> @column.major_load_mixed_types(i32* %m, float* %n, i32 %arg) {		define <4 x float> @column.major_load_mixed_types(i32* %m, float* %n, i32 %arg) {
;		;
; CHECK-NEXT: Intrinsic has incorrect argument type!		; CHECK-NEXT: Intrinsic has incorrect argument type!
; CHECK-NEXT: <4 x float> (i32, i64, i1, i32, i32) @llvm.matrix.column.major.load.v4f32.pi32		; CHECK-NEXT: <4 x float> (i32, i64, i1, i32, i32) @llvm.matrix.column.major.load.v4f32.pi32
; CHECK-NEXT: Intrinsic has incorrect argument type!		; CHECK-NEXT: Intrinsic has incorrect argument type!
; CHECK-NEXT: <4 x i32> (float, i64, i1, i32, i32) @llvm.matrix.column.major.load.v4i32		; CHECK-NEXT: <4 x i32> (float, i64, i1, i32, i32) @llvm.matrix.column.major.load.v4i32.i64
;		;
%result.0 = call <4 x float> @llvm.matrix.column.major.load.v4f32.pi32(i32* %m, i64 2, i1 false, i32 2, i32 2)		%result.0 = call <4 x float> @llvm.matrix.column.major.load.v4f32.pi32(i32* %m, i64 2, i1 false, i32 2, i32 2)
%result.1 = call <4 x i32> @llvm.matrix.column.major.load.v4i32(float* %n, i64 2, i1 false, i32 2, i32 2)		%result.1 = call <4 x i32> @llvm.matrix.column.major.load.v4i32.i64(float* %n, i64 2, i1 false, i32 2, i32 2)
ret <4 x float> %result.0		ret <4 x float> %result.0
}		}

define void @column.major_store_mixed_types(float* %m, i32* %n, i64 %arg) {		define void @column.major_store_mixed_types(float* %m, i32* %n, i64 %arg) {
;		;
; CHECK-NEXT: Intrinsic has incorrect argument type!		; CHECK-NEXT: Intrinsic has incorrect argument type!
; CHECK-NEXT: void (<4 x i32>, float, i64, i1, i32, i32) @llvm.matrix.column.major.store.v4i32.vi32		; CHECK-NEXT: void (<4 x i32>, float, i64, i1, i32, i32) @llvm.matrix.column.major.store.v4i32.vi32
; CHECK-NEXT: Intrinsic has incorrect argument type!		; CHECK-NEXT: Intrinsic has incorrect argument type!
; CHECK-NEXT: void (<4 x float>, i32, i64, i1, i32, i32) @llvm.matrix.column.major.store.v4f32.pi32		; CHECK-NEXT: void (<4 x float>, i32, i64, i1, i32, i32) @llvm.matrix.column.major.store.v4f32.pi32
;		;
call void @llvm.matrix.column.major.store.v4i32.vi32(<4 x i32> zeroinitializer, float* %m, i64 2, i1 false, i32 2, i32 2)		call void @llvm.matrix.column.major.store.v4i32.vi32(<4 x i32> zeroinitializer, float* %m, i64 2, i1 false, i32 2, i32 2)
call void @llvm.matrix.column.major.store.v4f32.pi32(<4 x float> zeroinitializer, i32* %n, i64 2, i1 false, i32 2, i32 2)		call void @llvm.matrix.column.major.store.v4f32.pi32(<4 x float> zeroinitializer, i32* %n, i64 2, i1 false, i32 2, i32 2)
ret void		ret void
}		}

define void @column.major_store_non_int_float_type(<4 x float>* %m, <4 x float>* %n, i64 %arg) {		define void @column.major_store_non_int_float_type(<4 x float>* %m, <4 x float>* %n, i64 %arg) {
;		;
; CHECK-NEXT: Intrinsic has incorrect argument type!		; CHECK-NEXT: Intrinsic has incorrect argument type!
; CHECK-NEXT: void (<4 x float>, <4 x float>, i64, i1, i32, i32)* @llvm.matrix.column.major.store.v4f32p0.p0v4f32		; CHECK-NEXT: void (<4 x float>, <4 x float>, i64, i1, i32, i32)* @llvm.matrix.column.major.store.v4f32p0.p0v4f32
;		;
call void @llvm.matrix.column.major.store.v4f32p0.p0v4f32(<4 x float> zeroinitializer, <4 x float> %n, i64 2, i1 false, i32 2, i32 2)		call void @llvm.matrix.column.major.store.v4f32p0.p0v4f32(<4 x float> zeroinitializer, <4 x float> %n, i64 2, i1 false, i32 2, i32 2)
ret void		ret void
}		}

define <4 x float> @column.major_load_stride_too_small(float* %m, i32 %arg) {		define <4 x float> @column.major_load_stride_too_small(float* %m, i32 %arg) {
;		;
; CHECK-NEXT: Stride must be greater or equal than the number of rows!		; CHECK-NEXT: Stride must be greater or equal than the number of rows!
; CHECK-NEXT: <4 x float> (float, i64, i1, i32, i32) @llvm.matrix.column.major.load.v4f32		; CHECK-NEXT: <4 x float> (float, i64, i1, i32, i32) @llvm.matrix.column.major.load.v4f32.i64
;		;
%result.1 = call <4 x float> @llvm.matrix.column.major.load.v4f32(float* %m, i64 1, i1 false, i32 2, i32 2)		%result.1 = call <4 x float> @llvm.matrix.column.major.load.v4f32.i64(float* %m, i64 1, i1 false, i32 2, i32 2)
ret <4 x float> %result.1		ret <4 x float> %result.1
}		}

define void @column.major_store_stride_too_small(float* %m, i64 %arg) {		define void @column.major_store_stride_too_small(float* %m, i64 %arg) {
;		;
; CHECK-NEXT: Stride must be greater or equal than the number of rows!		; CHECK-NEXT: Stride must be greater or equal than the number of rows!
; CHECK-NEXT: void (<4 x float>, float, i64, i1, i32, i32) @llvm.matrix.column.major.store.v4f32		; CHECK-NEXT: void (<4 x float>, float, i64, i1, i32, i32) @llvm.matrix.column.major.store.v4f32.i64
;		;
call void @llvm.matrix.column.major.store.v4f32(<4 x float> zeroinitializer, float* %m, i64 1, i1 false, i32 2, i32 2)		call void @llvm.matrix.column.major.store.v4f32.i64(<4 x float> zeroinitializer, float* %m, i64 1, i1 false, i32 2, i32 2)
ret void		ret void
}		}

declare <4 x i32> @llvm.matrix.column.major.load.v4i32(float*, i64, i1, i32, i32)		declare <4 x i32> @llvm.matrix.column.major.load.v4i32.i64(float*, i64, i1, i32, i32)
declare <4 x float> @llvm.matrix.column.major.load.v4f32.pi32(i32*, i64, i1, i32, i32)		declare <4 x float> @llvm.matrix.column.major.load.v4f32.pi32(i32*, i64, i1, i32, i32)
declare <4 x float> @llvm.matrix.column.major.load.v4f32(float*, i64, i1, i32, i32)		declare <4 x float> @llvm.matrix.column.major.load.v4f32.i64(float*, i64, i1, i32, i32)
declare <6 x float> @llvm.matrix.column.major.load.v6f32(float*, i64, i1, i32, i32)		declare <6 x float> @llvm.matrix.column.major.load.v6f32.i64(float*, i64, i1, i32, i32)

declare void @llvm.matrix.column.major.store.v4f32(<4 x float>, float*, i64, i1, i32, i32)		declare void @llvm.matrix.column.major.store.v4f32.i64(<4 x float>, float*, i64, i1, i32, i32)
declare void @llvm.matrix.column.major.store.v6f32(<6 x float>, float*, i64, i1, i32, i32)		declare void @llvm.matrix.column.major.store.v6f32.i64(<6 x float>, float*, i64, i1, i32, i32)
declare void @llvm.matrix.column.major.store.v4i32.vi32(<4 x i32>, float*, i64, i1, i32, i32)		declare void @llvm.matrix.column.major.store.v4i32.vi32(<4 x i32>, float*, i64, i1, i32, i32)
declare void @llvm.matrix.column.major.store.v4f32.pi32(<4 x float>, i32*, i64, i1, i32, i32)		declare void @llvm.matrix.column.major.store.v4f32.pi32(<4 x float>, i32*, i64, i1, i32, i32)
declare void @llvm.matrix.column.major.store.v4f32p0.p0v4f32(<4 x float>, <4 x float>, i64, i1, i32, i32)		declare void @llvm.matrix.column.major.store.v4f32p0.p0v4f32(<4 x float>, <4 x float>, i64, i1, i32, i32)

declare <4 x i32> @llvm.matrix.transpose.v4i32.v4f32(<4 x float>, i32, i32)		declare <4 x i32> @llvm.matrix.transpose.v4i32.v4f32(<4 x float>, i32, i32)
declare <4 x float> @llvm.matrix.transpose.v4f32(<4 x float>, i32, i32)		declare <4 x float> @llvm.matrix.transpose.v4f32(<4 x float>, i32, i32)
declare <4 x float> @llvm.matrix.transpose.v4f32.v4i32(<4 x i32>, i32, i32)		declare <4 x float> @llvm.matrix.transpose.v4f32.v4i32(<4 x i32>, i32, i32)

declare <4 x i32> @llvm.matrix.multiply.v4i32.v4f32.v4f32(<4 x float>, <4 x float>, i32, i32, i32)		declare <4 x i32> @llvm.matrix.multiply.v4i32.v4f32.v4f32(<4 x float>, <4 x float>, i32, i32, i32)
declare <4 x float> @llvm.matrix.multiply.v4f32.v4i32.v4f32(<4 x i32>, <4 x float>, i32, i32, i32)		declare <4 x float> @llvm.matrix.multiply.v4f32.v4i32.v4f32(<4 x i32>, <4 x float>, i32, i32, i32)
declare <4 x float> @llvm.matrix.multiply.v4f32.v4f32.v4i32(<4 x float>, <4 x i32>, i32, i32, i32)		declare <4 x float> @llvm.matrix.multiply.v4f32.v4f32.v4i32(<4 x float>, <4 x i32>, i32, i32, i32)
declare <4 x float> @llvm.matrix.multiply.v4f32.v4i32.v4i32(<4 x i32>, <4 x i32>, i32, i32, i32)		declare <4 x float> @llvm.matrix.multiply.v4f32.v4i32.v4i32(<4 x i32>, <4 x i32>, i32, i32, i32)
declare <4 x float> @llvm.matrix.multiply.v4f32.v4f32.v4f32(<4 x float>, <4 x float>, i32, i32, i32)		declare <4 x float> @llvm.matrix.multiply.v4f32.v4f32.v4f32(<4 x float>, <4 x float>, i32, i32, i32)

This is an archive of the discontinued LLVM Phabricator instance.

[Matrix] Overload stride arg in matrix.columnwise.load/store.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 365949

clang/test/CodeGen/matrix-type-builtins.c

clang/test/CodeGenCXX/matrix-type-builtins.cpp

clang/test/CodeGenObjC/matrix-type-builtins.m

llvm/docs/LangRef.rst

llvm/include/llvm/IR/Intrinsics.td

llvm/include/llvm/IR/MatrixBuilder.h

llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp

llvm/test/Transforms/LowerMatrixIntrinsics/strided-load-double.ll

llvm/test/Transforms/LowerMatrixIntrinsics/strided-store-double.ll

llvm/test/Verifier/matrix-intrinsics.ll

[Matrix] Overload stride arg in matrix.columnwise.load/store.
ClosedPublic