This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/ExecutionEngine/
-
mlir/
-
ExecutionEngine/
5/6
RunnerUtils.h
-
integration_test/Dialect/Linalg/CPU/
-
Dialect/
-
Linalg/
-
CPU/
4/4
benchmark_matmul.mlir
-
benchmark_matmul_column_major.mlir
-
benchmark_matmul_column_major_as_row_major.mlir
-
benchmark_matmul_i8_i8_i32.mlir
-
lib/ExecutionEngine/
-
ExecutionEngine/
-
RunnerUtils.cpp

Differential D96593

Reland "[mlir] add support for verification in integration tests"
ClosedPublic

Authored by gysit on Feb 12 2021, 4:31 AM.

Download Raw Diff

Details

Reviewers

Kayjukh
aartbik
nicolasvasilache

Commits

rG99f3510b4137: Reland "[mlir] add support for verification in integration tests"

Summary

The patch extends the runner utils by verification methods that compare two memrefs. The methods compare the content of the two memrefs and print success if the data is identical up to a small numerical error. The methods are meant to simplify the development of integration tests that compare the results against a reference implementation (cf. the updates to the linalg matmul integration tests).

Originally landed in 5fa893c (https://reviews.llvm.org/D96326) and reverted in dd719fd due to a Windows build failure.

Changes:

Remove the max function that requires the "algorithm" header on Windows
Eliminate the truncation warning in the float specialization of verifyElem by using a float constant

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

gysit created this revision.Feb 12 2021, 4:31 AM

Herald added a reviewer: aartbik. · View Herald TranscriptFeb 12 2021, 4:31 AM

Herald added subscribers: mravishankar, teijeong, rdzhabarov and 14 others. · View Herald Transcript

gysit requested review of this revision.Feb 12 2021, 4:31 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptFeb 12 2021, 4:31 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: limo1996, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Harbormaster completed remote builds in B88979: Diff 323284.Feb 12 2021, 6:15 AM

aartbik added inline comments.Feb 12 2021, 11:38 AM

mlir/include/mlir/ExecutionEngine/RunnerUtils.h
219	Please document what verify means in this case (returning true/false, not verify-failing)
mlir/integration_test/Dialect/Linalg/CPU/benchmark_matmul.mlir
91–100	did you verify that such a CHECK never has false positives or negatives? I often see a lot of stray numbers in the output, and just want to make sure this it not too brittle

gysit added inline comments.Feb 12 2021, 12:39 PM

mlir/integration_test/Dialect/Linalg/CPU/benchmark_matmul.mlir
91–100	Thank you for the comments. Are you referring to the numerical aspect of the verification? Or to the single character check? I verified for a few test cases that I get errors / no errors when I expect them (that's the reason why I introduced the ^ and $). The numerics should be fine for the use cases I have seen so far. The epsilon I chose is quite small compared to the machine epsilon meaning the verification should be sensitive. I believe that is what we want for now. The best choice for unit testing is of course working with integers. That avoids potential numerical issues.

aartbik added inline comments.Feb 12 2021, 1:49 PM

mlir/integration_test/Dialect/Linalg/CPU/benchmark_matmul.mlir
91–100	I was indeed referring to the single character check (just because the output sometimes contains timestamps etc.). I may be overly pedantic but I remember being surprised in the past that a digit-based check actually passed when I did not expect it. I noticed the ^ and $ so I suspect you gave it some thought, but just wanted to double check. That is why I would prefer some harder check on output (like PASS/FAIL or return code), but if this works, I am fine. Thanks for confirming.

address comments aartbik's comments
return error by value

gysit marked 2 inline comments as done.Feb 12 2021, 11:57 PM

gysit added inline comments.

mlir/integration_test/Dialect/Linalg/CPU/benchmark_matmul.mlir
91–100	I understand the concern. But actually returning the number of errors is sometimes really useful. E.g. if you may be able to see that the values of one column row are entirely wrong. It is still possible to extend the test with an scf.if error != 0 then printFail to be on the safer side.

lgtm, perhaps also ask @mehdi_amini to have a quick look, since he is busy refactoring our runner utils

In general I'm a bit worried that this directory is not well structured, but this addition is in line with the kind of things already in this file so I don't feel like there's a strong reason to hold on this.

Kayjukh added inline comments.Feb 14 2021, 1:42 AM

mlir/include/mlir/ExecutionEngine/RunnerUtils.h
248	nit: Could be condensed as `return (delta <= epsilon * std::abs(expected));`
288	This loop "feels wrong" to me because of the two occurrences of the `errors` variable, even though it seems to be correct when taking the time to think about it. If I'm not mistaken you could remove the `errors` argument to the function and always start from zero. This line would then become `errors += verify(os, ..., strides + 1);`, which makes more sense to me.
306	Would it make more sense to return a boolean indicating whether or not there were errors and update a reference argument accordingly? Otherwise, the fact that this function returns `-1` if there are no errors should be clearly documented somewhere, as it can easily be overlooked.

addressed Kayjukh's comments

gysit marked 2 inline comments as done.Feb 14 2021, 3:58 AM

gysit added inline comments.

mlir/include/mlir/ExecutionEngine/RunnerUtils.h
288	True this was confusion. I am now using a separate printCounter passed by reference. Another option would be to return the error count by reference but I currently prefer having a return value on all verify methods.
306	I went for documenting the -1 since I like the additional info provided by the return value.

LGTM. Since @aartbik and @mehdi_amini seem to agree, I think you could land this patch.

This revision is now accepted and ready to land.Feb 14 2021, 10:07 AM

Closed by commit rG99f3510b4137: Reland "[mlir] add support for verification in integration tests" (authored by gysit). · Explain WhyFeb 14 2021, 11:33 AM

This revision was automatically updated to reflect the committed changes.

gysit marked an inline comment as done.

gysit added a commit: rG99f3510b4137: Reland "[mlir] add support for verification in integration tests".

Revision Contents

Path

Size

mlir/

include/

mlir/

ExecutionEngine/

RunnerUtils.h

134 lines

integration_test/

Dialect/

Linalg/

CPU/

benchmark_matmul.mlir

17 lines

benchmark_matmul_column_major.mlir

17 lines

benchmark_matmul_column_major_as_row_major.mlir

33 lines

benchmark_matmul_i8_i8_i32.mlir

15 lines

lib/

ExecutionEngine/

RunnerUtils.cpp

39 lines

Diff 323628

mlir/include/mlir/ExecutionEngine/RunnerUtils.h

Show All 25 Lines
#define MLIR_RUNNERUTILS_EXPORT __declspec(dllimport)		#define MLIR_RUNNERUTILS_EXPORT __declspec(dllimport)
#endif // mlir_runner_utils_EXPORTS		#endif // mlir_runner_utils_EXPORTS
#endif // MLIR_RUNNERUTILS_EXPORT		#endif // MLIR_RUNNERUTILS_EXPORT
#else		#else
#define MLIR_RUNNERUTILS_EXPORT		#define MLIR_RUNNERUTILS_EXPORT
#endif // _WIN32		#endif // _WIN32

#include <assert.h>		#include <assert.h>
		#include <cmath>
#include <iostream>		#include <iostream>

#include "mlir/ExecutionEngine/CRunnerUtils.h"		#include "mlir/ExecutionEngine/CRunnerUtils.h"

template <typename T, typename StreamType>		template <typename T, typename StreamType>
void printMemRefMetaData(StreamType &os, const DynamicMemRefType<T> &V) {		void printMemRefMetaData(StreamType &os, const DynamicMemRefType<T> &V) {
os << "base@ = " << reinterpret_cast<void *>(V.data) << " rank = " << V.rank		os << "base@ = " << reinterpret_cast<void *>(V.data) << " rank = " << V.rank
<< " offset = " << V.offset;		<< " offset = " << V.offset;
Show All 26 Lines

////////////////////////////////////////////////////////////////////////////////		////////////////////////////////////////////////////////////////////////////////
// Templated instantiation follows.		// Templated instantiation follows.
////////////////////////////////////////////////////////////////////////////////		////////////////////////////////////////////////////////////////////////////////
namespace impl {		namespace impl {
template <typename T, int M, int... Dims>		template <typename T, int M, int... Dims>
std::ostream &operator<<(std::ostream &os, const Vector<T, M, Dims...> &v);		std::ostream &operator<<(std::ostream &os, const Vector<T, M, Dims...> &v);

template <int... Dims> struct StaticSizeMult {		template <int... Dims>
		struct StaticSizeMult {
static constexpr int value = 1;		static constexpr int value = 1;
};		};

template <int N, int... Dims> struct StaticSizeMult<N, Dims...> {		template <int N, int... Dims>
		struct StaticSizeMult<N, Dims...> {
static constexpr int value = N * StaticSizeMult<Dims...>::value;		static constexpr int value = N * StaticSizeMult<Dims...>::value;
};		};

static inline void printSpace(std::ostream &os, int count) {		static inline void printSpace(std::ostream &os, int count) {
for (int i = 0; i < count; ++i) {		for (int i = 0; i < count; ++i) {
os << ' ';		os << ' ';
}		}
}		}

template <typename T, int M, int... Dims> struct VectorDataPrinter {		template <typename T, int M, int... Dims>
		struct VectorDataPrinter {
static void print(std::ostream &os, const Vector<T, M, Dims...> &val);		static void print(std::ostream &os, const Vector<T, M, Dims...> &val);
};		};

template <typename T, int M, int... Dims>		template <typename T, int M, int... Dims>
void VectorDataPrinter<T, M, Dims...>::print(std::ostream &os,		void VectorDataPrinter<T, M, Dims...>::print(std::ostream &os,
const Vector<T, M, Dims...> &val) {		const Vector<T, M, Dims...> &val) {
static_assert(M > 0, "0 dimensioned tensor");		static_assert(M > 0, "0 dimensioned tensor");
static_assert(sizeof(val) == M * StaticSizeMult<Dims...>::value * sizeof(T),		static_assert(sizeof(val) == M * StaticSizeMult<Dims...>::value * sizeof(T),
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	void printMemRef(StridedMemRefType<T, N> &M) {
printMemRef(DynamicMemRefType<T>(M));		printMemRef(DynamicMemRefType<T>(M));
}		}

template <typename T>		template <typename T>
void printMemRef(UnrankedMemRefType<T> &M) {		void printMemRef(UnrankedMemRefType<T> &M) {
std::cout << "Unranked Memref ";		std::cout << "Unranked Memref ";
printMemRef(DynamicMemRefType<T>(M));		printMemRef(DynamicMemRefType<T>(M));
}		}

		/// Verify the result of two computations are equivalent up to a small
		aartbikUnsubmitted Done Reply Inline Actions Please document what verify means in this case (returning true/false, not verify-failing) aartbik: Please document what verify means in this case (returning true/false, not verify-failing)
		/// numerical error and return the number of errors.
		template <typename T>
		struct MemRefDataVerifier {
		/// Maximum number of errors printed by the verifier.
		static constexpr int printLimit = 10;

		/// Verify the relative difference of the values is smaller than epsilon.
		static bool verifyRelErrorSmallerThan(T actual, T expected, T epsilon);

		/// Verify the values are equivalent (integers) or are close (floating-point).
		static bool verifyElem(T actual, T expected);

		/// Verify the data element-by-element and return the number of errors.
		static int64_t verify(std::ostream &os, T actualBasePtr, T expectedBasePtr,
		int64_t dim, int64_t offset, const int64_t *sizes,
		const int64_t *strides, int64_t &printCounter);
		};

		template <typename T>
		bool MemRefDataVerifier<T>::verifyRelErrorSmallerThan(T actual, T expected,
		T epsilon) {
		// Return an error if one of the values is infinite or NaN.
		if (!std::isfinite(actual) \|\| !std::isfinite(expected))
		return false;
		// Return true if the relative error is smaller than epsilon.
		T delta = std::abs(actual - expected);
		return (delta <= epsilon * std::abs(expected));
		}

		KayjukhUnsubmitted Done Reply Inline Actions nit: Could be condensed as `return (delta <= epsilon * std::abs(expected));` Kayjukh: nit: Could be condensed as `return (delta <= epsilon * std::abs(expected));`
		template <typename T>
		bool MemRefDataVerifier<T>::verifyElem(T actual, T expected) {
		return actual == expected;
		}

		template <>
		inline bool MemRefDataVerifier<double>::verifyElem(double actual,
		double expected) {
		return verifyRelErrorSmallerThan(actual, expected, 1e-12);
		}

		template <>
		inline bool MemRefDataVerifier<float>::verifyElem(float actual,
		float expected) {
		return verifyRelErrorSmallerThan(actual, expected, 1e-6f);
		}

		template <typename T>
		int64_t MemRefDataVerifier<T>::verify(std::ostream &os, T *actualBasePtr,
		T *expectedBasePtr, int64_t dim,
		int64_t offset, const int64_t *sizes,
		const int64_t *strides,
		int64_t &printCounter) {
		int64_t errors = 0;
		// Verify the elements at the current offset.
		if (dim == 0) {
		if (!verifyElem(actualBasePtr[offset], expectedBasePtr[offset])) {
		if (printCounter < printLimit) {
		os << actualBasePtr[offset] << " != " << expectedBasePtr[offset]
		<< " offset = " << offset << "\n";
		printCounter++;
		}
		errors++;
		}
		} else {
		// Iterate the current dimension and verify recursively.
		for (int64_t i = 0; i < sizes[0]; ++i) {
		errors +=
		verify(os, actualBasePtr, expectedBasePtr, dim - 1,
		offset + i * strides[0], sizes + 1, strides + 1, printCounter);
		KayjukhUnsubmitted Not Done Reply Inline Actions This loop "feels wrong" to me because of the two occurrences of the `errors` variable, even though it seems to be correct when taking the time to think about it. If I'm not mistaken you could remove the `errors` argument to the function and always start from zero. This line would then become `errors += verify(os, ..., strides + 1);`, which makes more sense to me. Kayjukh: This loop "feels wrong" to me because of the two occurrences of the `errors` variable, even…
		gysitAuthorUnsubmitted Done Reply Inline Actions True this was confusion. I am now using a separate printCounter passed by reference. Another option would be to return the error count by reference but I currently prefer having a return value on all verify methods. gysit: True this was confusion. I am now using a separate printCounter passed by reference. Another…
		}
		}
		return errors;
		}

		/// Verify the equivalence of two dynamic memrefs and return the number of
		/// errors or -1 if the shape of the memrefs do not match.
		template <typename T>
		int64_t verifyMemRef(const DynamicMemRefType<T> &actual,
		const DynamicMemRefType<T> &expected) {
		// Check if the memref shapes match.
		for (int64_t i = 0; i < actual.rank; ++i) {
		if (expected.rank != actual.rank \|\| actual.offset != expected.offset \|\|
		actual.sizes[i] != expected.sizes[i] \|\|
		actual.strides[i] != expected.strides[i]) {
		printMemRefMetaData(std::cerr, actual);
		printMemRefMetaData(std::cerr, expected);
		return -1;
		KayjukhUnsubmitted Done Reply Inline Actions Would it make more sense to return a boolean indicating whether or not there were errors and update a reference argument accordingly? Otherwise, the fact that this function returns `-1` if there are no errors should be clearly documented somewhere, as it can easily be overlooked. Kayjukh: Would it make more sense to return a boolean indicating whether or not there were errors and…
		gysitAuthorUnsubmitted Done Reply Inline Actions I went for documenting the -1 since I like the additional info provided by the return value. gysit: I went for documenting the -1 since I like the additional info provided by the return value.
		}
		}
		// Return the number of errors.
		int64_t printCounter = 0;
		return MemRefDataVerifier<T>::verify(
		std::cerr, actual.basePtr, expected.basePtr, actual.rank, actual.offset,
		actual.sizes, actual.strides, printCounter);
		}

		/// Verify the equivalence of two unranked memrefs and return the number of
		/// errors or -1 if the shape of the memrefs do not match.
		template <typename T>
		int64_t verifyMemRef(UnrankedMemRefType<T> &actual,
		UnrankedMemRefType<T> &expected) {
		return verifyMemRef(DynamicMemRefType<T>(actual),
		DynamicMemRefType<T>(expected));
		}

} // namespace impl		} // namespace impl

////////////////////////////////////////////////////////////////////////////////		////////////////////////////////////////////////////////////////////////////////
// Currently exposed C API.		// Currently exposed C API.
////////////////////////////////////////////////////////////////////////////////		////////////////////////////////////////////////////////////////////////////////
extern "C" MLIR_RUNNERUTILS_EXPORT void		extern "C" MLIR_RUNNERUTILS_EXPORT void
_mlir_ciface_print_memref_i8(UnrankedMemRefType<int8_t> *M);		_mlir_ciface_print_memref_i8(UnrankedMemRefType<int8_t> *M);
extern "C" MLIR_RUNNERUTILS_EXPORT void		extern "C" MLIR_RUNNERUTILS_EXPORT void
Show All 20 Lines
_mlir_ciface_print_memref_3d_f32(StridedMemRefType<float, 3> *M);		_mlir_ciface_print_memref_3d_f32(StridedMemRefType<float, 3> *M);
extern "C" MLIR_RUNNERUTILS_EXPORT void		extern "C" MLIR_RUNNERUTILS_EXPORT void
_mlir_ciface_print_memref_4d_f32(StridedMemRefType<float, 4> *M);		_mlir_ciface_print_memref_4d_f32(StridedMemRefType<float, 4> *M);

extern "C" MLIR_RUNNERUTILS_EXPORT void		extern "C" MLIR_RUNNERUTILS_EXPORT void
_mlir_ciface_print_memref_vector_4x4xf32(		_mlir_ciface_print_memref_vector_4x4xf32(
StridedMemRefType<Vector2D<4, 4, float>, 2> *M);		StridedMemRefType<Vector2D<4, 4, float>, 2> *M);

		extern "C" MLIR_RUNNERUTILS_EXPORT int64_t _mlir_ciface_verifyMemRefI32(
		UnrankedMemRefType<int32_t> actual, UnrankedMemRefType<int32_t> expected);
		extern "C" MLIR_RUNNERUTILS_EXPORT int64_t _mlir_ciface_verifyMemRefF32(
		UnrankedMemRefType<float> actual, UnrankedMemRefType<float> expected);
		extern "C" MLIR_RUNNERUTILS_EXPORT int64_t _mlir_ciface_verifyMemRefF64(
		UnrankedMemRefType<double> actual, UnrankedMemRefType<double> expected);

		extern "C" MLIR_RUNNERUTILS_EXPORT int64_t verifyMemRefI32(int64_t rank,
		void *actualPtr,
		void *expectedPtr);
		extern "C" MLIR_RUNNERUTILS_EXPORT int64_t verifyMemRefF32(int64_t rank,
		void *actualPtr,
		void *expectedPtr);
		extern "C" MLIR_RUNNERUTILS_EXPORT int64_t verifyMemRefF64(int64_t rank,
		void *actualPtr,
		void *expectedPtr);

#endif // EXECUTIONENGINE_RUNNERUTILS_H_		#endif // EXECUTIONENGINE_RUNNERUTILS_H_

mlir/integration_test/Dialect/Linalg/CPU/benchmark_matmul.mlir

// RUN: export M=24 && export K=64 && export N=192 && export ITERS=10 && \		// RUN: export M=24 && export K=64 && export N=192 && export ITERS=10 && \
// RUN: cat %s \| sed 's@${M}@'"$M"'@g'\| sed 's@${K}@'"$K"'@g' \| sed 's@${N}@'"$N"'@g'\| sed 's@${ITERS}@'"$ITERS"'@g'\| \		// RUN: cat %s \| sed 's@${M}@'"$M"'@g'\| sed 's@${K}@'"$K"'@g' \| sed 's@${N}@'"$N"'@g'\| sed 's@${ITERS}@'"$ITERS"'@g'\| \
// RUN: mlir-opt -test-linalg-codegen-strategy="anchor-op=linalg.matmul register-tile-sizes=12,32,16 vectorize" \| \		// RUN: mlir-opt -test-linalg-codegen-strategy="anchor-func=matmul anchor-op=linalg.matmul register-tile-sizes=12,32,16 vectorize" \| \
// RUN: mlir-opt -test-linalg-codegen-strategy="anchor-op=linalg.fill register-tile-sizes=4,32 vectorize" \| \		// RUN: mlir-opt -test-linalg-codegen-strategy="anchor-op=linalg.fill register-tile-sizes=4,32 vectorize" \| \
// RUN: mlir-opt -test-linalg-codegen-strategy="anchor-op=linalg.copy register-tile-sizes=4,32 vectorize" \| \		// RUN: mlir-opt -test-linalg-codegen-strategy="anchor-op=linalg.copy register-tile-sizes=4,32 vectorize" \| \

// RUN: mlir-opt -canonicalize -convert-vector-to-scf -lower-affine -convert-linalg-to-loops \| \		// RUN: mlir-opt -canonicalize -convert-vector-to-scf -lower-affine -convert-linalg-to-loops \| \
// RUN: mlir-opt -canonicalize -convert-scf-to-std -convert-vector-to-llvm -convert-std-to-llvm \| \		// RUN: mlir-opt -canonicalize -convert-scf-to-std -convert-vector-to-llvm -convert-std-to-llvm \| \
// RUN: mlir-cpu-runner -O3 -e main -entry-point-result=void \		// RUN: mlir-cpu-runner -O3 -e main -entry-point-result=void \
// Activate to dump assembly		// Activate to dump assembly
// R_UN: -dump-object-file -object-filename=/tmp/a.o \		// R_UN: -dump-object-file -object-filename=/tmp/a.o \
		// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_runner_utils%shlibext \
// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_c_runner_utils%shlibext \| \		// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_c_runner_utils%shlibext \| \
// Use tee to both print to stderr and FileCheck		// Use tee to both print to stderr and FileCheck
// RUN: tee -a /dev/stderr \| FileCheck %s		// RUN: tee -a /dev/stderr \| FileCheck %s


!elem_type_a = type f32		!elem_type_a = type f32
!elem_type_b = type f32		!elem_type_b = type f32
!elem_type_c = type f32		!elem_type_c = type f32
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	scf.for %arg0 = %c0 to %iters step %c1 {
%z = constant 0.0 : !elem_type_c		%z = constant 0.0 : !elem_type_c
linalg.fill(%C, %z) : !row_major_C, !elem_type_c		linalg.fill(%C, %z) : !row_major_C, !elem_type_c
call @matmul(%A, %B, %C) : (!row_major_A, !row_major_B, !row_major_C) -> ()		call @matmul(%A, %B, %C) : (!row_major_A, !row_major_B, !row_major_C) -> ()
}		}
%t_end_matmul = call @rtclock() : () -> f64		%t_end_matmul = call @rtclock() : () -> f64
%tmatmul = subf %t_end_matmul, %t_start_matmul: f64		%tmatmul = subf %t_end_matmul, %t_start_matmul: f64
call @print_perf(%iters, %tmatmul) : (index, f64) -> ()		call @print_perf(%iters, %tmatmul) : (index, f64) -> ()

%res = load %C[%c0, %c0]: !row_major_C		// CHECK: {{^0$}}
// CHECK: 64		%C_ref = alloc() : !row_major_C
vector.print %res: f32		linalg.fill(%C_ref, %v0) : !row_major_C, !elem_type_c
		linalg.matmul ins(%A, %B : !row_major_A, !row_major_B)
		outs(%C_ref: !row_major_C)
		%act = memref_cast %C : !row_major_C to memref<*xf32>
		%exp = memref_cast %C_ref : !row_major_C to memref<*xf32>
		%errors = call @verifyMemRefF32(%act, %exp) : (memref<xf32>, memref<xf32>) -> i64
		vector.print %errors : i64
		dealloc %C_ref : !row_major_C
		aartbikUnsubmitted Done Reply Inline Actions did you verify that such a CHECK never has false positives or negatives? I often see a lot of stray numbers in the output, and just want to make sure this it not too brittle aartbik: did you verify that such a CHECK never has false positives or negatives? I often see a lot of…
		gysitAuthorUnsubmitted Done Reply Inline Actions Thank you for the comments. Are you referring to the numerical aspect of the verification? Or to the single character check? I verified for a few test cases that I get errors / no errors when I expect them (that's the reason why I introduced the ^ and $). The numerics should be fine for the use cases I have seen so far. The epsilon I chose is quite small compared to the machine epsilon meaning the verification should be sensitive. I believe that is what we want for now. The best choice for unit testing is of course working with integers. That avoids potential numerical issues. gysit: Thank you for the comments. Are you referring to the numerical aspect of the verification? Or…
		aartbikUnsubmitted Done Reply Inline Actions I was indeed referring to the single character check (just because the output sometimes contains timestamps etc.). I may be overly pedantic but I remember being surprised in the past that a digit-based check actually passed when I did not expect it. I noticed the ^ and $ so I suspect you gave it some thought, but just wanted to double check. That is why I would prefer some harder check on output (like PASS/FAIL or return code), but if this works, I am fine. Thanks for confirming. aartbik: I was indeed referring to the single character check (just because the output sometimes…
		gysitAuthorUnsubmitted Done Reply Inline Actions I understand the concern. But actually returning the number of errors is sometimes really useful. E.g. if you may be able to see that the values of one column row are entirely wrong. It is still possible to extend the test with an scf.if error != 0 then printFail to be on the safer side. gysit: I understand the concern. But actually returning the number of errors is sometimes really…

dealloc %A : !row_major_A		dealloc %A : !row_major_A
dealloc %B : !row_major_B		dealloc %B : !row_major_B
dealloc %C : !row_major_C		dealloc %C : !row_major_C

return		return
}		}

func private @rtclock() -> f64		func private @rtclock() -> f64
		func private @verifyMemRefF32(memref<xf32>, memref<xf32>) -> i64 attributes { llvm.emit_c_interface }

// TODO: init with random, run and check output.		// TODO: init with random, run and check output.
// func private @fill_random_f32(memref<*xf32>)		// func private @fill_random_f32(memref<*xf32>)

mlir/integration_test/Dialect/Linalg/CPU/benchmark_matmul_column_major.mlir

// RUN: export M=24 && export K=64 && export N=192 && export ITERS=10 && \		// RUN: export M=24 && export K=64 && export N=192 && export ITERS=10 && \
// RUN: cat %s \| sed 's@${M}@'"$M"'@g'\| sed 's@${K}@'"$K"'@g' \| sed 's@${N}@'"$N"'@g'\| sed 's@${ITERS}@'"$ITERS"'@g'\| \		// RUN: cat %s \| sed 's@${M}@'"$M"'@g'\| sed 's@${K}@'"$K"'@g' \| sed 's@${N}@'"$N"'@g'\| sed 's@${ITERS}@'"$ITERS"'@g'\| \
// RUN: mlir-opt -test-linalg-codegen-strategy="anchor-op=linalg.matmul_column_major register-tile-sizes=16,0,32 vectorize" \| \		// RUN: mlir-opt -test-linalg-codegen-strategy="anchor-func=matmul_column_major anchor-op=linalg.matmul_column_major register-tile-sizes=16,0,32 vectorize" \| \
// RUN: mlir-opt -test-linalg-codegen-strategy="anchor-op=linalg.fill register-tile-sizes=4,16 vectorize" \| \		// RUN: mlir-opt -test-linalg-codegen-strategy="anchor-op=linalg.fill register-tile-sizes=4,16 vectorize" \| \

// TODO: linalg.copy vectorization in the presence of permutation map fails. Enable when addressed.		// TODO: linalg.copy vectorization in the presence of permutation map fails. Enable when addressed.
// RUN: mlir-opt -test-linalg-codegen-strategy="anchor-op=linalg.copy register-tile-sizes=4,16 vectorize" \| \		// RUN: mlir-opt -test-linalg-codegen-strategy="anchor-op=linalg.copy register-tile-sizes=4,16 vectorize" \| \

// RUN: mlir-opt -canonicalize -convert-vector-to-scf -lower-affine -convert-linalg-to-loops \| \		// RUN: mlir-opt -canonicalize -convert-vector-to-scf -lower-affine -convert-linalg-to-loops \| \
// RUN: mlir-opt -canonicalize -convert-scf-to-std -convert-vector-to-llvm -convert-std-to-llvm \| \		// RUN: mlir-opt -canonicalize -convert-scf-to-std -convert-vector-to-llvm -convert-std-to-llvm \| \
// RUN: mlir-cpu-runner -O3 -e main -entry-point-result=void \		// RUN: mlir-cpu-runner -O3 -e main -entry-point-result=void \
// Activate to dump assembly		// Activate to dump assembly
// R_UN: -dump-object-file -object-filename=/tmp/a.o \		// R_UN: -dump-object-file -object-filename=/tmp/a.o \
		// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_runner_utils%shlibext \
// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_c_runner_utils%shlibext \| \		// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_c_runner_utils%shlibext \| \
// Use tee to both print to stderr and FileCheck		// Use tee to both print to stderr and FileCheck
// RUN: tee -a /dev/stderr \| FileCheck %s		// RUN: tee -a /dev/stderr \| FileCheck %s

!elem_type_a = type f32		!elem_type_a = type f32
!elem_type_b = type f32		!elem_type_b = type f32
!elem_type_c = type f32		!elem_type_c = type f32
!row_major_A = type memref<${M}x${K}x!elem_type_a>		!row_major_A = type memref<${M}x${K}x!elem_type_a>
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	scf.for %arg0 = %c0 to %iters step %c1 {
// be easy.		// be easy.
linalg.fill(%cC, %f0) : !column_major_C, !elem_type_c		linalg.fill(%cC, %f0) : !column_major_C, !elem_type_c
call @matmul_column_major(%cA, %cB, %cC) : (!column_major_A, !column_major_B, !column_major_C) -> ()		call @matmul_column_major(%cA, %cB, %cC) : (!column_major_A, !column_major_B, !column_major_C) -> ()
}		}
%t_end_matmul_column_major = call @rtclock() : () -> f64		%t_end_matmul_column_major = call @rtclock() : () -> f64
%tmatmul_column_major = subf %t_end_matmul_column_major, %t_start_matmul_column_major: f64		%tmatmul_column_major = subf %t_end_matmul_column_major, %t_start_matmul_column_major: f64
call @print_perf(%iters, %tmatmul_column_major) : (index, f64) -> ()		call @print_perf(%iters, %tmatmul_column_major) : (index, f64) -> ()

%res = load %cC[%c0, %c0]: !column_major_C		// CHECK: {{^0$}}
// CHECK: 64		%cC_ref = alloc() : !column_major_C
vector.print %res: !elem_type_c		linalg.fill(%cC_ref, %f0) : !column_major_C, !elem_type_c
		linalg.matmul_column_major ins(%cA, %cB : !column_major_A, !column_major_B)
		outs(%cC_ref: !column_major_C)
		%act = memref_cast %cC : !column_major_C to memref<*xf32>
		%exp = memref_cast %cC_ref : !column_major_C to memref<*xf32>
		%errors = call @verifyMemRefF32(%act, %exp) : (memref<xf32>, memref<xf32>) -> i64
		vector.print %errors : i64
		dealloc %cC_ref : !column_major_C

dealloc %cA : !column_major_A		dealloc %cA : !column_major_A
dealloc %cB : !column_major_B		dealloc %cB : !column_major_B
dealloc %cC : !column_major_C		dealloc %cC : !column_major_C

return		return
}		}

func private @rtclock() -> f64		func private @rtclock() -> f64
		func private @verifyMemRefF32(memref<xf32>, memref<xf32>) -> i64 attributes { llvm.emit_c_interface }

// TODO: init with random, run and check output.		// TODO: init with random, run and check output.
// func private @fill_random_f32(memref<*xf32>)		// func private @fill_random_f32(memref<*xf32>)

mlir/integration_test/Dialect/Linalg/CPU/benchmark_matmul_column_major_as_row_major.mlir

// RUN: export M=24 && export K=64 && export N=192 && export ITERS=10 && \		// RUN: export M=24 && export K=64 && export N=192 && export ITERS=10 && \
// RUN: cat %s \| sed 's@${M}@'"$M"'@g'\| sed 's@${K}@'"$K"'@g' \| sed 's@${N}@'"$N"'@g'\| sed 's@${ITERS}@'"$ITERS"'@g'\| \		// RUN: cat %s \| sed 's@${M}@'"$M"'@g'\| sed 's@${K}@'"$K"'@g' \| sed 's@${N}@'"$N"'@g'\| sed 's@${ITERS}@'"$ITERS"'@g'\| \
// RUN: mlir-opt -test-linalg-codegen-strategy="anchor-op=linalg.matmul_column_major register-tile-sizes=16,0,32 vectorize" \| \		// RUN: mlir-opt -test-linalg-codegen-strategy="anchor-func=matmul_column_major_as_row_major anchor-op=linalg.matmul_column_major register-tile-sizes=16,0,32 vectorize" \| \
// RUN: mlir-opt -test-linalg-codegen-strategy="anchor-op=linalg.matmul register-tile-sizes=12,32,16 vectorize" \| \		// RUN: mlir-opt -test-linalg-codegen-strategy="anchor-func=matmul_column_major_as_row_major anchor-op=linalg.matmul register-tile-sizes=12,32,16 vectorize" \| \
// RUN: mlir-opt -test-linalg-codegen-strategy="anchor-op=linalg.fill register-tile-sizes=4,16 vectorize" \| \		// RUN: mlir-opt -test-linalg-codegen-strategy="anchor-op=linalg.fill register-tile-sizes=4,16 vectorize" \| \

// TODO: linalg.copy vectorization in the presence of permutation map fails. Enable when addressed.		// TODO: linalg.copy vectorization in the presence of permutation map fails. Enable when addressed.
// R_UN: mlir-opt -test-linalg-codegen-strategy="anchor-op=linalg.copy register-tile-sizes=4,16 vectorize" \| \		// R_UN: mlir-opt -test-linalg-codegen-strategy="anchor-op=linalg.copy register-tile-sizes=4,16 vectorize" \| \

// RUN: mlir-opt -canonicalize -convert-vector-to-scf -lower-affine -convert-linalg-to-loops \| \		// RUN: mlir-opt -canonicalize -convert-vector-to-scf -lower-affine -convert-linalg-to-loops \| \
// RUN: mlir-opt -canonicalize -convert-scf-to-std -convert-vector-to-llvm -convert-std-to-llvm \| \		// RUN: mlir-opt -canonicalize -convert-scf-to-std -convert-vector-to-llvm -convert-std-to-llvm \| \
// RUN: mlir-cpu-runner -O3 -e main -entry-point-result=void \		// RUN: mlir-cpu-runner -O3 -e main -entry-point-result=void \
// Activate to dump assembly		// Activate to dump assembly
// R_UN: -dump-object-file -object-filename=/tmp/a.o \		// R_UN: -dump-object-file -object-filename=/tmp/a.o \
		// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_runner_utils%shlibext \
// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_c_runner_utils%shlibext \| \		// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_c_runner_utils%shlibext \| \
// Use tee to both print to stderr and FileCheck		// Use tee to both print to stderr and FileCheck
// RUN: tee -a /dev/stderr \| FileCheck %s		// RUN: tee -a /dev/stderr \| FileCheck %s

!elem_type_a = type f32		!elem_type_a = type f32
!elem_type_b = type f32		!elem_type_b = type f32
!elem_type_c = type f32		!elem_type_c = type f32
!row_major_A = type memref<${M}x${K}x!elem_type_a>		!row_major_A = type memref<${M}x${K}x!elem_type_a>
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	scf.for %arg0 = %c0 to %iters step %c1 {
call @matmul_column_major_as_row_major(%cA, %cB, %cC, %A, %B, %C) :		call @matmul_column_major_as_row_major(%cA, %cB, %cC, %A, %B, %C) :
(!column_major_A, !column_major_B, !column_major_C,		(!column_major_A, !column_major_B, !column_major_C,
!row_major_A, !row_major_B, !row_major_C) -> ()		!row_major_A, !row_major_B, !row_major_C) -> ()
}		}
%t_end_matmul_column_major_as_row_major = call @rtclock() : () -> f64		%t_end_matmul_column_major_as_row_major = call @rtclock() : () -> f64
%tmatmul_column_major_as_row_major = subf %t_end_matmul_column_major_as_row_major, %t_start_matmul_column_major_as_row_major: f64		%tmatmul_column_major_as_row_major = subf %t_end_matmul_column_major_as_row_major, %t_start_matmul_column_major_as_row_major: f64
call @print_perf(%iters, %tmatmul_column_major_as_row_major) : (index, f64) -> ()		call @print_perf(%iters, %tmatmul_column_major_as_row_major) : (index, f64) -> ()

%res = load %cC[%c0, %c0]: !column_major_C		// CHECK: {{^0$}}
// CHECK: 64		%cC_ref = alloc() : !column_major_C
vector.print %res: !elem_type_c		linalg.fill(%cC_ref, %f0) : !column_major_C, !elem_type_c
%res2 = load %C[%c0, %c0]: !row_major_C		linalg.matmul_column_major ins(%cA, %cB : !column_major_A, !column_major_B)
// CHECK: 64		outs(%cC_ref: !column_major_C)
vector.print %res2: !elem_type_c		%act1 = memref_cast %cC : !column_major_C to memref<*xf32>
		%exp1 = memref_cast %cC_ref : !column_major_C to memref<*xf32>
		%errors1 = call @verifyMemRefF32(%act1, %exp1) : (memref<xf32>, memref<xf32>) -> i64
		vector.print %errors1 : i64
		dealloc %cC_ref : !column_major_C

		// CHECK: {{^0$}}
		%C_ref = alloc() : !row_major_C
		linalg.fill(%C_ref, %f0) : !row_major_C, !elem_type_c
		linalg.matmul ins(%A, %B : !row_major_A, !row_major_B)
		outs(%C_ref: !row_major_C)
		%act2 = memref_cast %C : !row_major_C to memref<*xf32>
		%exp2 = memref_cast %C_ref : !row_major_C to memref<*xf32>
		%errors2 = call @verifyMemRefF32(%act2, %exp2) : (memref<xf32>, memref<xf32>) -> i64
		vector.print %errors2 : i64
		dealloc %C_ref : !row_major_C

dealloc %A : !row_major_A		dealloc %A : !row_major_A
dealloc %B : !row_major_B		dealloc %B : !row_major_B
dealloc %C : !row_major_C		dealloc %C : !row_major_C

dealloc %cA : !column_major_A		dealloc %cA : !column_major_A
dealloc %cB : !column_major_B		dealloc %cB : !column_major_B
dealloc %cC : !column_major_C		dealloc %cC : !column_major_C

return		return
}		}

func private @rtclock() -> f64		func private @rtclock() -> f64
		func private @verifyMemRefF32(memref<xf32>, memref<xf32>) -> i64 attributes { llvm.emit_c_interface }

// TODO: init with random, run and check output.		// TODO: init with random, run and check output.
// func private @fill_random_f32(memref<*xf32>)		// func private @fill_random_f32(memref<*xf32>)

mlir/integration_test/Dialect/Linalg/CPU/benchmark_matmul_i8_i8_i32.mlir

// RUN: export M=24 && export K=64 && export N=192 && export ITERS=10 && \		// RUN: export M=24 && export K=64 && export N=192 && export ITERS=10 && \
// RUN: cat %s \| sed 's@${M}@'"$M"'@g'\| sed 's@${K}@'"$K"'@g' \| sed 's@${N}@'"$N"'@g'\| sed 's@${ITERS}@'"$ITERS"'@g'\| \		// RUN: cat %s \| sed 's@${M}@'"$M"'@g'\| sed 's@${K}@'"$K"'@g' \| sed 's@${N}@'"$N"'@g'\| sed 's@${ITERS}@'"$ITERS"'@g'\| \
// RUN: mlir-opt -test-linalg-codegen-strategy="anchor-func=matmul anchor-op=linalg.matmul_i8_i8_i32 register-tile-sizes=12,32,16 vectorize" \| \		// RUN: mlir-opt -test-linalg-codegen-strategy="anchor-func=matmul anchor-op=linalg.matmul_i8_i8_i32 register-tile-sizes=12,32,16 vectorize" \| \
// RUN: mlir-opt -test-linalg-codegen-strategy="anchor-func=matmul anchor-op=linalg.fill register-tile-sizes=4,32 vectorize" \| \		// RUN: mlir-opt -test-linalg-codegen-strategy="anchor-func=matmul anchor-op=linalg.fill register-tile-sizes=4,32 vectorize" \| \
// RUN: mlir-opt -test-linalg-codegen-strategy="anchor-func=matmul anchor-op=linalg.copy register-tile-sizes=4,32 vectorize" \| \		// RUN: mlir-opt -test-linalg-codegen-strategy="anchor-func=matmul anchor-op=linalg.copy register-tile-sizes=4,32 vectorize" \| \
// RUN: mlir-opt -canonicalize -convert-vector-to-scf -lower-affine -convert-linalg-to-loops \| \		// RUN: mlir-opt -canonicalize -convert-vector-to-scf -lower-affine -convert-linalg-to-loops \| \

// RUN: mlir-opt -canonicalize -convert-scf-to-std -convert-vector-to-llvm -convert-std-to-llvm -mlir-disable-threading \| \		// RUN: mlir-opt -canonicalize -convert-scf-to-std -convert-vector-to-llvm -convert-std-to-llvm -mlir-disable-threading \| \
// RUN: mlir-cpu-runner -O3 -e main -entry-point-result=void \		// RUN: mlir-cpu-runner -O3 -e main -entry-point-result=void \
// Activate to dump assembly		// Activate to dump assembly
// R_UN: -dump-object-file -object-filename=/tmp/a.o \		// R_UN: -dump-object-file -object-filename=/tmp/a.o \
		// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_runner_utils%shlibext \
// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_c_runner_utils%shlibext \| \		// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_c_runner_utils%shlibext \| \
// Use tee to both print to stderr and FileCheck		// Use tee to both print to stderr and FileCheck
// RUN: tee -a /dev/stderr \| FileCheck %s		// RUN: tee -a /dev/stderr \| FileCheck %s


!elem_type_a = type i8		!elem_type_a = type i8
!elem_type_b = type i8		!elem_type_b = type i8
!elem_type_c = type i32		!elem_type_c = type i32
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	scf.for %arg0 = %c0 to %iters step %c1 {
// be easy.		// be easy.
linalg.fill(%C, %v0) : !row_major_C, !elem_type_c		linalg.fill(%C, %v0) : !row_major_C, !elem_type_c
call @matmul(%A, %B, %C) : (!row_major_A, !row_major_B, !row_major_C) -> ()		call @matmul(%A, %B, %C) : (!row_major_A, !row_major_B, !row_major_C) -> ()
}		}
%t_end_matmul = call @rtclock() : () -> f64		%t_end_matmul = call @rtclock() : () -> f64
%tmatmul = subf %t_end_matmul, %t_start_matmul: f64		%tmatmul = subf %t_end_matmul, %t_start_matmul: f64
call @print_perf(%iters, %tmatmul) : (index, f64) -> ()		call @print_perf(%iters, %tmatmul) : (index, f64) -> ()

%res = load %C[%c0, %c0]: !row_major_C		// CHECK: {{^0$}}
// CHECK: 64		%C_ref = alloc() : !row_major_C
vector.print %res: !elem_type_c		linalg.fill(%C_ref, %v0) : !row_major_C, !elem_type_c
		linalg.matmul_i8_i8_i32 ins(%A, %B : !row_major_A, !row_major_B)
		outs(%C_ref: !row_major_C)
		%res = memref_cast %C : !row_major_C to memref<*xi32>
		%exp = memref_cast %C_ref : !row_major_C to memref<*xi32>
		%errors = call @verifyMemRefI32(%res, %exp) : (memref<xi32>, memref<xi32>) -> i64
		vector.print %errors : i64
		dealloc %C_ref : !row_major_C

dealloc %A : !row_major_A		dealloc %A : !row_major_A
dealloc %B : !row_major_B		dealloc %B : !row_major_B
dealloc %C : !row_major_C		dealloc %C : !row_major_C

return		return
}		}

func private @rtclock() -> f64		func private @rtclock() -> f64
		func private @verifyMemRefI32(memref<xi32>, memref<xi32>) -> i64 attributes { llvm.emit_c_interface }

// TODO: init with random, run and check output.		// TODO: init with random, run and check output.
// func private @fill_random_f32(memref<*xf32>)		// func private @fill_random_f32(memref<*xf32>)

mlir/lib/ExecutionEngine/RunnerUtils.cpp

	Show First 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
	extern "C" void			extern "C" void
	_mlir_ciface_print_memref_3d_f32(StridedMemRefType<float, 3> *M) {			_mlir_ciface_print_memref_3d_f32(StridedMemRefType<float, 3> *M) {
	impl::printMemRef(*M);			impl::printMemRef(*M);
	}			}
	extern "C" void			extern "C" void
	_mlir_ciface_print_memref_4d_f32(StridedMemRefType<float, 4> *M) {			_mlir_ciface_print_memref_4d_f32(StridedMemRefType<float, 4> *M) {
	impl::printMemRef(*M);			impl::printMemRef(*M);
	}			}

				extern "C" int64_t
				_mlir_ciface_verifyMemRefI32(UnrankedMemRefType<int32_t> *actual,
				UnrankedMemRefType<int32_t> *expected) {
				return impl::verifyMemRef(actual, expected);
				}

				extern "C" int64_t
				_mlir_ciface_verifyMemRefF32(UnrankedMemRefType<float> *actual,
				UnrankedMemRefType<float> *expected) {
				return impl::verifyMemRef(actual, expected);
				}

				extern "C" int64_t
				_mlir_ciface_verifyMemRefF64(UnrankedMemRefType<double> *actual,
				UnrankedMemRefType<double> *expected) {
				return impl::verifyMemRef(actual, expected);
				}

				extern "C" int64_t verifyMemRefI32(int64_t rank, void *actualPtr,
				void *expectedPtr) {
				UnrankedMemRefType<int32_t> actualDesc = {rank, actualPtr};
				UnrankedMemRefType<int32_t> expectedDesc = {rank, expectedPtr};
				return _mlir_ciface_verifyMemRefI32(&actualDesc, &expectedDesc);
				}

				extern "C" int64_t verifyMemRefF32(int64_t rank, void *actualPtr,
				void *expectedPtr) {
				UnrankedMemRefType<float> actualDesc = {rank, actualPtr};
				UnrankedMemRefType<float> expectedDesc = {rank, expectedPtr};
				return _mlir_ciface_verifyMemRefF32(&actualDesc, &expectedDesc);
				}

				extern "C" int64_t verifyMemRefF64(int64_t rank, void *actualPtr,
				void *expectedPtr) {
				UnrankedMemRefType<double> actualDesc = {rank, actualPtr};
				UnrankedMemRefType<double> expectedDesc = {rank, expectedPtr};
				return _mlir_ciface_verifyMemRefF64(&actualDesc, &expectedDesc);
				}