Download Raw Diff

Details

Reviewers

bondhugula
herhut
csigg
ftynse
ThomasRaoux

Commits

rGe552fa28da28: [MLIR][GPU] Add CUDA Tensor core WMMA test

Summary

Add a test case to test the complete execution of WMMA ops on a Nvidia GPU with tensor cores.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

navdeepkk created this revision.Jan 24 2021, 11:27 PM

Herald added subscribers: teijeong, rdzhabarov, tatianashp and 15 others. · View Herald TranscriptJan 24 2021, 11:27 PM

navdeepkk requested review of this revision.Jan 24 2021, 11:27 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 24 2021, 11:27 PM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

navdeepkk added a parent revision: D95333: [MLIR][NVVM] Add test cases to check translation of matrix-multiply accumulate ops to the corresponding intrinsics in NVPTX backend.Jan 24 2021, 11:27 PM

navdeepkk added a reviewer: herhut.Jan 24 2021, 11:35 PM

Harbormaster completed remote builds in B86510: Diff 318909.Jan 25 2021, 12:37 AM

@csigg have we decided on moving such tests to integration_test/ ?

In D95334#2519637, @ftynse wrote:

@csigg have we decided on moving such tests to integration_test/ ?

Not that I'm aware of. I wouldn't hold back this revision, but let me know if you want me to move everything afterwards.

Looks great.

This revision is now accepted and ready to land.Jan 26 2021, 2:42 AM

bondhugula added inline comments.Jan 26 2021, 3:59 AM

mlir/test/mlir-cuda-runner/wmma-matmul.mlir
8–10 ↗	(On Diff #318909)	All alignment attributes have a typo.

Changes in this diff :-

1.) Modify the test case to use the !gpu.mmafragment type introduced in 
  revision D95330.

Harbormaster completed remote builds in B87844: Diff 321333.Feb 4 2021, 12:11 AM

bondhugula accepted this revision.Feb 4 2021, 12:41 AM

Changes in this diff :-

1.) Change type of ldm attribute in load/store fragment from i32 to index.

Harbormaster completed remote builds in B89561: Diff 324332.Feb 17 2021, 9:06 AM

Changes in this diff :-

1.) Make changes to operate with the newly intoduced gpu.mma_matrix type.

Herald added subscribers: dcaballe, cota. · View Herald TranscriptMay 2 2021, 1:44 PM

Harbormaster completed remote builds in B102201: Diff 342271.May 2 2021, 1:45 PM

Rebase on upstream/main.

Harbormaster completed remote builds in B105647: Diff 347047.May 21 2021, 8:50 AM

bondhugula accepted this revision.May 21 2021, 8:53 AM

I just realized that getting these test cases in will mean check-mlir will fail for all those without tensor cores on GPUs (if they are configuring with NVPTX)! Can we add an -DMLIR_ENABLE_CUDA_TENSOR_CORES and have these tests run under that? @ftynse @ThomasRaoux

This revision now requires changes to proceed.May 21 2021, 9:01 AM

bondhugula added reviewers: ftynse, ThomasRaoux.May 21 2021, 9:02 AM

In D95334#2773990, @bondhugula wrote:

I just realized that getting these test cases in will mean check-mlir will fail for all those without tensor cores on GPUs (if they are configuring with NVPTX)! Can we add an -DMLIR_ENABLE_CUDA_TENSOR_CORES and have these tests run under that? @ftynse @ThomasRaoux

Correct, for example we have some systems on our CI running CUDA execution tests that don't have tensor cores.

bondhugula added inline comments.May 21 2021, 10:57 PM

mlir/test/Integration/GPU/CUDA/wmma-matmul-f16.mlir
1–7 ↗	(On Diff #347047)	Please configure to run these tests only under a `-DMLIR_ENABLE_CUDA_TENSOR_CORES`.

Changes in this diff:-

1.) Add flag to enable/disable Tensor core WMMA tests.
2.) Rebase on upstream/main.

Herald added a subscriber: mgorny. · View Herald TranscriptMay 22 2021, 2:22 AM

Harbormaster completed remote builds in B105752: Diff 347183.May 22 2021, 2:22 AM

Changes in this diff :-

1.) Remove unnecessary alignment attribute.

Harbormaster completed remote builds in B105753: Diff 347185.May 22 2021, 2:53 AM

LGTM. Some minor comments.

mlir/test/CMakeLists.txt
35	Nit: Tensor core -> CUDA tensor core
mlir/test/Integration/GPU/CUDA/TensorCore/lit.local.cfg
4	enabled
5	...run_tensor_core... -> run_cuda_tensor_core... for better context?

This revision is now accepted and ready to land.May 22 2021, 2:56 AM

navdeepkk removed a parent revision: D95333: [MLIR][NVVM] Add test cases to check translation of matrix-multiply accumulate ops to the corresponding intrinsics in NVPTX backend.May 22 2021, 3:11 AM

Changes in this diff :-

1.) Address comments on previous diff.

Please update the commit summary. It's no longer accurate.

navdeepkk edited the summary of this revision. (Show Details)May 22 2021, 3:40 AM

navdeepkk retitled this revision from [MLIR][CUDA-RUNNER] Add WMMA Tensor core matmul test to [MLIR][GPU] Add WMMA Tensor core matmul test.

navdeepkk retitled this revision from [MLIR][GPU] Add WMMA Tensor core matmul test to [MLIR][GPU] Add CUDA Tensor core WMMA test.May 22 2021, 3:43 AM

Fix commit summary and title.

This revision was landed with ongoing or failed builds.May 22 2021, 3:50 AM

Closed by commit rGe552fa28da28: [MLIR][GPU] Add CUDA Tensor core WMMA test (authored by navdeepkk, committed by bondhugula). · Explain Why

This revision was automatically updated to reflect the committed changes.

bondhugula added a commit: rGe552fa28da28: [MLIR][GPU] Add CUDA Tensor core WMMA test.

Harbormaster completed remote builds in B105757: Diff 347193.May 22 2021, 4:23 AM

Diff 347195

mlir/test/CMakeLists.txt

	Show All 26 Lines
	set(MLIR_SPIRV_WRAPPER_LIBRARY_DIR ${CMAKE_LIBRARY_OUTPUT_DIRECTORY})			set(MLIR_SPIRV_WRAPPER_LIBRARY_DIR ${CMAKE_LIBRARY_OUTPUT_DIRECTORY})
	set(MLIR_VULKAN_WRAPPER_LIBRARY_DIR ${CMAKE_LIBRARY_OUTPUT_DIRECTORY})			set(MLIR_VULKAN_WRAPPER_LIBRARY_DIR ${CMAKE_LIBRARY_OUTPUT_DIRECTORY})

	if (MLIR_INCLUDE_INTEGRATION_TESTS)			if (MLIR_INCLUDE_INTEGRATION_TESTS)
	set(INTEL_SDE_EXECUTABLE "" CACHE STRING			set(INTEL_SDE_EXECUTABLE "" CACHE STRING
	"If set, arch-specific integration tests are run with Intel SDE.")			"If set, arch-specific integration tests are run with Intel SDE.")
	option(MLIR_RUN_AMX_TESTS "Run AMX tests.")			option(MLIR_RUN_AMX_TESTS "Run AMX tests.")
	option(MLIR_RUN_X86VECTOR_TESTS "Run X86Vector tests.")			option(MLIR_RUN_X86VECTOR_TESTS "Run X86Vector tests.")
				option(MLIR_RUN_CUDA_TENSOR_CORE_TESTS "Run CUDA Tensor core WMMA tests.")
				bondhugulaUnsubmitted Done Reply Inline Actions Nit: Tensor core -> CUDA tensor core bondhugula: Nit: Tensor core -> CUDA tensor core
	# Passed to lit.site.cfg.py.in to set up the path where to find the libraries.			# Passed to lit.site.cfg.py.in to set up the path where to find the libraries.
	set(MLIR_INTEGRATION_TEST_DIR ${CMAKE_LIBRARY_OUTPUT_DIRECTORY})			set(MLIR_INTEGRATION_TEST_DIR ${CMAKE_LIBRARY_OUTPUT_DIRECTORY})

	# Copy test data over.			# Copy test data over.
	file(COPY ${CMAKE_CURRENT_SOURCE_DIR}/Integration/data/test.mtx			file(COPY ${CMAKE_CURRENT_SOURCE_DIR}/Integration/data/test.mtx
	${CMAKE_CURRENT_SOURCE_DIR}/Integration/data/test.tns			${CMAKE_CURRENT_SOURCE_DIR}/Integration/data/test.tns
	${CMAKE_CURRENT_SOURCE_DIR}/Integration/data/wide.mtx			${CMAKE_CURRENT_SOURCE_DIR}/Integration/data/wide.mtx
	DESTINATION ${MLIR_INTEGRATION_TEST_DIR}/data/)			DESTINATION ${MLIR_INTEGRATION_TEST_DIR}/data/)
	▲ Show 20 Lines • Show All 90 Lines • Show Last 20 Lines

mlir/test/Integration/GPU/CUDA/TensorCore/lit.local.cfg

This file was added.

				import sys

				# TensorCore tests must be enabled via build flag.
				if config.mlir_run_cuda_tensor_core_tests != 'ON':
				bondhugulaUnsubmitted Done Reply Inline Actions enabled bondhugula: enabled
				config.unsupported = True
				bondhugulaUnsubmitted Done Reply Inline Actions ...run_tensor_core... -> run_cuda_tensor_core... for better context? bondhugula: ...run_tensor_core... -> run_cuda_tensor_core... for better context?

mlir/test/Integration/GPU/CUDA/TensorCore/wmma-matmul-f16.mlir

This file was added.

				// RUN: mlir-opt %s \
				// RUN: -gpu-kernel-outlining \
				// RUN: -pass-pipeline='gpu.module(strip-debuginfo,convert-gpu-to-nvvm{index-bitwidth=32},gpu-to-cubin{chip=sm_75})' \
				// RUN: --convert-scf-to-std -gpu-to-llvm \
				// RUN: \| mlir-cpu-runner \
				// RUN: --shared-libs=%linalg_test_lib_dir/libmlir_cuda_runtime%shlibext \
				// RUN: --shared-libs=%linalg_test_lib_dir/libmlir_runner_utils%shlibext \
				// RUN: --entry-point-result=void \
				// RUN: \| FileCheck %s
				// Test case to check the working of Tensor cores on Nvidia GPUs. The kernel has already
				// been outlined to prevent crashing due to introduction of an empty basic block by --gpu-
				// kernel-outling.
				module attributes {gpu.container_module} {
				func @main() {
				%0 = memref.alloc() : memref<16x16xf16>
				%22 = memref.alloc() : memref<16x16xf16>
				%1 = memref.alloc() : memref<16x16xf32>

				%f1 = constant 1.0e+00 : f16
				%f0 = constant 0.0e+00 : f16
				%c0 = constant 0 : index
				%c16 = constant 16 : index
				%c32 = constant 32 : index
				%c1 = constant 1 : index

				// Intialize the Input matrix with ones.
				scf.for %arg0 = %c0 to %c16 step %c1 {
				scf.for %arg1 = %c0 to %c16 step %c1 {
				memref.store %f1, %0[%arg0, %arg1] : memref<16x16xf16>
				}
				}
				// Intialize the accumulator matrix with zeros.
				scf.for %arg0 = %c0 to %c16 step %c1 {
				scf.for %arg1 = %c0 to %c16 step %c1 {
				memref.store %f0, %22[%arg0, %arg1] : memref<16x16xf16>
				}
				}

				%2 = memref.cast %0 : memref<16x16xf16> to memref<*xf16>
				%33 = memref.cast %22 : memref<16x16xf16> to memref<*xf16>
				%3 = memref.cast %1 : memref<16x16xf32> to memref<*xf32>
				gpu.host_register %2 : memref<*xf16>
				gpu.host_register %33 : memref<*xf16>

				gpu.launch_func @main_kernel::@main_kernel blocks in (%c1, %c1, %c1) threads in (%c32, %c1, %c1) args(%0 : memref<16x16xf16>, %22 : memref<16x16xf16>)

				// Convert the results from f16 to f32 for printing.
				scf.for %arg0 = %c0 to %c16 step %c1 {
				scf.for %arg1 = %c0 to %c16 step %c1 {
				%6 = memref.load %0[%arg0, %arg1] : memref<16x16xf16>
				%7 = fpext %6 : f16 to f32
				memref.store %7, %1[%arg0, %arg1] : memref<16x16xf32>
				}
				}

				// Print the memref after computation.
				call @print_memref_f32(%3) : (memref<*xf32>) -> ()
				// CHECK: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16]
				return
				}

				gpu.module @main_kernel {
				gpu.func @main_kernel(%arg0: memref<16x16xf16>, %arg22 : memref<16x16xf16>) kernel {
				%c0 = constant 0 : index

				%0 = gpu.subgroup_mma_load_matrix %arg0[%c0, %c0] {operand = "AOp", leadDimension = 16 : index} : memref<16x16xf16> -> !gpu.mma_matrix<16x16xf16, "AOp">
				%1 = gpu.subgroup_mma_load_matrix %arg0[%c0, %c0] {operand = "BOp", leadDimension = 16 : index} : memref<16x16xf16> -> !gpu.mma_matrix<16x16xf16, "BOp">
				%2 = gpu.subgroup_mma_load_matrix %arg22[%c0, %c0] {operand = "COp", leadDimension = 16 : index} : memref<16x16xf16> -> !gpu.mma_matrix<16x16xf16, "COp">

				%3 = gpu.subgroup_mma_compute %0, %1, %2 : !gpu.mma_matrix<16x16xf16, "AOp">, !gpu.mma_matrix<16x16xf16, "BOp">, !gpu.mma_matrix<16x16xf16, "COp"> -> !gpu.mma_matrix<16x16xf16, "DOp">

				gpu.subgroup_mma_store_matrix %3, %arg0[%c0, %c0] {leadDimension = 16 : index}: !gpu.mma_matrix<16x16xf16, "DOp">, memref<16x16xf16>

				gpu.return
				}
				}

				func private @print_memref_f32(memref<*xf32>)
				}

mlir/test/Integration/GPU/CUDA/TensorCore/wmma-matmul-f32.mlir

This file was added.

				// RUN: mlir-opt %s \
				// RUN: -gpu-kernel-outlining \
				// RUN: -pass-pipeline='gpu.module(strip-debuginfo,convert-gpu-to-nvvm{index-bitwidth=32},gpu-to-cubin{chip=sm_75})' \
				// RUN: --convert-scf-to-std -gpu-to-llvm \
				// RUN: \| mlir-cpu-runner \
				// RUN: --shared-libs=%linalg_test_lib_dir/libmlir_cuda_runtime%shlibext \
				// RUN: --shared-libs=%linalg_test_lib_dir/libmlir_runner_utils%shlibext \
				// RUN: --entry-point-result=void \
				// RUN: \| FileCheck %s
				// Test case to check the working of Tensor cores on Nvidia GPUs. The kernel has already
				// been outlined to prevent crashing due to introduction of an empty basic block by --gpu-
				// kernel-outling.
				module attributes {gpu.container_module} {
				func @main() {
				%0 = memref.alloc() : memref<16x16xf16>
				%22 = memref.alloc() : memref<16x16xf32>
				%1 = memref.alloc() : memref<16x16xf32>

				%f1 = constant 1.0e+00 : f16
				%f0 = constant 0.0e+00 : f32
				%c0 = constant 0 : index
				%c16 = constant 16 : index
				%c32 = constant 32 : index
				%c1 = constant 1 : index

				// Intialize the Input matrix with ones.
				scf.for %arg0 = %c0 to %c16 step %c1 {
				scf.for %arg1 = %c0 to %c16 step %c1 {
				memref.store %f1, %0[%arg0, %arg1] : memref<16x16xf16>
				}
				}
				// Intialize the accumulator matrix with zeros.
				scf.for %arg0 = %c0 to %c16 step %c1 {
				scf.for %arg1 = %c0 to %c16 step %c1 {
				memref.store %f0, %22[%arg0, %arg1] : memref<16x16xf32>
				}
				}

				%2 = memref.cast %0 : memref<16x16xf16> to memref<*xf16>
				%33 = memref.cast %22 : memref<16x16xf32> to memref<*xf32>
				%3 = memref.cast %1 : memref<16x16xf32> to memref<*xf32>
				gpu.host_register %2 : memref<*xf16>
				gpu.host_register %33 : memref<*xf32>

				gpu.launch_func @main_kernel::@main_kernel blocks in (%c1, %c1, %c1) threads in (%c32, %c1, %c1) args(%0 : memref<16x16xf16>, %22 : memref<16x16xf32>)

				// Print the memref after computation.
				call @print_memref_f32(%33) : (memref<*xf32>) -> ()
				// CHECK: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16],
				// CHECK-NEXT: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16]
				return
				}

				gpu.module @main_kernel {
				gpu.func @main_kernel(%arg0: memref<16x16xf16>, %arg22 : memref<16x16xf32>) kernel {
				%c0 = constant 0 : index

				%0 = gpu.subgroup_mma_load_matrix %arg0[%c0, %c0] {operand = "AOp", leadDimension = 16 : index} : memref<16x16xf16> -> !gpu.mma_matrix<16x16xf16, "AOp">
				%1 = gpu.subgroup_mma_load_matrix %arg0[%c0, %c0] {operand = "BOp", leadDimension = 16 : index} : memref<16x16xf16> -> !gpu.mma_matrix<16x16xf16, "BOp">
				%2 = gpu.subgroup_mma_load_matrix %arg22[%c0, %c0] {operand = "COp", leadDimension = 16 : index} : memref<16x16xf32> -> !gpu.mma_matrix<16x16xf32, "COp">

				%3 = gpu.subgroup_mma_compute %0, %1, %2 : !gpu.mma_matrix<16x16xf16, "AOp">, !gpu.mma_matrix<16x16xf16, "BOp">, !gpu.mma_matrix<16x16xf32, "COp"> -> !gpu.mma_matrix<16x16xf32, "DOp">

				gpu.subgroup_mma_store_matrix %3, %arg22[%c0, %c0] {leadDimension = 16 : index}: !gpu.mma_matrix<16x16xf32, "DOp">, memref<16x16xf32>

				gpu.return
				}
				}

				func private @print_memref_f32(memref<*xf32>)
				}

mlir/test/lit.site.cfg.py.in

	Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	config.enable_spirv_cpu_runner = @MLIR_SPIRV_CPU_RUNNER_ENABLED@			config.enable_spirv_cpu_runner = @MLIR_SPIRV_CPU_RUNNER_ENABLED@
	config.vulkan_wrapper_library_dir = "@MLIR_VULKAN_WRAPPER_LIBRARY_DIR@"			config.vulkan_wrapper_library_dir = "@MLIR_VULKAN_WRAPPER_LIBRARY_DIR@"
	config.enable_vulkan_runner = @MLIR_VULKAN_RUNNER_ENABLED@			config.enable_vulkan_runner = @MLIR_VULKAN_RUNNER_ENABLED@
	config.enable_bindings_python = @MLIR_BINDINGS_PYTHON_ENABLED@			config.enable_bindings_python = @MLIR_BINDINGS_PYTHON_ENABLED@
	config.mlir_integration_test_dir = "@MLIR_INTEGRATION_TEST_DIR@"			config.mlir_integration_test_dir = "@MLIR_INTEGRATION_TEST_DIR@"
	config.intel_sde_executable = "@INTEL_SDE_EXECUTABLE@"			config.intel_sde_executable = "@INTEL_SDE_EXECUTABLE@"
	config.mlir_run_amx_tests = "@MLIR_RUN_AMX_TESTS@"			config.mlir_run_amx_tests = "@MLIR_RUN_AMX_TESTS@"
	config.mlir_run_x86vector_tests = "@MLIR_RUN_X86VECTOR_TESTS@"			config.mlir_run_x86vector_tests = "@MLIR_RUN_X86VECTOR_TESTS@"
				config.mlir_run_cuda_tensor_core_tests = "@MLIR_RUN_CUDA_TENSOR_CORE_TESTS@"
	config.mlir_include_integration_tests = "@MLIR_INCLUDE_INTEGRATION_TESTS@"			config.mlir_include_integration_tests = "@MLIR_INCLUDE_INTEGRATION_TESTS@"

	# Support substitution of the tools_dir with user parameters. This is			# Support substitution of the tools_dir with user parameters. This is
	# used when we can't determine the tool dir at configuration time.			# used when we can't determine the tool dir at configuration time.
	try:			try:
	config.llvm_tools_dir = config.llvm_tools_dir % lit_config.params			config.llvm_tools_dir = config.llvm_tools_dir % lit_config.params
	config.llvm_lib_dir = config.llvm_lib_dir % lit_config.params			config.llvm_lib_dir = config.llvm_lib_dir % lit_config.params
	config.llvm_shlib_dir = config.llvm_shlib_dir % lit_config.params			config.llvm_shlib_dir = config.llvm_shlib_dir % lit_config.params
	Show All 10 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR][GPU] Add CUDA Tensor core WMMA test
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 347195

mlir/test/CMakeLists.txt

mlir/test/Integration/GPU/CUDA/TensorCore/lit.local.cfg

mlir/test/Integration/GPU/CUDA/TensorCore/wmma-matmul-f16.mlir

mlir/test/Integration/GPU/CUDA/TensorCore/wmma-matmul-f32.mlir

mlir/test/lit.site.cfg.py.in

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR][GPU] Add CUDA Tensor core WMMA testClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 347195

mlir/test/CMakeLists.txt

mlir/test/Integration/GPU/CUDA/TensorCore/lit.local.cfg

mlir/test/Integration/GPU/CUDA/TensorCore/wmma-matmul-f16.mlir

mlir/test/Integration/GPU/CUDA/TensorCore/wmma-matmul-f32.mlir

mlir/test/lit.site.cfg.py.in

[MLIR][GPU] Add CUDA Tensor core WMMA test
ClosedPublic