This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Link mlir_runner_utils statically into cuda/rocm-runtime-wrappers.
ClosedPublic

Authored by csigg on Jan 11 2021, 4:04 AM.

Download Raw Diff

Details

Reviewers

herhut
aartbik

Commits

rGcf50f4f76456: [mlir] Link mlir_runner_utils statically into cuda/rocm-runtime-wrappers.

Summary

The runtime-wrappers depend on LLVMSupport, pulling in static initialization code (e.g. command line arguments). Dynamically loading multiple such libraries results in ODR violoations.

So far this has not been an issue, but in D94421, I would like to load both the async-runtime and the cuda-runtime-wrappers as part of a cuda-runner integration test. When doing this, code that asserts that an option category is only registered once fails (note that I've only experienced this in Google's bazel where the async-runtime depends on LLVMSupport, but a similar issue would happen in cmake if more than one runtime-wrapper starts to depend on LLVMSupport).

The underlying issue is that we have a mix of static and dynamic linking. If all dependencies were loaded as shared objects (i.e. if LLVMSupport was linked dynamically to the runtime wrappers), each dependency would only get loaded once. However, linking dependencies dynamically would require special attention to paths (one could dynamically load the dependencies first given explicit paths). The simpler approach seems to be to link all dependencies statically into a single shared object.

This change basically applies the same logic that we have in the c_runner_utils: we have a shared object target that can be loaded dynamically, and we have a static library target that can be linked to other runtime-wrapper shared object targets.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

csigg created this revision.Jan 11 2021, 4:04 AM

Herald added a reviewer: aartbik. · View Herald TranscriptJan 11 2021, 4:04 AM

Herald added subscribers: teijeong, rdzhabarov, tatianashp and 17 others. · View Herald Transcript

csigg requested review of this revision.Jan 11 2021, 4:04 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 11 2021, 4:04 AM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Harbormaster completed remote builds in B84657: Diff 315749.Jan 11 2021, 4:30 AM

csigg retitled this revision from Link mlir_runner_utils statically into cuda/rocm-runtime-wrappers. to [mlir] Link mlir_runner_utils statically into cuda/rocm-runtime-wrappers..Jan 11 2021, 4:54 AM

csigg added a child revision: D94421: [mlir] Add gpu async integration test..Jan 11 2021, 8:18 AM

Can you add a description of the motivation to the commit message?

csigg edited the summary of this revision. (Show Details)Jan 11 2021, 12:24 PM

Thanks for adding the description: I still don't quite get how making these static is preventing an ODR issues?

csigg edited the summary of this revision. (Show Details)Jan 12 2021, 1:48 AM

csigg mentioned this in D94421: [mlir] Add gpu async integration test..Jan 12 2021, 10:18 AM

If the runners depends on libSupport and we link them statically into the runtime wrapper that are dynamically loaded, aren't we having two copies of libSupport? One in the statically linked runner executable and one in the dynamically loaded shared objects?

I don't really understand how this works, but my guess is that the JITDylib loaded into the ExecutionEngine's module do not share any symbols with the main process.

I asked this in the other CL too, but I was just wondering if these tests are better candidates for mlir/integration_test rather than mlir/test?

This still seems a bit like a hack to me but it is better than twiddling with runtime paths.

This revision is now accepted and ready to land.Jan 19 2021, 1:24 AM

Closed by commit rGcf50f4f76456: [mlir] Link mlir_runner_utils statically into cuda/rocm-runtime-wrappers. (authored by csigg). · Explain WhyJan 20 2021, 3:10 AM

This revision was automatically updated to reflect the committed changes.

csigg added a commit: rGcf50f4f76456: [mlir] Link mlir_runner_utils statically into cuda/rocm-runtime-wrappers..

Revision Contents

Path

Size

mlir/

include/

mlir/

ExecutionEngine/

CRunnerUtils.h

2 lines

lib/

ExecutionEngine/

CMakeLists.txt

21 lines

test/

mlir-cuda-runner/

5 lines

5 lines

5 lines

5 lines

5 lines

all-reduce-region.mlir

5 lines

all-reduce-xor.mlir

5 lines

gpu-to-cubin.mlir

5 lines

multiple-all-reduce.mlir

5 lines

shuffle.mlir

5 lines

two-modules.mlir

5 lines

mlir-rocm-runner/

gpu-to-hsaco.mlir

5 lines

two-modules.mlir

5 lines

vecadd.mlir

5 lines

vector-transferops.mlir

5 lines

tools/

mlir-cuda-runner/

CMakeLists.txt

1 line

mlir-rocm-runner/

CMakeLists.txt

1 line

Diff 317823

mlir/include/mlir/ExecutionEngine/CRunnerUtils.h

	Show All 20 Lines
	// We are building this library			// We are building this library
	#define MLIR_CRUNNERUTILS_EXPORT __declspec(dllexport)			#define MLIR_CRUNNERUTILS_EXPORT __declspec(dllexport)
	#define MLIR_CRUNNERUTILS_DEFINE_FUNCTIONS			#define MLIR_CRUNNERUTILS_DEFINE_FUNCTIONS
	#else			#else
	// We are using this library			// We are using this library
	#define MLIR_CRUNNERUTILS_EXPORT __declspec(dllimport)			#define MLIR_CRUNNERUTILS_EXPORT __declspec(dllimport)
	#endif // mlir_c_runner_utils_EXPORTS			#endif // mlir_c_runner_utils_EXPORTS
	#endif // MLIR_CRUNNERUTILS_EXPORT			#endif // MLIR_CRUNNERUTILS_EXPORT
	#else			#else // _WIN32
	#define MLIR_CRUNNERUTILS_EXPORT			#define MLIR_CRUNNERUTILS_EXPORT
	#define MLIR_CRUNNERUTILS_DEFINE_FUNCTIONS			#define MLIR_CRUNNERUTILS_DEFINE_FUNCTIONS
	#endif // _WIN32			#endif // _WIN32

	#include <cstdint>			#include <cstdint>

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Codegen-compatible structures for Vector type.			// Codegen-compatible structures for Vector type.
	▲ Show 20 Lines • Show All 185 Lines • Show Last 20 Lines

mlir/lib/ExecutionEngine/CMakeLists.txt

	Show First 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
	add_mlir_library(mlir_c_runner_utils			add_mlir_library(mlir_c_runner_utils
	SHARED			SHARED
	CRunnerUtils.cpp			CRunnerUtils.cpp
	SparseUtils.cpp			SparseUtils.cpp

	EXCLUDE_FROM_LIBMLIR			EXCLUDE_FROM_LIBMLIR
	)			)
	set_property(TARGET mlir_c_runner_utils PROPERTY CXX_STANDARD 11)			set_property(TARGET mlir_c_runner_utils PROPERTY CXX_STANDARD 11)
				target_compile_definitions(mlir_c_runner_utils PRIVATE mlir_c_runner_utils_EXPORTS)

	add_mlir_library(mlir_c_runner_utils_static			add_mlir_library(mlir_c_runner_utils_static
	CRunnerUtils.cpp			CRunnerUtils.cpp
	SparseUtils.cpp			SparseUtils.cpp

	EXCLUDE_FROM_LIBMLIR			EXCLUDE_FROM_LIBMLIR
	)			)
	set_property(TARGET mlir_c_runner_utils_static PROPERTY CXX_STANDARD 11)			set_property(TARGET mlir_c_runner_utils_static PROPERTY CXX_STANDARD 11)
	target_compile_definitions(mlir_c_runner_utils PRIVATE mlir_c_runner_utils_EXPORTS)

	add_mlir_library(mlir_runner_utils			add_mlir_library(mlir_runner_utils
	SHARED			SHARED
	RunnerUtils.cpp			RunnerUtils.cpp

	EXCLUDE_FROM_LIBMLIR			EXCLUDE_FROM_LIBMLIR

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
	mlir_c_runner_utils_static			mlir_c_runner_utils_static
	)			)
	target_compile_definitions(mlir_runner_utils PRIVATE mlir_runner_utils_EXPORTS)			target_compile_definitions(mlir_runner_utils PRIVATE mlir_runner_utils_EXPORTS)

				add_mlir_library(mlir_runner_utils_static
				RunnerUtils.cpp

				EXCLUDE_FROM_LIBMLIR

				LINK_LIBS PUBLIC
				mlir_c_runner_utils_static
				)

	add_mlir_library(mlir_async_runtime			add_mlir_library(mlir_async_runtime
	SHARED			SHARED
	AsyncRuntime.cpp			AsyncRuntime.cpp

	EXCLUDE_FROM_LIBMLIR			EXCLUDE_FROM_LIBMLIR

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
	mlir_c_runner_utils_static			mlir_c_runner_utils_static
	${LLVM_PTHREAD_LIB}			${LLVM_PTHREAD_LIB}
	)			)
	set_property(TARGET mlir_async_runtime PROPERTY CXX_VISIBILITY_PRESET hidden)			set_property(TARGET mlir_async_runtime PROPERTY CXX_VISIBILITY_PRESET hidden)
	target_compile_definitions(mlir_async_runtime PRIVATE mlir_async_runtime_EXPORTS)			target_compile_definitions(mlir_async_runtime PRIVATE mlir_async_runtime_EXPORTS)

				add_mlir_library(mlir_async_runtime_static
				AsyncRuntime.cpp

				EXCLUDE_FROM_LIBMLIR

				LINK_LIBS PUBLIC
				mlir_c_runner_utils_static
				${LLVM_PTHREAD_LIB}
				)

mlir/test/mlir-cuda-runner/all-reduce-and.mlir

	// RUN: mlir-cuda-runner %s --shared-libs=%cuda_wrapper_library_dir/libcuda-runtime-wrappers%shlibext,%linalg_test_lib_dir/libmlir_runner_utils%shlibext --entry-point-result=void \| FileCheck %s			// RUN: mlir-cuda-runner %s \
				// RUN: --shared-libs=%cuda_wrapper_library_dir/libcuda-runtime-wrappers%shlibext
				// RUN: --entry-point-result=void \
				// RUN: \| FileCheck %s

	func @main() {			func @main() {
	%data = alloc() : memref<2x6xi32>			%data = alloc() : memref<2x6xi32>
	%sum = alloc() : memref<2xi32>			%sum = alloc() : memref<2xi32>
	%cst0 = constant 0 : i32			%cst0 = constant 0 : i32
	%cst1 = constant 1 : i32			%cst1 = constant 1 : i32
	%cst2 = constant 2 : i32			%cst2 = constant 2 : i32
	%cst4 = constant 4 : i32			%cst4 = constant 4 : i32
	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

mlir/test/mlir-cuda-runner/all-reduce-max.mlir

	// RUN: mlir-cuda-runner %s --shared-libs=%cuda_wrapper_library_dir/libcuda-runtime-wrappers%shlibext,%linalg_test_lib_dir/libmlir_runner_utils%shlibext --entry-point-result=void \| FileCheck %s			// RUN: mlir-cuda-runner %s \
				// RUN: --shared-libs=%cuda_wrapper_library_dir/libcuda-runtime-wrappers%shlibext
				// RUN: --entry-point-result=void \
				// RUN: \| FileCheck %s

	func @main() {			func @main() {
	%data = alloc() : memref<2x6xi32>			%data = alloc() : memref<2x6xi32>
	%sum = alloc() : memref<2xi32>			%sum = alloc() : memref<2xi32>
	%cst0 = constant 0 : i32			%cst0 = constant 0 : i32
	%cst1 = constant 1 : i32			%cst1 = constant 1 : i32
	%cst2 = constant 2 : i32			%cst2 = constant 2 : i32
	%cst4 = constant 4 : i32			%cst4 = constant 4 : i32
	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

mlir/test/mlir-cuda-runner/all-reduce-min.mlir

	// RUN: mlir-cuda-runner %s --shared-libs=%cuda_wrapper_library_dir/libcuda-runtime-wrappers%shlibext,%linalg_test_lib_dir/libmlir_runner_utils%shlibext --entry-point-result=void \| FileCheck %s			// RUN: mlir-cuda-runner %s \
				// RUN: --shared-libs=%cuda_wrapper_library_dir/libcuda-runtime-wrappers%shlibext
				// RUN: --entry-point-result=void \
				// RUN: \| FileCheck %s

	func @main() {			func @main() {
	%data = alloc() : memref<2x6xi32>			%data = alloc() : memref<2x6xi32>
	%sum = alloc() : memref<2xi32>			%sum = alloc() : memref<2xi32>
	%cst0 = constant 0 : i32			%cst0 = constant 0 : i32
	%cst1 = constant 1 : i32			%cst1 = constant 1 : i32
	%cst2 = constant 2 : i32			%cst2 = constant 2 : i32
	%cst4 = constant 4 : i32			%cst4 = constant 4 : i32
	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

mlir/test/mlir-cuda-runner/all-reduce-op.mlir

	// RUN: mlir-cuda-runner %s --shared-libs=%cuda_wrapper_library_dir/libcuda-runtime-wrappers%shlibext,%linalg_test_lib_dir/libmlir_runner_utils%shlibext --entry-point-result=void \| FileCheck %s			// RUN: mlir-cuda-runner %s \
				// RUN: --shared-libs=%cuda_wrapper_library_dir/libcuda-runtime-wrappers%shlibext
				// RUN: --entry-point-result=void \
				// RUN: \| FileCheck %s

	// CHECK-COUNT-8: [{{(5356, ){12}5356}}]			// CHECK-COUNT-8: [{{(5356, ){12}5356}}]
	func @main() {			func @main() {
	%arg = alloc() : memref<2x4x13xf32>			%arg = alloc() : memref<2x4x13xf32>
	%dst = memref_cast %arg : memref<2x4x13xf32> to memref<?x?x?xf32>			%dst = memref_cast %arg : memref<2x4x13xf32> to memref<?x?x?xf32>
	%c0 = constant 0 : index			%c0 = constant 0 : index
	%c1 = constant 1 : index			%c1 = constant 1 : index
	%c2 = constant 2 : index			%c2 = constant 2 : index
	Show All 22 Lines

mlir/test/mlir-cuda-runner/all-reduce-or.mlir

	// RUN: mlir-cuda-runner %s --shared-libs=%cuda_wrapper_library_dir/libcuda-runtime-wrappers%shlibext,%linalg_test_lib_dir/libmlir_runner_utils%shlibext --entry-point-result=void \| FileCheck %s			// RUN: mlir-cuda-runner %s \
				// RUN: --shared-libs=%cuda_wrapper_library_dir/libcuda-runtime-wrappers%shlibext
				// RUN: --entry-point-result=void \
				// RUN: \| FileCheck %s

	func @main() {			func @main() {
	%data = alloc() : memref<2x6xi32>			%data = alloc() : memref<2x6xi32>
	%sum = alloc() : memref<2xi32>			%sum = alloc() : memref<2xi32>
	%cst0 = constant 0 : i32			%cst0 = constant 0 : i32
	%cst1 = constant 1 : i32			%cst1 = constant 1 : i32
	%cst2 = constant 2 : i32			%cst2 = constant 2 : i32
	%cst4 = constant 4 : i32			%cst4 = constant 4 : i32
	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

mlir/test/mlir-cuda-runner/all-reduce-region.mlir

	// RUN: mlir-cuda-runner %s --shared-libs=%cuda_wrapper_library_dir/libcuda-runtime-wrappers%shlibext,%linalg_test_lib_dir/libmlir_runner_utils%shlibext --entry-point-result=void \| FileCheck %s			// RUN: mlir-cuda-runner %s \
				// RUN: --shared-libs=%cuda_wrapper_library_dir/libcuda-runtime-wrappers%shlibext
				// RUN: --entry-point-result=void \
				// RUN: \| FileCheck %s

	// CHECK: [{{(35, ){34}35}}]			// CHECK: [{{(35, ){34}35}}]
	func @main() {			func @main() {
	%arg = alloc() : memref<35xf32>			%arg = alloc() : memref<35xf32>
	%dst = memref_cast %arg : memref<35xf32> to memref<?xf32>			%dst = memref_cast %arg : memref<35xf32> to memref<?xf32>
	%one = constant 1 : index			%one = constant 1 : index
	%c0 = constant 0 : index			%c0 = constant 0 : index
	%sx = dim %dst, %c0 : memref<?xf32>			%sx = dim %dst, %c0 : memref<?xf32>
	Show All 19 Lines

mlir/test/mlir-cuda-runner/all-reduce-xor.mlir

	// RUN: mlir-cuda-runner %s --shared-libs=%cuda_wrapper_library_dir/libcuda-runtime-wrappers%shlibext,%linalg_test_lib_dir/libmlir_runner_utils%shlibext --entry-point-result=void \| FileCheck %s			// RUN: mlir-cuda-runner %s \
				// RUN: --shared-libs=%cuda_wrapper_library_dir/libcuda-runtime-wrappers%shlibext
				// RUN: --entry-point-result=void \
				// RUN: \| FileCheck %s

	func @main() {			func @main() {
	%data = alloc() : memref<2x6xi32>			%data = alloc() : memref<2x6xi32>
	%sum = alloc() : memref<2xi32>			%sum = alloc() : memref<2xi32>
	%cst0 = constant 0 : i32			%cst0 = constant 0 : i32
	%cst1 = constant 1 : i32			%cst1 = constant 1 : i32
	%cst2 = constant 2 : i32			%cst2 = constant 2 : i32
	%cst4 = constant 4 : i32			%cst4 = constant 4 : i32
	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

mlir/test/mlir-cuda-runner/gpu-to-cubin.mlir

	// RUN: mlir-cuda-runner %s --shared-libs=%cuda_wrapper_library_dir/libcuda-runtime-wrappers%shlibext,%linalg_test_lib_dir/libmlir_runner_utils%shlibext --entry-point-result=void \| FileCheck %s			// RUN: mlir-cuda-runner %s \
				// RUN: --shared-libs=%cuda_wrapper_library_dir/libcuda-runtime-wrappers%shlibext
				// RUN: --entry-point-result=void \
				// RUN: \| FileCheck %s

	func @other_func(%arg0 : f32, %arg1 : memref<?xf32>) {			func @other_func(%arg0 : f32, %arg1 : memref<?xf32>) {
	%cst = constant 1 : index			%cst = constant 1 : index
	%c0 = constant 0 : index			%c0 = constant 0 : index
	%cst2 = dim %arg1, %c0 : memref<?xf32>			%cst2 = dim %arg1, %c0 : memref<?xf32>
	gpu.launch blocks(%bx, %by, %bz) in (%grid_x = %cst, %grid_y = %cst, %grid_z = %cst)			gpu.launch blocks(%bx, %by, %bz) in (%grid_x = %cst, %grid_y = %cst, %grid_z = %cst)
	threads(%tx, %ty, %tz) in (%block_x = %cst2, %block_y = %cst, %block_z = %cst) {			threads(%tx, %ty, %tz) in (%block_x = %cst2, %block_y = %cst, %block_z = %cst) {
	store %arg0, %arg1[%tx] : memref<?xf32>			store %arg0, %arg1[%tx] : memref<?xf32>
	Show All 20 Lines

mlir/test/mlir-cuda-runner/multiple-all-reduce.mlir

	// RUN: mlir-cuda-runner %s --shared-libs=%cuda_wrapper_library_dir/libcuda-runtime-wrappers%shlibext,%linalg_test_lib_dir/libmlir_runner_utils%shlibext --entry-point-result=void \| FileCheck %s			// RUN: mlir-cuda-runner %s \
				// RUN: --shared-libs=%cuda_wrapper_library_dir/libcuda-runtime-wrappers%shlibext
				// RUN: --entry-point-result=void \
				// RUN: \| FileCheck %s

	func @main() {			func @main() {
	%data = alloc() : memref<2x6xf32>			%data = alloc() : memref<2x6xf32>
	%sum = alloc() : memref<2xf32>			%sum = alloc() : memref<2xf32>
	%mul = alloc() : memref<2xf32>			%mul = alloc() : memref<2xf32>
	%cst0 = constant 0.0 : f32			%cst0 = constant 0.0 : f32
	%cst1 = constant 1.0 : f32			%cst1 = constant 1.0 : f32
	%cst2 = constant 2.0 : f32			%cst2 = constant 2.0 : f32
	▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

mlir/test/mlir-cuda-runner/shuffle.mlir

	// RUN: mlir-cuda-runner %s --shared-libs=%cuda_wrapper_library_dir/libcuda-runtime-wrappers%shlibext,%linalg_test_lib_dir/libmlir_runner_utils%shlibext --entry-point-result=void \| FileCheck %s			// RUN: mlir-cuda-runner %s \
				// RUN: --shared-libs=%cuda_wrapper_library_dir/libcuda-runtime-wrappers%shlibext
				// RUN: --entry-point-result=void \
				// RUN: \| FileCheck %s

	// CHECK: [4, 5, 6, 7, 0, 1, 2, 3, 12, -1, -1, -1, 8]			// CHECK: [4, 5, 6, 7, 0, 1, 2, 3, 12, -1, -1, -1, 8]
	func @main() {			func @main() {
	%arg = alloc() : memref<13xf32>			%arg = alloc() : memref<13xf32>
	%dst = memref_cast %arg : memref<13xf32> to memref<?xf32>			%dst = memref_cast %arg : memref<13xf32> to memref<?xf32>
	%one = constant 1 : index			%one = constant 1 : index
	%c0 = constant 0 : index			%c0 = constant 0 : index
	%sx = dim %dst, %c0 : memref<?xf32>			%sx = dim %dst, %c0 : memref<?xf32>
	Show All 22 Lines

mlir/test/mlir-cuda-runner/two-modules.mlir

	// RUN: mlir-cuda-runner %s --shared-libs=%cuda_wrapper_library_dir/libcuda-runtime-wrappers%shlibext,%linalg_test_lib_dir/libmlir_runner_utils%shlibext --entry-point-result=void \| FileCheck %s			// RUN: mlir-cuda-runner %s \
				// RUN: --shared-libs=%cuda_wrapper_library_dir/libcuda-runtime-wrappers%shlibext
				// RUN: --entry-point-result=void \
				// RUN: \| FileCheck %s

	// CHECK: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]			// CHECK: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
	func @main() {			func @main() {
	%arg = alloc() : memref<13xi32>			%arg = alloc() : memref<13xi32>
	%dst = memref_cast %arg : memref<13xi32> to memref<?xi32>			%dst = memref_cast %arg : memref<13xi32> to memref<?xi32>
	%one = constant 1 : index			%one = constant 1 : index
	%c0 = constant 0 : index			%c0 = constant 0 : index
	%sx = dim %dst, %c0 : memref<?xi32>			%sx = dim %dst, %c0 : memref<?xi32>
	Show All 19 Lines

mlir/test/mlir-rocm-runner/gpu-to-hsaco.mlir

	// RUN: mlir-rocm-runner %s --shared-libs=%rocm_wrapper_library_dir/librocm-runtime-wrappers%shlibext,%linalg_test_lib_dir/libmlir_runner_utils%shlibext --entry-point-result=void \| FileCheck %s			// RUN: mlir-rocm-runner %s \
				// RUN: --shared-libs=%rocm_wrapper_library_dir/librocm-runtime-wrappers%shlibext
				// RUN: --entry-point-result=void \
				// RUN: \| FileCheck %s

	func @other_func(%arg0 : f32, %arg1 : memref<?xf32>) {			func @other_func(%arg0 : f32, %arg1 : memref<?xf32>) {
	%c0 = constant 0 : index			%c0 = constant 0 : index
	%c1 = constant 1 : index			%c1 = constant 1 : index
	%block_dim = dim %arg1, %c0 : memref<?xf32>			%block_dim = dim %arg1, %c0 : memref<?xf32>
	gpu.launch blocks(%bx, %by, %bz) in (%grid_x = %c1, %grid_y = %c1, %grid_z = %c1)			gpu.launch blocks(%bx, %by, %bz) in (%grid_x = %c1, %grid_y = %c1, %grid_z = %c1)
	threads(%tx, %ty, %tz) in (%block_x = %block_dim, %block_y = %c1, %block_z = %c1) {			threads(%tx, %ty, %tz) in (%block_x = %block_dim, %block_y = %c1, %block_z = %c1) {
	store %arg0, %arg1[%tx] : memref<?xf32>			store %arg0, %arg1[%tx] : memref<?xf32>
	Show All 23 Lines

mlir/test/mlir-rocm-runner/two-modules.mlir

	// RUN: mlir-rocm-runner %s --shared-libs=%rocm_wrapper_library_dir/librocm-runtime-wrappers%shlibext,%linalg_test_lib_dir/libmlir_runner_utils%shlibext --entry-point-result=void \| FileCheck %s			// RUN: mlir-rocm-runner %s \
				// RUN: --shared-libs=%rocm_wrapper_library_dir/librocm-runtime-wrappers%shlibext
				// RUN: --entry-point-result=void \
				// RUN: \| FileCheck %s

	// CHECK: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]			// CHECK: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
	func @main() {			func @main() {
	%arg = alloc() : memref<13xi32>			%arg = alloc() : memref<13xi32>
	%dst = memref_cast %arg : memref<13xi32> to memref<?xi32>			%dst = memref_cast %arg : memref<13xi32> to memref<?xi32>
	%c0 = constant 0 : index			%c0 = constant 0 : index
	%c1 = constant 1 : index			%c1 = constant 1 : index
	%sx = dim %dst, %c0 : memref<?xi32>			%sx = dim %dst, %c0 : memref<?xi32>
	Show All 21 Lines

mlir/test/mlir-rocm-runner/vecadd.mlir

	// RUN: mlir-rocm-runner %s --shared-libs=%rocm_wrapper_library_dir/librocm-runtime-wrappers%shlibext,%linalg_test_lib_dir/libmlir_runner_utils%shlibext --entry-point-result=void \| FileCheck %s			// RUN: mlir-rocm-runner %s \
				// RUN: --shared-libs=%rocm_wrapper_library_dir/librocm-runtime-wrappers%shlibext
				// RUN: --entry-point-result=void \
				// RUN: \| FileCheck %s

	func @vecadd(%arg0 : memref<?xf32>, %arg1 : memref<?xf32>, %arg2 : memref<?xf32>) {			func @vecadd(%arg0 : memref<?xf32>, %arg1 : memref<?xf32>, %arg2 : memref<?xf32>) {
	%c0 = constant 0 : index			%c0 = constant 0 : index
	%c1 = constant 1 : index			%c1 = constant 1 : index
	%block_dim = dim %arg0, %c0 : memref<?xf32>			%block_dim = dim %arg0, %c0 : memref<?xf32>
	gpu.launch blocks(%bx, %by, %bz) in (%grid_x = %c1, %grid_y = %c1, %grid_z = %c1)			gpu.launch blocks(%bx, %by, %bz) in (%grid_x = %c1, %grid_y = %c1, %grid_z = %c1)
	threads(%tx, %ty, %tz) in (%block_x = %block_dim, %block_y = %c1, %block_z = %c1) {			threads(%tx, %ty, %tz) in (%block_x = %block_dim, %block_y = %c1, %block_z = %c1) {
	%a = load %arg0[%tx] : memref<?xf32>			%a = load %arg0[%tx] : memref<?xf32>
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

mlir/test/mlir-rocm-runner/vector-transferops.mlir

	// RUN: mlir-rocm-runner %s --shared-libs=%rocm_wrapper_library_dir/librocm-runtime-wrappers%shlibext,%linalg_test_lib_dir/libmlir_runner_utils%shlibext --entry-point-result=void \| FileCheck %s			// RUN: mlir-rocm-runner %s \
				// RUN: --shared-libs=%rocm_wrapper_library_dir/librocm-runtime-wrappers%shlibext
				// RUN: --entry-point-result=void \
				// RUN: \| FileCheck %s

	func @vectransferx2(%arg0 : memref<?xf32>, %arg1 : memref<?xf32>) {			func @vectransferx2(%arg0 : memref<?xf32>, %arg1 : memref<?xf32>) {
	%cst = constant 1 : index			%cst = constant 1 : index
	gpu.launch blocks(%bx, %by, %bz) in (%grid_x = %cst, %grid_y = %cst, %grid_z = %cst)			gpu.launch blocks(%bx, %by, %bz) in (%grid_x = %cst, %grid_y = %cst, %grid_z = %cst)
	threads(%tx, %ty, %tz) in (%block_x = %cst, %block_y = %cst, %block_z = %cst) {			threads(%tx, %ty, %tz) in (%block_x = %cst, %block_y = %cst, %block_z = %cst) {
	%f0 = constant 0.0: f32			%f0 = constant 0.0: f32
	%base = constant 0 : index			%base = constant 0 : index
	%f = vector.transfer_read %arg0[%base], %f0			%f = vector.transfer_read %arg0[%base], %f0
	▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

mlir/tools/mlir-cuda-runner/CMakeLists.txt

Show All 31 Lines	if(MLIR_CUDA_RUNNER_ENABLED)
)		)
target_include_directories(cuda-runtime-wrappers		target_include_directories(cuda-runtime-wrappers
PRIVATE ${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES}		PRIVATE ${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES}
LLVMSupport		LLVMSupport
)		)
target_link_libraries(cuda-runtime-wrappers		target_link_libraries(cuda-runtime-wrappers
PUBLIC		PUBLIC
LLVMSupport		LLVMSupport
		mlir_runner_utils_static
${CUDA_RUNTIME_LIBRARY}		${CUDA_RUNTIME_LIBRARY}
)		)

get_property(dialect_libs GLOBAL PROPERTY MLIR_DIALECT_LIBS)		get_property(dialect_libs GLOBAL PROPERTY MLIR_DIALECT_LIBS)
get_property(conversion_libs GLOBAL PROPERTY MLIR_CONVERSION_LIBS)		get_property(conversion_libs GLOBAL PROPERTY MLIR_CONVERSION_LIBS)
set(LIBS		set(LIBS
${dialect_libs}		${dialect_libs}
${conversion_libs}		${conversion_libs}
Show All 38 Lines

mlir/tools/mlir-rocm-runner/CMakeLists.txt

Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	target_include_directories(rocm-runtime-wrappers
PRIVATE		PRIVATE
"${HIP_PATH}/../include"		"${HIP_PATH}/../include"
"${HIP_PATH}/include"		"${HIP_PATH}/include"
LLVMSupport		LLVMSupport
)		)
target_link_libraries(rocm-runtime-wrappers		target_link_libraries(rocm-runtime-wrappers
PUBLIC		PUBLIC
LLVMSupport		LLVMSupport
		mlir_runner_utils_static
${ROCM_RUNTIME_LIBRARY}		${ROCM_RUNTIME_LIBRARY}
)		)

get_property(dialect_libs GLOBAL PROPERTY MLIR_DIALECT_LIBS)		get_property(dialect_libs GLOBAL PROPERTY MLIR_DIALECT_LIBS)
get_property(conversion_libs GLOBAL PROPERTY MLIR_CONVERSION_LIBS)		get_property(conversion_libs GLOBAL PROPERTY MLIR_CONVERSION_LIBS)
set(LIBS		set(LIBS
${dialect_libs}		${dialect_libs}
${conversion_libs}		${conversion_libs}
▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Link mlir_runner_utils statically into cuda/rocm-runtime-wrappers.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 317823

mlir/include/mlir/ExecutionEngine/CRunnerUtils.h

mlir/lib/ExecutionEngine/CMakeLists.txt

mlir/test/mlir-cuda-runner/all-reduce-and.mlir

mlir/test/mlir-cuda-runner/all-reduce-max.mlir

mlir/test/mlir-cuda-runner/all-reduce-min.mlir

mlir/test/mlir-cuda-runner/all-reduce-op.mlir

mlir/test/mlir-cuda-runner/all-reduce-or.mlir

mlir/test/mlir-cuda-runner/all-reduce-region.mlir

mlir/test/mlir-cuda-runner/all-reduce-xor.mlir

mlir/test/mlir-cuda-runner/gpu-to-cubin.mlir

mlir/test/mlir-cuda-runner/multiple-all-reduce.mlir

mlir/test/mlir-cuda-runner/shuffle.mlir

mlir/test/mlir-cuda-runner/two-modules.mlir

mlir/test/mlir-rocm-runner/gpu-to-hsaco.mlir

mlir/test/mlir-rocm-runner/two-modules.mlir

mlir/test/mlir-rocm-runner/vecadd.mlir

mlir/test/mlir-rocm-runner/vector-transferops.mlir

mlir/tools/mlir-cuda-runner/CMakeLists.txt

mlir/tools/mlir-rocm-runner/CMakeLists.txt

[mlir] Link mlir_runner_utils statically into cuda/rocm-runtime-wrappers.
ClosedPublic