This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Basic/
-
clang/
-
Basic/
1
BuiltinsNVPTX.def
-
lib/Headers/
-
Headers/
-
__clang_cuda_intrinsics.h
-
test/CodeGen/
-
CodeGen/
-
builtins-nvptx-ptx60.cu
-
llvm/
-
include/llvm/IR/
-
llvm/
-
IR/
-
IntrinsicsNVVM.td
-
lib/Target/NVPTX/
-
Target/
-
NVPTX/
-
NVPTXIntrinsics.td
-
test/CodeGen/NVPTX/
-
CodeGen/
-
NVPTX/
-
match.ll

Differential D120499

[NVPTX] Fix nvvm.match.sync*.i64 intrinsics return type (i64 -> i32)
ClosedPublic

Authored by krisb on Feb 24 2022, 9:38 AM.

Download Raw Diff

Details

Reviewers

tra

Commits

rG57aaab3b17f0: [NVPTX] Fix nvvm.match.sync*.i64 intrinsics return type (i64 -> i32)

Summary

NVVM IR specification defines them with i32 return type [0]:

The following intrinsics synchronize a subset of threads in a warp and then
broadcast and compare a value across threads in the subset.

declare i32 @llvm.nvvm.match.any.sync.i64(i32 %membermask, i64 %value)
declare {i32, i1} @llvm.nvvm.match.all.sync.i64(i32 %membermask, i64 %value)
...
The i32 return value is a 32-bit mask where bit position in mask corresponds
to thread’s laneid.

as well as PTX ISA spec [1]:

9.7.12.8. Parallel Synchronization and Communication Instructions: match.sync
...
Syntax
match.any.sync.type  d, a, membermask;
match.all.sync.type  d[|p], a, membermask;

Description
...
Destination d is a 32-bit mask where bit position in mask corresponds
to thread’s laneid.

So it doesn't make sense to define them with any other return type.

Additionally, ptxas doesn't accept intructions, produced by NVPTX
backend. Here is the ptxas output for llvm/test/CodeGen/NVPTX/match.ll:

ptxas match.ptx, line 44; error   : Arguments mismatch for instruction 'match'
ptxas match.ptx, line 45; error   : Arguments mismatch for instruction 'match'
ptxas match.ptx, line 46; error   : Arguments mismatch for instruction 'match'
ptxas match.ptx, line 47; error   : Arguments mismatch for instruction 'match'
ptxas match.ptx, line 98; error   : Arguments mismatch for instruction 'match'

After this patch, it compiles with no issues.

[0] https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html#unique_341827171
[1] https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-match-sync

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

krisb created this revision.Feb 24 2022, 9:38 AM

Herald added subscribers: asavonic, hiraditya. · View Herald TranscriptFeb 24 2022, 9:38 AM

krisb requested review of this revision.Feb 24 2022, 9:38 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptFeb 24 2022, 9:38 AM

Herald added subscribers: llvm-commits, cfe-commits, jdoerfert. · View Herald Transcript

Harbormaster completed remote builds in B151306: Diff 411167.Feb 24 2022, 10:38 AM

Good catch. Thank you for the fix.

clang/include/clang/Basic/BuiltinsNVPTX.def
476–477	I've also noticed that the PTX spec also says `Requires sm_70 or higher.`, so we may want to fix the constraint, too.

Add SM_70 requirement for 'match' builtins.

Harbormaster completed remote builds in B151430: Diff 411341.Feb 25 2022, 1:04 AM

Fix a test.

Harbormaster completed remote builds in B151481: Diff 411416.Feb 25 2022, 8:13 AM

LGTM.

This revision is now accepted and ready to land.Feb 28 2022, 12:01 PM

This revision was landed with ongoing or failed builds.Mar 1 2022, 2:27 AM

Closed by commit rG57aaab3b17f0: [NVPTX] Fix nvvm.match.sync*.i64 intrinsics return type (i64 -> i32) (authored by krisb). · Explain Why

This revision was automatically updated to reflect the committed changes.

krisb added a commit: rG57aaab3b17f0: [NVPTX] Fix nvvm.match.sync*.i64 intrinsics return type (i64 -> i32).

@tra thank you for looking at this!

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

BuiltinsNVPTX.def

8 lines

lib/

Headers/

__clang_cuda_intrinsics.h

4 lines

test/

CodeGen/

builtins-nvptx-ptx60.cu

8 lines

llvm/

include/

llvm/

IR/

IntrinsicsNVVM.td

4 lines

lib/

Target/

NVPTX/

NVPTXIntrinsics.td

32 lines

test/

CodeGen/

NVPTX/

match.ll

84 lines

Diff 411416

clang/include/clang/Basic/BuiltinsNVPTX.def

	Show First 20 Lines • Show All 467 Lines • ▼ Show 20 Lines
	BUILTIN(__nvvm_vote_ballot, "Uib", "")			BUILTIN(__nvvm_vote_ballot, "Uib", "")

	TARGET_BUILTIN(__nvvm_vote_all_sync, "bUib", "", PTX60)			TARGET_BUILTIN(__nvvm_vote_all_sync, "bUib", "", PTX60)
	TARGET_BUILTIN(__nvvm_vote_any_sync, "bUib", "", PTX60)			TARGET_BUILTIN(__nvvm_vote_any_sync, "bUib", "", PTX60)
	TARGET_BUILTIN(__nvvm_vote_uni_sync, "bUib", "", PTX60)			TARGET_BUILTIN(__nvvm_vote_uni_sync, "bUib", "", PTX60)
	TARGET_BUILTIN(__nvvm_vote_ballot_sync, "UiUib", "", PTX60)			TARGET_BUILTIN(__nvvm_vote_ballot_sync, "UiUib", "", PTX60)

	// Match			// Match
	TARGET_BUILTIN(__nvvm_match_any_sync_i32, "UiUiUi", "", PTX60)			TARGET_BUILTIN(__nvvm_match_any_sync_i32, "UiUiUi", "", AND(SM_70,PTX60))
	TARGET_BUILTIN(__nvvm_match_any_sync_i64, "WiUiWi", "", PTX60)			TARGET_BUILTIN(__nvvm_match_any_sync_i64, "UiUiWi", "", AND(SM_70,PTX60))
				traUnsubmitted Not Done Reply Inline Actions I've also noticed that the PTX spec also says `Requires sm_70 or higher.`, so we may want to fix the constraint, too. tra: I've also noticed that the PTX spec also says `Requires sm_70 or higher.`, so we may want to…
	// These return a pair {value, predicate}, which requires custom lowering.			// These return a pair {value, predicate}, which requires custom lowering.
	TARGET_BUILTIN(__nvvm_match_all_sync_i32p, "UiUiUii*", "", PTX60)			TARGET_BUILTIN(__nvvm_match_all_sync_i32p, "UiUiUii*", "", AND(SM_70,PTX60))
	TARGET_BUILTIN(__nvvm_match_all_sync_i64p, "WiUiWii*", "", PTX60)			TARGET_BUILTIN(__nvvm_match_all_sync_i64p, "UiUiWii*", "", AND(SM_70,PTX60))

	// Redux			// Redux
	TARGET_BUILTIN(__nvvm_redux_sync_add, "iii", "", AND(SM_80,PTX70))			TARGET_BUILTIN(__nvvm_redux_sync_add, "iii", "", AND(SM_80,PTX70))
	TARGET_BUILTIN(__nvvm_redux_sync_min, "iii", "", AND(SM_80,PTX70))			TARGET_BUILTIN(__nvvm_redux_sync_min, "iii", "", AND(SM_80,PTX70))
	TARGET_BUILTIN(__nvvm_redux_sync_max, "iii", "", AND(SM_80,PTX70))			TARGET_BUILTIN(__nvvm_redux_sync_max, "iii", "", AND(SM_80,PTX70))
	TARGET_BUILTIN(__nvvm_redux_sync_umin, "UiUii", "", AND(SM_80,PTX70))			TARGET_BUILTIN(__nvvm_redux_sync_umin, "UiUii", "", AND(SM_80,PTX70))
	TARGET_BUILTIN(__nvvm_redux_sync_umax, "UiUii", "", AND(SM_80,PTX70))			TARGET_BUILTIN(__nvvm_redux_sync_umax, "UiUii", "", AND(SM_80,PTX70))
	TARGET_BUILTIN(__nvvm_redux_sync_and, "iii", "", AND(SM_80,PTX70))			TARGET_BUILTIN(__nvvm_redux_sync_and, "iii", "", AND(SM_80,PTX70))
	▲ Show 20 Lines • Show All 361 Lines • Show Last 20 Lines

clang/lib/Headers/__clang_cuda_intrinsics.h

	/*===--- __clang_cuda_intrinsics.h - Device-side CUDA intrinsic wrappers ---===			/*===--- __clang_cuda_intrinsics.h - Device-side CUDA intrinsic wrappers ---===
				Lint: Lint Inline Actions clang-format suggested style edits found: Lint: Lint: clang-format suggested style edits found:
	*			*
	* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	* See https://llvm.org/LICENSE.txt for license information.			* See https://llvm.org/LICENSE.txt for license information.
	* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	*			*
	*===-----------------------------------------------------------------------===			*===-----------------------------------------------------------------------===
	*/			*/
	#ifndef __CLANG_CUDA_INTRINSICS_H__			#ifndef __CLANG_CUDA_INTRINSICS_H__
	▲ Show 20 Lines • Show All 219 Lines • ▼ Show 20 Lines

	// Define __match* builtins CUDA-9 headers expect to see.			// Define __match* builtins CUDA-9 headers expect to see.
	#if !defined(__CUDA_ARCH__) \|\| __CUDA_ARCH__ >= 700			#if !defined(__CUDA_ARCH__) \|\| __CUDA_ARCH__ >= 700
	inline __device__ unsigned int __match32_any_sync(unsigned int mask,			inline __device__ unsigned int __match32_any_sync(unsigned int mask,
	unsigned int value) {			unsigned int value) {
	return __nvvm_match_any_sync_i32(mask, value);			return __nvvm_match_any_sync_i32(mask, value);
	}			}

	inline __device__ unsigned long long			inline __device__ unsigned int
				Lint: Pre-merge checks Inline Actions clang-format: please reformat the code -inline __device__ unsigned int -__match64_any_sync(unsigned int mask, unsigned long long value) { +inline __device__ unsigned int __match64_any_sync(unsigned int mask, + unsigned long long value) { Lint: Pre-merge checks: clang-format: please reformat the code ``` -inline __device__ unsigned int -__match64_any_sync…
	__match64_any_sync(unsigned int mask, unsigned long long value) {			__match64_any_sync(unsigned int mask, unsigned long long value) {
	return __nvvm_match_any_sync_i64(mask, value);			return __nvvm_match_any_sync_i64(mask, value);
	}			}

	inline __device__ unsigned int			inline __device__ unsigned int
	__match32_all_sync(unsigned int mask, unsigned int value, int *pred) {			__match32_all_sync(unsigned int mask, unsigned int value, int *pred) {
	return __nvvm_match_all_sync_i32p(mask, value, pred);			return __nvvm_match_all_sync_i32p(mask, value, pred);
	}			}

	inline __device__ unsigned long long			inline __device__ unsigned int
	__match64_all_sync(unsigned int mask, unsigned long long value, int *pred) {			__match64_all_sync(unsigned int mask, unsigned long long value, int *pred) {
	return __nvvm_match_all_sync_i64p(mask, value, pred);			return __nvvm_match_all_sync_i64p(mask, value, pred);
	}			}
	#include "crt/sm_70_rt.hpp"			#include "crt/sm_70_rt.hpp"

	#endif // !defined(__CUDA_ARCH__) \|\| __CUDA_ARCH__ >= 700			#endif // !defined(__CUDA_ARCH__) \|\| __CUDA_ARCH__ >= 700
	#endif // __CUDA_VERSION >= 9000			#endif // __CUDA_VERSION >= 9000

	▲ Show 20 Lines • Show All 263 Lines • Show Last 20 Lines

clang/test/CodeGen/builtins-nvptx-ptx60.cu

// RUN: %clang_cc1 -triple nvptx64-unknown-unknown -target-cpu sm_60 \		// RUN: %clang_cc1 -triple nvptx64-unknown-unknown -target-cpu sm_70 \
// RUN: -fcuda-is-device -target-feature +ptx60 \		// RUN: -fcuda-is-device -target-feature +ptx60 \
// RUN: -S -emit-llvm -o - -x cuda %s \		// RUN: -S -emit-llvm -o - -x cuda %s \
// RUN: \| FileCheck -check-prefix=CHECK %s		// RUN: \| FileCheck -check-prefix=CHECK %s
// RUN: %clang_cc1 -triple nvptx64-unknown-unknown -target-cpu sm_80 \		// RUN: %clang_cc1 -triple nvptx64-unknown-unknown -target-cpu sm_80 \
// RUN: -fcuda-is-device -target-feature +ptx65 \		// RUN: -fcuda-is-device -target-feature +ptx65 \
// RUN: -S -emit-llvm -o - -x cuda %s \		// RUN: -S -emit-llvm -o - -x cuda %s \
// RUN: \| FileCheck -check-prefix=CHECK %s		// RUN: \| FileCheck -check-prefix=CHECK %s
// RUN: %clang_cc1 -triple nvptx64-unknown-unknown -target-cpu sm_80 \		// RUN: %clang_cc1 -triple nvptx64-unknown-unknown -target-cpu sm_80 \
// RUN: -fcuda-is-device -target-feature +ptx70 \		// RUN: -fcuda-is-device -target-feature +ptx70 \
// RUN: -S -emit-llvm -o - -x cuda %s \		// RUN: -S -emit-llvm -o - -x cuda %s \
// RUN: \| FileCheck -check-prefix=CHECK %s		// RUN: \| FileCheck -check-prefix=CHECK %s
// RUN: %clang_cc1 -triple nvptx-unknown-unknown -target-cpu sm_60 \		// RUN: %clang_cc1 -triple nvptx-unknown-unknown -target-cpu sm_70 \
// RUN: -fcuda-is-device -S -o /dev/null -x cuda -verify %s		// RUN: -fcuda-is-device -S -o /dev/null -x cuda -verify %s

#define __device__ __attribute__((device))		#define __device__ __attribute__((device))
#define __global__ __attribute__((global))		#define __global__ __attribute__((global))
#define __shared__ __attribute__((shared))		#define __shared__ __attribute__((shared))
#define __constant__ __attribute__((constant))		#define __constant__ __attribute__((constant))

typedef unsigned long long uint64_t;		typedef unsigned long long uint64_t;
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	__device__ void nvvm_sync(unsigned mask, int i, float f, int a, int b,

//		//
// MATCH.{ALL,ANY}.SYNC		// MATCH.{ALL,ANY}.SYNC
//		//

// CHECK: call i32 @llvm.nvvm.match.any.sync.i32(i32		// CHECK: call i32 @llvm.nvvm.match.any.sync.i32(i32
// expected-error@+1 {{'__nvvm_match_any_sync_i32' needs target feature ptx60}}		// expected-error@+1 {{'__nvvm_match_any_sync_i32' needs target feature ptx60}}
__nvvm_match_any_sync_i32(mask, i);		__nvvm_match_any_sync_i32(mask, i);
// CHECK: call i64 @llvm.nvvm.match.any.sync.i64(i32		// CHECK: call i32 @llvm.nvvm.match.any.sync.i64(i32
// expected-error@+1 {{'__nvvm_match_any_sync_i64' needs target feature ptx60}}		// expected-error@+1 {{'__nvvm_match_any_sync_i64' needs target feature ptx60}}
__nvvm_match_any_sync_i64(mask, i64);		__nvvm_match_any_sync_i64(mask, i64);
// CHECK: call { i32, i1 } @llvm.nvvm.match.all.sync.i32p(i32		// CHECK: call { i32, i1 } @llvm.nvvm.match.all.sync.i32p(i32
// expected-error@+1 {{'__nvvm_match_all_sync_i32p' needs target feature ptx60}}		// expected-error@+1 {{'__nvvm_match_all_sync_i32p' needs target feature ptx60}}
__nvvm_match_all_sync_i32p(mask, i, &i);		__nvvm_match_all_sync_i32p(mask, i, &i);
// CHECK: call { i64, i1 } @llvm.nvvm.match.all.sync.i64p(i32		// CHECK: call { i32, i1 } @llvm.nvvm.match.all.sync.i64p(i32
// expected-error@+1 {{'__nvvm_match_all_sync_i64p' needs target feature ptx60}}		// expected-error@+1 {{'__nvvm_match_all_sync_i64p' needs target feature ptx60}}
__nvvm_match_all_sync_i64p(mask, i64, &i);		__nvvm_match_all_sync_i64p(mask, i64, &i);

// CHECK: ret void		// CHECK: ret void
}		}

llvm/include/llvm/IR/IntrinsicsNVVM.td

	Show First 20 Lines • Show All 4,493 Lines • ▼ Show 20 Lines
	//			//
	// match.any.sync.b32 mask, value			// match.any.sync.b32 mask, value
	def int_nvvm_match_any_sync_i32 :			def int_nvvm_match_any_sync_i32 :
	Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty],			Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty],
	[IntrInaccessibleMemOnly, IntrConvergent], "llvm.nvvm.match.any.sync.i32">,			[IntrInaccessibleMemOnly, IntrConvergent], "llvm.nvvm.match.any.sync.i32">,
	GCCBuiltin<"__nvvm_match_any_sync_i32">;			GCCBuiltin<"__nvvm_match_any_sync_i32">;
	// match.any.sync.b64 mask, value			// match.any.sync.b64 mask, value
	def int_nvvm_match_any_sync_i64 :			def int_nvvm_match_any_sync_i64 :
	Intrinsic<[llvm_i64_ty], [llvm_i32_ty, llvm_i64_ty],			Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i64_ty],
	[IntrInaccessibleMemOnly, IntrConvergent], "llvm.nvvm.match.any.sync.i64">,			[IntrInaccessibleMemOnly, IntrConvergent], "llvm.nvvm.match.any.sync.i64">,
	GCCBuiltin<"__nvvm_match_any_sync_i64">;			GCCBuiltin<"__nvvm_match_any_sync_i64">;

	// match.all instruction have two variants -- one returns a single value, another			// match.all instruction have two variants -- one returns a single value, another
	// returns a pair {value, predicate}. We currently only implement the latter as			// returns a pair {value, predicate}. We currently only implement the latter as
	// that's the variant exposed by CUDA API.			// that's the variant exposed by CUDA API.

	// match.all.sync.b32p mask, value			// match.all.sync.b32p mask, value
	def int_nvvm_match_all_sync_i32p :			def int_nvvm_match_all_sync_i32p :
	Intrinsic<[llvm_i32_ty, llvm_i1_ty], [llvm_i32_ty, llvm_i32_ty],			Intrinsic<[llvm_i32_ty, llvm_i1_ty], [llvm_i32_ty, llvm_i32_ty],
	[IntrInaccessibleMemOnly, IntrConvergent], "llvm.nvvm.match.all.sync.i32p">;			[IntrInaccessibleMemOnly, IntrConvergent], "llvm.nvvm.match.all.sync.i32p">;
	// match.all.sync.b64p mask, value			// match.all.sync.b64p mask, value
	def int_nvvm_match_all_sync_i64p :			def int_nvvm_match_all_sync_i64p :
	Intrinsic<[llvm_i64_ty, llvm_i1_ty], [llvm_i32_ty, llvm_i64_ty],			Intrinsic<[llvm_i32_ty, llvm_i1_ty], [llvm_i32_ty, llvm_i64_ty],
	[IntrInaccessibleMemOnly, IntrConvergent], "llvm.nvvm.match.all.sync.i64p">;			[IntrInaccessibleMemOnly, IntrConvergent], "llvm.nvvm.match.all.sync.i64p">;

	//			//
	// REDUX.SYNC			// REDUX.SYNC
	//			//
	// redux.sync.min.u32 dst, src, membermask;			// redux.sync.min.u32 dst, src, membermask;
	def int_nvvm_redux_sync_umin : GCCBuiltin<"__nvvm_redux_sync_umin">,			def int_nvvm_redux_sync_umin : GCCBuiltin<"__nvvm_redux_sync_umin">,
	Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty],			Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty],
	▲ Show 20 Lines • Show All 139 Lines • Show Last 20 Lines

llvm/lib/Target/NVPTX/NVPTXIntrinsics.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 217 Lines • ▼ Show 20 Lines

	defm VOTE_SYNC_ALL : VOTE_SYNC<Int1Regs, "all.pred", int_nvvm_vote_all_sync>;			defm VOTE_SYNC_ALL : VOTE_SYNC<Int1Regs, "all.pred", int_nvvm_vote_all_sync>;
	defm VOTE_SYNC_ANY : VOTE_SYNC<Int1Regs, "any.pred", int_nvvm_vote_any_sync>;			defm VOTE_SYNC_ANY : VOTE_SYNC<Int1Regs, "any.pred", int_nvvm_vote_any_sync>;
	defm VOTE_SYNC_UNI : VOTE_SYNC<Int1Regs, "uni.pred", int_nvvm_vote_uni_sync>;			defm VOTE_SYNC_UNI : VOTE_SYNC<Int1Regs, "uni.pred", int_nvvm_vote_uni_sync>;
	defm VOTE_SYNC_BALLOT : VOTE_SYNC<Int32Regs, "ballot.b32", int_nvvm_vote_ballot_sync>;			defm VOTE_SYNC_BALLOT : VOTE_SYNC<Int32Regs, "ballot.b32", int_nvvm_vote_ballot_sync>;

	multiclass MATCH_ANY_SYNC<NVPTXRegClass regclass, string ptxtype, Intrinsic IntOp,			multiclass MATCH_ANY_SYNC<NVPTXRegClass regclass, string ptxtype, Intrinsic IntOp,
	Operand ImmOp> {			Operand ImmOp> {
	def ii : NVPTXInst<(outs regclass:$dest), (ins i32imm:$mask, ImmOp:$value),			def ii : NVPTXInst<(outs Int32Regs:$dest), (ins i32imm:$mask, ImmOp:$value),
	"match.any.sync." # ptxtype # " \t$dest, $value, $mask;",			"match.any.sync." # ptxtype # " \t$dest, $value, $mask;",
	[(set regclass:$dest, (IntOp imm:$mask, imm:$value))]>,			[(set Int32Regs:$dest, (IntOp imm:$mask, imm:$value))]>,
	Requires<[hasPTX60, hasSM70]>;			Requires<[hasPTX60, hasSM70]>;
	def ir : NVPTXInst<(outs regclass:$dest), (ins Int32Regs:$mask, ImmOp:$value),			def ir : NVPTXInst<(outs Int32Regs:$dest), (ins Int32Regs:$mask, ImmOp:$value),
	"match.any.sync." # ptxtype # " \t$dest, $value, $mask;",			"match.any.sync." # ptxtype # " \t$dest, $value, $mask;",
	[(set regclass:$dest, (IntOp Int32Regs:$mask, imm:$value))]>,			[(set Int32Regs:$dest, (IntOp Int32Regs:$mask, imm:$value))]>,
	Requires<[hasPTX60, hasSM70]>;			Requires<[hasPTX60, hasSM70]>;
	def ri : NVPTXInst<(outs regclass:$dest), (ins i32imm:$mask, regclass:$value),			def ri : NVPTXInst<(outs Int32Regs:$dest), (ins i32imm:$mask, regclass:$value),
	"match.any.sync." # ptxtype # " \t$dest, $value, $mask;",			"match.any.sync." # ptxtype # " \t$dest, $value, $mask;",
	[(set regclass:$dest, (IntOp imm:$mask, regclass:$value))]>,			[(set Int32Regs:$dest, (IntOp imm:$mask, regclass:$value))]>,
	Requires<[hasPTX60, hasSM70]>;			Requires<[hasPTX60, hasSM70]>;
	def rr : NVPTXInst<(outs regclass:$dest), (ins Int32Regs:$mask, regclass:$value),			def rr : NVPTXInst<(outs Int32Regs:$dest), (ins Int32Regs:$mask, regclass:$value),
	"match.any.sync." # ptxtype # " \t$dest, $value, $mask;",			"match.any.sync." # ptxtype # " \t$dest, $value, $mask;",
	[(set regclass:$dest, (IntOp Int32Regs:$mask, regclass:$value))]>,			[(set Int32Regs:$dest, (IntOp Int32Regs:$mask, regclass:$value))]>,
	Requires<[hasPTX60, hasSM70]>;			Requires<[hasPTX60, hasSM70]>;
	}			}

	defm MATCH_ANY_SYNC_32 : MATCH_ANY_SYNC<Int32Regs, "b32", int_nvvm_match_any_sync_i32,			defm MATCH_ANY_SYNC_32 : MATCH_ANY_SYNC<Int32Regs, "b32", int_nvvm_match_any_sync_i32,
	i32imm>;			i32imm>;
	defm MATCH_ANY_SYNC_64 : MATCH_ANY_SYNC<Int64Regs, "b64", int_nvvm_match_any_sync_i64,			defm MATCH_ANY_SYNC_64 : MATCH_ANY_SYNC<Int64Regs, "b64", int_nvvm_match_any_sync_i64,
	i64imm>;			i64imm>;

	multiclass MATCH_ALLP_SYNC<NVPTXRegClass regclass, string ptxtype, Intrinsic IntOp,			multiclass MATCH_ALLP_SYNC<NVPTXRegClass regclass, string ptxtype, Intrinsic IntOp,
	Operand ImmOp> {			Operand ImmOp> {
	def ii : NVPTXInst<(outs regclass:$dest, Int1Regs:$pred),			def ii : NVPTXInst<(outs Int32Regs:$dest, Int1Regs:$pred),
	(ins i32imm:$mask, ImmOp:$value),			(ins i32imm:$mask, ImmOp:$value),
	"match.all.sync." # ptxtype # " \t$dest\|$pred, $value, $mask;",			"match.all.sync." # ptxtype # " \t$dest\|$pred, $value, $mask;",
	[(set regclass:$dest, Int1Regs:$pred, (IntOp imm:$mask, imm:$value))]>,			[(set Int32Regs:$dest, Int1Regs:$pred, (IntOp imm:$mask, imm:$value))]>,
	Requires<[hasPTX60, hasSM70]>;			Requires<[hasPTX60, hasSM70]>;
	def ir : NVPTXInst<(outs regclass:$dest, Int1Regs:$pred),			def ir : NVPTXInst<(outs Int32Regs:$dest, Int1Regs:$pred),
	(ins Int32Regs:$mask, ImmOp:$value),			(ins Int32Regs:$mask, ImmOp:$value),
	"match.all.sync." # ptxtype # " \t$dest\|$pred, $value, $mask;",			"match.all.sync." # ptxtype # " \t$dest\|$pred, $value, $mask;",
	[(set regclass:$dest, Int1Regs:$pred, (IntOp Int32Regs:$mask, imm:$value))]>,			[(set Int32Regs:$dest, Int1Regs:$pred, (IntOp Int32Regs:$mask, imm:$value))]>,
	Requires<[hasPTX60, hasSM70]>;			Requires<[hasPTX60, hasSM70]>;
	def ri : NVPTXInst<(outs regclass:$dest, Int1Regs:$pred),			def ri : NVPTXInst<(outs Int32Regs:$dest, Int1Regs:$pred),
	(ins i32imm:$mask, regclass:$value),			(ins i32imm:$mask, regclass:$value),
	"match.all.sync." # ptxtype # " \t$dest\|$pred, $value, $mask;",			"match.all.sync." # ptxtype # " \t$dest\|$pred, $value, $mask;",
	[(set regclass:$dest, Int1Regs:$pred, (IntOp imm:$mask, regclass:$value))]>,			[(set Int32Regs:$dest, Int1Regs:$pred, (IntOp imm:$mask, regclass:$value))]>,
	Requires<[hasPTX60, hasSM70]>;			Requires<[hasPTX60, hasSM70]>;
	def rr : NVPTXInst<(outs regclass:$dest, Int1Regs:$pred),			def rr : NVPTXInst<(outs Int32Regs:$dest, Int1Regs:$pred),
	(ins Int32Regs:$mask, regclass:$value),			(ins Int32Regs:$mask, regclass:$value),
	"match.all.sync." # ptxtype # " \t$dest\|$pred, $value, $mask;",			"match.all.sync." # ptxtype # " \t$dest\|$pred, $value, $mask;",
	[(set regclass:$dest, Int1Regs:$pred, (IntOp Int32Regs:$mask, regclass:$value))]>,			[(set Int32Regs:$dest, Int1Regs:$pred, (IntOp Int32Regs:$mask, regclass:$value))]>,
	Requires<[hasPTX60, hasSM70]>;			Requires<[hasPTX60, hasSM70]>;
	}			}
	defm MATCH_ALLP_SYNC_32 : MATCH_ALLP_SYNC<Int32Regs, "b32", int_nvvm_match_all_sync_i32p,			defm MATCH_ALLP_SYNC_32 : MATCH_ALLP_SYNC<Int32Regs, "b32", int_nvvm_match_all_sync_i32p,
	i32imm>;			i32imm>;
	defm MATCH_ALLP_SYNC_64 : MATCH_ALLP_SYNC<Int64Regs, "b64", int_nvvm_match_all_sync_i64p,			defm MATCH_ALLP_SYNC_64 : MATCH_ALLP_SYNC<Int64Regs, "b64", int_nvvm_match_all_sync_i64p,
	i64imm>;			i64imm>;

	multiclass REDUX_SYNC<string BinOp, string PTXType, Intrinsic Intrin> {			multiclass REDUX_SYNC<string BinOp, string PTXType, Intrinsic Intrin> {
	▲ Show 20 Lines • Show All 6,417 Lines • Show Last 20 Lines

llvm/test/CodeGen/NVPTX/match.ll

; RUN: llc < %s -march=nvptx64 -mcpu=sm_70 -mattr=+ptx60 \| FileCheck %s		; RUN: llc < %s -march=nvptx64 -mcpu=sm_70 -mattr=+ptx60 \| FileCheck %s

declare i32 @llvm.nvvm.match.any.sync.i32(i32, i32)		declare i32 @llvm.nvvm.match.any.sync.i32(i32, i32)
declare i64 @llvm.nvvm.match.any.sync.i64(i32, i64)		declare i32 @llvm.nvvm.match.any.sync.i64(i32, i64)

; CHECK-LABEL: .func{{.*}}match.any.sync.i32		; CHECK-LABEL: .func{{.*}}match.any.sync.i32
define i32 @match.any.sync.i32(i32 %mask, i32 %value) {		define i32 @match.any.sync.i32(i32 %mask, i32 %value) {
; CHECK: ld.param.u32 [[MASK:%r[0-9]+]], [match.any.sync.i32_param_0];		; CHECK: ld.param.u32 [[MASK:%r[0-9]+]], [match.any.sync.i32_param_0];
; CHECK: ld.param.u32 [[VALUE:%r[0-9]+]], [match.any.sync.i32_param_1];		; CHECK: ld.param.u32 [[VALUE:%r[0-9]+]], [match.any.sync.i32_param_1];

; CHECK: match.any.sync.b32 [[V0:%r[0-9]+]], [[VALUE]], [[MASK]];		; CHECK: match.any.sync.b32 [[V0:%r[0-9]+]], [[VALUE]], [[MASK]];
%v0 = call i32 @llvm.nvvm.match.any.sync.i32(i32 %mask, i32 %value)		%v0 = call i32 @llvm.nvvm.match.any.sync.i32(i32 %mask, i32 %value)
; CHECK: match.any.sync.b32 [[V1:%r[0-9]+]], [[VALUE]], 1;		; CHECK: match.any.sync.b32 [[V1:%r[0-9]+]], [[VALUE]], 1;
%v1 = call i32 @llvm.nvvm.match.any.sync.i32(i32 1, i32 %value)		%v1 = call i32 @llvm.nvvm.match.any.sync.i32(i32 1, i32 %value)
; CHECK: match.any.sync.b32 [[V2:%r[0-9]+]], 2, [[MASK]];		; CHECK: match.any.sync.b32 [[V2:%r[0-9]+]], 2, [[MASK]];
%v2 = call i32 @llvm.nvvm.match.any.sync.i32(i32 %mask, i32 2)		%v2 = call i32 @llvm.nvvm.match.any.sync.i32(i32 %mask, i32 2)
; CHECK: match.any.sync.b32 [[V3:%r[0-9]+]], 4, 3;		; CHECK: match.any.sync.b32 [[V3:%r[0-9]+]], 4, 3;
%v3 = call i32 @llvm.nvvm.match.any.sync.i32(i32 3, i32 4)		%v3 = call i32 @llvm.nvvm.match.any.sync.i32(i32 3, i32 4)
%sum1 = add i32 %v0, %v1		%sum1 = add i32 %v0, %v1
%sum2 = add i32 %v2, %v3		%sum2 = add i32 %v2, %v3
%sum3 = add i32 %sum1, %sum2		%sum3 = add i32 %sum1, %sum2
ret i32 %sum3;		ret i32 %sum3;
}		}

; CHECK-LABEL: .func{{.*}}match.any.sync.i64		; CHECK-LABEL: .func{{.*}}match.any.sync.i64
define i64 @match.any.sync.i64(i32 %mask, i64 %value) {		define i32 @match.any.sync.i64(i32 %mask, i64 %value) {
; CHECK: ld.param.u32 [[MASK:%r[0-9]+]], [match.any.sync.i64_param_0];		; CHECK: ld.param.u32 [[MASK:%r[0-9]+]], [match.any.sync.i64_param_0];
; CHECK: ld.param.u64 [[VALUE:%rd[0-9]+]], [match.any.sync.i64_param_1];		; CHECK: ld.param.u64 [[VALUE:%rd[0-9]+]], [match.any.sync.i64_param_1];

; CHECK: match.any.sync.b64 [[V0:%rd[0-9]+]], [[VALUE]], [[MASK]];		; CHECK: match.any.sync.b64 [[V0:%r[0-9]+]], [[VALUE]], [[MASK]];
%v0 = call i64 @llvm.nvvm.match.any.sync.i64(i32 %mask, i64 %value)		%v0 = call i32 @llvm.nvvm.match.any.sync.i64(i32 %mask, i64 %value)
; CHECK: match.any.sync.b64 [[V1:%rd[0-9]+]], [[VALUE]], 1;		; CHECK: match.any.sync.b64 [[V1:%r[0-9]+]], [[VALUE]], 1;
%v1 = call i64 @llvm.nvvm.match.any.sync.i64(i32 1, i64 %value)		%v1 = call i32 @llvm.nvvm.match.any.sync.i64(i32 1, i64 %value)
; CHECK: match.any.sync.b64 [[V2:%rd[0-9]+]], 2, [[MASK]];		; CHECK: match.any.sync.b64 [[V2:%r[0-9]+]], 2, [[MASK]];
%v2 = call i64 @llvm.nvvm.match.any.sync.i64(i32 %mask, i64 2)		%v2 = call i32 @llvm.nvvm.match.any.sync.i64(i32 %mask, i64 2)
; CHECK: match.any.sync.b64 [[V3:%rd[0-9]+]], 4, 3;		; CHECK: match.any.sync.b64 [[V3:%r[0-9]+]], 4, 3;
%v3 = call i64 @llvm.nvvm.match.any.sync.i64(i32 3, i64 4)		%v3 = call i32 @llvm.nvvm.match.any.sync.i64(i32 3, i64 4)
%sum1 = add i64 %v0, %v1		%sum1 = add i32 %v0, %v1
%sum2 = add i64 %v2, %v3		%sum2 = add i32 %v2, %v3
%sum3 = add i64 %sum1, %sum2		%sum3 = add i32 %sum1, %sum2
ret i64 %sum3;		ret i32 %sum3;
}		}

declare {i32, i1} @llvm.nvvm.match.all.sync.i32p(i32, i32)		declare {i32, i1} @llvm.nvvm.match.all.sync.i32p(i32, i32)
declare {i64, i1} @llvm.nvvm.match.all.sync.i64p(i32, i64)		declare {i32, i1} @llvm.nvvm.match.all.sync.i64p(i32, i64)

; CHECK-LABEL: .func{{.*}}match.all.sync.i32p(		; CHECK-LABEL: .func{{.*}}match.all.sync.i32p(
define {i32,i1} @match.all.sync.i32p(i32 %mask, i32 %value) {		define {i32,i1} @match.all.sync.i32p(i32 %mask, i32 %value) {
; CHECK: ld.param.u32 [[MASK:%r[0-9]+]], [match.all.sync.i32p_param_0];		; CHECK: ld.param.u32 [[MASK:%r[0-9]+]], [match.all.sync.i32p_param_0];
; CHECK: ld.param.u32 [[VALUE:%r[0-9]+]], [match.all.sync.i32p_param_1];		; CHECK: ld.param.u32 [[VALUE:%r[0-9]+]], [match.all.sync.i32p_param_1];

; CHECK: match.all.sync.b32 {{%r[0-9]+\\|%p[0-9]+}}, [[VALUE]], [[MASK]];		; CHECK: match.all.sync.b32 {{%r[0-9]+\\|%p[0-9]+}}, [[VALUE]], [[MASK]];
%r1 = call {i32, i1} @llvm.nvvm.match.all.sync.i32p(i32 %mask, i32 %value)		%r1 = call {i32, i1} @llvm.nvvm.match.all.sync.i32p(i32 %mask, i32 %value)
Show All 22 Lines	define {i32,i1} @match.all.sync.i32p(i32 %mask, i32 %value) {
%psum2 = add i1 %p3, %p4		%psum2 = add i1 %p3, %p4
%psum3 = add i1 %psum1, %psum2		%psum3 = add i1 %psum1, %psum2
%ret0 = insertvalue {i32, i1} undef, i32 %vsum3, 0		%ret0 = insertvalue {i32, i1} undef, i32 %vsum3, 0
%ret1 = insertvalue {i32, i1} %ret0, i1 %psum3, 1		%ret1 = insertvalue {i32, i1} %ret0, i1 %psum3, 1
ret {i32, i1} %ret1;		ret {i32, i1} %ret1;
}		}

; CHECK-LABEL: .func{{.*}}match.all.sync.i64p(		; CHECK-LABEL: .func{{.*}}match.all.sync.i64p(
define {i64,i1} @match.all.sync.i64p(i32 %mask, i64 %value) {		define {i32,i1} @match.all.sync.i64p(i32 %mask, i64 %value) {
; CHECK: ld.param.u32 [[MASK:%r[0-9]+]], [match.all.sync.i64p_param_0];		; CHECK: ld.param.u32 [[MASK:%r[0-9]+]], [match.all.sync.i64p_param_0];
; CHECK: ld.param.u64 [[VALUE:%rd[0-9]+]], [match.all.sync.i64p_param_1];		; CHECK: ld.param.u64 [[VALUE:%rd[0-9]+]], [match.all.sync.i64p_param_1];

; CHECK: match.all.sync.b64 {{%rd[0-9]+\\|%p[0-9]+}}, [[VALUE]], [[MASK]];		; CHECK: match.all.sync.b64 {{%r[0-9]+\\|%p[0-9]+}}, [[VALUE]], [[MASK]];
%r1 = call {i64, i1} @llvm.nvvm.match.all.sync.i64p(i32 %mask, i64 %value)		%r1 = call {i32, i1} @llvm.nvvm.match.all.sync.i64p(i32 %mask, i64 %value)
%v1 = extractvalue {i64, i1} %r1, 0		%v1 = extractvalue {i32, i1} %r1, 0
%p1 = extractvalue {i64, i1} %r1, 1		%p1 = extractvalue {i32, i1} %r1, 1

; CHECK: match.all.sync.b64 {{%rd[0-9]+\\|%p[0-9]+}}, 1, [[MASK]];		; CHECK: match.all.sync.b64 {{%r[0-9]+\\|%p[0-9]+}}, 1, [[MASK]];
%r2 = call {i64, i1} @llvm.nvvm.match.all.sync.i64p(i32 %mask, i64 1)		%r2 = call {i32, i1} @llvm.nvvm.match.all.sync.i64p(i32 %mask, i64 1)
%v2 = extractvalue {i64, i1} %r2, 0		%v2 = extractvalue {i32, i1} %r2, 0
%p2 = extractvalue {i64, i1} %r2, 1		%p2 = extractvalue {i32, i1} %r2, 1

; CHECK: match.all.sync.b64 {{%rd[0-9]+\\|%p[0-9]+}}, [[VALUE]], 2;		; CHECK: match.all.sync.b64 {{%r[0-9]+\\|%p[0-9]+}}, [[VALUE]], 2;
%r3 = call {i64, i1} @llvm.nvvm.match.all.sync.i64p(i32 2, i64 %value)		%r3 = call {i32, i1} @llvm.nvvm.match.all.sync.i64p(i32 2, i64 %value)
%v3 = extractvalue {i64, i1} %r3, 0		%v3 = extractvalue {i32, i1} %r3, 0
%p3 = extractvalue {i64, i1} %r3, 1		%p3 = extractvalue {i32, i1} %r3, 1

; CHECK: match.all.sync.b64 {{%rd[0-9]+\\|%p[0-9]+}}, 4, 3;		; CHECK: match.all.sync.b64 {{%r[0-9]+\\|%p[0-9]+}}, 4, 3;
%r4 = call {i64, i1} @llvm.nvvm.match.all.sync.i64p(i32 3, i64 4)		%r4 = call {i32, i1} @llvm.nvvm.match.all.sync.i64p(i32 3, i64 4)
%v4 = extractvalue {i64, i1} %r4, 0		%v4 = extractvalue {i32, i1} %r4, 0
%p4 = extractvalue {i64, i1} %r4, 1		%p4 = extractvalue {i32, i1} %r4, 1

%vsum1 = add i64 %v1, %v2		%vsum1 = add i32 %v1, %v2
%vsum2 = add i64 %v3, %v4		%vsum2 = add i32 %v3, %v4
%vsum3 = add i64 %vsum1, %vsum2		%vsum3 = add i32 %vsum1, %vsum2
%psum1 = add i1 %p1, %p2		%psum1 = add i1 %p1, %p2
%psum2 = add i1 %p3, %p4		%psum2 = add i1 %p3, %p4
%psum3 = add i1 %psum1, %psum2		%psum3 = add i1 %psum1, %psum2
%ret0 = insertvalue {i64, i1} undef, i64 %vsum3, 0		%ret0 = insertvalue {i32, i1} undef, i32 %vsum3, 0
%ret1 = insertvalue {i64, i1} %ret0, i1 %psum3, 1		%ret1 = insertvalue {i32, i1} %ret0, i1 %psum3, 1
ret {i64, i1} %ret1;		ret {i32, i1} %ret1;
}		}