Download Raw Diff

Details

Reviewers

herhut

Commits

rGad398164bac0: [mlir][gpu] Refactor functions for workgroup and private buffer attributions.

Summary

Consolidate interfaces adding workgroup and private buffer attributions in GPU
dialect.

Note all private buffer attributions must follow workgroup buffer attributions.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

whchung created this revision.May 6 2020, 10:48 AM

Herald added a reviewer: herhut. · View Herald TranscriptMay 6 2020, 10:48 AM

Herald added subscribers: llvm-commits, Kayjukh, frgossen and 14 others. · View Herald Transcript

Harbormaster completed remote builds in B55958: Diff 262425.May 6 2020, 11:52 AM

Thanks for adding this!

mlir/include/mlir/Dialect/GPU/GPUOps.td
215	It is not clear to me why we have two versions of this function. While you have not caused this, could you clean it up and unify them? One could implement `addWorkgroupAttribution(ArrayRef<int64_t> shape, Type elementType)` by means of `BlockArgument addWorkgroupAttribution(Type type)`. I see only one use of the former form in `MemoryPromotion`. Changing that use to the type based one and removing the shape/elementType versions would be best. Also, both of these could be implemented in the cpp file and only be declared here.
220	Use `getNumWorkgroupAttributions` here?

This revision now requires changes to proceed.May 7 2020, 5:03 AM

Address review comments.

whchung marked 3 inline comments as done.May 7 2020, 7:34 AM

whchung added inline comments.

mlir/include/mlir/Dialect/GPU/GPUOps.td
215	@herhut I was somewhat puzzled by the two versions of interfaces too. I've revised the patch so we we consolidate the interfaces. Could you help conduct another round of review? Thanks.

whchung retitled this revision from [mlir][gpu] Add utility functions to add private buffer attributions. to [mlir][gpu] Refactor functions for workgroup and private buffer attributions..May 7 2020, 7:35 AM

whchung edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B56066: Diff 262647.May 7 2020, 8:29 AM

Thanks for the cleanup.

mlir/include/mlir/Dialect/GPU/GPUOps.td
195	These getters that give access to a range should remain in the header to enable inlining (and maybe avoid materializing the ArrayRef).
222	I don't think this is needed. The contract could simple be that all trailing operands after function arguments and workgroup attributions are private. Storing an extra count does not add benefit. You would then need to verify that arguments + all attributions = operand count. So let's just drop the extra attribute.
mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
478	This would just be `{begin, getBody().front().args_end()}`

@whchung Are you still interested in moving this forward?

Herald added a subscriber: jurahul. · View Herald TranscriptMay 19 2020, 2:54 AM

Address code review comments.

In D79508#2043577, @herhut wrote:

@whchung Are you still interested in moving this forward?

@herhut I just revised the patch to address your review comments. Getting rid of the counter for private allocations is less than ideal in my applications but I've found ways to get around them. Could you give this patch another look? Thanks.

Harbormaster failed remote builds in B57205: Diff 264899!May 19 2020, 8:07 AM

In D79508#2044128, @whchung wrote:

In D79508#2043577, @herhut wrote:

@whchung Are you still interested in moving this forward?

@herhut I just revised the patch to address your review comments. Getting rid of the counter for private allocations is less than ideal in my applications but I've found ways to get around them. Could you give this patch another look? Thanks.

I think you misunderstood my comment. I liked having the getters and interface for private attributions. My point was that you do not need to store a count as an attribute for the private attributions but instead could always compute the number of private attributes via getNumOperands() - getType().getNumInputs() - getNumWorkgroupAttributions(). Sorry for not being clear.

Using that approach should also cover your use cases?

In D79508#2044267, @herhut wrote:

In D79508#2044128, @whchung wrote:

In D79508#2043577, @herhut wrote:

@whchung Are you still interested in moving this forward?

@herhut I just revised the patch to address your review comments. Getting rid of the counter for private allocations is less than ideal in my applications but I've found ways to get around them. Could you give this patch another look? Thanks.

I think you misunderstood my comment. I liked having the getters and interface for private attributions. My point was that you do not need to store a count as an attribute for the private attributions but instead could always compute the number of private attributes via getNumOperands() - getType().getNumInputs() - getNumWorkgroupAttributions(). Sorry for not being clear.

Using that approach should also cover your use cases?

@herhut Yes, that's exactly how I compute the number of private attributes right now in my application.

BTW it seems nowadays pre-merge checks would always fail on Windows due to logic in SPIR-V dialect.

[2020-05-19T14:57:38.942Z] tools\mlir\include\mlir/Dialect/SPIRV/SPIRVSerialization.inc(4334): fatal error C1061: compiler limit: blocks nested too deeply

BTW it seems nowadays pre-merge checks would always fail on Windows due to logic in SPIR-V dialect.
[2020-05-19T14:57:38.942Z] tools\mlir\include\mlir/Dialect/SPIRV/SPIRVSerialization.inc(4334): fatal error C1061: compiler limit: blocks nested too deeply

This should be fixed by patch https://github.com/llvm/llvm-project/commit/d5b1643c74eeae327d85c75fe79fd98edb1014f9

In D79508#2044398, @whchung wrote:

In D79508#2044267, @herhut wrote:

In D79508#2044128, @whchung wrote:

In D79508#2043577, @herhut wrote:

@whchung Are you still interested in moving this forward?

@herhut I just revised the patch to address your review comments. Getting rid of the counter for private allocations is less than ideal in my applications but I've found ways to get around them. Could you give this patch another look? Thanks.

I think you misunderstood my comment. I liked having the getters and interface for private attributions. My point was that you do not need to store a count as an attribute for the private attributions but instead could always compute the number of private attributes via getNumOperands() - getType().getNumInputs() - getNumWorkgroupAttributions(). Sorry for not being clear.

Using that approach should also cover your use cases?

@herhut Yes, that's exactly how I compute the number of private attributes right now in my application.

Why not add a getNumPrivateAttributions to the GPUFunc for this? Then the interface is exactly as before.

With that change, I think this is good to land.

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
478	I would prefer if new private attributions would also be added at the end of the existing private attributes, rather than at the front. You could just use `addArgument` for this.

Reinstate getNumPrivateAttributions() in GPUFunc.

Use it in verifier logic, and addPrivateAttribution() logic.

whchung marked an inline comment as done.May 20 2020, 8:18 AM

whchung added inline comments.

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
478	@herhut `addArgument` might not guard the case `addPrivateAttribution` is used prior to `addWorkgroupAttribution`. Now I have revised the patch to re-introduce `getNumPrivateAttribution` we can leverage it here.

herhut added inline comments.May 20 2020, 9:40 AM

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
478	I don't understand this. Private attributions are always at the end. So using `getBody().front().addArgument(...)` would insert them at the end, where they belong. If you then use `addWorkgroupAttribution`, it will insert in front of the private one, as `getNumWorkgroupAttributions` would return 0, so it inserts directly after the function arguments.

Harbormaster completed remote builds in B57386: Diff 265256.May 20 2020, 9:49 AM

Simplify logic.

whchung marked 2 inline comments as done.May 20 2020, 10:21 AM

whchung added inline comments.

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
478	Yes you are right. I've revised the patch with simplified logic.

Thanks!

This revision is now accepted and ready to land.May 20 2020, 11:04 AM

Harbormaster completed remote builds in B57412: Diff 265295.May 20 2020, 11:27 AM

Closed by commit rGad398164bac0: [mlir][gpu] Refactor functions for workgroup and private buffer attributions. (authored by whchung). · Explain WhyMay 20 2020, 2:53 PM

This revision was automatically updated to reflect the committed changes.

whchung marked an inline comment as done.

rriddle added inline comments.May 27 2020, 11:52 AM

mlir/include/mlir/Dialect/GPU/GPUOps.td
203	This doesn't make sense to me, when does GPUFuncOp ever have operands? Did you mean to use numArguments here? This looks like a lack of testing coverage if so.

whchung added inline comments.May 27 2020, 12:46 PM

mlir/include/mlir/Dialect/GPU/GPUOps.td
203	Thanks for catching this. Indeed this is a bug not covered by any existing tests. I'll submit another patch addressing this with a test.

whchung mentioned this in D80766: [mlir][gpu] Fix logic error in D79508 computing number of private attributions..May 28 2020, 3:30 PM

whchung marked 3 inline comments as done.May 28 2020, 3:31 PM

whchung added inline comments.

mlir/include/mlir/Dialect/GPU/GPUOps.td
203	@rriddle I submitted D80766 to fix this.

whchung mentioned this in rG603b974cf710: [mlir][gpu] Fix logic error in D79508 computing number of private attributions..Jun 8 2020, 5:58 AM

Diff 264899

mlir/include/mlir/Dialect/GPU/GPUOps.td

Show First 20 Lines • Show All 155 Lines • ▼ Show 20 Lines	def GPU_GPUFuncOp : GPU_Op<"func", [HasParent<"GPUModuleOp">,
let builders = [		let builders = [
OpBuilder<"OpBuilder &builder, OperationState &result, StringRef name, "		OpBuilder<"OpBuilder &builder, OperationState &result, StringRef name, "
"FunctionType type, ArrayRef<Type> workgroupAttributions = {}, "		"FunctionType type, ArrayRef<Type> workgroupAttributions = {}, "
"ArrayRef<Type> privateAttributions = {}, "		"ArrayRef<Type> privateAttributions = {}, "
"ArrayRef<NamedAttribute> attrs = {}">		"ArrayRef<NamedAttribute> attrs = {}">
];		];

let extraClassDeclaration = [{		let extraClassDeclaration = [{
/// Adds a workgroup attribution of the MemRef type with the given shape and
/// element type.
Value addWorkgroupAttribution(ArrayRef<int64_t> shape, Type elementType);

/// Returns `true` if the GPU function defined by this Op is a kernel, i.e.		/// Returns `true` if the GPU function defined by this Op is a kernel, i.e.
/// it is intended to be launched from host.		/// it is intended to be launched from host.
bool isKernel() {		bool isKernel() {
return getAttrOfType<UnitAttr>(GPUDialect::getKernelFuncAttrName()) !=		return getAttrOfType<UnitAttr>(GPUDialect::getKernelFuncAttrName()) !=
nullptr;		nullptr;
}		}

/// Change the type of this function in place. This is an extremely		/// Change the type of this function in place. This is an extremely
Show All 15 Lines	let extraClassDeclaration = [{

/// Returns a list of block arguments that correspond to buffers located in		/// Returns a list of block arguments that correspond to buffers located in
/// the workgroup memory		/// the workgroup memory
ArrayRef<BlockArgument> getWorkgroupAttributions() {		ArrayRef<BlockArgument> getWorkgroupAttributions() {
auto begin =		auto begin =
std::next(getBody().front().args_begin(), getType().getNumInputs());		std::next(getBody().front().args_begin(), getType().getNumInputs());
auto end = std::next(begin, getNumWorkgroupAttributions());		auto end = std::next(begin, getNumWorkgroupAttributions());
return {begin, end};		return {begin, end};
}		}
		herhutUnsubmitted Done Reply Inline Actions These getters that give access to a range should remain in the header to enable inlining (and maybe avoid materializing the ArrayRef). herhut: These getters that give access to a range should remain in the header to enable inlining (and…

// Adds a new block argument that corresponds to buffers located in		/// Adds a new block argument that corresponds to buffers located in
// workgroup memory.		/// workgroup memory.
BlockArgument addWorkgroupAttribution(Type type) {		BlockArgument addWorkgroupAttribution(Type type);
auto attrName = getNumWorkgroupAttributionsAttrName();
auto attr = getAttrOfType<IntegerAttr>(attrName);
setAttr(attrName, IntegerAttr::get(attr.getType(), attr.getValue() + 1));
return getBody().front().insertArgument(
getType().getNumInputs() + attr.getInt(), type);
}

/// Returns a list of block arguments that correspond to buffers located in		/// Returns a list of block arguments that correspond to buffers located in
/// the private memory.		/// the private memory.
ArrayRef<BlockArgument> getPrivateAttributions() {		ArrayRef<BlockArgument> getPrivateAttributions() {
		rriddleUnsubmitted Done Reply Inline Actions This doesn't make sense to me, when does GPUFuncOp ever have operands? Did you mean to use numArguments here? This looks like a lack of testing coverage if so. rriddle: This doesn't make sense to me, when does GPUFuncOp ever have operands? Did you mean to use…
		whchungAuthorUnsubmitted Not Done Reply Inline Actions Thanks for catching this. Indeed this is a bug not covered by any existing tests. I'll submit another patch addressing this with a test. whchung: Thanks for catching this. Indeed this is a bug not covered by any existing tests. I'll submit…
		whchungAuthorUnsubmitted Done Reply Inline Actions @rriddle I submitted D80766 to fix this. whchung: @rriddle I submitted D80766 to fix this.
		// Buffers on the private memory always come after buffers on the workgroup
		// memory.
auto begin =		auto begin =
std::next(getBody().front().args_begin(),		std::next(getBody().front().args_begin(),
getType().getNumInputs() + getNumWorkgroupAttributions());		getType().getNumInputs() + getNumWorkgroupAttributions());
return {begin, getBody().front().args_end()};		return {begin, getBody().front().args_end()};
}		}

		/// Adds a new block argument that corresponds to buffers located in
		/// private memory.
		BlockArgument addPrivateAttribution(Type type);

		herhutUnsubmitted Done Reply Inline Actions It is not clear to me why we have two versions of this function. While you have not caused this, could you clean it up and unify them? One could implement `addWorkgroupAttribution(ArrayRef<int64_t> shape, Type elementType)` by means of `BlockArgument addWorkgroupAttribution(Type type)`. I see only one use of the former form in `MemoryPromotion`. Changing that use to the type based one and removing the shape/elementType versions would be best. Also, both of these could be implemented in the cpp file and only be declared here. herhut: It is not clear to me why we have two versions of this function. While you have not caused this…
		whchungAuthorUnsubmitted Done Reply Inline Actions @herhut I was somewhat puzzled by the two versions of interfaces too. I've revised the patch so we we consolidate the interfaces. Could you help conduct another round of review? Thanks. whchung: @herhut I was somewhat puzzled by the two versions of interfaces too. I've revised the patch so…
/// Returns the name of the attribute containing the number of buffers		/// Returns the name of the attribute containing the number of buffers
/// located in the workgroup memory.		/// located in the workgroup memory.
static StringRef getNumWorkgroupAttributionsAttrName() {		static StringRef getNumWorkgroupAttributionsAttrName() {
return "workgroup_attributions";		return "workgroup_attributions";
}		}
		herhutUnsubmitted Done Reply Inline Actions Use `getNumWorkgroupAttributions` here? herhut: Use `getNumWorkgroupAttributions` here?

// FunctionLike trait needs access to the functions below.		// FunctionLike trait needs access to the functions below.
		herhutUnsubmitted Done Reply Inline Actions I don't think this is needed. The contract could simple be that all trailing operands after function arguments and workgroup attributions are private. Storing an extra count does not add benefit. You would then need to verify that arguments + all attributions = operand count. So let's just drop the extra attribute. herhut: I don't think this is needed. The contract could simple be that all trailing operands after…
friend class OpTrait::FunctionLike<GPUFuncOp>;		friend class OpTrait::FunctionLike<GPUFuncOp>;

/// Hooks for the input/output type enumeration in FunctionLike .		/// Hooks for the input/output type enumeration in FunctionLike .
unsigned getNumFuncArguments() { return getType().getNumInputs(); }		unsigned getNumFuncArguments() { return getType().getNumInputs(); }
unsigned getNumFuncResults() { return getType().getNumResults(); }		unsigned getNumFuncResults() { return getType().getNumResults(); }

/// Returns the keywords used in the custom syntax for this Op.		/// Returns the keywords used in the custom syntax for this Op.
static StringRef getWorkgroupKeyword() { return "workgroup"; }		static StringRef getWorkgroupKeyword() { return "workgroup"; }
▲ Show 20 Lines • Show All 460 Lines • Show Last 20 Lines

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp

Show First 20 Lines • Show All 451 Lines • ▼ Show 20 Lines	static LogicalResult verify(LaunchFuncOp op) {

return success();		return success();
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// GPUFuncOp		// GPUFuncOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// Adds a workgroup attribution to "op" of the MemRef type with the given shape		/// Adds a new block argument that corresponds to buffers located in
/// and element type.		/// workgroup memory.
Value GPUFuncOp::addWorkgroupAttribution(ArrayRef<int64_t> shape,		BlockArgument GPUFuncOp::addWorkgroupAttribution(Type type) {
Type elementType) {		auto attrName = getNumWorkgroupAttributionsAttrName();
unsigned pos = getNumFuncArguments() + getNumWorkgroupAttributions();		auto attr = getAttrOfType<IntegerAttr>(attrName);
Block &bodyBlock = body().front();		setAttr(attrName, IntegerAttr::get(attr.getType(), attr.getValue() + 1));
Value attribution = bodyBlock.insertArgument(		return getBody().front().insertArgument(
std::next(bodyBlock.args_begin(), pos),		getType().getNumInputs() + attr.getInt(), type);
MemRefType::get(shape, elementType, /affineMapComposition=/{},		}
GPUDialect::getWorkgroupAddressSpace()));
auto numWorkgroupBuffersAttr =		/// Adds a new block argument that corresponds to buffers located in
getAttrOfType<IntegerAttr>(getNumWorkgroupAttributionsAttrName());		/// private memory.
setAttr(getNumWorkgroupAttributionsAttrName(),		BlockArgument GPUFuncOp::addPrivateAttribution(Type type) {
IntegerAttr::get(numWorkgroupBuffersAttr.getType(),		// Buffers on the private memory always come after buffers on the workgroup
numWorkgroupBuffersAttr.getValue() + 1));		// memory.
return attribution;		auto workgroupAttrCount = getNumWorkgroupAttributions();

		return getBody().front().insertArgument(
		getType().getNumInputs() + workgroupAttrCount, type);
		herhutUnsubmitted Done Reply Inline Actions This would just be `{begin, getBody().front().args_end()}` herhut: This would just be `{begin, getBody().front().args_end()}`
		herhutUnsubmitted Done Reply Inline Actions I would prefer if new private attributions would also be added at the end of the existing private attributes, rather than at the front. You could just use `addArgument` for this. herhut: I would prefer if new private attributions would also be added at the end of the existing…
		whchungAuthorUnsubmitted Done Reply Inline Actions @herhut `addArgument` might not guard the case `addPrivateAttribution` is used prior to `addWorkgroupAttribution`. Now I have revised the patch to re-introduce `getNumPrivateAttribution` we can leverage it here. whchung: @herhut `addArgument` might not guard the case `addPrivateAttribution` is used prior to…
		herhutUnsubmitted Done Reply Inline Actions I don't understand this. Private attributions are always at the end. So using `getBody().front().addArgument(...)` would insert them at the end, where they belong. If you then use `addWorkgroupAttribution`, it will insert in front of the private one, as `getNumWorkgroupAttributions` would return 0, so it inserts directly after the function arguments. herhut: I don't understand this. Private attributions are always at the end. So using `getBody().front…
		whchungAuthorUnsubmitted Done Reply Inline Actions Yes you are right. I've revised the patch with simplified logic. whchung: Yes you are right. I've revised the patch with simplified logic.
}		}

void GPUFuncOp::build(OpBuilder &builder, OperationState &result,		void GPUFuncOp::build(OpBuilder &builder, OperationState &result,
StringRef name, FunctionType type,		StringRef name, FunctionType type,
ArrayRef<Type> workgroupAttributions,		ArrayRef<Type> workgroupAttributions,
ArrayRef<Type> privateAttributions,		ArrayRef<Type> privateAttributions,
ArrayRef<NamedAttribute> attrs) {		ArrayRef<NamedAttribute> attrs) {
result.addAttribute(SymbolTable::getSymbolAttrName(),		result.addAttribute(SymbolTable::getSymbolAttrName(),
▲ Show 20 Lines • Show All 305 Lines • Show Last 20 Lines

mlir/lib/Dialect/GPU/Transforms/AllReduceLowering.cpp

Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines	private:
// Creates dimension op of type T, with the result casted to int32.		// Creates dimension op of type T, with the result casted to int32.
template <typename T> Value getDimOp(StringRef dimension) {		template <typename T> Value getDimOp(StringRef dimension) {
Value dim = create<T>(indexType, rewriter.getStringAttr(dimension));		Value dim = create<T>(indexType, rewriter.getStringAttr(dimension));
return create<IndexCastOp>(int32Type, dim);		return create<IndexCastOp>(int32Type, dim);
}		}

/// Adds type to funcOp's workgroup attributions.		/// Adds type to funcOp's workgroup attributions.
Value createWorkgroupBuffer() {		Value createWorkgroupBuffer() {
int workgroupMemoryAddressSpace = 3;		int workgroupMemoryAddressSpace =
		gpu::GPUDialect::getWorkgroupAddressSpace();
auto bufferType =		auto bufferType =
MemRefType::get({kSubgroupSize}, valueType, ArrayRef<AffineMap>{},		MemRefType::get({kSubgroupSize}, valueType, ArrayRef<AffineMap>{},
workgroupMemoryAddressSpace);		workgroupMemoryAddressSpace);
return funcOp.addWorkgroupAttribution(bufferType);		return funcOp.addWorkgroupAttribution(bufferType);
}		}

/// Returns an accumulator factory using either the op attribute or the body		/// Returns an accumulator factory using either the op attribute or the body
/// region.		/// region.
▲ Show 20 Lines • Show All 240 Lines • Show Last 20 Lines

mlir/lib/Dialect/GPU/Transforms/MemoryPromotion.cpp

	Show First 20 Lines • Show All 154 Lines • ▼ Show 20 Lines

	/// Promotes a function argument to workgroup memory in the given function. The			/// Promotes a function argument to workgroup memory in the given function. The
	/// copies will be inserted in the beginning and in the end of the function.			/// copies will be inserted in the beginning and in the end of the function.
	void mlir::promoteToWorkgroupMemory(GPUFuncOp op, unsigned arg) {			void mlir::promoteToWorkgroupMemory(GPUFuncOp op, unsigned arg) {
	Value value = op.getArgument(arg);			Value value = op.getArgument(arg);
	auto type = value.getType().dyn_cast<MemRefType>();			auto type = value.getType().dyn_cast<MemRefType>();
	assert(type && type.hasStaticShape() && "can only promote memrefs");			assert(type && type.hasStaticShape() && "can only promote memrefs");

	Value attribution =			// Get the type of the buffer in the workgroup memory.
	op.addWorkgroupAttribution(type.getShape(), type.getElementType());			int workgroupMemoryAddressSpace = gpu::GPUDialect::getWorkgroupAddressSpace();
				auto bufferType = MemRefType::get(type.getShape(), type.getElementType(), {},
				workgroupMemoryAddressSpace);

				Value attribution = op.addWorkgroupAttribution(bufferType);

	// Replace the uses first since only the original uses are currently present.			// Replace the uses first since only the original uses are currently present.
	// Then insert the copies.			// Then insert the copies.
	value.replaceAllUsesWith(attribution);			value.replaceAllUsesWith(attribution);
	insertCopies(op.getBody(), op.getLoc(), value, attribution);			insertCopies(op.getBody(), op.getLoc(), value, attribution);
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][gpu] Refactor functions for workgroup and private buffer attributions.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 264899

mlir/include/mlir/Dialect/GPU/GPUOps.td

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp

mlir/lib/Dialect/GPU/Transforms/AllReduceLowering.cpp

mlir/lib/Dialect/GPU/Transforms/MemoryPromotion.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][gpu] Refactor functions for workgroup and private buffer attributions.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 264899

mlir/include/mlir/Dialect/GPU/GPUOps.td

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp

mlir/lib/Dialect/GPU/Transforms/AllReduceLowering.cpp

mlir/lib/Dialect/GPU/Transforms/MemoryPromotion.cpp

[mlir][gpu] Refactor functions for workgroup and private buffer attributions.
ClosedPublic