Download Raw Diff

Details

Reviewers

ftynse
bondhugula

Commits

rGab95ba704da4: [mlir][memref] Implement fast lowering of memref.copy

Summary

In the absence of maps, we can lower memref.copy to a memcpy.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

herhut created this revision.Dec 21 2021, 4:00 AM

Herald added subscribers: sdasgup3, wenzhicui, wrengr and 20 others. · View Herald TranscriptDec 21 2021, 4:00 AM

herhut requested review of this revision.Dec 21 2021, 4:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 21 2021, 4:00 AM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Harbormaster completed remote builds in B140224: Diff 395636.Dec 21 2021, 4:17 AM

Do you want to instead create one pattern with two functions in it? It'll lead to less overhead in the greedy rewrite driver and perhaps also easier to later choose between the two.

mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp
825	Doc comment here - along the lines of your revision summary and title.

bondhugula added a reviewer: bondhugula.Dec 21 2021, 10:10 AM

mehdi_amini added inline comments.Dec 21 2021, 11:49 AM

mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp
763	This should likely be documented, I think notifyFailure would do it here.

Add comments and failure reason.

In D116099#3205178, @bondhugula wrote:

Do you want to instead create one pattern with two functions in it? It'll lead to less overhead in the greedy rewrite driver and perhaps also easier to later choose between the two.

I though about that, too. Another way would be to give the memcpy based pattern higher benefit and actually make then non-exclusive. That way one could optionally blend in the memcpy based pattern.

One issue I currently see with this entire thing is that it relies on maps being present as a way to understand whether the memref has identity strides and offset. I am not sure whether this assumption actually holds for all users.

Harbormaster completed remote builds in B140381: Diff 395851.Dec 22 2021, 5:35 AM

In D116099#3206432, @herhut wrote:

In D116099#3205178, @bondhugula wrote:

Do you want to instead create one pattern with two functions in it? It'll lead to less overhead in the greedy rewrite driver and perhaps also easier to later choose between the two.

I though about that, too. Another way would be to give the memcpy based pattern higher benefit and actually make then non-exclusive. That way one could optionally blend in the memcpy based pattern.

All of this still leads to two patterns.

One issue I currently see with this entire thing is that it relies on maps being present as a way to understand whether the memref has identity strides and offset. I am not sure whether this assumption actually holds for all users.

You are just using isIdentity() -- the API doesn't name map outside. In the inside, it's still a map.

mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp
716–717	non-identity maps -> non-identity layout There is no reference to map in the API and in your code below and there isn't a need to. The `MemRefLayoutAttrInterface` hides it.

In D116099#3206683, @bondhugula wrote:

In D116099#3206432, @herhut wrote:

In D116099#3205178, @bondhugula wrote:

Do you want to instead create one pattern with two functions in it? It'll lead to less overhead in the greedy rewrite driver and perhaps also easier to later choose between the two.

I though about that, too. Another way would be to give the memcpy based pattern higher benefit and actually make then non-exclusive. That way one could optionally blend in the memcpy based pattern.

All of this still leads to two patterns.

Yes, my question is more whether that is useful, to have it as two patterns vs. one. The benefit modelling would be one reason. If everybody thinks that this is not useful enough for the cost, I can merge them.

One issue I currently see with this entire thing is that it relies on maps being present as a way to understand whether the memref has identity strides and offset. I am not sure whether this assumption actually holds for all users.

You are just using isIdentity() -- the API doesn't name map outside. In the inside, it's still a map.

I was not clear. I was wondering about the case where the static type of the memref does not have a map but the descriptor at runtime still does not have identity strides. Currently, one can create this setting using memref.reinterpret_cast by omitting the map from the target type. Maybe that should be illegal.

You are just using isIdentity() -- the API doesn't name map outside. In the inside, it's still a map.

I was not clear. I was wondering about the case where the static type of the memref does not have a map but the descriptor at runtime still does not have identity strides. Currently, one can create this setting using memref.reinterpret_cast by omitting the map from the target type. Maybe that should be illegal.

This would be an invalid op and should have failed the verifier in the first place. (Note that the memref always has a map -- it's not printed if it's identity as you know. Also, identity strides (all ones) don't correspond to an identity map -- N^2, N, 1 for example would correspond to an identity map for a 3-d memref for example.)

Combined the two patterns.

In D116099#3225198, @bondhugula wrote:

You are just using isIdentity() -- the API doesn't name map outside. In the inside, it's still a map.

I was not clear. I was wondering about the case where the static type of the memref does not have a map but the descriptor at runtime still does not have identity strides. Currently, one can create this setting using memref.reinterpret_cast by omitting the map from the target type. Maybe that should be illegal.

This would be an invalid op and should have failed the verifier in the first place. (Note that the memref always has a map -- it's not printed if it's identity as you know. Also, identity strides (all ones) don't correspond to an identity map -- N^2, N, 1 for example would correspond to an identity map for a 3-d memref for example.)

I have sent https://reviews.llvm.org/D116601 to improve verification. One can still create this case dynamically but as we consider that illegal, the result of memref.copy is undefined in such cases, so it is fine to just use the intrinsics.

Harbormaster completed remote builds in B142032: Diff 398072.Jan 7 2022, 1:21 AM

LGTM - thanks!

This revision is now accepted and ready to land.Jan 7 2022, 7:17 AM

rebase

Herald added a subscriber: awarzynski. · View Herald TranscriptJan 11 2022, 7:31 AM

Harbormaster completed remote builds in B142655: Diff 398942.Jan 11 2022, 7:46 AM

fix test

Harbormaster completed remote builds in B143119: Diff 399613.Jan 13 2022, 3:42 AM

Closed by commit rGab95ba704da4: [mlir][memref] Implement fast lowering of memref.copy (authored by herhut). · Explain WhyJan 14 2022, 5:22 AM

This revision was automatically updated to reflect the committed changes.

herhut added a commit: rGab95ba704da4: [mlir][memref] Implement fast lowering of memref.copy.

Diff 399959

mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp

Show First 20 Lines • Show All 700 Lines • ▼ Show 20 Lines	if (srcType.isa<MemRefType>() && dstType.isa<UnrankedMemRefType>()) {
auto loadOp = rewriter.create<LLVM::LoadOp>(loc, castPtr);		auto loadOp = rewriter.create<LLVM::LoadOp>(loc, castPtr);
rewriter.replaceOp(memRefCastOp, loadOp.getResult());		rewriter.replaceOp(memRefCastOp, loadOp.getResult());
} else {		} else {
llvm_unreachable("Unsupported unranked memref to unranked memref cast");		llvm_unreachable("Unsupported unranked memref to unranked memref cast");
}		}
}		}
};		};

		/// Pattern to lower a `memref.copy` to llvm.
		///
		/// For memrefs with identity layouts, the copy is lowered to the llvm
		/// `memcpy` intrinsic. For non-identity layouts, the copy is lowered to a call
		/// to the generic `MemrefCopyFn`.
struct MemRefCopyOpLowering : public ConvertOpToLLVMPattern<memref::CopyOp> {		struct MemRefCopyOpLowering : public ConvertOpToLLVMPattern<memref::CopyOp> {
using ConvertOpToLLVMPattern<memref::CopyOp>::ConvertOpToLLVMPattern;		using ConvertOpToLLVMPattern<memref::CopyOp>::ConvertOpToLLVMPattern;

LogicalResult		LogicalResult
		bondhugulaUnsubmitted Not Done Reply Inline Actions non-identity maps -> non-identity layout There is no reference to map in the API and in your code below and there isn't a need to. The `MemRefLayoutAttrInterface` hides it. bondhugula: non-identity maps -> non-identity layout There is no reference to map in the API and in your…
matchAndRewrite(memref::CopyOp op, OpAdaptor adaptor,		lowerToMemCopyIntrinsic(memref::CopyOp op, OpAdaptor adaptor,
ConversionPatternRewriter &rewriter) const override {		ConversionPatternRewriter &rewriter) const {
		auto loc = op.getLoc();
		auto srcType = op.source().getType().dyn_cast<MemRefType>();

		MemRefDescriptor srcDesc(adaptor.source());

		// Compute number of elements.
		Value numElements;
		for (int pos = 0; pos < srcType.getRank(); ++pos) {
		auto size = srcDesc.size(rewriter, loc, pos);
		numElements = numElements
		? rewriter.create<LLVM::MulOp>(loc, numElements, size)
		: size;
		}
		// Get element size.
		auto sizeInBytes = getSizeInBytes(loc, srcType.getElementType(), rewriter);
		// Compute total.
		Value totalSize =
		rewriter.create<LLVM::MulOp>(loc, numElements, sizeInBytes);

		Value srcBasePtr = srcDesc.alignedPtr(rewriter, loc);
		MemRefDescriptor targetDesc(adaptor.target());
		Value targetBasePtr = targetDesc.alignedPtr(rewriter, loc);
		Value isVolatile = rewriter.create<LLVM::ConstantOp>(
		loc, typeConverter->convertType(rewriter.getI1Type()),
		rewriter.getBoolAttr(false));
		rewriter.create<LLVM::MemcpyOp>(loc, targetBasePtr, srcBasePtr, totalSize,
		isVolatile);
		rewriter.eraseOp(op);

		return success();
		}

		LogicalResult
		lowerToMemCopyFunctionCall(memref::CopyOp op, OpAdaptor adaptor,
		ConversionPatternRewriter &rewriter) const {
auto loc = op.getLoc();		auto loc = op.getLoc();
auto srcType = op.source().getType().cast<BaseMemRefType>();		auto srcType = op.source().getType().cast<BaseMemRefType>();
auto targetType = op.target().getType().cast<BaseMemRefType>();		auto targetType = op.target().getType().cast<BaseMemRefType>();

// First make sure we have an unranked memref descriptor representation.		// First make sure we have an unranked memref descriptor representation.
auto makeUnranked = [&, this](Value ranked, BaseMemRefType type) {		auto makeUnranked = [&, this](Value ranked, BaseMemRefType type) {
auto rank = rewriter.create<LLVM::ConstantOp>(		auto rank = rewriter.create<LLVM::ConstantOp>(
loc, getIndexType(), rewriter.getIndexAttr(type.getRank()));		loc, getIndexType(), rewriter.getIndexAttr(type.getRank()));
auto *typeConverter = getTypeConverter();		auto *typeConverter = getTypeConverter();
		mehdi_aminiUnsubmitted Done Reply Inline Actions This should likely be documented, I think notifyFailure would do it here. mehdi_amini: This should likely be documented, I think notifyFailure would do it here.
auto ptr =		auto ptr =
typeConverter->promoteOneMemRefDescriptor(loc, ranked, rewriter);		typeConverter->promoteOneMemRefDescriptor(loc, ranked, rewriter);
auto voidPtr =		auto voidPtr =
rewriter.create<LLVM::BitcastOp>(loc, getVoidPtrType(), ptr)		rewriter.create<LLVM::BitcastOp>(loc, getVoidPtrType(), ptr)
.getResult();		.getResult();
auto unrankedType =		auto unrankedType =
UnrankedMemRefType::get(type.getElementType(), type.getMemorySpace());		UnrankedMemRefType::get(type.getElementType(), type.getMemorySpace());
return UnrankedMemRefDescriptor::pack(rewriter, loc, *typeConverter,		return UnrankedMemRefDescriptor::pack(rewriter, loc, *typeConverter,
Show All 28 Lines	lowerToMemCopyFunctionCall(memref::CopyOp op, OpAdaptor adaptor,
auto copyFn = LLVM::lookupOrCreateMemRefCopyFn(		auto copyFn = LLVM::lookupOrCreateMemRefCopyFn(
op->getParentOfType<ModuleOp>(), getIndexType(), sourcePtr.getType());		op->getParentOfType<ModuleOp>(), getIndexType(), sourcePtr.getType());
rewriter.create<LLVM::CallOp>(loc, copyFn,		rewriter.create<LLVM::CallOp>(loc, copyFn,
ValueRange{elemSize, sourcePtr, targetPtr});		ValueRange{elemSize, sourcePtr, targetPtr});
rewriter.eraseOp(op);		rewriter.eraseOp(op);

return success();		return success();
}		}

		LogicalResult
		matchAndRewrite(memref::CopyOp op, OpAdaptor adaptor,
		ConversionPatternRewriter &rewriter) const override {
		auto srcType = op.source().getType().cast<BaseMemRefType>();
		auto targetType = op.target().getType().cast<BaseMemRefType>();

		if (srcType.hasRank() &&
		srcType.cast<MemRefType>().getLayout().isIdentity() &&
		targetType.hasRank() &&
		targetType.cast<MemRefType>().getLayout().isIdentity())
		return lowerToMemCopyIntrinsic(op, adaptor, rewriter);

		return lowerToMemCopyFunctionCall(op, adaptor, rewriter);
		}
};		};

/// Extracts allocated, aligned pointers and offset from a ranked or unranked		/// Extracts allocated, aligned pointers and offset from a ranked or unranked
		bondhugulaUnsubmitted Done Reply Inline Actions Doc comment here - along the lines of your revision summary and title. bondhugula: Doc comment here - along the lines of your revision summary and title.
/// memref type. In unranked case, the fields are extracted from the underlying		/// memref type. In unranked case, the fields are extracted from the underlying
/// ranked descriptor.		/// ranked descriptor.
static void extractPointersAndOffset(Location loc,		static void extractPointersAndOffset(Location loc,
ConversionPatternRewriter &rewriter,		ConversionPatternRewriter &rewriter,
LLVMTypeConverter &typeConverter,		LLVMTypeConverter &typeConverter,
Value originalOperand,		Value originalOperand,
Value convertedOperand,		Value convertedOperand,
Value allocatedPtr, Value alignedPtr,		Value allocatedPtr, Value alignedPtr,
▲ Show 20 Lines • Show All 903 Lines • Show Last 20 Lines

mlir/test/mlir-cpu-runner/copy.mlir

Show All 29 Lines	func @main() -> () {
memref.copy %input, %copy : memref<2x3xf32> to memref<2x3xf32>		memref.copy %input, %copy : memref<2x3xf32> to memref<2x3xf32>
%unranked_copy = memref.cast %copy : memref<2x3xf32> to memref<*xf32>		%unranked_copy = memref.cast %copy : memref<2x3xf32> to memref<*xf32>
call @print_memref_f32(%unranked_copy) : (memref<*xf32>) -> ()		call @print_memref_f32(%unranked_copy) : (memref<*xf32>) -> ()
// CHECK: rank = 2 offset = 0 sizes = [2, 3] strides = [3, 1]		// CHECK: rank = 2 offset = 0 sizes = [2, 3] strides = [3, 1]
// CHECK-NEXT: [0, 1, 2]		// CHECK-NEXT: [0, 1, 2]
// CHECK-NEXT: [3, 4, 5]		// CHECK-NEXT: [3, 4, 5]

%copy_two = memref.alloc() : memref<3x2xf32>		%copy_two = memref.alloc() : memref<3x2xf32>
%copy_two_casted = memref.reinterpret_cast %copy_two to offset: [0], sizes: [2, 3], strides:[1, 2]		%copy_two_casted = memref.reinterpret_cast %copy_two to offset: [0], sizes: [2, 3], strides: [1, 2]
: memref<3x2xf32> to memref<2x3xf32, offset: 0, strides: [1, 2]>		: memref<3x2xf32> to memref<2x3xf32, offset: 0, strides: [1, 2]>
memref.copy %input, %copy_two_casted : memref<2x3xf32> to memref<2x3xf32, offset: 0, strides: [1, 2]>		memref.copy %input, %copy_two_casted : memref<2x3xf32> to memref<2x3xf32, offset: 0, strides: [1, 2]>
%unranked_copy_two = memref.cast %copy_two : memref<3x2xf32> to memref<*xf32>		%unranked_copy_two = memref.cast %copy_two : memref<3x2xf32> to memref<*xf32>
call @print_memref_f32(%unranked_copy_two) : (memref<*xf32>) -> ()		call @print_memref_f32(%unranked_copy_two) : (memref<*xf32>) -> ()
// CHECK: rank = 2 offset = 0 sizes = [3, 2] strides = [2, 1]		// CHECK: rank = 2 offset = 0 sizes = [3, 2] strides = [2, 1]
// CHECK-NEXT: [0, 3]		// CHECK-NEXT: [0, 3]
// CHECK-NEXT: [1, 4]		// CHECK-NEXT: [1, 4]
// CHECK-NEXT: [2, 5]		// CHECK-NEXT: [2, 5]

%input_empty = memref.alloc() : memref<3x0x1xf32>		%input_empty = memref.alloc() : memref<3x0x1xf32>
%copy_empty = memref.alloc() : memref<3x0x1xf32>		%copy_empty = memref.alloc() : memref<3x0x1xf32>
// Copying an empty shape should do nothing (and should not crash).		// Copying an empty shape should do nothing (and should not crash).
memref.copy %input_empty, %copy_empty : memref<3x0x1xf32> to memref<3x0x1xf32>		memref.copy %input_empty, %copy_empty : memref<3x0x1xf32> to memref<3x0x1xf32>

		%input_empty_casted = memref.reinterpret_cast %input_empty to offset: [0], sizes: [0, 3, 1], strides: [3, 1, 1]
		: memref<3x0x1xf32> to memref<0x3x1xf32, offset: 0, strides: [3, 1, 1]>
		%copy_empty_casted = memref.alloc() : memref<0x3x1xf32>
		// Copying a casted empty shape should do nothing (and should not crash).
		memref.copy %input_empty_casted, %copy_empty_casted : memref<0x3x1xf32, offset: 0, strides: [3, 1, 1]> to memref<0x3x1xf32>

memref.dealloc %copy_empty : memref<3x0x1xf32>		memref.dealloc %copy_empty : memref<3x0x1xf32>
memref.dealloc %input_empty : memref<3x0x1xf32>		memref.dealloc %input_empty : memref<3x0x1xf32>
memref.dealloc %copy_two : memref<3x2xf32>		memref.dealloc %copy_two : memref<3x2xf32>
memref.dealloc %copy : memref<2x3xf32>		memref.dealloc %copy : memref<2x3xf32>
memref.dealloc %input : memref<2x3xf32>		memref.dealloc %input : memref<2x3xf32>
return		return
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][memref] Implement fast lowering of memref.copy
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 399959

mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp

mlir/test/mlir-cpu-runner/copy.mlir

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][memref] Implement fast lowering of memref.copyClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 399959

mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp

mlir/test/mlir-cpu-runner/copy.mlir

[mlir][memref] Implement fast lowering of memref.copy
ClosedPublic