This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Conversion/
-
mlir/
-
Conversion/
1/1
Passes.td
-
StandardToLLVM/
2/2
ConvertStandardToLLVMPass.h
-
lib/
-
Analysis/
-
Utils.cpp
-
Conversion/StandardToLLVM/
-
StandardToLLVM/
37/37
StandardToLLVM.cpp
-
Dialect/StandardOps/IR/
-
StandardOps/
-
IR/
-
Ops.cpp
-
test/Conversion/StandardToLLVM/
-
Conversion/
-
StandardToLLVM/
1/2
convert-dynamic-memref-ops.mlir

Differential D77528

[MLIR] Add support to use aligned_alloc to lower AllocOp from std to llvm
ClosedPublic

Authored by bondhugula on Apr 6 2020, 12:55 AM.

Download Raw Diff

Details

Reviewers

nicolasvasilache
ftynse
dcaballe

Commits

rG01d97a35493a: [MLIR] Add support to use aligned_alloc to lower AllocOp from std to llvm

Summary

Support to recognize and deal with aligned_alloc was recently added to
LLVM's TLI/MemoryBuiltins and its various optimization passes. This
revision adds support for generation of aligned_alloc's when lowering
AllocOp from std to LLVM. Setting 'use-aligned_alloc=1' will lead to
aligned_alloc being used for all heap allocations. An alignment and size
that works with the constraints of aligned_alloc is chosen.

Using aligned_alloc is preferable to "using malloc and adjusting the
allocated pointer to align for indexing" because the pointer access
arithmetic done for the latter only makes it harder for LLVM passes to
deal with for analysis, optimization, attribute deduction, and rewrites.

Depends on D76602.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

bondhugula created this revision.Apr 6 2020, 12:55 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2020, 12:55 AM

Herald added subscribers: llvm-commits, grosul1, Joonsoo and 12 others. · View Herald Transcript

bondhugula edited the summary of this revision. (Show Details)Apr 6 2020, 12:55 AM

bondhugula added reviewers: nicolasvasilache, ftynse, dcaballe.

mehdi_amini added inline comments.Apr 6 2020, 1:10 AM

mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
50	Please don't introduce any new global command line option and thread them through the APIs. We're looking to eliminate every global initializers under mlir/lib/... : https://bugs.llvm.org/show_bug.cgi?id=45437

Harbormaster completed remote builds in B51890: Diff 255244.Apr 6 2020, 1:36 AM

ftynse added inline comments.Apr 6 2020, 1:44 AM

mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1404	Can't you just rely on the Optional result being None to signal that aligned alloc is not required?
1425	Prefer early return
1427	I wouldn't add a lambda that is only called once immediately after its definition.
1482	Nit: putting the comment above `if` makes it less surprising in terms of indentation. Otherwise, it feels like there is incorrect indentation because of elided braces and the actual statement does not belong to the`if`.
1594	I am a bit concerned about hardcoding the "typical" value. Can we make it parametric instead?

Thanks for the review.

mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1404	We still use aligned_alloc if the elt size is > 16 bytes (in spite of no alignment attribute). The code would otherwise crash since the loads/stores won't have alignment guaranteed (with malloc) but the they would be lowered by LLVM via aligned load/stores. (LLVM in the absence of alignment attributes on load/store would assume ABI alignment which for say vector<8xf32> elt types would be 256-bit boundaries).
1425	Thanks.
1427	Hmm... this is just for better readability - it gives a name / auto documents a code block without the need to outline it into a function or add an explicit comment. I've seen this as a standard practice.
1482	Sure, thanks.
1594	I had sort of a similar concern. But 16 bytes is pretty much what glibc malloc gives on nearly every system we have (on probably really old ones, it was perhaps 8 bytes). Did you want a pass flag and then letting 16 be the default - that would be too much plumbing (just like alignedAlloc). This is already a parameter of sorts.

bondhugula updated this revision to Diff 255270.Apr 6 2020, 3:06 AM

Address review comments.

bondhugula marked an inline comment as done.Apr 6 2020, 3:06 AM

Harbormaster completed remote builds in B51906: Diff 255270.Apr 6 2020, 3:45 AM

ftynse added inline comments.Apr 6 2020, 5:08 AM

mlir/include/mlir/Conversion/Passes.td
228–229	Nit: I suppose this option should be removed by the previous commit, instead of rewritten by this one.
mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1404	We still use aligned_alloc if the elt size is > 16 bytes At which point, this function returns the alignment value rather than `llvm::None`. The only situation where it returns a non-`None` value and sets `useAlignedAlloc` to `false` is for `AllocaOp`. I would consider refactoring this function to only work on `AllocOp`, and querying the `AllocaOp` allocation separately in the call place. This would make the API simpler and shorten the code. And you wouldn't need an extra boolean result. (LLVM in the absence of alignment attributes on load/store would assume ABI alignment which for say vector<8xf32> elt types would be 256-bit boundaries). I suppose we can just take the ABI alignment into account when lowering the `alloc` operation (we already do so for the `alignment` attribute). Not all platforms have `aligned_alloc`.
1427	This does not seem to be common practice in MLIR. FWIW, I find it less readable than just writing int64_t constEltSizeBytes = 0; if (auto vectorType = elementType.template dyn_cast<VectorType>()) constEltSizeBytes = vectorType.getNumElements() * llvm::divideCeil(vectorType.getElementTypeBitWidth(), 8); else constEltSizeBytes = llvm::divideCeil(elementType.getIntOrFloatBitWidth(), 8); // Use aligned_alloc if elt_size > malloc's alignment. bool isMallocAlignmentSufficient = constEltSizeBytes > kMallocAlignment; useAlignedAlloc = isMallocAlignmentSufficient; Since you already have the block comment immediately above it anyway, and variables can have names just as well as lambdas. The lambda also mutates a global state that it captures by-reference, so the only effects of lambda are: (1) extra indentation; (2) extra LoC; and (3) extra concepts leading to cognitive overhead.
1463–1464	I would just have auto allocaOp = dyn_cast<AllocaOp>(op); if (allocaOp) { allocatedBytePtr = nullptr; accessAlignment = nullptr; return rewriter.create<LLVM::AllocaOp>( loc, elementPtrType, cumulativeSize, allocaOp.alignment() ? allocaOp.alignemnt().getValue() : 0); } and sink the `getAllocationAlignment` below, all while making it handle only `AllocOp`.
1594	The world is not limited to glibc. MLIR should also work on other platforms, and you essentially shift the burden of the plumbing you didn't do to people debugging builds on those platforms. You can have one pass option that corresponds to malloc alignment and, if it is set to 0, treat it as "never use aligned_alloc".

Address some of the review comments.

bondhugula marked 7 inline comments as done.Apr 6 2020, 7:35 AM

bondhugula added inline comments.

mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1404	I would consider refactoring this function to only work on AllocOp, and querying the AllocaOp allocation separately in the Sure, sounds good. I suppose we can just take the ABI alignment into account when lowering the alloc operation (we already do so for the Yes, we should do this with malloc as well. This can be a TODO and should be done in a separate revision (because it isn't to do with aligned_alloc but fixing malloc lowering).
1427	Hmm... the demarcation/isolation is important I feel. I'm fine with changing to the straightline style but out of curiosity and for future purposes as well, it'll be good to have a third person view here on coding style as far as such patterns go: @mehdi_amini - is there a guideline here?
1463–1464	Sure.
1594	Sorry, I didn't quite understand. What should the pass options be and what should the behavior and the default behavior be?

Address most of the review comments.

bondhugula added inline comments.Apr 6 2020, 8:03 AM

mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1404	Not all platforms have aligned_alloc. Btw, aligned_alloc is a C11 and C++11 standard, and it's available on Windows too (it's posix_memalign that isn't). And of course there are systems where there isn't any dynamic memory allocation at all.

Harbormaster completed remote builds in B51933: Diff 255320.Apr 6 2020, 8:06 AM

Remove stale comment.

Harbormaster completed remote builds in B51943: Diff 255336.Apr 6 2020, 8:38 AM

It is my understanding this CL does not change previous behavior, is this accurate?

mlir/include/mlir/Conversion/StandardToLLVM/ConvertStandardToLLVMPass.h
44	Same comment as in the parent revision, this is a silently breaking API change that will impact integrations. I'd just use the struct that you'd have introduced in the previous revision and document that this is only using 2 fields.
66–67	same discussion re silently breaking API changes
mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1415	Alignment needed is really a target-specific thing isn't it? I would have expected something that looks at DataLayout and that brings in the can of worms we have been punting on (I understand once flang is in MLIR core we will want to reopen it). Can this part be dropped from this revision, esp in light of @ftynse's comments? I also have some fun micro ARM targets I will need to test some of this on, the smaller the baked in assumptions the better.
1427	I am generally a fan of such style (esp. when mixed with functional combinators), so I'd vote for +1 it when it makes sense.

Harbormaster completed remote builds in B51957: Diff 255355.Apr 6 2020, 9:45 AM

mehdi_amini added inline comments.Apr 7 2020, 2:58 AM

mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1427	@bondhugula this seems too detailed to have a guideline :) I wouldn't say it is "common", but probably not unheard of? I have been doing this myself but in general not calling it right after, rather to outline a block of code outside of a loop to make the loop shorter and easier to read, or similar situation (getting large boilerplate out of the way and "naming it"). @rriddle ?

ftynse added inline comments.Apr 7 2020, 2:59 AM

mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1427	Generally, I would be strongly opposed to defining style guidelines based on a single use of a construct in a single diff, where only a small subset of contributors could participate (and before you can object that review history is public, are you reading all comments on all diffs?). I would be also opposed to having to define additional rules until we have to. I am not generally opposed to helper lambdas, I just don't see any benefit from this specific one, only drawbacks. And a lambda that the entire environment by reference is not exactly my definition of isolation.
1594	Normally, you would have two pass options (and a configuration `struct` for the constructors like Nicolas proposed in another patch to decrease the amount of churn in pass constructor APIs): `-use-aligned-alloc` and `-assume-malloc-alignment`. If you don't want two separate options, you could get away with one `-use-aligned-alloc-and-assume-malloc-alignment` (did not think about a better name). If it is set to zero (default), the conversion doesn't use aligned_alloc at all. If it is set to non-zero, the conversion uses aligned_alloc and treats the option value as malloc alignment in order to also use aligned_alloc in relevant cases.

LowerToLLVM struct for options.

@nicolasvasilache wrote:

It is my understanding this CL does not change previous behavior, is this accurate?

It actually changes it in one case: if the alloc op didn't have alignment specified *and* its elt type was larger than 16 bytes, this revision would make it use aligned_alloc. I can change this and let it continue to use malloc, but note that the malloc in such cases would have generated code that would crash. So this is only fixing things for the previous behavior it's changing. And we should also change malloc path to do the alignment to elt type boundaries for > 16 bytes by default.

mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1427	As Mehdi now confirms, this is too detailed to have a style guideline. The lambda demarcates the start and end of the thing it's naming/auto-documenting - you don't get it from code comments alone and I see it better for readability. Given @ntv's and @mehdi_amini's comments, I'm now strongly inclined to retain it.

bondhugula marked 2 inline comments as done.Apr 7 2020, 3:55 AM

bondhugula added inline comments.

mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1594	The configuration struct is now done (in the parent revision). How about just keeping it simple with -use-aligned-alloc and not changing previous/existing behavior when -use-aligned-alloc is not provided? This revision is not about tinkering with malloc alignment handling. Update - PTAL. Thanks for all the feedback.

Missed change.

Remove debug code.

Harbormaster failed remote builds in B52133: Diff 255627!Apr 7 2020, 4:17 AM

ftynse marked an inline comment as done.Apr 7 2020, 4:44 AM

ftynse added inline comments.

mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1404	Yes, I know it's C11 (but C++17, most changes in C11 did not make it to C++11), but unfortunately, I don't think we can assume all target systems have C11, let alone C++17.
1427	The lambda demarcates the start and end of the thing it's naming/auto-documenting - you don't get it from code comments alone and I see it better for readability. Well, you have a block comment right above it (modulo the variable declaration that is only used inside the lambda), so I wouldn't call it auto-documenting since you felt like you needed to write documentation for it. And having a named variable of lambda-type or an identically-named variable of boolean type still gives you exactly the same naming scheme. Given @ntv's and @mehdi_amini's comments, I'm now strongly inclined to retain it. I read Mehdi's comment differently: but in general not calling it right after rather to outline a block of code outside of a loop to make the loop shorter do not seem to necessarily support your usage here (neither does it contradict). Readability is a very subjective thing. I was reading your code for review purposes and this construct did make me lose time and expand more energy than for a straight-line code here, so for me personally it decreased readability. Namely because it (a) mutates an implicitly captured variable and (b) requires to unwrap more abstractions mentally. Anyway, I won't block the commit just because of a stylistic discussion. I can suggest an alternative that would address part of my readability concerns: int64_t constEltSizeBytes = [elementType]() { if (auto vectorType = elementType.template dyn_cast<VectorType>()) return vectorType.getNumElements() * llvm::divideCeil(vectorType.getElementTypeBitWidth(), 8); else return llvm::divideCeil(elementType.getIntOrFloatBitWidth(), 8); }(); bool isMallocAlignmentSufficient = constEltSizeBytes > kMallocAlignment; This removes implicit by-reference capture, makes it clear that you do not intend for the lambda to be reused (named lambda would be also okay since it doesn't store references anyway, but there's no point), and this way of using lambdas is actually considered a C++11 idiom for comlex constant initialization (https://groups.google.com/a/isocpp.org/g/std-discussion/c/FBjcR4WJlkU/m/nQnsSOziq04J) so one can claim it's "common enough".
1594	How about just keeping it simple with -use-aligned-alloc and not changing previous/existing behavior when -use-aligned-alloc is not provided? Works for me!

Harbormaster completed remote builds in B52135: Diff 255631.Apr 7 2020, 4:50 AM

Harbormaster completed remote builds in B52138: Diff 255634.

rriddle added inline comments.Apr 7 2020, 1:17 PM

mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1427	FWIWI, there is already a guide on using lamdas for computing predicates: https://llvm.org/docs/CodingStandards.html#turn-predicate-loops-into-predicate-functions

mehdi_amini added inline comments.Apr 7 2020, 9:43 PM

mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1427	While I haven't seen this pattern used as it is here widely in MLIR (and I like consistency), I just didn't feel any reason for me to really oppose it, but I wasn't really supporting it as-is either. The fact that @ftynse opposed to this on principle of readability seems to hint that it would be better to not do it. I would like have written it in a form around this (with a comment possibly): int64_t constEltSizeBytes = llvm::divideCeil(vectorType.getElementTypeBitWidth(), 8); if (auto vectorType = elementType.template dyn_cast<VectorType>()) constEltSizeBytes = constEltSizeBytes * vectorType.getNumElements(); bool isMallocAlignmentSufficient = constEltSizeBytes > kMallocAlignment; Everything is named as well and you avoid the readability overhead of the lambda.
1427	@rriddle: the link is about free functions I believe, which I see differently than lambda: the rational explained there is close to what I mention before of moving the predicate code "out of the way" of the main logic (and reducing the indentation, etc), which you don't get with the immediately local lambda.

@ftynse Everything's addressed now but you may want to take another look because I had completely missed the fact earlier that aligned_alloc only supports a size that is a multiple of alignment. So I have additional handling (and test cases for that) now to bump the allocation size to the next multiple of alignment. Also, if the elt size is not a power of two (and an alignment attribute doesn't exist), the next power of two is used for alignment (although this isn't an interesting case for aligned allocation, didn't want to punt to malloc for this to keep things simple/clear). So, all heap allocations use aligned_alloc whenever -use-aligned-alloc is set.

mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1415	All of this now dropped. -use-aligned-alloc is now independent of what malloc can do. The malloc path remains untouched.
1427	Thank you all for commenting. This discussion is now moot since the aligned_alloc generation is now untangled from being contingent on this conditional / what malloc supports, and so this whole lambda and conditionals are gone.
1594	Done - aligned_alloc is now used only with -use-aligned-alloc, and for all heap allocations whenever that cmd line flag exists.

Address review comments. Also, handle when the size may not be a multiple of alignment.

Harbormaster failed remote builds in B52292: Diff 255893!Apr 7 2020, 9:47 PM

Everything's addressed now but you may want to take another look because I had completely missed the fact earlier that aligned_alloc only supports a size that is a multiple of alignment.

I was going to ask what happened in case of memref<?xvector<3xf32>> but got distracted by the lambda discussion :)

mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1596	Typo: Aigned

This revision is now accepted and ready to land.Apr 8 2020, 2:08 AM

Add another test case.

In D77528#1968908, @ftynse wrote:

Everything's addressed now but you may want to take another look because I had completely missed the fact earlier that aligned_alloc only supports a size that is a multiple of alignment.

I was going to ask what happened in case of memref<?xvector<3xf32>> but got distracted by the lambda discussion :)

The allocation size will be bumped to a multiple of the alignment. With memref<?xvector<3xf32>, if there is no alignment attribute, the next power of two larger than 12, i.e., 16, will be the alignment, and the allocation size will be bumped to a multiple of 16 ('cumulativeSize' will bumped up).

Address review comments.

bondhugula edited the summary of this revision. (Show Details)Apr 8 2020, 2:37 AM

bondhugula edited the summary of this revision. (Show Details)

Closed by commit rG01d97a35493a: [MLIR] Add support to use aligned_alloc to lower AllocOp from std to llvm (authored by bondhugula). · Explain WhyApr 8 2020, 2:41 AM

This revision was automatically updated to reflect the committed changes.

Harbormaster failed remote builds in B52318: Diff 255932!Apr 8 2020, 3:12 AM

Harbormaster completed remote builds in B52316: Diff 255929.

This seems to have broken the build with gcc-5, https://buildkite.com/mlir/mlir-core/builds/4023#0cd4c474-dbf0-4b26-a3f5-d2a63b983ac9.

Herald added a subscriber: frgossen. · View Herald TranscriptApr 8 2020, 3:45 AM

In D77528#1969067, @ftynse wrote:

This seems to have broken the build with gcc-5, https://buildkite.com/mlir/mlir-core/builds/4023#0cd4c474-dbf0-4b26-a3f5-d2a63b983ac9.

I don't have GCC-5 at hand to be sure of the fix. Any insights?

In D77528#1969077, @bondhugula wrote:

I don't have GCC-5 at hand to be sure of the fix. Any insights?

Neither do I... I would expect using AllocaOpLowering = AllocLikeOpLowering<AllocaOp>; instead of creating a derived class could get rid of the problem with constructors, and this is the right thing to have anyway.

bondhugula mentioned this in D77719: [MLIR] Fix gcc-5 build failure cause by D77528.Apr 8 2020, 4:56 AM

In D77528#1969104, @ftynse wrote:

In D77528#1969077, @bondhugula wrote:

I don't have GCC-5 at hand to be sure of the fix. Any insights?

Neither do I... I would expect using AllocaOpLowering = AllocLikeOpLowering<AllocaOp>; instead of creating a derived class could get rid of the problem with constructors, and this is the right thing to have anyway.

Thanks - I just went ahead and committed this: D77719. Where can I get the buildkite URL for a commit?

bondhugula mentioned this in rGa59008a3a5b0: [MLIR] Fix gcc-5 build failure cause by D77528.Apr 8 2020, 5:23 AM

In D77528#1969205, @bondhugula wrote:

Thanks - I just went ahead and committed this: D77719. Where can I get the buildkite URL for a commit?

Thanks! The build is performed hourly, using all the commits landed in the last hour, so we'll have to wait a bit. The overall status is here https://buildkite.com/mlir/mlir-core

bondhugula mentioned this in D77726: [MLIR] Fix more gcc-5 build failure issues by D77528.Apr 8 2020, 6:45 AM

bondhugula mentioned this in rG3156b5422e6c: [MLIR] Fix more gcc-5 build issues from D77528.Apr 8 2020, 7:03 AM

In D77528#1969290, @ftynse wrote:

In D77528#1969205, @bondhugula wrote:

Thanks - I just went ahead and committed this: D77719. Where can I get the buildkite URL for a commit?

Thanks! The build is performed hourly, using all the commits landed in the last hour, so we'll have to wait a bit. The overall status is here https://buildkite.com/mlir/mlir-core

That didn't work. I made a couple more cleanup changes in D77726 and that has fixed it.

mehdi_amini added inline comments.Jun 9 2020, 9:09 PM

mlir/test/Conversion/StandardToLLVM/convert-dynamic-memref-ops.mlir
186	Can you split this in a new test file? We will process the entire file twice while the tests between the two invocations are actually entirely disjoint.

Herald added a project: Restricted Project. · View Herald TranscriptJun 9 2020, 9:09 PM

Herald added subscribers: msifontes, jurahul, Kayjukh, stephenneuendorffer. · View Herald Transcript

bondhugula marked an inline comment as done.Jun 11 2020, 1:30 AM

bondhugula added inline comments.

mlir/test/Conversion/StandardToLLVM/convert-dynamic-memref-ops.mlir
186	Sure - this make sense. Given a similar approach and a tendency to put everything in one file at other places, we missed that. The only benefit of having it here is that the generated ocde follows the pattern of the test case right above and so it was easy to add it here and update both together when needed. But thinking about this: what about the additional proofing this provides given that it is also running (without crashing, etc.) on the other cases even if we don't have a FileCheck for the rest? I think there is some benefit there without having to copy over that stuff to another file? How do you weigh these?

Revision Contents

Path

Size

mlir/

include/

mlir/

Conversion/

Passes.td

2 lines

StandardToLLVM/

ConvertStandardToLLVMPass.h

26 lines

lib/

Analysis/

Utils.cpp

1 line

Conversion/

StandardToLLVM/

StandardToLLVM.cpp

175 lines

Dialect/

StandardOps/

IR/

Ops.cpp

2 lines

test/

Conversion/

StandardToLLVM/

convert-dynamic-memref-ops.mlir

47 lines

Diff 255936

mlir/include/mlir/Conversion/Passes.td

Show First 20 Lines • Show All 219 Lines • ▼ Show 20 Lines	let description = [{

Functions converted to LLVM IR. Function arguments types are converted		Functions converted to LLVM IR. Function arguments types are converted
one-to-one. Function results are converted one-to-one and, in case more than		one-to-one. Function results are converted one-to-one and, in case more than
1 value is returned, packed into an LLVM IR struct type. Function calls and		1 value is returned, packed into an LLVM IR struct type. Function calls and
returns are updated accordingly. Block argument types are updated to use		returns are updated accordingly. Block argument types are updated to use
LLVM IR types.		LLVM IR types.
}];		}];
let constructor = "mlir::createLowerToLLVMPass()";		let constructor = "mlir::createLowerToLLVMPass()";
let options = [		let options = [
		Option<"useAlignedAlloc", "use-aligned-alloc", "bool", /default=/"false",
		"Use aligned_alloc in place of malloc for heap allocations">,
Option<"useBarePtrCallConv", "use-bare-ptr-memref-call-conv", "bool",		Option<"useBarePtrCallConv", "use-bare-ptr-memref-call-conv", "bool",
ftynseUnsubmitted Done Reply Inline Actions Nit: I suppose this option should be removed by the previous commit, instead of rewritten by this one. ftynse: Nit: I suppose this option should be removed by the previous commit, instead of rewritten by…
/default=/"false",		/default=/"false",
"Replace FuncOp's MemRef arguments with bare pointers to the MemRef "		"Replace FuncOp's MemRef arguments with bare pointers to the MemRef "
"element types">,		"element types">,
Option<"emitCWrappers", "emit-c-wrappers", "bool", /default=/"false",		Option<"emitCWrappers", "emit-c-wrappers", "bool", /default=/"false",
"Emit wrappers for C-compatible pointer-to-struct memref "		"Emit wrappers for C-compatible pointer-to-struct memref "
"descriptors">,		"descriptors">,
Option<"indexBitwidth", "index-bitwidth", "unsigned",		Option<"indexBitwidth", "index-bitwidth", "unsigned",
/default=kDeriveIndexBitwidthFromDataLayout/"0",		/default=kDeriveIndexBitwidthFromDataLayout/"0",
Show All 29 Lines

mlir/include/mlir/Conversion/StandardToLLVM/ConvertStandardToLLVMPass.h

	Show All 14 Lines
	class LLVMTypeConverter;			class LLVMTypeConverter;
	class ModuleOp;			class ModuleOp;
	template <typename T> class OperationPass;			template <typename T> class OperationPass;
	class OwningRewritePatternList;			class OwningRewritePatternList;

	/// Collect a set of patterns to convert memory-related operations from the			/// Collect a set of patterns to convert memory-related operations from the
	/// Standard dialect to the LLVM dialect, excluding non-memory-related			/// Standard dialect to the LLVM dialect, excluding non-memory-related
	/// operations and FuncOp.			/// operations and FuncOp.
	void populateStdToLLVMMemoryConversionPatters(			void populateStdToLLVMMemoryConversionPatterns(
	LLVMTypeConverter &converter, OwningRewritePatternList &patterns);			LLVMTypeConverter &converter, OwningRewritePatternList &patterns,
				bool useAlignedAlloc);

	/// Collect a set of patterns to convert from the Standard dialect to the LLVM			/// Collect a set of patterns to convert from the Standard dialect to the LLVM
	/// dialect, excluding the memory-related operations.			/// dialect, excluding the memory-related operations.
	void populateStdToLLVMNonMemoryConversionPatterns(			void populateStdToLLVMNonMemoryConversionPatterns(
	LLVMTypeConverter &converter, OwningRewritePatternList &patterns);			LLVMTypeConverter &converter, OwningRewritePatternList &patterns);

	/// Collect the default pattern to convert a FuncOp to the LLVM dialect. If			/// Collect the default pattern to convert a FuncOp to the LLVM dialect. If
	/// `emitCWrappers` is set, the pattern will also produce functions			/// `emitCWrappers` is set, the pattern will also produce functions
	/// that pass memref descriptors by pointer-to-structure in addition to the			/// that pass memref descriptors by pointer-to-structure in addition to the
	/// default unpacked form.			/// default unpacked form.
	void populateStdToLLVMDefaultFuncOpConversionPattern(			void populateStdToLLVMDefaultFuncOpConversionPattern(
	LLVMTypeConverter &converter, OwningRewritePatternList &patterns,			LLVMTypeConverter &converter, OwningRewritePatternList &patterns,
	bool emitCWrappers = false);			bool emitCWrappers = false);

	/// Collect a set of default patterns to convert from the Standard dialect to			/// Collect a set of default patterns to convert from the Standard dialect to
	/// LLVM.			/// LLVM.
	void populateStdToLLVMConversionPatterns(LLVMTypeConverter &converter,			void populateStdToLLVMConversionPatterns(LLVMTypeConverter &converter,
	OwningRewritePatternList &patterns,			OwningRewritePatternList &patterns,
	bool emitCWrappers = false);			bool emitCWrappers = false,
				nicolasvasilacheUnsubmitted Done Reply Inline Actions Same comment as in the parent revision, this is a silently breaking API change that will impact integrations. I'd just use the struct that you'd have introduced in the previous revision and document that this is only using 2 fields. nicolasvasilache: Same comment as in the parent revision, this is a silently breaking API change that will impact…
				bool useAlignedAlloc = false);

	/// Collect a set of patterns to convert from the Standard dialect to			/// Collect a set of patterns to convert from the Standard dialect to
	/// LLVM using the bare pointer calling convention for MemRef function			/// LLVM using the bare pointer calling convention for MemRef function
	/// arguments.			/// arguments.
	void populateStdToLLVMBarePtrConversionPatterns(			void populateStdToLLVMBarePtrConversionPatterns(
	LLVMTypeConverter &converter, OwningRewritePatternList &patterns);			LLVMTypeConverter &converter, OwningRewritePatternList &patterns,
				bool useAlignedAlloc);

	/// Value to pass as bitwidth for the index type when the converter is expected			/// Value to pass as bitwidth for the index type when the converter is expected
	/// to derive the bitwidth from the LLVM data layout.			/// to derive the bitwidth from the LLVM data layout.
	static constexpr unsigned kDeriveIndexBitwidthFromDataLayout = 0;			static constexpr unsigned kDeriveIndexBitwidthFromDataLayout = 0;

	struct LowerToLLVMOptions {			struct LowerToLLVMOptions {
	bool useBarePtrCallConv = false;			bool useBarePtrCallConv = false;
	bool emitCWrappers = false;			bool emitCWrappers = false;
	unsigned indexBitwidth = kDeriveIndexBitwidthFromDataLayout;			unsigned indexBitwidth = kDeriveIndexBitwidthFromDataLayout;
				/// Use aligned_alloc for heap allocations.
				bool useAlignedAlloc = false;
	};			};

	/// Creates a pass to convert the Standard dialect into the LLVMIR dialect.			/// Creates a pass to convert the Standard dialect into the LLVMIR dialect.
	/// stdlib malloc/free is used for allocating memrefs allocated with std.alloc,			/// stdlib malloc/free is used by default for allocating memrefs allocated with
				nicolasvasilacheUnsubmitted Done Reply Inline Actions same discussion re silently breaking API changes nicolasvasilache: same discussion re silently breaking API changes
	/// while LLVM's alloca is used for those allocated with std.alloca.			/// std.alloc, while LLVM's alloca is used for those allocated with std.alloca.
	std::unique_ptr<OperationPass<ModuleOp>> createLowerToLLVMPass(			std::unique_ptr<OperationPass<ModuleOp>>
	const LowerToLLVMOptions &options = {			createLowerToLLVMPass(const LowerToLLVMOptions &options = {
	/useBarePtrCallConv=/false, /emitCWrappers=/false,			/useBarePtrCallConv=/false, /emitCWrappers=/false,
	/indexBitwidth=/kDeriveIndexBitwidthFromDataLayout});			/indexBitwidth=/kDeriveIndexBitwidthFromDataLayout,
				/useAlignedAlloc=/false});

	} // namespace mlir			} // namespace mlir

	#endif // MLIR_CONVERSION_STANDARDTOLLVM_CONVERTSTANDARDTOLLVMPASS_H_			#endif // MLIR_CONVERSION_STANDARDTOLLVM_CONVERTSTANDARDTOLLVMPASS_H_

mlir/lib/Analysis/Utils.cpp

Show First 20 Lines • Show All 338 Lines • ▼ Show 20 Lines	LogicalResult MemRefRegion::compute(Operation *op, unsigned loopDepth,
}		}
cst.removeTrivialRedundancy();		cst.removeTrivialRedundancy();

LLVM_DEBUG(llvm::dbgs() << "Memory region:\n");		LLVM_DEBUG(llvm::dbgs() << "Memory region:\n");
LLVM_DEBUG(cst.dump());		LLVM_DEBUG(cst.dump());
return success();		return success();
}		}

// TODO(mlir-team): improve/complete this when we have target data.
static unsigned getMemRefEltSizeInBytes(MemRefType memRefType) {		static unsigned getMemRefEltSizeInBytes(MemRefType memRefType) {
auto elementType = memRefType.getElementType();		auto elementType = memRefType.getElementType();

unsigned sizeInBits;		unsigned sizeInBits;
if (elementType.isIntOrFloat()) {		if (elementType.isIntOrFloat()) {
sizeInBits = elementType.getIntOrFloatBitWidth();		sizeInBits = elementType.getIntOrFloatBitWidth();
} else {		} else {
auto vectorType = elementType.cast<VectorType>();		auto vectorType = elementType.cast<VectorType>();
▲ Show 20 Lines • Show All 701 Lines • Show Last 20 Lines

mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
// Extract an LLVM IR type from the LLVM IR dialect type.		// Extract an LLVM IR type from the LLVM IR dialect type.
static LLVM::LLVMType unwrap(Type type) {		static LLVM::LLVMType unwrap(Type type) {
if (!type)		if (!type)
return nullptr;		return nullptr;
auto *mlirContext = type.getContext();		auto *mlirContext = type.getContext();
auto wrappedLLVMType = type.dyn_cast<LLVM::LLVMType>();		auto wrappedLLVMType = type.dyn_cast<LLVM::LLVMType>();
if (!wrappedLLVMType)		if (!wrappedLLVMType)
emitError(UnknownLoc::get(mlirContext),		emitError(UnknownLoc::get(mlirContext),
"conversion resulted in a non-LLVM type");		"conversion resulted in a non-LLVM type");
		mehdi_aminiUnsubmitted Done Reply Inline Actions Please don't introduce any new global command line option and thread them through the APIs. We're looking to eliminate every global initializers under mlir/lib/... : https://bugs.llvm.org/show_bug.cgi?id=45437 mehdi_amini: Please don't introduce any new global command line option and thread them through the APIs.
return wrappedLLVMType;		return wrappedLLVMType;
}		}

/// Initialize customization to default callbacks.		/// Initialize customization to default callbacks.
LLVMTypeConverterCustomization::LLVMTypeConverterCustomization()		LLVMTypeConverterCustomization::LLVMTypeConverterCustomization()
: funcArgConverter(structFuncArgTypeConverter),		: funcArgConverter(structFuncArgTypeConverter),
indexBitwidth(kDeriveIndexBitwidthFromDataLayout) {}		indexBitwidth(kDeriveIndexBitwidthFromDataLayout) {}

▲ Show 20 Lines • Show All 1,184 Lines • ▼ Show 20 Lines
struct AllocLikeOpLowering : public ConvertOpToLLVMPattern<AllocLikeOp> {		struct AllocLikeOpLowering : public ConvertOpToLLVMPattern<AllocLikeOp> {
using ConvertOpToLLVMPattern<AllocLikeOp>::ConvertOpToLLVMPattern;		using ConvertOpToLLVMPattern<AllocLikeOp>::ConvertOpToLLVMPattern;
using Base = AllocLikeOpLowering<AllocLikeOp>;		using Base = AllocLikeOpLowering<AllocLikeOp>;
using ConvertOpToLLVMPattern<AllocLikeOp>::createIndexConstant;		using ConvertOpToLLVMPattern<AllocLikeOp>::createIndexConstant;
using ConvertOpToLLVMPattern<AllocLikeOp>::getIndexType;		using ConvertOpToLLVMPattern<AllocLikeOp>::getIndexType;
using ConvertOpToLLVMPattern<AllocLikeOp>::typeConverter;		using ConvertOpToLLVMPattern<AllocLikeOp>::typeConverter;
using ConvertOpToLLVMPattern<AllocLikeOp>::getVoidPtrType;		using ConvertOpToLLVMPattern<AllocLikeOp>::getVoidPtrType;

explicit AllocLikeOpLowering(LLVMTypeConverter &converter)		explicit AllocLikeOpLowering(LLVMTypeConverter &converter,
: ConvertOpToLLVMPattern<AllocLikeOp>(converter) {}		bool useAlignedAlloc = false)
		: ConvertOpToLLVMPattern<AllocLikeOp>(converter),
		useAlignedAlloc(useAlignedAlloc) {}

LogicalResult match(Operation *op) const override {		LogicalResult match(Operation *op) const override {
MemRefType memRefType = cast<AllocLikeOp>(op).getType();		MemRefType memRefType = cast<AllocLikeOp>(op).getType();
if (isSupportedMemRefType(memRefType))		if (isSupportedMemRefType(memRefType))
return success();		return success();

int64_t offset;		int64_t offset;
SmallVector<int64_t, 4> strides;		SmallVector<int64_t, 4> strides;
auto successStrides = getStridesAndOffset(memRefType, strides, offset);		auto successStrides = getStridesAndOffset(memRefType, strides, offset);
if (failed(successStrides))		if (failed(successStrides))
return failure();		return failure();

// Dynamic strides are ok if they can be deduced from dynamic sizes (which		// Dynamic strides are ok if they can be deduced from dynamic sizes (which
// is guaranteed when succeeded(successStrides)). Dynamic offset however can		// is guaranteed when succeeded(successStrides)). Dynamic offset however can
// never be alloc'ed.		// never be alloc'ed.
if (offset == MemRefType::getDynamicStrideOrOffset())		if (offset == MemRefType::getDynamicStrideOrOffset())
return failure();		return failure();

return success();		return success();
}		}

		// Returns bump = (alignment - (input % alignment))% alignment, which is the
		// increment necessary to align `input` to `alignment` boundary.
		// TODO: this can be made more efficient by just using a single addition
		// and two bit shifts: (ptr + align - 1)/align, align is always power of 2.
		Value createBumpToAlign(Location loc, OpBuilder b, Value input,
		Value alignment) const {
		Value modAlign = b.create<LLVM::URemOp>(loc, input, alignment);
		Value diff = b.create<LLVM::SubOp>(loc, alignment, modAlign);
		Value shift = b.create<LLVM::URemOp>(loc, diff, alignment);
		return shift;
		}

/// Creates and populates the memref descriptor struct given all its fields.		/// Creates and populates the memref descriptor struct given all its fields.
/// This method also performs any post allocation alignment needed for heap		/// This method also performs any post allocation alignment needed for heap
/// allocations when `accessAlignment` is non null. This is used with		/// allocations when `accessAlignment` is non null. This is used with
/// allocators that do not support alignment.		/// allocators that do not support alignment.
MemRefDescriptor createMemRefDescriptor(		MemRefDescriptor createMemRefDescriptor(
Location loc, ConversionPatternRewriter &rewriter, MemRefType memRefType,		Location loc, ConversionPatternRewriter &rewriter, MemRefType memRefType,
Value allocatedTypePtr, Value allocatedBytePtr, Value accessAlignment,		Value allocatedTypePtr, Value allocatedBytePtr, Value accessAlignment,
uint64_t offset, ArrayRef<int64_t> strides, ArrayRef<Value> sizes) const {		uint64_t offset, ArrayRef<int64_t> strides, ArrayRef<Value> sizes) const {
auto elementPtrType = getElementPtrType(memRefType);		auto elementPtrType = getElementPtrType(memRefType);
auto structType = typeConverter.convertType(memRefType);		auto structType = typeConverter.convertType(memRefType);
auto memRefDescriptor = MemRefDescriptor::undef(rewriter, loc, structType);		auto memRefDescriptor = MemRefDescriptor::undef(rewriter, loc, structType);

// Field 1: Allocated pointer, used for malloc/free.		// Field 1: Allocated pointer, used for malloc/free.
memRefDescriptor.setAllocatedPtr(rewriter, loc, allocatedTypePtr);		memRefDescriptor.setAllocatedPtr(rewriter, loc, allocatedTypePtr);

// Field 2: Actual aligned pointer to payload.		// Field 2: Actual aligned pointer to payload.
Value alignedBytePtr = allocatedTypePtr;		Value alignedBytePtr = allocatedTypePtr;
if (accessAlignment) {		if (accessAlignment) {
// offset = (align - (ptr % align))% align		// offset = (align - (ptr % align))% align
Value intVal = rewriter.create<LLVM::PtrToIntOp>(		Value intVal = rewriter.create<LLVM::PtrToIntOp>(
loc, this->getIndexType(), allocatedBytePtr);		loc, this->getIndexType(), allocatedBytePtr);
Value ptrModAlign =		Value offset = createBumpToAlign(loc, rewriter, intVal, accessAlignment);
rewriter.create<LLVM::URemOp>(loc, intVal, accessAlignment);
Value subbed =
rewriter.create<LLVM::SubOp>(loc, accessAlignment, ptrModAlign);
Value offset =
rewriter.create<LLVM::URemOp>(loc, subbed, accessAlignment);
Value aligned = rewriter.create<LLVM::GEPOp>(		Value aligned = rewriter.create<LLVM::GEPOp>(
loc, allocatedBytePtr.getType(), allocatedBytePtr, offset);		loc, allocatedBytePtr.getType(), allocatedBytePtr, offset);
alignedBytePtr = rewriter.create<LLVM::BitcastOp>(		alignedBytePtr = rewriter.create<LLVM::BitcastOp>(
loc, elementPtrType, ArrayRef<Value>(aligned));		loc, elementPtrType, ArrayRef<Value>(aligned));
}		}
memRefDescriptor.setAlignedPtr(rewriter, loc, alignedBytePtr);		memRefDescriptor.setAlignedPtr(rewriter, loc, alignedBytePtr);

// Field 3: Offset in aligned pointer.		// Field 3: Offset in aligned pointer.
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	struct AllocLikeOpLowering : public ConvertOpToLLVMPattern<AllocLikeOp> {
/// Returns the type of a pointer to an element of the memref.		/// Returns the type of a pointer to an element of the memref.
Type getElementPtrType(MemRefType memRefType) const {		Type getElementPtrType(MemRefType memRefType) const {
auto elementType = memRefType.getElementType();		auto elementType = memRefType.getElementType();
auto structElementType = typeConverter.convertType(elementType);		auto structElementType = typeConverter.convertType(elementType);
return structElementType.template cast<LLVM::LLVMType>().getPointerTo(		return structElementType.template cast<LLVM::LLVMType>().getPointerTo(
memRefType.getMemorySpace());		memRefType.getMemorySpace());
}		}

		/// Returns the memref's element size in bytes.
		// TODO: there are other places where this is used. Expose publicly?
		static unsigned getMemRefEltSizeInBytes(MemRefType memRefType) {
		auto elementType = memRefType.getElementType();

		ftynseUnsubmitted Done Reply Inline Actions Can't you just rely on the Optional result being None to signal that aligned alloc is not required? ftynse: Can't you just rely on the Optional result being None to signal that aligned alloc is not…
		bondhugulaAuthorUnsubmitted Done Reply Inline Actions We still use aligned_alloc if the elt size is > 16 bytes (in spite of no alignment attribute). The code would otherwise crash since the loads/stores won't have alignment guaranteed (with malloc) but the they would be lowered by LLVM via aligned load/stores. (LLVM in the absence of alignment attributes on load/store would assume ABI alignment which for say vector<8xf32> elt types would be 256-bit boundaries). bondhugula: We still use aligned_alloc if the elt size is > 16 bytes (in spite of no alignment attribute).
		ftynseUnsubmitted Done Reply Inline Actions We still use aligned_alloc if the elt size is > 16 bytes At which point, this function returns the alignment value rather than `llvm::None`. The only situation where it returns a non-`None` value and sets `useAlignedAlloc` to `false` is for `AllocaOp`. I would consider refactoring this function to only work on `AllocOp`, and querying the `AllocaOp` allocation separately in the call place. This would make the API simpler and shorten the code. And you wouldn't need an extra boolean result. (LLVM in the absence of alignment attributes on load/store would assume ABI alignment which for say vector<8xf32> elt types would be 256-bit boundaries). I suppose we can just take the ABI alignment into account when lowering the `alloc` operation (we already do so for the `alignment` attribute). Not all platforms have `aligned_alloc`. ftynse: > We still use aligned_alloc if the elt size is > 16 bytes At which point, this function…
		bondhugulaAuthorUnsubmitted Done Reply Inline Actions I would consider refactoring this function to only work on AllocOp, and querying the AllocaOp allocation separately in the Sure, sounds good. I suppose we can just take the ABI alignment into account when lowering the alloc operation (we already do so for the Yes, we should do this with malloc as well. This can be a TODO and should be done in a separate revision (because it isn't to do with aligned_alloc but fixing malloc lowering). bondhugula: >I would consider refactoring this function to only work on >AllocOp, and querying the…
		bondhugulaAuthorUnsubmitted Done Reply Inline Actions Not all platforms have aligned_alloc. Btw, aligned_alloc is a C11 and C++11 standard, and it's available on Windows too (it's posix_memalign that isn't). And of course there are systems where there isn't any dynamic memory allocation at all. bondhugula: > Not all platforms have aligned_alloc. Btw, aligned_alloc is a C11 and C++11 standard, and…
		ftynseUnsubmitted Done Reply Inline Actions Yes, I know it's C11 (but C++17, most changes in C11 did not make it to C++11), but unfortunately, I don't think we can assume all target systems have C11, let alone C++17. ftynse: Yes, I know it's C11 (but C++17, most changes in C11 did not make it to C++11), but…
		unsigned sizeInBits;
		if (elementType.isIntOrFloat()) {
		sizeInBits = elementType.getIntOrFloatBitWidth();
		} else {
		auto vectorType = elementType.cast<VectorType>();
		sizeInBits =
		vectorType.getElementTypeBitWidth() * vectorType.getNumElements();
		}
		return llvm::divideCeil(sizeInBits, 8);
		}

		nicolasvasilacheUnsubmitted Done Reply Inline Actions Alignment needed is really a target-specific thing isn't it? I would have expected something that looks at DataLayout and that brings in the can of worms we have been punting on (I understand once flang is in MLIR core we will want to reopen it). Can this part be dropped from this revision, esp in light of @ftynse's comments? I also have some fun micro ARM targets I will need to test some of this on, the smaller the baked in assumptions the better. nicolasvasilache: Alignment needed is really a target-specific thing isn't it? I would have expected something…
		bondhugulaAuthorUnsubmitted Done Reply Inline Actions All of this now dropped. -use-aligned-alloc is now independent of what malloc can do. The malloc path remains untouched. bondhugula: All of this now dropped. -use-aligned-alloc is now independent of what malloc can do. The…
		/// Returns the alignment to be used for the allocation call itself.
		/// aligned_alloc requires the allocation size to be a power of two, and the
		/// allocation size to be a multiple of alignment,
		Optional<int64_t> getAllocationAlignment(AllocOp allocOp) const {
		// No alignment can be used for the 'malloc' call itself.
		if (!useAlignedAlloc)
		return None;

		if (allocOp.alignment())
		return allocOp.alignment().getValue().getSExtValue();
		ftynseUnsubmitted Done Reply Inline Actions Prefer early return ftynse: Prefer early return
		bondhugulaAuthorUnsubmitted Done Reply Inline Actions Thanks. bondhugula: Thanks.

		// Whenever we don't have alignment set, we will use an alignment
		ftynseUnsubmitted Done Reply Inline Actions I wouldn't add a lambda that is only called once immediately after its definition. ftynse: I wouldn't add a lambda that is only called once immediately after its definition.
		bondhugulaAuthorUnsubmitted Done Reply Inline Actions Hmm... this is just for better readability - it gives a name / auto documents a code block without the need to outline it into a function or add an explicit comment. I've seen this as a standard practice. bondhugula: Hmm... this is just for better readability - it gives a name / auto documents a code block…
		ftynseUnsubmitted Done Reply Inline Actions This does not seem to be common practice in MLIR. FWIW, I find it less readable than just writing int64_t constEltSizeBytes = 0; if (auto vectorType = elementType.template dyn_cast<VectorType>()) constEltSizeBytes = vectorType.getNumElements() * llvm::divideCeil(vectorType.getElementTypeBitWidth(), 8); else constEltSizeBytes = llvm::divideCeil(elementType.getIntOrFloatBitWidth(), 8); // Use aligned_alloc if elt_size > malloc's alignment. bool isMallocAlignmentSufficient = constEltSizeBytes > kMallocAlignment; useAlignedAlloc = isMallocAlignmentSufficient; Since you already have the block comment immediately above it anyway, and variables can have names just as well as lambdas. The lambda also mutates a global state that it captures by-reference, so the only effects of lambda are: (1) extra indentation; (2) extra LoC; and (3) extra concepts leading to cognitive overhead. ftynse: This does not seem to be common practice in MLIR. FWIW, I find it less readable than just…
		bondhugulaAuthorUnsubmitted Done Reply Inline Actions Hmm... the demarcation/isolation is important I feel. I'm fine with changing to the straightline style but out of curiosity and for future purposes as well, it'll be good to have a third person view here on coding style as far as such patterns go: @mehdi_amini - is there a guideline here? bondhugula: Hmm... the demarcation/isolation is important I feel. I'm fine with changing to the…
		nicolasvasilacheUnsubmitted Done Reply Inline Actions I am generally a fan of such style (esp. when mixed with functional combinators), so I'd vote for +1 it when it makes sense. nicolasvasilache: I am generally a fan of such style (esp. when mixed with functional combinators), so I'd vote…
		ftynseUnsubmitted Done Reply Inline Actions Generally, I would be strongly opposed to defining style guidelines based on a single use of a construct in a single diff, where only a small subset of contributors could participate (and before you can object that review history is public, are you reading all comments on all diffs?). I would be also opposed to having to define additional rules until we have to. I am not generally opposed to helper lambdas, I just don't see any benefit from this specific one, only drawbacks. And a lambda that the entire environment by reference is not exactly my definition of isolation. ftynse: Generally, I would be strongly opposed to defining style guidelines based on a single use of a…
		bondhugulaAuthorUnsubmitted Done Reply Inline Actions As Mehdi now confirms, this is too detailed to have a style guideline. The lambda demarcates the start and end of the thing it's naming/auto-documenting - you don't get it from code comments alone and I see it better for readability. Given @ntv's and @mehdi_amini's comments, I'm now strongly inclined to retain it. bondhugula: As Mehdi now confirms, this is too detailed to have a style guideline. The lambda demarcates…
		ftynseUnsubmitted Done Reply Inline Actions The lambda demarcates the start and end of the thing it's naming/auto-documenting - you don't get it from code comments alone and I see it better for readability. Well, you have a block comment right above it (modulo the variable declaration that is only used inside the lambda), so I wouldn't call it auto-documenting since you felt like you needed to write documentation for it. And having a named variable of lambda-type or an identically-named variable of boolean type still gives you exactly the same naming scheme. Given @ntv's and @mehdi_amini's comments, I'm now strongly inclined to retain it. I read Mehdi's comment differently: but in general not calling it right after rather to outline a block of code outside of a loop to make the loop shorter do not seem to necessarily support your usage here (neither does it contradict). Readability is a very subjective thing. I was reading your code for review purposes and this construct did make me lose time and expand more energy than for a straight-line code here, so for me personally it decreased readability. Namely because it (a) mutates an implicitly captured variable and (b) requires to unwrap more abstractions mentally. Anyway, I won't block the commit just because of a stylistic discussion. I can suggest an alternative that would address part of my readability concerns: int64_t constEltSizeBytes = [elementType]() { if (auto vectorType = elementType.template dyn_cast<VectorType>()) return vectorType.getNumElements() * llvm::divideCeil(vectorType.getElementTypeBitWidth(), 8); else return llvm::divideCeil(elementType.getIntOrFloatBitWidth(), 8); }(); bool isMallocAlignmentSufficient = constEltSizeBytes > kMallocAlignment; This removes implicit by-reference capture, makes it clear that you do not intend for the lambda to be reused (named lambda would be also okay since it doesn't store references anyway, but there's no point), and this way of using lambdas is actually considered a C++11 idiom for comlex constant initialization (https://groups.google.com/a/isocpp.org/g/std-discussion/c/FBjcR4WJlkU/m/nQnsSOziq04J) so one can claim it's "common enough". ftynse: > The lambda demarcates the start and end of the thing it's naming/auto-documenting - you don't…
		mehdi_aminiUnsubmitted Done Reply Inline Actions While I haven't seen this pattern used as it is here widely in MLIR (and I like consistency), I just didn't feel any reason for me to really oppose it, but I wasn't really supporting it as-is either. The fact that @ftynse opposed to this on principle of readability seems to hint that it would be better to not do it. I would like have written it in a form around this (with a comment possibly): int64_t constEltSizeBytes = llvm::divideCeil(vectorType.getElementTypeBitWidth(), 8); if (auto vectorType = elementType.template dyn_cast<VectorType>()) constEltSizeBytes = constEltSizeBytes * vectorType.getNumElements(); bool isMallocAlignmentSufficient = constEltSizeBytes > kMallocAlignment; Everything is named as well and you avoid the readability overhead of the lambda. mehdi_amini: While I haven't seen this pattern used as it is here widely in MLIR (and I like consistency), I…
		mehdi_aminiUnsubmitted Done Reply Inline Actions @bondhugula this seems too detailed to have a guideline :) I wouldn't say it is "common", but probably not unheard of? I have been doing this myself but in general not calling it right after, rather to outline a block of code outside of a loop to make the loop shorter and easier to read, or similar situation (getting large boilerplate out of the way and "naming it"). @rriddle ? mehdi_amini: @bondhugula this seems too detailed to have a guideline :) I wouldn't say it is "common", but…
		rriddleUnsubmitted Done Reply Inline Actions FWIWI, there is already a guide on using lamdas for computing predicates: https://llvm.org/docs/CodingStandards.html#turn-predicate-loops-into-predicate-functions rriddle: FWIWI, there is already a guide on using lamdas for computing predicates: https://llvm.
		bondhugulaAuthorUnsubmitted Done Reply Inline Actions Thank you all for commenting. This discussion is now moot since the aligned_alloc generation is now untangled from being contingent on this conditional / what malloc supports, and so this whole lambda and conditionals are gone. bondhugula: Thank you all for commenting. This discussion is now moot since the aligned_alloc generation is…
		mehdi_aminiUnsubmitted Done Reply Inline Actions @rriddle: the link is about free functions I believe, which I see differently than lambda: the rational explained there is close to what I mention before of moving the predicate code "out of the way" of the main logic (and reducing the indentation, etc), which you don't get with the immediately local lambda. mehdi_amini: @rriddle: the link is about free functions I believe, which I see differently than lambda: the…
		// consistent with the element type; since the allocation size has to be a
		// power of two, we will bump to the next power of two if it already isn't.
		auto eltSizeBytes = getMemRefEltSizeInBytes(allocOp.getType());
		return std::max(kMinAlignedAllocAlignment,
		llvm::PowerOf2Ceil(eltSizeBytes));
		}

		/// Returns true if the memref size in bytes is known to be a multiple of
		/// factor.
		static bool isMemRefSizeMultipleOf(MemRefType type, uint64_t factor) {
		uint64_t sizeDivisor = getMemRefEltSizeInBytes(type);
		for (unsigned i = 0, e = type.getRank(); i < e; i++) {
		if (type.isDynamic(type.getDimSize(i)))
		continue;
		sizeDivisor = sizeDivisor * type.getDimSize(i);
		}
		return sizeDivisor % factor == 0;
		}

/// Allocates the underlying buffer using the right call. `allocatedBytePtr`		/// Allocates the underlying buffer using the right call. `allocatedBytePtr`
/// is set to null for stack allocations. `accessAlignment` is set if		/// is set to null for stack allocations. `accessAlignment` is set if
/// alignment is neeeded post allocation (for eg. in conjunction with malloc).		/// alignment is neeeded post allocation (for eg. in conjunction with malloc).
/// TODO(bondhugula): next revision will support std lib func aligned_alloc.
Value allocateBuffer(Location loc, Value cumulativeSize, Operation *op,		Value allocateBuffer(Location loc, Value cumulativeSize, Operation *op,
MemRefType memRefType, Value one, Value &accessAlignment,		MemRefType memRefType, Value one, Value &accessAlignment,
Value &allocatedBytePtr,		Value &allocatedBytePtr,
ConversionPatternRewriter &rewriter) const {		ConversionPatternRewriter &rewriter) const {
auto elementPtrType = getElementPtrType(memRefType);		auto elementPtrType = getElementPtrType(memRefType);

// Whether to use std lib function aligned_alloc that supports alignment.
Optional<APInt> allocationAlignment = cast<AllocLikeOp>(op).alignment();

// With alloca, one gets a pointer to the element type right away.		// With alloca, one gets a pointer to the element type right away.
bool onStack = isa<AllocaOp>(op);		// For stack allocations.
if (onStack) {		if (auto allocaOp = dyn_cast<AllocaOp>(op)) {
allocatedBytePtr = nullptr;		allocatedBytePtr = nullptr;
accessAlignment = nullptr;		accessAlignment = nullptr;
return rewriter.create<LLVM::AllocaOp>(		return rewriter.create<LLVM::AllocaOp>(
loc, elementPtrType, cumulativeSize,		loc, elementPtrType, cumulativeSize,
allocationAlignment ? allocationAlignment.getValue().getSExtValue()		allocaOp.alignment() ? allocaOp.alignment().getValue().getSExtValue()
: 0);		: 0);
		ftynseUnsubmitted Done Reply Inline Actions I would just have auto allocaOp = dyn_cast<AllocaOp>(op); if (allocaOp) { allocatedBytePtr = nullptr; accessAlignment = nullptr; return rewriter.create<LLVM::AllocaOp>( loc, elementPtrType, cumulativeSize, allocaOp.alignment() ? allocaOp.alignemnt().getValue() : 0); } and sink the `getAllocationAlignment` below, all while making it handle only `AllocOp`. ftynse: I would just have ``` auto allocaOp = dyn_cast<AllocaOp>(op); if (allocaOp) {…
		bondhugulaAuthorUnsubmitted Done Reply Inline Actions Sure. bondhugula: Sure.
}		}

// Use malloc. Insert the malloc declaration if it is not already present.		// Heap allocations.
auto allocFuncName = "malloc";
AllocOp allocOp = cast<AllocOp>(op);		AllocOp allocOp = cast<AllocOp>(op);

		Optional<int64_t> allocationAlignment = getAllocationAlignment(allocOp);
		// Whether to use std lib function aligned_alloc that supports alignment.
		bool useAlignedAlloc = allocationAlignment.hasValue();

		// Insert the malloc/aligned_alloc declaration if it is not already present.
		auto allocFuncName = useAlignedAlloc ? "aligned_alloc" : "malloc";
auto module = allocOp.getParentOfType<ModuleOp>();		auto module = allocOp.getParentOfType<ModuleOp>();
auto allocFunc = module.lookupSymbol<LLVM::LLVMFuncOp>(allocFuncName);		auto allocFunc = module.lookupSymbol<LLVM::LLVMFuncOp>(allocFuncName);
if (!allocFunc) {		if (!allocFunc) {
OpBuilder moduleBuilder(op->getParentOfType<ModuleOp>().getBodyRegion());		OpBuilder moduleBuilder(op->getParentOfType<ModuleOp>().getBodyRegion());
SmallVector<LLVM::LLVMType, 2> callArgTypes = {getIndexType()};		SmallVector<LLVM::LLVMType, 2> callArgTypes = {getIndexType()};
		// aligned_alloc(size_t alignment, size_t size)
		if (useAlignedAlloc)
		ftynseUnsubmitted Done Reply Inline Actions Nit: putting the comment above `if` makes it less surprising in terms of indentation. Otherwise, it feels like there is incorrect indentation because of elided braces and the actual statement does not belong to the`if`. ftynse: Nit: putting the comment above `if` makes it less surprising in terms of indentation.
		bondhugulaAuthorUnsubmitted Done Reply Inline Actions Sure, thanks. bondhugula: Sure, thanks.
		callArgTypes.push_back(getIndexType());
allocFunc = moduleBuilder.create<LLVM::LLVMFuncOp>(		allocFunc = moduleBuilder.create<LLVM::LLVMFuncOp>(
rewriter.getUnknownLoc(), allocFuncName,		rewriter.getUnknownLoc(), allocFuncName,
LLVM::LLVMType::getFunctionTy(getVoidPtrType(), callArgTypes,		LLVM::LLVMType::getFunctionTy(getVoidPtrType(), callArgTypes,
/isVarArg=/false));		/isVarArg=/false));
}		}

// Allocate the underlying buffer and store a pointer to it in the MemRef		// Allocate the underlying buffer and store a pointer to it in the MemRef
// descriptor.		// descriptor.
SmallVector<Value, 2> callArgs;		SmallVector<Value, 2> callArgs;
		if (useAlignedAlloc) {
		// Use aligned_alloc.
		assert(allocationAlignment && "allocation alignment should be present");
		auto alignedAllocAlignmentValue = rewriter.create<LLVM::ConstantOp>(
		loc, typeConverter.convertType(rewriter.getIntegerType(64)),
		rewriter.getI64IntegerAttr(allocationAlignment.getValue()));
		// aligned_alloc requires size to be a multiple of alignment; we will pad
		// the size to the next multiple if necessary.
		if (!isMemRefSizeMultipleOf(memRefType, allocationAlignment.getValue())) {
		Value bump = createBumpToAlign(loc, rewriter, cumulativeSize,
		alignedAllocAlignmentValue);
		cumulativeSize =
		rewriter.create<LLVM::AddOp>(loc, cumulativeSize, bump);
		}
		callArgs = {alignedAllocAlignmentValue, cumulativeSize};
		} else {
// Adjust the allocation size to consider alignment.		// Adjust the allocation size to consider alignment.
if (allocOp.alignment()) {		if (allocOp.alignment()) {
accessAlignment = createIndexConstant(		accessAlignment = createIndexConstant(
rewriter, loc, allocOp.alignment().getValue().getSExtValue());		rewriter, loc, allocOp.alignment().getValue().getSExtValue());
cumulativeSize = rewriter.create<LLVM::SubOp>(		cumulativeSize = rewriter.create<LLVM::SubOp>(
loc,		loc,
rewriter.create<LLVM::AddOp>(loc, cumulativeSize, accessAlignment),		rewriter.create<LLVM::AddOp>(loc, cumulativeSize, accessAlignment),
one);		one);
}		}
callArgs.push_back(cumulativeSize);		callArgs.push_back(cumulativeSize);
		}
auto allocFuncSymbol = rewriter.getSymbolRefAttr(allocFunc);		auto allocFuncSymbol = rewriter.getSymbolRefAttr(allocFunc);
allocatedBytePtr = rewriter		allocatedBytePtr = rewriter
.create<LLVM::CallOp>(loc, getVoidPtrType(),		.create<LLVM::CallOp>(loc, getVoidPtrType(),
allocFuncSymbol, callArgs)		allocFuncSymbol, callArgs)
.getResult(0);		.getResult(0);
// For heap allocations, the allocated pointer is a cast of the byte pointer		// For heap allocations, the allocated pointer is a cast of the byte pointer
// to the type pointer.		// to the type pointer.
return rewriter.create<LLVM::BitcastOp>(loc, elementPtrType,		return rewriter.create<LLVM::BitcastOp>(loc, elementPtrType,
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	void rewrite(Operation *op, ArrayRef<Value> operands,
// Create the MemRef descriptor.		// Create the MemRef descriptor.
auto memRefDescriptor = createMemRefDescriptor(		auto memRefDescriptor = createMemRefDescriptor(
loc, rewriter, memRefType, allocatedTypePtr, allocatedBytePtr,		loc, rewriter, memRefType, allocatedTypePtr, allocatedBytePtr,
accessAlignment, offset, strides, sizes);		accessAlignment, offset, strides, sizes);

// Return the final value of the descriptor.		// Return the final value of the descriptor.
rewriter.replaceOp(op, {memRefDescriptor});		rewriter.replaceOp(op, {memRefDescriptor});
}		}

		protected:
		/// Use aligned_alloc instead of malloc for all heap allocations.
		bool useAlignedAlloc;
		ftynseUnsubmitted Done Reply Inline Actions I am a bit concerned about hardcoding the "typical" value. Can we make it parametric instead? ftynse: I am a bit concerned about hardcoding the "typical" value. Can we make it parametric instead?
		bondhugulaAuthorUnsubmitted Done Reply Inline Actions I had sort of a similar concern. But 16 bytes is pretty much what glibc malloc gives on nearly every system we have (on probably really old ones, it was perhaps 8 bytes). Did you want a pass flag and then letting 16 be the default - that would be too much plumbing (just like alignedAlloc). This is already a parameter of sorts. bondhugula: I had sort of a similar concern. But 16 bytes is pretty much what glibc malloc gives on nearly…
		ftynseUnsubmitted Done Reply Inline Actions The world is not limited to glibc. MLIR should also work on other platforms, and you essentially shift the burden of the plumbing you didn't do to people debugging builds on those platforms. You can have one pass option that corresponds to malloc alignment and, if it is set to 0, treat it as "never use aligned_alloc". ftynse: The world is not limited to glibc. MLIR should also work on other platforms, and you…
		bondhugulaAuthorUnsubmitted Done Reply Inline Actions Sorry, I didn't quite understand. What should the pass options be and what should the behavior and the default behavior be? bondhugula: Sorry, I didn't quite understand. What should the pass options be and what should the behavior…
		ftynseUnsubmitted Done Reply Inline Actions Normally, you would have two pass options (and a configuration `struct` for the constructors like Nicolas proposed in another patch to decrease the amount of churn in pass constructor APIs): `-use-aligned-alloc` and `-assume-malloc-alignment`. If you don't want two separate options, you could get away with one `-use-aligned-alloc-and-assume-malloc-alignment` (did not think about a better name). If it is set to zero (default), the conversion doesn't use aligned_alloc at all. If it is set to non-zero, the conversion uses aligned_alloc and treats the option value as malloc alignment in order to also use aligned_alloc in relevant cases. ftynse: Normally, you would have two pass options (and a configuration `struct` for the constructors…
		bondhugulaAuthorUnsubmitted Done Reply Inline Actions The configuration struct is now done (in the parent revision). How about just keeping it simple with -use-aligned-alloc and not changing previous/existing behavior when -use-aligned-alloc is not provided? This revision is not about tinkering with malloc alignment handling. Update - PTAL. Thanks for all the feedback. bondhugula: The configuration struct is now done (in the parent revision). How about just keeping it simple…
		ftynseUnsubmitted Done Reply Inline Actions How about just keeping it simple with -use-aligned-alloc and not changing previous/existing behavior when -use-aligned-alloc is not provided? Works for me! ftynse: > How about just keeping it simple with -use-aligned-alloc and not changing previous/existing…
		bondhugulaAuthorUnsubmitted Done Reply Inline Actions Done - aligned_alloc is now used only with -use-aligned-alloc, and for all heap allocations whenever that cmd line flag exists. bondhugula: Done - aligned_alloc is now used only with -use-aligned-alloc, and for all heap allocations…
		/// The minimum alignment to use with aligned_alloc (has to be a power of 2).
		uint64_t kMinAlignedAllocAlignment = 16UL;
		ftynseUnsubmitted Done Reply Inline Actions Typo: Aigned ftynse: Typo: Aigned
};		};

struct AllocOpLowering : public AllocLikeOpLowering<AllocOp> {		struct AllocOpLowering : public AllocLikeOpLowering<AllocOp> {
using Base::Base;		explicit AllocOpLowering(LLVMTypeConverter &converter,
		bool useAlignedAlloc = false)
		: AllocLikeOpLowering<AllocOp>(converter, useAlignedAlloc) {}
};		};

struct AllocaOpLowering : public AllocLikeOpLowering<AllocaOp> {		struct AllocaOpLowering : public AllocLikeOpLowering<AllocaOp> {
using Base::Base;		using Base::Base;
};		};

// A CallOp automatically promotes MemRefType to a sequence of alloca/store and		// A CallOp automatically promotes MemRefType to a sequence of alloca/store and
// passes the pointer to the MemRef across function boundaries.		// passes the pointer to the MemRef across function boundaries.
template <typename CallOpType>		template <typename CallOpType>
struct CallOpInterfaceLowering : public ConvertOpToLLVMPattern<CallOpType> {		struct CallOpInterfaceLowering : public ConvertOpToLLVMPattern<CallOpType> {
▲ Show 20 Lines • Show All 1,207 Lines • ▼ Show 20 Lines	patterns.insert<
UnsignedDivIOpLowering,		UnsignedDivIOpLowering,
UnsignedRemIOpLowering,		UnsignedRemIOpLowering,
UnsignedShiftRightOpLowering,		UnsignedShiftRightOpLowering,
XOrOpLowering,		XOrOpLowering,
ZeroExtendIOpLowering>(converter);		ZeroExtendIOpLowering>(converter);
// clang-format on		// clang-format on
}		}

void mlir::populateStdToLLVMMemoryConversionPatters(		void mlir::populateStdToLLVMMemoryConversionPatterns(
LLVMTypeConverter &converter, OwningRewritePatternList &patterns) {		LLVMTypeConverter &converter, OwningRewritePatternList &patterns,
		bool useAlignedAlloc) {
// clang-format off		// clang-format off
patterns.insert<		patterns.insert<
AssumeAlignmentOpLowering,		AssumeAlignmentOpLowering,
		DeallocOpLowering,
DimOpLowering,		DimOpLowering,
LoadOpLowering,		LoadOpLowering,
MemRefCastOpLowering,		MemRefCastOpLowering,
StoreOpLowering,		StoreOpLowering,
SubViewOpLowering,		SubViewOpLowering,
ViewOpLowering>(converter);		ViewOpLowering>(converter);
patterns.insert<		patterns.insert<
AllocOpLowering,		AllocOpLowering
DeallocOpLowering>(converter);		>(converter, useAlignedAlloc);
// clang-format on		// clang-format on
}		}

void mlir::populateStdToLLVMDefaultFuncOpConversionPattern(		void mlir::populateStdToLLVMDefaultFuncOpConversionPattern(
LLVMTypeConverter &converter, OwningRewritePatternList &patterns,		LLVMTypeConverter &converter, OwningRewritePatternList &patterns,
bool emitCWrappers) {		bool emitCWrappers) {
patterns.insert<FuncOpConversion>(converter, emitCWrappers);		patterns.insert<FuncOpConversion>(converter, emitCWrappers);
}		}

void mlir::populateStdToLLVMConversionPatterns(		void mlir::populateStdToLLVMConversionPatterns(
LLVMTypeConverter &converter, OwningRewritePatternList &patterns,		LLVMTypeConverter &converter, OwningRewritePatternList &patterns,
bool emitCWrappers) {		bool emitCWrappers, bool useAlignedAlloc) {
populateStdToLLVMDefaultFuncOpConversionPattern(converter, patterns,		populateStdToLLVMDefaultFuncOpConversionPattern(converter, patterns,
emitCWrappers);		emitCWrappers);
populateStdToLLVMNonMemoryConversionPatterns(converter, patterns);		populateStdToLLVMNonMemoryConversionPatterns(converter, patterns);
populateStdToLLVMMemoryConversionPatters(converter, patterns);		populateStdToLLVMMemoryConversionPatterns(converter, patterns,
		useAlignedAlloc);
}		}

static void populateStdToLLVMBarePtrFuncOpConversionPattern(		static void populateStdToLLVMBarePtrFuncOpConversionPattern(
LLVMTypeConverter &converter, OwningRewritePatternList &patterns) {		LLVMTypeConverter &converter, OwningRewritePatternList &patterns) {
patterns.insert<BarePtrFuncOpConversion>(converter);		patterns.insert<BarePtrFuncOpConversion>(converter);
}		}

void mlir::populateStdToLLVMBarePtrConversionPatterns(		void mlir::populateStdToLLVMBarePtrConversionPatterns(
LLVMTypeConverter &converter, OwningRewritePatternList &patterns) {		LLVMTypeConverter &converter, OwningRewritePatternList &patterns,
		bool useAlignedAlloc) {
populateStdToLLVMBarePtrFuncOpConversionPattern(converter, patterns);		populateStdToLLVMBarePtrFuncOpConversionPattern(converter, patterns);
populateStdToLLVMNonMemoryConversionPatterns(converter, patterns);		populateStdToLLVMNonMemoryConversionPatterns(converter, patterns);
populateStdToLLVMMemoryConversionPatters(converter, patterns);		populateStdToLLVMMemoryConversionPatterns(converter, patterns,
		useAlignedAlloc);
}		}

// Create an LLVM IR structure type if there is more than one result.		// Create an LLVM IR structure type if there is more than one result.
Type LLVMTypeConverter::packFunctionResults(ArrayRef<Type> types) {		Type LLVMTypeConverter::packFunctionResults(ArrayRef<Type> types) {
assert(!types.empty() && "expected non-empty list of type");		assert(!types.empty() && "expected non-empty list of type");

if (types.size() == 1)		if (types.size() == 1)
return convertType(types.front());		return convertType(types.front());
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	LLVMTypeConverter::promoteMemRefDescriptors(Location loc, ValueRange opOperands,
return promotedOperands;		return promotedOperands;
}		}

namespace {		namespace {
/// A pass converting MLIR operations into the LLVM IR dialect.		/// A pass converting MLIR operations into the LLVM IR dialect.
struct LLVMLoweringPass : public ConvertStandardToLLVMBase<LLVMLoweringPass> {		struct LLVMLoweringPass : public ConvertStandardToLLVMBase<LLVMLoweringPass> {
LLVMLoweringPass() = default;		LLVMLoweringPass() = default;
LLVMLoweringPass(bool useBarePtrCallConv, bool emitCWrappers,		LLVMLoweringPass(bool useBarePtrCallConv, bool emitCWrappers,
unsigned indexBitwidth) {		unsigned indexBitwidth, bool useAlignedAlloc) {
this->useBarePtrCallConv = useBarePtrCallConv;		this->useBarePtrCallConv = useBarePtrCallConv;
this->emitCWrappers = emitCWrappers;		this->emitCWrappers = emitCWrappers;
this->indexBitwidth = indexBitwidth;		this->indexBitwidth = indexBitwidth;
		this->useAlignedAlloc = useAlignedAlloc;
}		}

/// Run the dialect converter on the module.		/// Run the dialect converter on the module.
void runOnOperation() override {		void runOnOperation() override {
if (useBarePtrCallConv && emitCWrappers) {		if (useBarePtrCallConv && emitCWrappers) {
getOperation().emitError()		getOperation().emitError()
<< "incompatible conversion options: bare-pointer calling convention "		<< "incompatible conversion options: bare-pointer calling convention "
"and C wrapper emission";		"and C wrapper emission";
signalPassFailure();		signalPassFailure();
return;		return;
}		}

ModuleOp m = getOperation();		ModuleOp m = getOperation();

LLVMTypeConverterCustomization customs;		LLVMTypeConverterCustomization customs;
customs.funcArgConverter = useBarePtrCallConv ? barePtrFuncArgTypeConverter		customs.funcArgConverter = useBarePtrCallConv ? barePtrFuncArgTypeConverter
: structFuncArgTypeConverter;		: structFuncArgTypeConverter;
customs.indexBitwidth = indexBitwidth;		customs.indexBitwidth = indexBitwidth;
LLVMTypeConverter typeConverter(&getContext(), customs);		LLVMTypeConverter typeConverter(&getContext(), customs);

OwningRewritePatternList patterns;		OwningRewritePatternList patterns;
if (useBarePtrCallConv)		if (useBarePtrCallConv)
populateStdToLLVMBarePtrConversionPatterns(typeConverter, patterns);		populateStdToLLVMBarePtrConversionPatterns(typeConverter, patterns,
		useAlignedAlloc);
else		else
populateStdToLLVMConversionPatterns(typeConverter, patterns,		populateStdToLLVMConversionPatterns(typeConverter, patterns,
emitCWrappers);		emitCWrappers, useAlignedAlloc);

LLVMConversionTarget target(getContext());		LLVMConversionTarget target(getContext());
if (failed(applyPartialConversion(m, target, patterns, &typeConverter)))		if (failed(applyPartialConversion(m, target, patterns, &typeConverter)))
signalPassFailure();		signalPassFailure();
}		}
};		};
} // end namespace		} // end namespace

mlir::LLVMConversionTarget::LLVMConversionTarget(MLIRContext &ctx)		mlir::LLVMConversionTarget::LLVMConversionTarget(MLIRContext &ctx)
: ConversionTarget(ctx) {		: ConversionTarget(ctx) {
this->addLegalDialect<LLVM::LLVMDialect>();		this->addLegalDialect<LLVM::LLVMDialect>();
this->addIllegalOp<LLVM::DialectCastOp>();		this->addIllegalOp<LLVM::DialectCastOp>();
this->addIllegalOp<TanhOp>();		this->addIllegalOp<TanhOp>();
}		}

std::unique_ptr<OperationPass<ModuleOp>>		std::unique_ptr<OperationPass<ModuleOp>>
mlir::createLowerToLLVMPass(const LowerToLLVMOptions &options) {		mlir::createLowerToLLVMPass(const LowerToLLVMOptions &options) {
return std::make_unique<LLVMLoweringPass>(		return std::make_unique<LLVMLoweringPass>(
options.useBarePtrCallConv, options.emitCWrappers, options.indexBitwidth);		options.useBarePtrCallConv, options.emitCWrappers, options.indexBitwidth,
		options.useAlignedAlloc);
}		}

mlir/lib/Dialect/StandardOps/IR/Ops.cpp

Show First 20 Lines • Show All 1,062 Lines • ▼ Show 20 Lines	static LogicalResult verify(DimOp op) {
}		}

return success();		return success();
}		}

OpFoldResult DimOp::fold(ArrayRef<Attribute> operands) {		OpFoldResult DimOp::fold(ArrayRef<Attribute> operands) {
// Constant fold dim when the size along the index referred to is a constant.		// Constant fold dim when the size along the index referred to is a constant.
auto opType = memrefOrTensor().getType();		auto opType = memrefOrTensor().getType();
int64_t indexSize = -1;		int64_t indexSize = ShapedType::kDynamicSize;
if (auto tensorType = opType.dyn_cast<RankedTensorType>())		if (auto tensorType = opType.dyn_cast<RankedTensorType>())
indexSize = tensorType.getShape()[getIndex()];		indexSize = tensorType.getShape()[getIndex()];
else if (auto memrefType = opType.dyn_cast<MemRefType>())		else if (auto memrefType = opType.dyn_cast<MemRefType>())
indexSize = memrefType.getShape()[getIndex()];		indexSize = memrefType.getShape()[getIndex()];

if (!ShapedType::isDynamic(indexSize))		if (!ShapedType::isDynamic(indexSize))
return IntegerAttr::get(IndexType::get(getContext()), indexSize);		return IntegerAttr::get(IndexType::get(getContext()), indexSize);

▲ Show 20 Lines • Show All 1,463 Lines • Show Last 20 Lines

mlir/test/Conversion/StandardToLLVM/convert-dynamic-memref-ops.mlir

	// RUN: mlir-opt -convert-std-to-llvm %s \| FileCheck %s			// RUN: mlir-opt -convert-std-to-llvm %s \| FileCheck %s
				// RUN: mlir-opt -convert-std-to-llvm='use-aligned-alloc=1' %s \| FileCheck %s --check-prefix=ALIGNED-ALLOC

	// CHECK-LABEL: func @check_strided_memref_arguments(			// CHECK-LABEL: func @check_strided_memref_arguments(
	// CHECK-COUNT-2: !llvm<"float*">			// CHECK-COUNT-2: !llvm<"float*">
	// CHECK-COUNT-5: !llvm.i64			// CHECK-COUNT-5: !llvm.i64
	// CHECK-COUNT-2: !llvm<"float*">			// CHECK-COUNT-2: !llvm<"float*">
	// CHECK-COUNT-5: !llvm.i64			// CHECK-COUNT-5: !llvm.i64
	// CHECK-COUNT-2: !llvm<"float*">			// CHECK-COUNT-2: !llvm<"float*">
	// CHECK-COUNT-5: !llvm.i64			// CHECK-COUNT-5: !llvm.i64
	▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines
	func @dynamic_dealloc(%arg0: memref<?x?xf32>) {			func @dynamic_dealloc(%arg0: memref<?x?xf32>) {
	// CHECK: %[[ptr:.]] = llvm.extractvalue %{{.}}[0] : !llvm<"{ float, float, i64, [2 x i64], [2 x i64] }">			// CHECK: %[[ptr:.]] = llvm.extractvalue %{{.}}[0] : !llvm<"{ float, float, i64, [2 x i64], [2 x i64] }">
	// CHECK-NEXT: %[[ptri8:.]] = llvm.bitcast %[[ptr]] : !llvm<"float"> to !llvm<"i8*">			// CHECK-NEXT: %[[ptri8:.]] = llvm.bitcast %[[ptr]] : !llvm<"float"> to !llvm<"i8*">
	// CHECK-NEXT: llvm.call @free(%[[ptri8]]) : (!llvm<"i8*">) -> ()			// CHECK-NEXT: llvm.call @free(%[[ptri8]]) : (!llvm<"i8*">) -> ()
	dealloc %arg0 : memref<?x?xf32>			dealloc %arg0 : memref<?x?xf32>
	return			return
	}			}

				// CHECK-LABEL: func @stdlib_aligned_alloc({{.}}) -> !llvm<"{ float, float*, i64, [2 x i64], [2 x i64] }"> {
				// ALIGNED-ALLOC-LABEL: func @stdlib_aligned_alloc({{.}}) -> !llvm<"{ float, float*, i64, [2 x i64], [2 x i64] }"> {
				func @stdlib_aligned_alloc(%N : index) -> memref<32x18xf32> {
				// ALIGNED-ALLOC-NEXT: %[[sz1:.*]] = llvm.mlir.constant(32 : index) : !llvm.i64
				// ALIGNED-ALLOC-NEXT: %[[sz2:.*]] = llvm.mlir.constant(18 : index) : !llvm.i64
				// ALIGNED-ALLOC-NEXT: %[[num_elems:.*]] = llvm.mul %0, %1 : !llvm.i64
				// ALIGNED-ALLOC-NEXT: %[[null:.]] = llvm.mlir.null : !llvm<"float">
				// ALIGNED-ALLOC-NEXT: %[[one:.*]] = llvm.mlir.constant(1 : index) : !llvm.i64
				// ALIGNED-ALLOC-NEXT: %[[gep:.]] = llvm.getelementptr %[[null]][%[[one]]] : (!llvm<"float">, !llvm.i64) -> !llvm<"float*">
				// ALIGNED-ALLOC-NEXT: %[[sizeof:.]] = llvm.ptrtoint %[[gep]] : !llvm<"float"> to !llvm.i64
				// ALIGNED-ALLOC-NEXT: %[[bytes:.*]] = llvm.mul %[[num_elems]], %[[sizeof]] : !llvm.i64
				// ALIGNED-ALLOC-NEXT: %[[alignment:.*]] = llvm.mlir.constant(32 : i64) : !llvm.i64
				// ALIGNED-ALLOC-NEXT: %[[allocated:.]] = llvm.call @aligned_alloc(%[[alignment]], %[[bytes]]) : (!llvm.i64, !llvm.i64) -> !llvm<"i8">
				// ALIGNED-ALLOC-NEXT: llvm.bitcast %[[allocated]] : !llvm<"i8"> to !llvm<"float">
				%0 = alloc() {alignment = 32} : memref<32x18xf32>
				// Do another alloc just to test that we have a unique declaration for
				// aligned_alloc.
				// ALIGNED-ALLOC: llvm.call @aligned_alloc
				%1 = alloc() {alignment = 64} : memref<4096xf32>

				// Alignment is to element type boundaries (minimum 16 bytes).
				// ALIGNED-ALLOC: %[[c32:.*]] = llvm.mlir.constant(32 : i64) : !llvm.i64
				// ALIGNED-ALLOC-NEXT: llvm.call @aligned_alloc(%[[c32]]
				%2 = alloc() : memref<4096xvector<8xf32>>
				// The minimum alignment is 16 bytes unless explicitly specified.
				// ALIGNED-ALLOC: %[[c16:.*]] = llvm.mlir.constant(16 : i64) : !llvm.i64
				// ALIGNED-ALLOC-NEXT: llvm.call @aligned_alloc(%[[c16]],
				%3 = alloc() : memref<4096xvector<2xf32>>
				// ALIGNED-ALLOC: %[[c8:.*]] = llvm.mlir.constant(8 : i64) : !llvm.i64
				// ALIGNED-ALLOC-NEXT: llvm.call @aligned_alloc(%[[c8]],
				%4 = alloc() {alignment = 8} : memref<1024xvector<4xf32>>
				// Bump the memref allocation size if its size is not a multiple of alignment.
				// ALIGNED-ALLOC: %[[c32:.*]] = llvm.mlir.constant(32 : i64) : !llvm.i64
				// ALIGNED-ALLOC-NEXT: llvm.urem
				// ALIGNED-ALLOC-NEXT: llvm.sub
				// ALIGNED-ALLOC-NEXT: llvm.urem
				// ALIGNED-ALLOC-NEXT: %[[SIZE_ALIGNED:.*]] = llvm.add
				// ALIGNED-ALLOC-NEXT: llvm.call @aligned_alloc(%[[c32]], %[[SIZE_ALIGNED]])
				%5 = alloc() {alignment = 32} : memref<100xf32>
				// Bump alignment to the next power of two if it isn't.
				// ALIGNED-ALLOC: %[[c128:.*]] = llvm.mlir.constant(128 : i64) : !llvm.i64
				// ALIGNED-ALLOC: llvm.call @aligned_alloc(%[[c128]]
				%6 = alloc(%N) : memref<?xvector<18xf32>>
				return %0 : memref<32x18xf32>
				}
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions Can you split this in a new test file? We will process the entire file twice while the tests between the two invocations are actually entirely disjoint. mehdi_amini: Can you split this in a new test file? We will process the entire file twice while the tests…
				bondhugulaAuthorUnsubmitted Done Reply Inline Actions Sure - this make sense. Given a similar approach and a tendency to put everything in one file at other places, we missed that. The only benefit of having it here is that the generated ocde follows the pattern of the test case right above and so it was easy to add it here and update both together when needed. But thinking about this: what about the additional proofing this provides given that it is also running (without crashing, etc.) on the other cases even if we don't have a FileCheck for the rest? I think there is some benefit there without having to copy over that stuff to another file? How do you weigh these? bondhugula: Sure - this make sense. Given a similar approach and a tendency to put everything in one file…

	// CHECK-LABEL: func @mixed_load(			// CHECK-LABEL: func @mixed_load(
	// CHECK-COUNT-2: !llvm<"float*">,			// CHECK-COUNT-2: !llvm<"float*">,
	// CHECK-COUNT-5: {{%[a-zA-Z0-9]*}}: !llvm.i64			// CHECK-COUNT-5: {{%[a-zA-Z0-9]*}}: !llvm.i64
	// CHECK: %[[I:.*]]: !llvm.i64,			// CHECK: %[[I:.*]]: !llvm.i64,
	// CHECK: %[[J:.*]]: !llvm.i64)			// CHECK: %[[J:.*]]: !llvm.i64)
	func @mixed_load(%mixed : memref<42x?xf32>, %i : index, %j : index) {			func @mixed_load(%mixed : memref<42x?xf32>, %i : index, %j : index) {
	// CHECK: %[[ptr:.]] = llvm.extractvalue %[[ld:.]][1] : !llvm<"{ float, float, i64, [2 x i64], [2 x i64] }">			// CHECK: %[[ptr:.]] = llvm.extractvalue %[[ld:.]][1] : !llvm<"{ float, float, i64, [2 x i64], [2 x i64] }">
	// CHECK-NEXT: %[[off:.*]] = llvm.mlir.constant(0 : index) : !llvm.i64			// CHECK-NEXT: %[[off:.*]] = llvm.mlir.constant(0 : index) : !llvm.i64
	▲ Show 20 Lines • Show All 174 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Add support to use aligned_alloc to lower AllocOp from std to llvmClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 255936

mlir/include/mlir/Conversion/Passes.td

mlir/include/mlir/Conversion/StandardToLLVM/ConvertStandardToLLVMPass.h

mlir/lib/Analysis/Utils.cpp

mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp

mlir/lib/Dialect/StandardOps/IR/Ops.cpp

mlir/test/Conversion/StandardToLLVM/convert-dynamic-memref-ops.mlir

[MLIR] Add support to use aligned_alloc to lower AllocOp from std to llvm
ClosedPublic