This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Add support to use aligned_alloc to lower AllocOp from std to llvm
ClosedPublic

Authored by bondhugula on Apr 6 2020, 12:55 AM.

Details

Summary

Support to recognize and deal with aligned_alloc was recently added to
LLVM's TLI/MemoryBuiltins and its various optimization passes. This
revision adds support for generation of aligned_alloc's when lowering
AllocOp from std to LLVM. Setting 'use-aligned_alloc=1' will lead to
aligned_alloc being used for all heap allocations. An alignment and size
that works with the constraints of aligned_alloc is chosen.

Using aligned_alloc is preferable to "using malloc and adjusting the
allocated pointer to align for indexing" because the pointer access
arithmetic done for the latter only makes it harder for LLVM passes to
deal with for analysis, optimization, attribute deduction, and rewrites.

Depends on D76602.

Diff Detail

Event Timeline

bondhugula created this revision.Apr 6 2020, 12:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2020, 12:55 AM
bondhugula edited the summary of this revision. (Show Details)Apr 6 2020, 12:55 AM
mehdi_amini added inline comments.Apr 6 2020, 1:10 AM
mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
50

Please don't introduce any new global command line option and thread them through the APIs.
We're looking to eliminate every global initializers under mlir/lib/... : https://bugs.llvm.org/show_bug.cgi?id=45437

ftynse added inline comments.Apr 6 2020, 1:44 AM
mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1397

Can't you just rely on the Optional result being None to signal that aligned alloc is not required?

1418

Prefer early return

1420

I wouldn't add a lambda that is only called once immediately after its definition.

1478

Nit: putting the comment above if makes it less surprising in terms of indentation. Otherwise, it feels like there is incorrect indentation because of elided braces and the actual statement does not belong to the`if`.

1582

I am a bit concerned about hardcoding the "typical" value. Can we make it parametric instead?

bondhugula marked 7 inline comments as done.Apr 6 2020, 2:55 AM

Thanks for the review.

mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1397

We still use aligned_alloc if the elt size is > 16 bytes (in spite of no alignment attribute). The code would otherwise crash since the loads/stores won't have alignment guaranteed (with malloc) but the they would be lowered by LLVM via aligned load/stores. (LLVM in the absence of alignment attributes on load/store would assume ABI alignment which for say vector<8xf32> elt types would be 256-bit boundaries).

1418

Thanks.

1420

Hmm... this is just for better readability - it gives a name / auto documents a code block without the need to outline it into a function or add an explicit comment. I've seen this as a standard practice.

1478

Sure, thanks.

1582

I had sort of a similar concern. But 16 bytes is pretty much what glibc malloc gives on nearly every system we have (on probably really old ones, it was perhaps 8 bytes). Did you want a pass flag and then letting 16 be the default - that would be too much plumbing (just like alignedAlloc). This is already a parameter of sorts.

bondhugula updated this revision to Diff 255270.Apr 6 2020, 3:06 AM
bondhugula marked 4 inline comments as done.Apr 6 2020, 3:06 AM

Address review comments.

bondhugula marked an inline comment as done.Apr 6 2020, 3:06 AM
ftynse added inline comments.Apr 6 2020, 5:08 AM
mlir/include/mlir/Conversion/Passes.td
228

Nit: I suppose this option should be removed by the previous commit, instead of rewritten by this one.

mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1397

We still use aligned_alloc if the elt size is > 16 bytes

At which point, this function returns the alignment value rather than llvm::None.
The only situation where it returns a non-None value and sets useAlignedAlloc to false is for AllocaOp. I would consider refactoring this function to only work on AllocOp, and querying the AllocaOp allocation separately in the call place. This would make the API simpler and shorten the code. And you wouldn't need an extra boolean result.

(LLVM in the absence of alignment attributes on load/store would assume ABI alignment which for say vector<8xf32> elt types would be 256-bit boundaries).

I suppose we can just take the ABI alignment into account when lowering the alloc operation (we already do so for the alignment attribute). Not all platforms have aligned_alloc.

1420

This does not seem to be common practice in MLIR. FWIW, I find it less readable than just writing

int64_t constEltSizeBytes = 0;
if (auto vectorType = elementType.template dyn_cast<VectorType>())
  constEltSizeBytes =
      vectorType.getNumElements() *
      llvm::divideCeil(vectorType.getElementTypeBitWidth(), 8);
else
  constEltSizeBytes =
      llvm::divideCeil(elementType.getIntOrFloatBitWidth(), 8);

// Use aligned_alloc if elt_size > malloc's alignment.
bool isMallocAlignmentSufficient = constEltSizeBytes > kMallocAlignment;
useAlignedAlloc = isMallocAlignmentSufficient;

Since you already have the block comment immediately above it anyway, and variables can have names just as well as lambdas. The lambda also mutates a global state that it captures by-reference, so the only effects of lambda are: (1) extra indentation; (2) extra LoC; and (3) extra concepts leading to cognitive overhead.

1464

I would just have

auto allocaOp = dyn_cast<AllocaOp>(op);
if (allocaOp) {
  allocatedBytePtr = nullptr;
  accessAlignment = nullptr;
  return rewriter.create<LLVM::AllocaOp>(
          loc, elementPtrType, cumulativeSize,
          allocaOp.alignment() ? allocaOp.alignemnt().getValue() : 0);
}

and sink the getAllocationAlignment below, all while making it handle only AllocOp.

1582

The world is not limited to glibc. MLIR should also work on other platforms, and you essentially shift the burden of the plumbing you didn't do to people debugging builds on those platforms. You can have one pass option that corresponds to malloc alignment and, if it is set to 0, treat it as "never use aligned_alloc".

bondhugula updated this revision to Diff 255320.Apr 6 2020, 7:03 AM
bondhugula marked an inline comment as done.

Address some of the review comments.

bondhugula marked 7 inline comments as done.Apr 6 2020, 7:35 AM
bondhugula added inline comments.
mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1397

I would consider refactoring this function to only work on
AllocOp, and querying the AllocaOp allocation separately in the

Sure, sounds good.

I suppose we can just take the ABI alignment into account when
lowering the alloc operation (we already do so for the

Yes, we should do this with malloc as well. This can be a TODO and should be done in a separate revision (because it isn't to do with aligned_alloc but fixing malloc lowering).

1420

Hmm... the demarcation/isolation is important I feel. I'm fine with changing to the straightline style but out of curiosity and for future purposes as well, it'll be good to have a third person view here on coding style as far as such patterns go: @mehdi_amini - is there a guideline here?

1464

Sure.

1582

Sorry, I didn't quite understand. What should the pass options be and what should the behavior and the default behavior be?

bondhugula updated this revision to Diff 255336.Apr 6 2020, 7:36 AM
bondhugula marked 2 inline comments as done.

Address most of the review comments.

bondhugula added inline comments.Apr 6 2020, 8:03 AM
mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1397

Not all platforms have aligned_alloc.

Btw, aligned_alloc is a C11 and C++11 standard, and it's available on Windows too (it's posix_memalign that isn't). And of course there are systems where there isn't any dynamic memory allocation at all.

bondhugula updated this revision to Diff 255355.Apr 6 2020, 8:22 AM

Remove stale comment.

It is my understanding this CL does not change previous behavior, is this accurate?

mlir/include/mlir/Conversion/StandardToLLVM/ConvertStandardToLLVMPass.h
45

Same comment as in the parent revision, this is a silently breaking API change that will impact integrations.
I'd just use the struct that you'd have introduced in the previous revision and document that this is only using 2 fields.

66–67

same discussion re silently breaking API changes

mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1408

Alignment needed is really a target-specific thing isn't it?
I would have expected something that looks at DataLayout and that brings in the can of worms we have been punting on (I understand once flang is in MLIR core we will want to reopen it).

Can this part be dropped from this revision, esp in light of @ftynse's comments?

I also have some fun micro ARM targets I will need to test some of this on, the smaller the baked in assumptions the better.

1420

I am generally a fan of such style (esp. when mixed with functional combinators), so I'd vote for +1 it when it makes sense.

mehdi_amini added inline comments.Apr 7 2020, 2:58 AM
mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1420

@bondhugula this seems too detailed to have a guideline :)
I wouldn't say it is "common", but probably not unheard of?
I have been doing this myself but in general not calling it right after, rather to outline a block of code outside of a loop to make the loop shorter and easier to read, or similar situation (getting large boilerplate out of the way and "naming it").

@rriddle ?

ftynse added inline comments.Apr 7 2020, 2:59 AM
mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1420

Generally, I would be strongly opposed to defining style guidelines based on a single use of a construct in a single diff, where only a small subset of contributors could participate (and before you can object that review history is public, are you reading all comments on all diffs?). I would be also opposed to having to define additional rules until we have to. I am not generally opposed to helper lambdas, I just don't see any benefit from this specific one, only drawbacks. And a lambda that the entire environment by reference is not exactly my definition of isolation.

1582

Normally, you would have two pass options (and a configuration struct for the constructors like Nicolas proposed in another patch to decrease the amount of churn in pass constructor APIs): -use-aligned-alloc and -assume-malloc-alignment. If you don't want two separate options, you could get away with one -use-aligned-alloc-and-assume-malloc-alignment (did not think about a better name). If it is set to zero (default), the conversion doesn't use aligned_alloc at all. If it is set to non-zero, the conversion uses aligned_alloc and treats the option value as malloc alignment in order to also use aligned_alloc in relevant cases.

bondhugula updated this revision to Diff 255627.Apr 7 2020, 3:39 AM
bondhugula marked 2 inline comments as done.

LowerToLLVM struct for options.

bondhugula marked an inline comment as done.Apr 7 2020, 3:44 AM

@nicolasvasilache wrote:

It is my understanding this CL does not change previous behavior, is this accurate?

It actually changes it in one case: if the alloc op didn't have alignment specified *and* its elt type was larger than 16 bytes, this revision would make it use aligned_alloc. I can change this and let it continue to use malloc, but note that the malloc in such cases would have generated code that would crash. So this is only fixing things for the previous behavior it's changing. And we should also change malloc path to do the alignment to elt type boundaries for > 16 bytes by default.

mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1420

As Mehdi now confirms, this is too detailed to have a style guideline. The lambda demarcates the start and end of the thing it's naming/auto-documenting - you don't get it from code comments alone and I see it better for readability. Given @ntv's and @mehdi_amini's comments, I'm now strongly inclined to retain it.

bondhugula marked 2 inline comments as done.Apr 7 2020, 3:55 AM
bondhugula added inline comments.
mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1582

The configuration struct is now done (in the parent revision). How about just keeping it simple with -use-aligned-alloc and not changing previous/existing behavior when -use-aligned-alloc is not provided? This revision is not about tinkering with malloc alignment handling.

Update - PTAL. Thanks for all the feedback.

bondhugula updated this revision to Diff 255631.Apr 7 2020, 3:55 AM
bondhugula marked an inline comment as done.

Missed change.

bondhugula updated this revision to Diff 255634.Apr 7 2020, 4:15 AM

Remove debug code.

ftynse marked an inline comment as done.Apr 7 2020, 4:44 AM
ftynse added inline comments.
mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1397

Yes, I know it's C11 (but C++17, most changes in C11 did not make it to C++11), but unfortunately, I don't think we can assume all target systems have C11, let alone C++17.

1420

The lambda demarcates the start and end of the thing it's naming/auto-documenting - you don't get it from code comments alone and I see it better for readability.

Well, you have a block comment right above it (modulo the variable declaration that is only used inside the lambda), so I wouldn't call it auto-documenting since you felt like you needed to write documentation for it. And having a named variable of lambda-type or an identically-named variable of boolean type still gives you exactly the same naming scheme.

Given @ntv's and @mehdi_amini's comments, I'm now strongly inclined to retain it.

I read Mehdi's comment differently:

but in general not calling it right after

rather to outline a block of code outside of a loop to make the loop shorter

do not seem to necessarily support your usage here (neither does it contradict).

Readability is a very subjective thing. I was reading your code for review purposes and this construct did make me lose time and expand more energy than for a straight-line code here, so for me personally it decreased readability. Namely because it (a) mutates an implicitly captured variable and (b) requires to unwrap more abstractions mentally. Anyway, I won't block the commit just because of a stylistic discussion.

I can suggest an alternative that would address part of my readability concerns:

int64_t constEltSizeBytes = [elementType]() {
  if (auto vectorType = elementType.template dyn_cast<VectorType>())
    return vectorType.getNumElements() *
        llvm::divideCeil(vectorType.getElementTypeBitWidth(), 8);
  else
    return llvm::divideCeil(elementType.getIntOrFloatBitWidth(), 8);
}();
bool isMallocAlignmentSufficient = constEltSizeBytes > kMallocAlignment;

This removes implicit by-reference capture, makes it clear that you do not intend for the lambda to be reused (named lambda would be also okay since it doesn't store references anyway, but there's no point), and this way of using lambdas is actually considered a C++11 idiom for comlex constant initialization (https://groups.google.com/a/isocpp.org/g/std-discussion/c/FBjcR4WJlkU/m/nQnsSOziq04J) so one can claim it's "common enough".

1582

How about just keeping it simple with -use-aligned-alloc and not changing previous/existing behavior when -use-aligned-alloc is not provided?

Works for me!

Harbormaster completed remote builds in B52138: Diff 255634.
rriddle added inline comments.Apr 7 2020, 1:17 PM
mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1420

FWIWI, there is already a guide on using lamdas for computing predicates:

https://llvm.org/docs/CodingStandards.html#turn-predicate-loops-into-predicate-functions

mehdi_amini added inline comments.Apr 7 2020, 9:43 PM
mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1420

While I haven't seen this pattern used as it is here widely in MLIR (and I like consistency), I just didn't feel any reason for me to really oppose it, but I wasn't really supporting it as-is either. The fact that @ftynse opposed to this on principle of readability seems to hint that it would be better to not do it.

I would like have written it in a form around this (with a comment possibly):

int64_t constEltSizeBytes = llvm::divideCeil(vectorType.getElementTypeBitWidth(), 8);
if (auto vectorType = elementType.template dyn_cast<VectorType>())
  constEltSizeBytes = constEltSizeBytes * vectorType.getNumElements();
bool isMallocAlignmentSufficient = constEltSizeBytes > kMallocAlignment;

Everything is named as well and you avoid the readability overhead of the lambda.

1420

@rriddle: the link is about free functions I believe, which I see differently than lambda: the rational explained there is close to what I mention before of moving the predicate code "out of the way" of the main logic (and reducing the indentation, etc), which you don't get with the immediately local lambda.

bondhugula marked 14 inline comments as done.Apr 7 2020, 9:45 PM

@ftynse Everything's addressed now but you may want to take another look because I had completely missed the fact earlier that aligned_alloc only supports a size that is a multiple of alignment. So I have additional handling (and test cases for that) now to bump the allocation size to the next multiple of alignment. Also, if the elt size is not a power of two (and an alignment attribute doesn't exist), the next power of two is used for alignment (although this isn't an interesting case for aligned allocation, didn't want to punt to malloc for this to keep things simple/clear). So, all heap allocations use aligned_alloc whenever -use-aligned-alloc is set.

mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1408

All of this now dropped. -use-aligned-alloc is now independent of what malloc can do. The malloc path remains untouched.

1420

Thank you all for commenting. This discussion is now moot since the aligned_alloc generation is now untangled from being contingent on this conditional / what malloc supports, and so this whole lambda and conditionals are gone.

1582

Done - aligned_alloc is now used only with -use-aligned-alloc, and for all heap allocations whenever that cmd line flag exists.

bondhugula updated this revision to Diff 255893.Apr 7 2020, 9:46 PM
bondhugula marked 2 inline comments as done.

Address review comments. Also, handle when the size may not be a multiple of alignment.

ftynse accepted this revision.Apr 8 2020, 2:08 AM

Everything's addressed now but you may want to take another look because I had completely missed the fact earlier that aligned_alloc only supports a size that is a multiple of alignment.

I was going to ask what happened in case of memref<?xvector<3xf32>> but got distracted by the lambda discussion :)

mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1584

Typo: Aigned

This revision is now accepted and ready to land.Apr 8 2020, 2:08 AM
bondhugula updated this revision to Diff 255929.Apr 8 2020, 2:10 AM

Add another test case.

bondhugula marked 3 inline comments as done.Apr 8 2020, 2:31 AM

Everything's addressed now but you may want to take another look because I had completely missed the fact earlier that aligned_alloc only supports a size that is a multiple of alignment.

I was going to ask what happened in case of memref<?xvector<3xf32>> but got distracted by the lambda discussion :)

The allocation size will be bumped to a multiple of the alignment. With memref<?xvector<3xf32>, if there is no alignment attribute, the next power of two larger than 12, i.e., 16, will be the alignment, and the allocation size will be bumped to a multiple of 16 ('cumulativeSize' will bumped up).

bondhugula updated this revision to Diff 255932.Apr 8 2020, 2:32 AM

Address review comments.

bondhugula edited the summary of this revision. (Show Details)Apr 8 2020, 2:37 AM
bondhugula edited the summary of this revision. (Show Details)
This revision was automatically updated to reflect the committed changes.
Harbormaster completed remote builds in B52316: Diff 255929.

I don't have GCC-5 at hand to be sure of the fix. Any insights?

ftynse added a comment.Apr 8 2020, 4:21 AM

I don't have GCC-5 at hand to be sure of the fix. Any insights?

Neither do I... I would expect using AllocaOpLowering = AllocLikeOpLowering<AllocaOp>; instead of creating a derived class could get rid of the problem with constructors, and this is the right thing to have anyway.

I don't have GCC-5 at hand to be sure of the fix. Any insights?

Neither do I... I would expect using AllocaOpLowering = AllocLikeOpLowering<AllocaOp>; instead of creating a derived class could get rid of the problem with constructors, and this is the right thing to have anyway.

Thanks - I just went ahead and committed this: D77719. Where can I get the buildkite URL for a commit?

ftynse added a comment.Apr 8 2020, 5:52 AM

Thanks - I just went ahead and committed this: D77719. Where can I get the buildkite URL for a commit?

Thanks! The build is performed hourly, using all the commits landed in the last hour, so we'll have to wait a bit. The overall status is here https://buildkite.com/mlir/mlir-core

Thanks - I just went ahead and committed this: D77719. Where can I get the buildkite URL for a commit?

Thanks! The build is performed hourly, using all the commits landed in the last hour, so we'll have to wait a bit. The overall status is here https://buildkite.com/mlir/mlir-core

That didn't work. I made a couple more cleanup changes in D77726 and that has fixed it.

mehdi_amini added inline comments.Jun 9 2020, 9:09 PM
mlir/test/Conversion/StandardToLLVM/convert-dynamic-memref-ops.mlir
186

Can you split this in a new test file?

We will process the entire file twice while the tests between the two invocations are actually entirely disjoint.

bondhugula marked an inline comment as done.Jun 11 2020, 1:30 AM
bondhugula added inline comments.
mlir/test/Conversion/StandardToLLVM/convert-dynamic-memref-ops.mlir
186

Sure - this make sense. Given a similar approach and a tendency to put everything in one file at other places, we missed that. The only benefit of having it here is that the generated ocde follows the pattern of the test case right above and so it was easy to add it here and update both together when needed. But thinking about this: what about the additional proofing this provides given that it is also running (without crashing, etc.) on the other cases even if we don't have a FileCheck for the rest? I think there is some benefit there without having to copy over that stuff to another file? How do you weigh these?