This is an archive of the discontinued LLVM Phabricator instance.

[ArgPromotion] Make a non-byval promotion attempt first
ClosedPublic

Authored by psamolysov on Apr 27 2022, 2:58 AM.

Details

Summary

It makes sense to make a non-byval promotion attempt first and then fall
back to the byval one. The non-byval ('usual') promotion is generally
better, for example it does promotion even when a structure has more
elements than 'MaxElements' but not all of them are actually used in the
function.

Diff Detail

Event Timeline

psamolysov created this revision.Apr 27 2022, 2:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2022, 2:58 AM
psamolysov requested review of this revision.Apr 27 2022, 2:58 AM

Change the test to use the legacy pass manager, unfortunately the buildbot doesn't accept tests with arguments for the opt tool in the following format: -passes=function(sroa),cgscc(argpromotion).

Failed Tests (2):
  LLVM :: DebugInfo/NVPTX/debug-file-loc-only.ll
  LLVM :: DebugInfo/NVPTX/debug-file-loc.ll

These tests look like unrelated to the changes.

nikic added a comment.Apr 28 2022, 3:51 AM

Might make sense to move the non-byval promotion attempt first and then fall back to byval? I think non-byval promotion is generally better if it's possible.

Though it would probably make sense to handle byval promotion in the same way as non-byval promotion (but allowing stores) -- I believe the current byval implementation is not correct under opaque pointers.

Might make sense to move the non-byval promotion attempt first and then fall back to byval? I think non-byval promotion is generally better if it's possible.

Let's start from this change. I'm going to update the patch moving the non-byval attempt first.

psamolysov updated this revision to Diff 425812.EditedApr 28 2022, 9:18 AM
psamolysov retitled this revision from [ArgPromotion] Allow pointer promotion if byval adds to many arguments to [ArgPromotion] Make a non-byval promotion attempt first.
psamolysov edited the summary of this revision. (Show Details)

Make an attempt to do non-byval promotion first.

nikic added inline comments.Apr 28 2022, 10:07 AM
llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
857

This std::transform makes the code both longer and harder to read.

I've updated the tests to forbid non-byval promotion and let byval chance to work. As I see, not only 'store' is allowed for byval (but not every 'store', for not-densely packed structures, with x86_fp80, for example, stores makes 'canPaddingBeAccessed' to return 'true') but GEPs with non constant indexes too. I think there could be other instructions that is allowed for byval but not allowed for 'usual' promoting, in fact, if isDenselyPacked returns 'true', any user is possible: store, cmp, etc. because byval promotion just replaces passing a structure by value with passing it's elements without care about how they are used if the structure is densely packed or a GEP with non-const index is the only user.

By the way, should GEP with non constant index be a flag that padding can be accessed and in this case the canPaddingBeAccessed function should return 'true'?

@psamolysov Right, byval promotion currently doesn't look at how the argument is used -- however, the promotion will only actually be profitable if the introduced alloca can be mem2reg promoted. This will happen if the argument is only used in load/store operations at constant offset, but will generally not happen in other cases, in which case we've just made argument passing less efficient for no real benefit.

psamolysov marked an inline comment as done.Apr 29 2022, 2:20 AM

@nikic Thank you for the explanation, I think the IsSafeToPromote variable should be used as a flag whether using the argument in stores is allowed and taking into account within the findArgParts function. That refactoring is much larger than the proposed here, so it deserves another review.

llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
857

True, I've returned the loop back.

psamolysov marked an inline comment as done.Apr 29 2022, 2:20 AM

Use the loop to add types into the Types vector for the areTypesABICompatible call instead of the std::transform algorithm.

nikic accepted this revision.May 2 2022, 1:47 AM

LGTM

This revision is now accepted and ready to land.May 2 2022, 1:47 AM

Change the test to use the legacy pass manager, unfortunately the buildbot doesn't accept tests with arguments for the opt tool in the following format: -passes=function(sroa),cgscc(argpromotion).

this should work if you add quotes, like -passes='function(sroa),cgscc(argpromotion)'

but is there a reason we're not using the output of -passes=sroa as the input IR and only running argpromotion rather than running the test through both sroa and argpromotion?

@aeubanks Hm... thank you for the explanation why my first attempt to run two passes didn't work. I used two passes instead of just run argpromotion on an optimized IR because I saw examples of such approach in the tests, llvm\test\Transforms\ArgumentPromotion\2008-09-07-CGUpdate.ll for example runs inline before argpromotion. I've updated the test to run the argpromotion pass only.

Update the byval-through-pointer-promotion.ll test: remove sroa from the RUN clause because the optimized input is in use.

psamolysov added a comment.EditedMay 4 2022, 4:31 AM

Colleagues, if the changes look good for you, could you help me with landing? Thank you.

If you have no objections, I would like to use the comments under this review to ask a question about changing the byval promotion scheme with the "usual" one. The following code:

define internal void @f(%struct.ss* byval(%struct.ss) align 4 %b, %struct.ss** align 8 %acc) nounwind  {
entry:
  store %struct.ss* %b, %struct.ss** %acc, align 8
  ret void
}

Currently is optimized to the following one:

define internal void @f(i32 %b.0, i64 %b.1, %struct.ss** align 8 %acc) #0 {
entry:
  %b = alloca %struct.ss, align 4
  %.0 = getelementptr %struct.ss, %struct.ss* %b, i32 0, i32 0
  store i32 %b.0, i32* %.0, align 4
  %.1 = getelementptr %struct.ss, %struct.ss* %b, i32 0, i32 1
  store i64 %b.1, i64* %.1, align 4
 
  store %struct.ss* %b, %struct.ss** %acc, align 8
  ret void
}

So, actually we store not the original pointer of %struct.ss but a pointer to a temporary allocated on the function f's stack frame. Maybe I get byval wrong and %struct.ss* byval(%struct.ss) align 4 %b is also a pointer to a temporary (but an implicitly created) and therefore there is no difference?

And I see that the "usual" (non-byval) promotion uses i64 for the second argument of the generated GEP instructions:

  %2 = alloca %struct.A, align 32
  %3 = getelementptr inbounds %struct.A, %struct.A* %2, i32 0, i32 0
  store float %0, float* %3, align 32
  %4 = getelementptr inbounds %struct.A, %struct.A* %2, i32 0, i32 2
  store i64 2, i64* %4, align 16

; the struct is ready, use GEP and load to prepare new values for the arguments
  %5 = getelementptr %struct.A, %struct.A* %2, i64 0, i32 0 ; i64 0, not i32 0
  %.val = load float, float* %5, align 32
  %6 = getelementptr %struct.A, %struct.A* %2, i64 0, i32 2 ; i64 0, not i32 0
  %.val1 = load i64, i64* %6, align 16

Does this work as designed?

I've prepared a patch that replaces byval promotion with usual (non-byval) one with allowed stores. The patch is based on this one, so I will be able to open the review after landing this one. If you could help me with landing, it would be great. Thanks a lot in advance.

This revision was landed with ongoing or failed builds.May 12 2022, 7:45 AM
This revision was automatically updated to reflect the committed changes.
nikic added a comment.May 12 2022, 7:45 AM

Sorry for the delay, I have landed the change now. I'd recommend you to request commit access, see https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access for instructions.

@nikic Thank you for landing and suggestion.

nikic added a comment.May 12 2022, 7:47 AM

And I see that the "usual" (non-byval) promotion uses i64 for the second argument of the generated GEP instructions:
[...]
Does this work as designed?

The index type to use is specified by DataLayout -- however, other index types will be automatically sign extended or truncated. So it's fine either way, though using the correct index type (i64 by default) is preferred. Struct indices always use i32 though.