It makes sense to make a non-byval promotion attempt first and then fall
back to the byval one. The non-byval ('usual') promotion is generally
better, for example it does promotion even when a structure has more
elements than 'MaxElements' but not all of them are actually used in the
function.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Change the test to use the legacy pass manager, unfortunately the buildbot doesn't accept tests with arguments for the opt tool in the following format: -passes=function(sroa),cgscc(argpromotion).
Failed Tests (2): LLVM :: DebugInfo/NVPTX/debug-file-loc-only.ll LLVM :: DebugInfo/NVPTX/debug-file-loc.ll
These tests look like unrelated to the changes.
Might make sense to move the non-byval promotion attempt first and then fall back to byval? I think non-byval promotion is generally better if it's possible.
Though it would probably make sense to handle byval promotion in the same way as non-byval promotion (but allowing stores) -- I believe the current byval implementation is not correct under opaque pointers.
Might make sense to move the non-byval promotion attempt first and then fall back to byval? I think non-byval promotion is generally better if it's possible.
Let's start from this change. I'm going to update the patch moving the non-byval attempt first.
llvm/lib/Transforms/IPO/ArgumentPromotion.cpp | ||
---|---|---|
857 | This std::transform makes the code both longer and harder to read. |
I've updated the tests to forbid non-byval promotion and let byval chance to work. As I see, not only 'store' is allowed for byval (but not every 'store', for not-densely packed structures, with x86_fp80, for example, stores makes 'canPaddingBeAccessed' to return 'true') but GEPs with non constant indexes too. I think there could be other instructions that is allowed for byval but not allowed for 'usual' promoting, in fact, if isDenselyPacked returns 'true', any user is possible: store, cmp, etc. because byval promotion just replaces passing a structure by value with passing it's elements without care about how they are used if the structure is densely packed or a GEP with non-const index is the only user.
By the way, should GEP with non constant index be a flag that padding can be accessed and in this case the canPaddingBeAccessed function should return 'true'?
@psamolysov Right, byval promotion currently doesn't look at how the argument is used -- however, the promotion will only actually be profitable if the introduced alloca can be mem2reg promoted. This will happen if the argument is only used in load/store operations at constant offset, but will generally not happen in other cases, in which case we've just made argument passing less efficient for no real benefit.
@nikic Thank you for the explanation, I think the IsSafeToPromote variable should be used as a flag whether using the argument in stores is allowed and taking into account within the findArgParts function. That refactoring is much larger than the proposed here, so it deserves another review.
llvm/lib/Transforms/IPO/ArgumentPromotion.cpp | ||
---|---|---|
857 | True, I've returned the loop back. |
Use the loop to add types into the Types vector for the areTypesABICompatible call instead of the std::transform algorithm.
this should work if you add quotes, like -passes='function(sroa),cgscc(argpromotion)'
but is there a reason we're not using the output of -passes=sroa as the input IR and only running argpromotion rather than running the test through both sroa and argpromotion?
@aeubanks Hm... thank you for the explanation why my first attempt to run two passes didn't work. I used two passes instead of just run argpromotion on an optimized IR because I saw examples of such approach in the tests, llvm\test\Transforms\ArgumentPromotion\2008-09-07-CGUpdate.ll for example runs inline before argpromotion. I've updated the test to run the argpromotion pass only.
Update the byval-through-pointer-promotion.ll test: remove sroa from the RUN clause because the optimized input is in use.
Colleagues, if the changes look good for you, could you help me with landing? Thank you.
If you have no objections, I would like to use the comments under this review to ask a question about changing the byval promotion scheme with the "usual" one. The following code:
define internal void @f(%struct.ss* byval(%struct.ss) align 4 %b, %struct.ss** align 8 %acc) nounwind { entry: store %struct.ss* %b, %struct.ss** %acc, align 8 ret void }
Currently is optimized to the following one:
define internal void @f(i32 %b.0, i64 %b.1, %struct.ss** align 8 %acc) #0 { entry: %b = alloca %struct.ss, align 4 %.0 = getelementptr %struct.ss, %struct.ss* %b, i32 0, i32 0 store i32 %b.0, i32* %.0, align 4 %.1 = getelementptr %struct.ss, %struct.ss* %b, i32 0, i32 1 store i64 %b.1, i64* %.1, align 4 store %struct.ss* %b, %struct.ss** %acc, align 8 ret void }
So, actually we store not the original pointer of %struct.ss but a pointer to a temporary allocated on the function f's stack frame. Maybe I get byval wrong and %struct.ss* byval(%struct.ss) align 4 %b is also a pointer to a temporary (but an implicitly created) and therefore there is no difference?
And I see that the "usual" (non-byval) promotion uses i64 for the second argument of the generated GEP instructions:
%2 = alloca %struct.A, align 32 %3 = getelementptr inbounds %struct.A, %struct.A* %2, i32 0, i32 0 store float %0, float* %3, align 32 %4 = getelementptr inbounds %struct.A, %struct.A* %2, i32 0, i32 2 store i64 2, i64* %4, align 16 ; the struct is ready, use GEP and load to prepare new values for the arguments %5 = getelementptr %struct.A, %struct.A* %2, i64 0, i32 0 ; i64 0, not i32 0 %.val = load float, float* %5, align 32 %6 = getelementptr %struct.A, %struct.A* %2, i64 0, i32 2 ; i64 0, not i32 0 %.val1 = load i64, i64* %6, align 16
Does this work as designed?
I've prepared a patch that replaces byval promotion with usual (non-byval) one with allowed stores. The patch is based on this one, so I will be able to open the review after landing this one. If you could help me with landing, it would be great. Thanks a lot in advance.
Sorry for the delay, I have landed the change now. I'd recommend you to request commit access, see https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access for instructions.
The index type to use is specified by DataLayout -- however, other index types will be automatically sign extended or truncated. So it's fine either way, though using the correct index type (i64 by default) is preferred. Struct indices always use i32 though.
This std::transform makes the code both longer and harder to read.