This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/IPO/
-
llvm/
-
Transforms/
-
IPO/
-
ArgumentPromotion.h
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
40/47
ArgumentPromotion.cpp
-
test/Transforms/ArgumentPromotion/
-
Transforms/
-
ArgumentPromotion/
-
attrs.ll
-
byval-2.ll
-
byval-through-pointer-promotion.ll
-
byval-with-padding.ll
-
byval.ll
-
dbg.ll
-
fp80.ll
-
metadata.ll
-
store-after-load.ll
-
store-into-inself.ll

Differential D125485

[ArgPromotion] Unify byval promotion with non-byval
ClosedPublic

Authored by psamolysov on May 12 2022, 10:58 AM.

Download Raw Diff

Details

Reviewers

aeubanks
nikic
jdoerfert
sstefan1
chandlerc
eli.friedman

Commits

rG170c4d21bd94: [ArgPromotion] Unify byval promotion with non-byval

Summary

It makes sense to handle byval promotion in the same way as non-byval
but also allowing store instructions. However, these should
use the same checks as the load instructions do, i.e. be part of the
ArgsToPromote collection. For these instructions, the check for
interfering modifications can be disabled, though. The promotion
algorithm itself has been modified a lot: all the accesses (i.e. loads
and stores) are rewritten to the emitted alloca instructions. To optimize
these new allocas out, the PromoteMemToReg function from
Transforms/Utils/PromoteMemoryToRegister.cpp file is invoked after
promotion.

In order to let the PromoteMemToReg promote as many allocas as it
is possible, there should be no GEPs from the allocas. To
eliminate the GEPs, its own alloca is generated for every argument
part because a single alloca for the whole argument (that significantly
simplifies the code of the pass though) unfortunately cannot be used.

The idea comes from the following discussion:
https://reviews.llvm.org/D124514#3479676

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

psamolysov created this revision.May 12 2022, 10:58 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 12 2022, 10:58 AM

Herald added subscribers: ormris, hiraditya. · View Herald Transcript

psamolysov requested review of this revision.May 12 2022, 10:58 AM

Herald added a subscriber: llvm-commits. · View Herald TranscriptMay 12 2022, 10:58 AM

Harbormaster completed remote builds in B164156: Diff 429014.May 12 2022, 12:29 PM

for the byval case, a store can change the value loaded by a later load, so it's not completely dead in that regard

@aeubanks Thank you for the comment. I'm not sure that I really get what "loaded by a later load" means, if it means any load instruction after the store in the same function, this case won't be optimized:

define internal void @k(%struct.ss* byval(%struct.ss) align 4 %b) nounwind  {
entry:
  %temp = getelementptr %struct.ss, %struct.ss* %b, i32 0, i32 0
  %temp1 = load i32, i32* %temp, align 4
  %temp2 = add i32 %temp1, 1
  store i32 %temp2, i32* %temp, align 4
  %temp3 = load i32, i32* %temp, align 4
  ret void
}

because the check AAR.canInstructionRangeModRef (https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp#L658) will return true that means the pointer was invalidated. If the load instruction takes place in another basic block, the AAR.canBasicBlockModify(*TranspBB, Loc) check will save us.

But I've found the new unified approach has a disadvantage from this point of view: when the original byval promotion restores the argument's structure in the callee and remains its users the same, the new approach just does nothing. So, when the original byval promotion gives room for the SROA optimization, the new scheme does not. The example above will be optimized into a single line with the -sroa pass after the "old" -argpromotion one:

define internal void @k(i32 %b.0, i64 %b.1) #0 {
entry:
  %temp2 = add i32 %b.0, 1
  ret void
}

and will not be optimized at all with the -sroa pass after the "new" -argpromotion one.

Currently I have no idea how to restore the optimization room for SROA in that case.

Reformat the source code because the pre-merge run on Linux filed with the following error message:

ERROR   git-clang-format returned an non-zero exit code 1

@psamolysov What I had in mind is something along the following lines:

For byval, allow stores -- however, these should still use the same checks as loads, i.e. be part of ArgParts collection. We need to make sure that different loads/stores don't have any partial overlaps.
For byval, drop the check for interfering mods.
Change promotion to work by rewriting accesses to an alloca (as is currently done for byval promotion) and then call promoteMemoryToRegister(). Our earlier checks have ensured that promoteMemoryToRegister() will succeed. (For the non-byval case, we don't really need alloca+mem2reg, but it would help with unifying the implementations.)

@nikic Thank you for detailed description of the idea. I'm not sure that I get the idea correctly, if to analyse your comment by points:

I believe this is implemented in the patch.

If stores are allowed (IsStoresAllowed in the code), the findArgParts function should disable the AAR.canInstructionRangeModRef and AAR.canBasicBlockModify(*TranspBB, Loc) checks because the pointers may be invalidated.

This makes the most difficult thing for me. As I get your point, all accesses (load/stores) to the arguments must be replaced with accesses to new allocas with the corresponding types, so instead of just eliminate loads we should for every promotable pointer argument (regardless whether it is byval or not) create an alloca, add a store of the new value argument into this alloca, keep all the allowed stores and loads (they will work with this alloca) and then call the 'promoteMemoryToRegister' function to do a part of mem2reg's work.

Is my understanding correct?

In D125485#3510990, @psamolysov wrote:

@nikic Thank you for detailed description of the idea. I'm not sure that I get the idea correctly, if to analyse your comment by points:

I believe this is implemented in the patch.

Not quite: We should set IsStoresAllowed == true for all byval+align arguments (independently of "densely packed" and "padding access" predicates), and stores shouldn't be unconditionally allowed, but go through HandleLoad (well, a slight generalization that works on both load&store, but basically does the same thing).

If stores are allowed (IsStoresAllowed in the code), the findArgParts function should disable the AAR.canInstructionRangeModRef and AAR.canBasicBlockModify(*TranspBB, Loc) checks because the pointers may be invalidated.

Correct.

This makes the most difficult thing for me. As I get your point, all accesses (load/stores) to the arguments must be replaced with accesses to new allocas with the corresponding types, so instead of just eliminate loads we should for every promotable pointer argument (regardless whether it is byval or not) create an alloca, add a store of the new value argument into this alloca, keep all the allowed stores and loads (they will work with this alloca) and then call the 'promoteMemoryToRegister' function to do a part of mem2reg's work.

Yeah, exactly. Basically do what byval promotion currently does, just based on ArgParts and not byval structure, and add a mem2reg call at the end.

Harbormaster completed remote builds in B164262: Diff 429164.May 13 2022, 2:38 AM

@nikic Thank you very much, it is crystal clear. I'm going to implement the suggested plan.

@nikic If you have time, could you make it clear why the OffsetAndArgPart alias is defined as std::pair<int64_t, ArgPart>, so why int64_t is used for Offset, not uint64_t? All the LLVM's API for work with offsets, APInt (is created from an uint64_t value), StructLayout::getElementOffset (returns uint64_t), etc use unsigned values. Do we really use negative offsets and if not, can the offset part of the pair be replaced with uint64_t? Thank you.

In D125485#3515516, @psamolysov wrote:

@nikic If you have time, could you make it clear why the OffsetAndArgPart alias is defined as std::pair<int64_t, ArgPart>, so why int64_t is used for Offset, not uint64_t? All the LLVM's API for work with offsets, APInt (is created from an uint64_t value), StructLayout::getElementOffset (returns uint64_t), etc use unsigned values. Do we really use negative offsets and if not, can the offset part of the pair be replaced with uint64_t? Thank you.

This is an int64_t, because GEP offsets in LLVM are signed. This is only supported "because we can" though, so if it makes things complicated it could be dropped (this would require bailing out of the transform for negative offsets, not just changing the type). Note that for byval, an access at negative offset is UB, so we're free to transform it even if it makes no sense in that case.

@nikic Thank you for the good explanation. Currently I try to use the createByteGEP function to emit GEPs and the signed nature of the offset creates no problems (I believe there is some singed to unsigned promotion in the function calls inside the createByteGEP one, but I have no warnings).

Unfortunately, PromoteMemToReg optimizes almost nothing. If there is a GEP from an alloca, and this GEP is used by store, for example, the isAllocaPromotable function returns false because the if (!onlyUsedByLifetimeMarkersOrDroppableInsts(GEPI)) (which allows intrinsics as GEP users only) check fails. I always use store instructions to write the new argument values into the alloca, so this check will always fail for structures and will pass for primitives only. I've tried to run the mem2reg pass using the opt utility and get the same result, my IR generated by the ArgumentPromotion pass looks as not optimized. The SROA pass works fine, but I'm not sure is this a good idea to run SROA directly from another pass (the PromoteMemToReg function creates an instance of the PromoteMem2Reg class and calls its run member function but this class is not a pass).

In D125485#3515850, @psamolysov wrote:

Unfortunately, PromoteMemToReg optimizes almost nothing. If there is a GEP from an alloca, and this GEP is used by store, for example, the isAllocaPromotable function returns false because the if (!onlyUsedByLifetimeMarkersOrDroppableInsts(GEPI)) (which allows intrinsics as GEP users only) check fails. I always use store instructions to write the new argument values into the alloca, so this check will always fail for structures and will pass for primitives only. I've tried to run the mem2reg pass using the opt utility and get the same result, my IR generated by the ArgumentPromotion pass looks as not optimized. The SROA pass works fine, but I'm not sure is this a good idea to run SROA directly from another pass (the PromoteMemToReg function creates an instance of the PromoteMem2Reg class and calls its run member function but this class is not a pass).

When rewriting to an alloca, I believe that you should end up with only direct loads and stores to the alloca, without any remaining GEPs. There should be one alloca for each ArgPart, not one alloca for all parts. In that case PromoteMemToReg should be able to promote it.

Hm, I thought about this, having one alloca for every ArgPart looks as a solution, one thing to overcome is how to re-target all the old load/store instructions from the old GEPs from a large argument to the new allocas. I was in doubt how to do this but I believe the same approach with dead instructions detection and elimination from non-byval promotion could work.

ormris removed a subscriber: ormris.May 16 2022, 11:08 AM

I've implemented the approach proposed by @nikic (thank you again and again!) What I see, after PromoteMemToReg, some new inserted function arguments might become unused (so, there are no users for the argument after promotion). Theoretically, we can move the code that generates GEPs and loads in the callers after the code that makes promotion in the callee and take those unused argument into account (so, just not to generate the instructions for such arguments) but it requires to generate a new (more new than already generated NF) definition of the callee without the unused arguments. I tried and it makes the code of the pass very difficult to read and understand, so I believe this is a job for another pass in IPO (and maybe this pass already exists) in pair with DCE.

Implement the approach proposed by @nikic

Herald added a reviewer: sstefan1. · View Herald TranscriptMay 17 2022, 5:30 AM

Harbormaster completed remote builds in B164849: Diff 430002.May 17 2022, 6:07 AM

psamolysov mentioned this in D123669: [ArgPromotion] Use a Visited set to protect dead instruction collection.May 18 2022, 1:21 AM

Reformat the code to let the git-clang-format check pass.

Harbormaster completed remote builds in B165100: Diff 430344.May 18 2022, 5:58 AM

Unfortunately, a CallGraphSCCPass pass (the legacy ArgPromotion pass is a child of this class) cannot use a function analysis pass, DominatorTreeWrapperPass for example. It is interesting, the compiled with the new version of the ArgumentPromotion pass opt tool crashes by this reason only during a run of the Polly tests (you can see the crash in the pre-merge checks on Windows).

A workaround can be the following: convert the legacy ArgPromotion pass into a module pass and use the <llvm/ADT/SCCIterator.h>, the solution is described on the StackOverflow: https://stackoverflow.com/questions/30059622/using-dominatortreewrapperpass-in-callgraphsccpass This approach has a few disadvantages: the code of the CallGraphSCCPass::getAnalysisUsage(AnalysisUsage &) method should be copied into the ArgPromotion::getAnalysisUsage (2 lines are there though) and skipSCC(SCC) won't be available anymore as well as CallGraphSCCPass::doInitialization() but this is an empty method.

-I'm going to implement this workaround to check if this works at all. This has no impact on the ArgumentPromotionPass pass that is used by the new pass manager.-

Fortunately, this is not required, the DominatorTree must be calculated for the new function, cached DominatorTree, even if we are able to get it in the pass, just won't work.

psamolysov updated this revision to Diff 430601.May 19 2022, 1:52 AM

Create a DominatorTree for a new function

New function requires its own dominator tree for the call to the
PromoteMem2Reg function. The dominator tree calculated for an old
function is not actual after the argument promotion and using the
tree may lead to UB inside the PromoteMem2Reg.

I'm in doubt what to do for the Assumption Cache (AC). Currently I use
the AC pre-built for the old function. Our pass does nothing to deal with
the @llvm.assume intrinsics, so I believe the cache should be actual and
for the new function too. Do you agree?

Harbormaster completed remote builds in B165272: Diff 430601.May 19 2022, 2:31 AM

Colleagues the patch is ready for review. Could you have a look again? Thank you.

psamolysov added inline comments.May 23 2022, 1:48 AM

llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
621	I'm not aware about speculative stores. Should we handle the stores in the exactly same way as loads here, so should the parameter be set to `false`?

Fix a typo in the comment to the not used anymore isDenselyPacked static member function.

Harbormaster completed remote builds in B166428: Diff 432213.May 26 2022, 2:29 AM

Ping.

Unfortunately, it looks like @nikic and @aeubanks have no time to review the patch (or maybe I'm getting something wrong and I should have added the nicks via @ in my comments during ping request sending, I'm sorry if you, colleagues, just didn't catch my requests). @chandlerc Could you as a code owner and one of the pass's author have a look at the patch? @jdoerfert @eli.friedman Or you? Thanks a lot in advance.

@nikic @aeubanks Colleagues, ping, could you have a look at the changes?

Sorry for the long delay here. This looks pretty nice!

llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
323	`SmallVector<AllocaInst *>`
327	I doubt this code is performance-critical enough to need this accumulate + reserve :)
367	Can drop the `false` argument, non-volatile store is the default.
374–380	This comment looks a bit out of place, as this closure doesn't remove any dead instructions itself.
382	Mild preference for `OffsetToAlloca.lookup()` here, which makes it clear that we don't intend to modify the map (which `operator[]` can do).
398	It's possible to reuse one IRBuilder instance using `IRB.SetInsertPoint()`.
400	Can omit empty name, it's the default.
414	Volatile operations can't be promoted, can assert that `!SI->isVolatile()`.
419	Can we `assert(isAllocaPromotable())` here? I think that by design, we should only produce promotable allocas.
486–498	The "the" can be dropped here.
490	IsStoresAllowed -> AreStoresAllowed
539	I would replaced this with "accessed as" to cover both. Otherwise it sounds like it's either both loads or both stores with conflicting types, but it can also be a mix.
573	else if
621	I think the parameter should be false here, because the store is not guaranteed to executed, and it would be confusing otherwise. The relevant part here is that we're not going to speculate any stores, so we don't care about the alignment at all -- we can explicitly skip the alignment code in HandleEndUser for stores to make this clear.
627	What happens if the we're storing the argument into itself? `store ptr %arg, ptr %arg` or similar. I don't think your code handles this correct right now? It might be best to walk over `Use`s rather than `User`s, so we can check which operand the use is on.
653	I think it's safe to drop these TODOs -- this is not really compatible with the new promotion approach that uses separate allocas (we'd have to reconstruct the result from multiple allocas, which makes little sense).

aeubanks added inline comments.Jun 23 2022, 10:29 AM

llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
431–432	creating the DominatorTree should go through something like `function_ref<DominatorTree &(Function &F)> DTGetter`, see `AARGetter`. but we run SROA (including mem2reg) right after argpromotion, is there a reason to run mem2reg here rather than leave it for the function simplification pipeline? then we wouldn't need to worry about AssumptionCache/DominatorTree here since they're purely used for mem2reg

@nikic Thank you very much for the review and comments, I've addressed almost all of them and asked a pair of questions about alignment check for loads in the HandleEndUser lambda. Could you have a look and give your answers?

llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
327	I agree, removed this `accumulate` and `reserve`.
374–380	Thank you. I've moved the comment just before the dead code elimination loop.
398	I wasn't aware about this method, thanks!
419	I've added the assertion for `isAllocaPromotable`.
431–432	About running `mem2reg` in the pass. I tried to remove the `mem2reg` and run a few tests with `opt -O3` (and adding the `noinline` attribute for the callee before), and see that the almost of the code of the callees is optimized out, so the idea just to remove the `mem2reg` call from the pass looks suitable. @nikic what is your opinion? Should we call the `mem2reg` inside the argument promotion pass or let the compilation pipeline call it? P.S. I'm also planning to add the `Dead Argument Elimination` pass into the pipeline after landing this patch because `mem2reg` after new argument promotion leaves unused arguments.
431–432	@aeubanks Thank you for the suggestion about `DTGetter`. Unfortunately, I cannot get why the `DTGetter` should be used. As I get, `AARGetter` is a wrapper over the `FAM.getResult<AAManager>(F)` for the new PM and is just an instance of the `LegacyAARGetter` class for the legacy PM. It's goal is to give the correct AAR depending on the used pass manager. In our case, the `DominatorTree` is just built again for a new function after the argument promotion and doesn't depend on the used pass manager.
539	Ups... I've forgotten that `ArgParts` collected all the promotable parts for loads as well as for stores and a mix is possible.
621	Thank you for the suggestion. I've changed the argument's value to `true` and fix the alignment check in the `HandleEndUser` in order to let it work for loads only. Also, I've edited the comment before the check to make it clear that only loads are checked. Two questions, please. Currently `Part` can be either a load or a store previously saved in the `ArgParts` map by the same offset. Should the condition that the `MustExecInstr` in the seen before `ArgPart` is a load (or at least no a store) also be added? Also, when the `Part.Alignment` field is recalculated - `Part.Alignment = std::max(Part.Alignment, I->getAlign());`, should we do this whenever `I` is a load only or in any case? Thank you.
627	Good catch! This is correct: when I tried `store ptr %arg, ptr %arg`, an access violation error in the `llvm::Instruction::eraseFromParent()` function occurred because in the dead instruction elimination loop in the `doPromotion` function, the `store` instruction handles twice. Thank you for the idea! I've rewritten the walk to use `Use`s instead of `User`s and to check that the value the `store` is the user of is not the `store`'s value operand. If so, this `store` is an "unknown user" because we can store something into the pointer but not the pointer's value into somewhere. A LIT test was also added. I tried to add a similar check for the `load` instructions (if the `load`'s pointer operand is exactly the value the `load` is the user of) and remove the check in `HandleEndUser` that `Ptr` is equal to `Arg`. Unfortunately, this makes no sense because before the walking, there is a loop where we try to handle every `load` instruction in the entry BB for every argument and this check actually checks whether the current `load` instruction is related to the current argument. So, the check is still required.
653	I believe this comment was added because it wasn't clear for me whether stores can be speculative. Thank you for the answer, the situation is clear for now and I've removed my TODO comment as well as the previous one.

Address almost the all reviewer's comments.

Harbormaster completed remote builds in B171814: Diff 439686.Jun 24 2022, 3:55 AM

nikic added inline comments.Jun 24 2022, 4:00 AM

llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
431–432	I'm okay with not doing the promotion here. We should probably add `mem2reg` to the tests though, to make sure we generate trivially promotable code.
599	Put `U->getUser()` into a variable, as it's used so much?
618	I think this works, but the robust way to check this is `U->getOperandNo() == SI->getPointerOperandIndex()`. (We will separately visit the use in the pointer and value operands, and only consider the pointer operand a known use, while the value operand will fall through to the bailout below.)
619	And with the previous change, I believe you can change this to `if (!*HandleEndUser` like for LoadInst, as we show now be guaranteed that this is store based-on the argument.
621	Hm, so I guess it's not quite true that we don't speculate stores, in the sense that we do insert a store to store the argument into the alloca. Of course, as this store gets promoted later, the actual alignment doesn't really matter. It's probably still best to produce correct alignment for it. So I think the safest thing to do here is just not special case the store case at all -- or would that cause any regressions in tests?

nikic added inline comments.Jun 24 2022, 5:05 AM

llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
621	Though we could also explicitly set the align of the alloca to the same value, in which case those would always match. Probably a good idea to do that anyway.

psamolysov marked 6 inline comments as done.Jun 24 2022, 5:24 AM

psamolysov added inline comments.

llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
431–432	I've started to remove the promotion (mem2reg) from the pass and found the following problem: the `chained.ll` test is not worked as it is designed now, so to run the `argpromotion` pass and then `mem2reg` is not the same as run the `mem2reg` in the `argpromotion` pass because the pass runs `mem2reg` after every promotion attempt while the pass manager run passes one by one. If we consider the `chained.ll` case, after the `argpromotion` pass without `mem2reg` call inside the pass, the generated code is the following: define internal i32 @test(i32* %x.0.val) { entry: %x.0.allc = alloca i32, align 8 store i32 %x.0.val, i32** %x.0.allc, align 8 %y = load i32, i32* %x.0.allc, align 8 %z = load i32, i32* %y, align 4 ret i32 %z } define i32 @caller() { entry: %G2.val = load i32, i32* @G2, align 8 %x = call i32 @test(i32* %G2.val) ret i32 %x } And the pass cannot promote the pointer `%x.0.val` again because there is a `store`: ArgPromotion of i32* %x.0.val failed: unknown user store i32* %x.0.val, i32** %x.0.allc, align 8 and the `byval` attribute is not present so stores aren't allowed. If we run `mem2reg` from the pass, the `alloca` and corresponding `store`s will be promoted and the `argpromotion` pass gets a chance to do another iteration. So that to run `mem2reg` from the pass makes sense.
599	OK, thank you for the suggestion.
618	I've applied the suggestion. Thanks.
621	No, it adds no regressions, so I've just remove the check that the instruction is a `load` and use the same check for `load` as well as `store` instructions and reformulated the comment a bit to cover both cases.

Apply suggestions from the code review.

The condition that the instruction is load was removed from the alignment check in the HandleEndUser lambda.

psamolysov added inline comments.Jun 24 2022, 5:29 AM

llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
621	set the align of the alloca to the same value... Should the value be `Pair.second.Alignment` as for the `store` into the alloca instruction on line 365?

nikic added inline comments.Jun 24 2022, 6:04 AM

llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
431–432	Ah, good point. So we do need to run promotion here. Unfortunately, I cannot get why the DTGetter should be used. As I get, AARGetter is a wrapper over the FAM.getResult<AAManager>(F) for the new PM and is just an instance of the LegacyAARGetter class for the legacy PM. It's goal is to give the correct AAR depending on the used pass manager. The general motivation is to cache the DT, so the next pass using it won't have to recompute it. But I don't know what complications the legacy pass manager introduces here. I didn't see any compile-time impact from your current code (though I expect this is mainly because argument promotion triggers rarely.)
621	Yeah, that's what I had in mind.

Harbormaster completed remote builds in B171831: Diff 439709.Jun 24 2022, 6:26 AM

psamolysov marked an inline comment as done.Jun 24 2022, 7:37 AM

psamolysov added inline comments.

llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
431–432	When I tried to reuse the DominatorTree from the legacy pass manager, I got problems on some tests from polly. It looks like my problem was in that I built the Dominator Tree for the old function and tried to reuse it to promote the allocas in the new one. If the Dominator Tree is lazy built for the new function using the `DTGetter`, everything works for the new pass manager (but, as I remember, it worked even when I tried to pass the DT for the old function, I believe it worked because we actually don't change CFG). For the legacy PM, the situation is more dramatic because the `ArgPromotion` is a `CallGraphSCCPass` pass and not a `FunctionPass` one and this code: `getAnalysis<DominatorTreeWrapperPass>(F).getDomTree()` doesn't work as it is expected: the following error occurs on the Legacy PM: Unable to schedule 'Dominator Tree Construction' required by 'Promote 'by reference' arguments to scalars' Unable to schedule pass As a workaround, I just create a new `DominatorTree` instance from the `DTGetter` created from the legacy pass. The idea is stolen from the `lib/Analysis/MustExecute.cpp` file. Unfortunately, this requires to store all the created `DominatorTree` instances while the pass is running.
621	Done.

Implement the proposed solution with DTGetter to reuse DominatorTrees from the pass manager. This works fine on the new PM but not on the legacy one. As a workaround, a new DominatorTree is created for every new generated function whenever the Argument Promotion Pass is used through the legacy pass manager.

nikic mentioned this in D128536: [ArgPromotion] Remove legacy PM support.Jun 24 2022, 8:14 AM

We should be fetching AssumptionCache for the new function rather than the old function as well. I think it is working out in practice because there won't be any assumes, but if it were queried, it would be on the wrong function.

For the legacy pass manager, I would like to propose this innovative solution: D128536 IMHO, at this point, if the legacy pass manager is causing issues, we should be fixing them by dropping support for it.

@nikic I believe this is a very good idea just to drop the legacy PM support in ArgPromotion. Thank you for the patch, once your patch has been landed, I'll just leave the single DTGetter.

We should be fetching AssumptionCache for the new function rather than the old function as well.

For AssumptionCache there should be no problem with the legacy PM because the following construct is supported:

getAnalysis<AssumptionCacheTracker>().getAssumptionCache(*OldF)

I've added the ACGetter lambdas to the modern pass as well as the legacy one and I'm ready to remove the getter for the legacy pass as soon as the pass has been removed.

Fetch AssumptionCache for the new function rather than the old one.

Harbormaster completed remote builds in B171878: Diff 439781.Jun 24 2022, 9:51 AM

nikic mentioned this in rG217e85761cd1: [ArgPromotion] Remove legacy PM support.Jun 27 2022, 12:42 AM

Okay, D128536 has landed, can you please rebase this patch?

Rebase the patch on main, remove all the stuff related to the legacy pass manager.

@nikic thank you for landing D128536. The patch has been rebased.

LGTM

llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
402	Just realized that we can probably just do a `LI->setOperand(0, GetAlloca(Ptr))` here and don't really need to create a new instruction and RAUW.
632	The first line of this comment can be kept.

This revision is now accepted and ready to land.Jun 27 2022, 3:29 AM

Apply the suggestion about doing a LI->setOperand(0, GetAlloca(Ptr)) and not really creating a new instruction and RAUW.

psamolysov added inline comments.Jun 27 2022, 4:04 AM

llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
402	Replaced with `LI->setOperand(LoadInst::getPointerOperandIndex(), GetAlloca(Ptr))` Thank you very much for the suggestion. I've applied it for `store` instructions too. One thing: when the `LI` was replaced with `OffsetToArg[Offset.getSExtValue()]` in the old code or with the `NewLI` in the patch, the instruction's metadata was not copied. Currently, when we are actually don't replace the instruction, the metadata is reused and in the `metadata.ll` test, the following code is generated: %1 = icmp ne i32* %p2.0.val, null call void @llvm.assume(i1 %1) for this line of code: %v2 = load i32, i32* %p2, !nonnull !1 I've updated the test to take this behavior into account, but is this correct and what is expected?

nikic accepted this revision.Jun 27 2022, 4:16 AM

nikic added inline comments.

llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
402	Yes, this is exactly the effect I wanted to see :)

@nikic Thank you for accepting. I've got the commit access and able to land patches. I'm going to land the patch in a day or two.

Harbormaster completed remote builds in B172154: Diff 440160.Jun 27 2022, 5:27 AM

This revision was landed with ongoing or failed builds.Jun 28 2022, 5:23 AM

Closed by commit rG170c4d21bd94: [ArgPromotion] Unify byval promotion with non-byval (authored by psamolysov). · Explain Why

This revision was automatically updated to reflect the committed changes.

psamolysov added a commit: rG170c4d21bd94: [ArgPromotion] Unify byval promotion with non-byval.

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

IPO/

ArgumentPromotion.h

2 lines

lib/

Transforms/

IPO/

ArgumentPromotion.cpp

373 lines

test/

Transforms/

ArgumentPromotion/

attrs.ll

22 lines

byval-2.ll

23 lines

byval-through-pointer-promotion.ll

	byval-with-padding.ll
	byval-through-pointer-promotion.ll

116 lines

22 lines

45 lines

4 lines

30 lines

102 lines

Diff 440571

llvm/include/llvm/Transforms/IPO/ArgumentPromotion.h

	Show All 22 Lines
	/// direct (by-value) arguments.			/// direct (by-value) arguments.
	class ArgumentPromotionPass : public PassInfoMixin<ArgumentPromotionPass> {			class ArgumentPromotionPass : public PassInfoMixin<ArgumentPromotionPass> {
	unsigned MaxElements;			unsigned MaxElements;

	public:			public:
	ArgumentPromotionPass(unsigned MaxElements = 2u) : MaxElements(MaxElements) {}			ArgumentPromotionPass(unsigned MaxElements = 2u) : MaxElements(MaxElements) {}

	/// Checks if a type could have padding bytes.			/// Checks if a type could have padding bytes.
				// TODO the function aren't used in the ArgumentPromotionPass anymore and
				// should be moved into AttributorAttributes.cpp as the single known user.
	static bool isDenselyPacked(Type *Ty, const DataLayout &DL);			static bool isDenselyPacked(Type *Ty, const DataLayout &DL);

	PreservedAnalyses run(LazyCallGraph::SCC &C, CGSCCAnalysisManager &AM,			PreservedAnalyses run(LazyCallGraph::SCC &C, CGSCCAnalysisManager &AM,
	LazyCallGraph &CG, CGSCCUpdateResult &UR);			LazyCallGraph &CG, CGSCCUpdateResult &UR);
	};			};

	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_TRANSFORMS_IPO_ARGUMENTPROMOTION_H			#endif // LLVM_TRANSFORMS_IPO_ARGUMENTPROMOTION_H

llvm/lib/Transforms/IPO/ArgumentPromotion.cpp

Show All 23 Lines
// Note that this transformation could also be done for arguments that are only		// Note that this transformation could also be done for arguments that are only
// stored to (returning the value instead), but does not currently. This case		// stored to (returning the value instead), but does not currently. This case
// would be best handled when and if LLVM begins supporting multiple return		// would be best handled when and if LLVM begins supporting multiple return
// values from functions.		// values from functions.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/IPO/ArgumentPromotion.h"		#include "llvm/Transforms/IPO/ArgumentPromotion.h"

#include "llvm/ADT/DepthFirstIterator.h"		#include "llvm/ADT/DepthFirstIterator.h"
#include "llvm/ADT/None.h"		#include "llvm/ADT/None.h"
#include "llvm/ADT/Optional.h"		#include "llvm/ADT/Optional.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/ScopeExit.h"		#include "llvm/ADT/ScopeExit.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
Show All 11 Lines
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Argument.h"		#include "llvm/IR/Argument.h"
#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/NoFolder.h"		#include "llvm/IR/NoFolder.h"
#include "llvm/IR/PassManager.h"		#include "llvm/IR/PassManager.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/IR/Use.h"		#include "llvm/IR/Use.h"
#include "llvm/IR/User.h"		#include "llvm/IR/User.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/IPO.h"		#include "llvm/Transforms/IPO.h"
		#include "llvm/Transforms/Utils/PromoteMemToReg.h"
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>
#include <utility>		#include <utility>
#include <vector>		#include <vector>

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "argpromotion"		#define DEBUG_TYPE "argpromotion"

STATISTIC(NumArgumentsPromoted, "Number of pointer arguments promoted");		STATISTIC(NumArgumentsPromoted, "Number of pointer arguments promoted");
STATISTIC(NumByValArgsPromoted, "Number of byval arguments promoted");
STATISTIC(NumArgumentsDead, "Number of dead pointer args eliminated");		STATISTIC(NumArgumentsDead, "Number of dead pointer args eliminated");

namespace {		namespace {

struct ArgPart {		struct ArgPart {
Type *Ty;		Type *Ty;
Align Alignment;		Align Alignment;
/// A representative guaranteed-executed load instruction for use by		/// A representative guaranteed-executed load or store instruction for use by
/// metadata transfer.		/// metadata transfer.
LoadInst *MustExecLoad;		Instruction *MustExecInstr;
};		};

using OffsetAndArgPart = std::pair<int64_t, ArgPart>;		using OffsetAndArgPart = std::pair<int64_t, ArgPart>;

} // end anonymous namespace		} // end anonymous namespace

static Value *createByteGEP(IRBuilderBase &IRB, const DataLayout &DL,		static Value *createByteGEP(IRBuilderBase &IRB, const DataLayout &DL,
Value Ptr, Type ResElemTy, int64_t Offset) {		Value Ptr, Type ResElemTy, int64_t Offset) {
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	static Value *createByteGEP(IRBuilderBase &IRB, const DataLayout &DL,
}		}
return IRB.CreateBitCast(Ptr, ResElemTy->getPointerTo(AddrSpace));		return IRB.CreateBitCast(Ptr, ResElemTy->getPointerTo(AddrSpace));
}		}

/// DoPromotion - This method actually performs the promotion of the specified		/// DoPromotion - This method actually performs the promotion of the specified
/// arguments, and returns the new function. At this point, we know that it's		/// arguments, and returns the new function. At this point, we know that it's
/// safe to do so.		/// safe to do so.
static Function *doPromotion(		static Function *doPromotion(
Function *F,		Function *F, function_ref<DominatorTree &(Function &F)> DTGetter,
		function_ref<AssumptionCache *(Function &F)> ACGetter,
const DenseMap<Argument *, SmallVector<OffsetAndArgPart, 4>> &ArgsToPromote,		const DenseMap<Argument *, SmallVector<OffsetAndArgPart, 4>> &ArgsToPromote,
SmallPtrSetImpl<Argument *> &ByValArgsToTransform,
Optional<function_ref<void(CallBase &OldCS, CallBase &NewCS)>>		Optional<function_ref<void(CallBase &OldCS, CallBase &NewCS)>>
ReplaceCallSite) {		ReplaceCallSite) {
// Start by computing a new prototype for the function, which is the same as		// Start by computing a new prototype for the function, which is the same as
// the old function, but has modified arguments.		// the old function, but has modified arguments.
FunctionType *FTy = F->getFunctionType();		FunctionType *FTy = F->getFunctionType();
std::vector<Type *> Params;		std::vector<Type *> Params;

// Attribute - Keep track of the parameter attributes for the arguments		// Attribute - Keep track of the parameter attributes for the arguments
// that we are not promoting. For the ones that we do promote, the parameter		// that we are not promoting. For the ones that we do promote, the parameter
// attributes are lost		// attributes are lost
SmallVector<AttributeSet, 8> ArgAttrVec;		SmallVector<AttributeSet, 8> ArgAttrVec;
AttributeList PAL = F->getAttributes();		AttributeList PAL = F->getAttributes();

// First, determine the new argument list		// First, determine the new argument list
unsigned ArgNo = 0;		unsigned ArgNo = 0;
for (Function::arg_iterator I = F->arg_begin(), E = F->arg_end(); I != E;		for (Function::arg_iterator I = F->arg_begin(), E = F->arg_end(); I != E;
++I, ++ArgNo) {		++I, ++ArgNo) {
if (ByValArgsToTransform.count(&*I)) {		if (!ArgsToPromote.count(&*I)) {
// Simple byval argument? Just add all the struct element types.
Type *AgTy = I->getParamByValType();
StructType *STy = cast<StructType>(AgTy);
llvm::append_range(Params, STy->elements());
ArgAttrVec.insert(ArgAttrVec.end(), STy->getNumElements(),
AttributeSet());
++NumByValArgsPromoted;
} else if (!ArgsToPromote.count(&*I)) {
// Unchanged argument		// Unchanged argument
Params.push_back(I->getType());		Params.push_back(I->getType());
ArgAttrVec.push_back(PAL.getParamAttrs(ArgNo));		ArgAttrVec.push_back(PAL.getParamAttrs(ArgNo));
} else if (I->use_empty()) {		} else if (I->use_empty()) {
// Dead argument (which are always marked as promotable)		// Dead argument (which are always marked as promotable)
++NumArgumentsDead;		++NumArgumentsDead;
} else {		} else {
const auto &ArgParts = ArgsToPromote.find(&*I)->second;		const auto &ArgParts = ArgsToPromote.find(&*I)->second;
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	while (!F->use_empty()) {
const AttributeList &CallPAL = CB.getAttributes();		const AttributeList &CallPAL = CB.getAttributes();
IRBuilder<NoFolder> IRB(&CB);		IRBuilder<NoFolder> IRB(&CB);

// Loop over the operands, inserting GEP and loads in the caller as		// Loop over the operands, inserting GEP and loads in the caller as
// appropriate.		// appropriate.
auto *AI = CB.arg_begin();		auto *AI = CB.arg_begin();
ArgNo = 0;		ArgNo = 0;
for (Function::arg_iterator I = F->arg_begin(), E = F->arg_end(); I != E;		for (Function::arg_iterator I = F->arg_begin(), E = F->arg_end(); I != E;
++I, ++AI, ++ArgNo)		++I, ++AI, ++ArgNo) {
if (!ArgsToPromote.count(&I) && !ByValArgsToTransform.count(&I)) {		if (!ArgsToPromote.count(&*I)) {
Args.push_back(*AI); // Unmodified argument		Args.push_back(*AI); // Unmodified argument
ArgAttrVec.push_back(CallPAL.getParamAttrs(ArgNo));		ArgAttrVec.push_back(CallPAL.getParamAttrs(ArgNo));
} else if (ByValArgsToTransform.count(&*I)) {
// Emit a GEP and load for each element of the struct.
Type *AgTy = I->getParamByValType();
StructType *STy = cast<StructType>(AgTy);
Value *Idxs[2] = {
ConstantInt::get(Type::getInt32Ty(F->getContext()), 0), nullptr};
const StructLayout *SL = DL.getStructLayout(STy);
Align StructAlign = *I->getParamAlign();
for (unsigned J = 0, Elems = STy->getNumElements(); J != Elems; ++J) {
Idxs[1] = ConstantInt::get(Type::getInt32Ty(F->getContext()), J);
auto *Idx =
IRB.CreateGEP(STy, AI, Idxs, (AI)->getName() + "." + Twine(J));
// TODO: Tell AA about the new values?
Align Alignment =
commonAlignment(StructAlign, SL->getElementOffset(J));
Args.push_back(IRB.CreateAlignedLoad(
STy->getElementType(J), Idx, Alignment, Idx->getName() + ".val"));
ArgAttrVec.push_back(AttributeSet());
}
} else if (!I->use_empty()) {		} else if (!I->use_empty()) {
Value V = AI;		Value V = AI;
const auto &ArgParts = ArgsToPromote.find(&*I)->second;		const auto &ArgParts = ArgsToPromote.find(&*I)->second;
for (const auto &Pair : ArgParts) {		for (const auto &Pair : ArgParts) {
LoadInst *LI = IRB.CreateAlignedLoad(		LoadInst *LI = IRB.CreateAlignedLoad(
Pair.second.Ty,		Pair.second.Ty,
createByteGEP(IRB, DL, V, Pair.second.Ty, Pair.first),		createByteGEP(IRB, DL, V, Pair.second.Ty, Pair.first),
Pair.second.Alignment, V->getName() + ".val");		Pair.second.Alignment, V->getName() + ".val");
if (Pair.second.MustExecLoad) {		if (Pair.second.MustExecInstr) {
LI->setAAMetadata(Pair.second.MustExecLoad->getAAMetadata());		LI->setAAMetadata(Pair.second.MustExecInstr->getAAMetadata());
LI->copyMetadata(*Pair.second.MustExecLoad,		LI->copyMetadata(*Pair.second.MustExecInstr,
{LLVMContext::MD_range, LLVMContext::MD_nonnull,		{LLVMContext::MD_range, LLVMContext::MD_nonnull,
LLVMContext::MD_dereferenceable,		LLVMContext::MD_dereferenceable,
LLVMContext::MD_dereferenceable_or_null,		LLVMContext::MD_dereferenceable_or_null,
LLVMContext::MD_align, LLVMContext::MD_noundef});		LLVMContext::MD_align, LLVMContext::MD_noundef});
}		}
Args.push_back(LI);		Args.push_back(LI);
ArgAttrVec.push_back(AttributeSet());		ArgAttrVec.push_back(AttributeSet());
}		}
}		}
		}

// Push any varargs arguments on the list.		// Push any varargs arguments on the list.
for (; AI != CB.arg_end(); ++AI, ++ArgNo) {		for (; AI != CB.arg_end(); ++AI, ++ArgNo) {
Args.push_back(*AI);		Args.push_back(*AI);
ArgAttrVec.push_back(CallPAL.getParamAttrs(ArgNo));		ArgAttrVec.push_back(CallPAL.getParamAttrs(ArgNo));
}		}

SmallVector<OperandBundleDef, 1> OpBundles;		SmallVector<OperandBundleDef, 1> OpBundles;
Show All 33 Lines	while (!F->use_empty()) {
CB.eraseFromParent();		CB.eraseFromParent();
}		}

// Since we have now created the new function, splice the body of the old		// Since we have now created the new function, splice the body of the old
// function right into the new function, leaving the old rotting hulk of the		// function right into the new function, leaving the old rotting hulk of the
// function empty.		// function empty.
NF->getBasicBlockList().splice(NF->begin(), F->getBasicBlockList());		NF->getBasicBlockList().splice(NF->begin(), F->getBasicBlockList());

		// We will collect all the new created allocas to promote them into registers
		// after the following loop
		SmallVector<AllocaInst *, 4> Allocas;
		nikicUnsubmitted Done Reply Inline Actions `SmallVector<AllocaInst >` nikic:* `SmallVector<AllocaInst *>`

// Loop over the argument list, transferring uses of the old arguments over to		// Loop over the argument list, transferring uses of the old arguments over to
// the new arguments, also transferring over the names as well.		// the new arguments, also transferring over the names as well.
Function::arg_iterator I2 = NF->arg_begin();		Function::arg_iterator I2 = NF->arg_begin();
		nikicUnsubmitted Done Reply Inline Actions I doubt this code is performance-critical enough to need this accumulate + reserve :) nikic: I doubt this code is performance-critical enough to need this accumulate + reserve :)
		psamolysovAuthorUnsubmitted Done Reply Inline Actions I agree, removed this `accumulate` and `reserve`. psamolysov: I agree, removed this `accumulate` and `reserve`.
for (Argument &Arg : F->args()) {		for (Argument &Arg : F->args()) {
if (!ArgsToPromote.count(&Arg) && !ByValArgsToTransform.count(&Arg)) {		if (!ArgsToPromote.count(&Arg)) {
// If this is an unmodified argument, move the name and users over to the		// If this is an unmodified argument, move the name and users over to the
// new version.		// new version.
Arg.replaceAllUsesWith(&*I2);		Arg.replaceAllUsesWith(&*I2);
I2->takeName(&Arg);		I2->takeName(&Arg);
++I2;		++I2;
continue;		continue;
}		}

if (ByValArgsToTransform.count(&Arg)) {
// In the callee, we create an alloca, and store each of the new incoming
// arguments into the alloca.
Instruction *InsertPt = &NF->begin()->front();

// Just add all the struct element types.
Type *AgTy = Arg.getParamByValType();
Align StructAlign = *Arg.getParamAlign();
Value *TheAlloca = new AllocaInst(AgTy, DL.getAllocaAddrSpace(), nullptr,
StructAlign, "", InsertPt);
StructType *STy = cast<StructType>(AgTy);
Value *Idxs[2] = {ConstantInt::get(Type::getInt32Ty(F->getContext()), 0),
nullptr};
const StructLayout *SL = DL.getStructLayout(STy);

for (unsigned J = 0, Elems = STy->getNumElements(); J != Elems; ++J) {
Idxs[1] = ConstantInt::get(Type::getInt32Ty(F->getContext()), J);
Value *Idx = GetElementPtrInst::Create(
AgTy, TheAlloca, Idxs, TheAlloca->getName() + "." + Twine(J),
InsertPt);
I2->setName(Arg.getName() + "." + Twine(J));
Align Alignment = commonAlignment(StructAlign, SL->getElementOffset(J));
new StoreInst(&*I2++, Idx, false, Alignment, InsertPt);
}

// Anything that used the arg should now use the alloca.
Arg.replaceAllUsesWith(TheAlloca);
TheAlloca->takeName(&Arg);
continue;
}

// There potentially are metadata uses for things like llvm.dbg.value.		// There potentially are metadata uses for things like llvm.dbg.value.
// Replace them with undef, after handling the other regular uses.		// Replace them with undef, after handling the other regular uses.
auto RauwUndefMetadata = make_scope_exit(		auto RauwUndefMetadata = make_scope_exit(
[&]() { Arg.replaceAllUsesWith(UndefValue::get(Arg.getType())); });		[&]() { Arg.replaceAllUsesWith(UndefValue::get(Arg.getType())); });

if (Arg.use_empty())		if (Arg.use_empty())
continue;		continue;

SmallDenseMap<int64_t, Argument *> OffsetToArg;		// Otherwise, if we promoted this argument, we have to create an alloca in
		// the callee for every promotable part and store each of the new incoming
		// arguments into the corresponding alloca, what lets the old code (the
		// store instructions if they are allowed especially) a chance to work as
		// before.
		assert(Arg.getType()->isPointerTy() &&
		"Only arguments with a pointer type are promotable");

		IRBuilder<NoFolder> IRB(&NF->begin()->front());

		// Add only the promoted elements, so parts from ArgsToPromote
		SmallDenseMap<int64_t, AllocaInst *> OffsetToAlloca;
for (const auto &Pair : ArgsToPromote.find(&Arg)->second) {		for (const auto &Pair : ArgsToPromote.find(&Arg)->second) {
Argument &NewArg = *I2++;		int64_t Offset = Pair.first;
NewArg.setName(Arg.getName() + "." + Twine(Pair.first) + ".val");		const ArgPart &Part = Pair.second;
OffsetToArg.insert({Pair.first, &NewArg});
		Argument *NewArg = I2++;
		NewArg->setName(Arg.getName() + "." + Twine(Offset) + ".val");

		AllocaInst *NewAlloca = IRB.CreateAlloca(
		Part.Ty, nullptr, Arg.getName() + "." + Twine(Offset) + ".allc");
		NewAlloca->setAlignment(Pair.second.Alignment);
		nikicUnsubmitted Done Reply Inline Actions Can drop the `false` argument, non-volatile store is the default. nikic: Can drop the `false` argument, non-volatile store is the default.
		IRB.CreateAlignedStore(NewArg, NewAlloca, Pair.second.Alignment);

		// Collect the alloca to retarget the users to
		OffsetToAlloca.insert({Offset, NewAlloca});
}		}

// Otherwise, if we promoted this argument, then all users are load		auto GetAlloca = [&](Value *Ptr) {
// instructions (with possible casts and GEPs in between).		APInt Offset(DL.getIndexTypeSizeInBits(Ptr->getType()), 0);
		Ptr = Ptr->stripAndAccumulateConstantOffsets(DL, Offset,
		/* AllowNonInbounds */ true);
		assert(Ptr == &Arg && "Not constant offset from arg?");
		return OffsetToAlloca.lookup(Offset.getSExtValue());
		};
		nikicUnsubmitted Done Reply Inline Actions This comment looks a bit out of place, as this closure doesn't remove any dead instructions itself. nikic: This comment looks a bit out of place, as this closure doesn't remove any dead instructions…
		psamolysovAuthorUnsubmitted Done Reply Inline Actions Thank you. I've moved the comment just before the dead code elimination loop. psamolysov: Thank you. I've moved the comment just before the dead code elimination loop.

		// Cleanup the code from the dead instructions: GEPs and BitCasts in between
		nikicUnsubmitted Done Reply Inline Actions Mild preference for `OffsetToAlloca.lookup()` here, which makes it clear that we don't intend to modify the map (which `operator[]` can do). nikic: Mild preference for `OffsetToAlloca.lookup()` here, which makes it clear that we don't intend…
		// the original argument and its users: loads and stores. Retarget every
		// user to the new created alloca.
SmallVector<Value *, 16> Worklist;		SmallVector<Value *, 16> Worklist;
SmallVector<Instruction *, 16> DeadInsts;		SmallVector<Instruction *, 16> DeadInsts;
append_range(Worklist, Arg.users());		append_range(Worklist, Arg.users());
while (!Worklist.empty()) {		while (!Worklist.empty()) {
Value *V = Worklist.pop_back_val();		Value *V = Worklist.pop_back_val();
if (isa<BitCastInst>(V) \|\| isa<GetElementPtrInst>(V)) {		if (isa<BitCastInst>(V) \|\| isa<GetElementPtrInst>(V)) {
DeadInsts.push_back(cast<Instruction>(V));		DeadInsts.push_back(cast<Instruction>(V));
append_range(Worklist, V->users());		append_range(Worklist, V->users());
continue;		continue;
}		}

if (auto *LI = dyn_cast<LoadInst>(V)) {		if (auto *LI = dyn_cast<LoadInst>(V)) {
Value *Ptr = LI->getPointerOperand();		Value *Ptr = LI->getPointerOperand();
APInt Offset(DL.getIndexTypeSizeInBits(Ptr->getType()), 0);		LI->setOperand(LoadInst::getPointerOperandIndex(), GetAlloca(Ptr));
		nikicUnsubmitted Done Reply Inline Actions It's possible to reuse one IRBuilder instance using `IRB.SetInsertPoint()`. nikic: It's possible to reuse one IRBuilder instance using `IRB.SetInsertPoint()`.
		psamolysovAuthorUnsubmitted Done Reply Inline Actions I wasn't aware about this method, thanks! psamolysov: I wasn't aware about this method, thanks!
Ptr =		continue;
Ptr->stripAndAccumulateConstantOffsets(DL, Offset,		}
		nikicUnsubmitted Done Reply Inline Actions Can omit empty name, it's the default. nikic: Can omit empty name, it's the default.
/* AllowNonInbounds */ true);
assert(Ptr == &Arg && "Not constant offset from arg?");		if (auto *SI = dyn_cast<StoreInst>(V)) {
		nikicUnsubmitted Not Done Reply Inline Actions Just realized that we can probably just do a `LI->setOperand(0, GetAlloca(Ptr))` here and don't really need to create a new instruction and RAUW. nikic: Just realized that we can probably just do a `LI->setOperand(0, GetAlloca(Ptr))` here and don't…
		psamolysovAuthorUnsubmitted Done Reply Inline Actions Replaced with `LI->setOperand(LoadInst::getPointerOperandIndex(), GetAlloca(Ptr))` Thank you very much for the suggestion. I've applied it for `store` instructions too. One thing: when the `LI` was replaced with `OffsetToArg[Offset.getSExtValue()]` in the old code or with the `NewLI` in the patch, the instruction's metadata was not copied. Currently, when we are actually don't replace the instruction, the metadata is reused and in the `metadata.ll` test, the following code is generated: %1 = icmp ne i32* %p2.0.val, null call void @llvm.assume(i1 %1) for this line of code: %v2 = load i32, i32* %p2, !nonnull !1 I've updated the test to take this behavior into account, but is this correct and what is expected? psamolysov: Replaced with `LI->setOperand(LoadInst::getPointerOperandIndex(), GetAlloca(Ptr))` Thank you…
		nikicUnsubmitted Not Done Reply Inline Actions Yes, this is exactly the effect I wanted to see :) nikic: Yes, this is exactly the effect I wanted to see :)
LI->replaceAllUsesWith(OffsetToArg[Offset.getSExtValue()]);		assert(!SI->isVolatile() && "Volatile operations can't be promoted.");
DeadInsts.push_back(LI);		Value *Ptr = SI->getPointerOperand();
		SI->setOperand(StoreInst::getPointerOperandIndex(), GetAlloca(Ptr));
continue;		continue;
}		}

llvm_unreachable("Unexpected user");		llvm_unreachable("Unexpected user");
}		}

for (Instruction *I : DeadInsts) {		for (Instruction *I : DeadInsts) {
I->replaceAllUsesWith(PoisonValue::get(I->getType()));		I->replaceAllUsesWith(PoisonValue::get(I->getType()));
I->eraseFromParent();		I->eraseFromParent();
		nikicUnsubmitted Done Reply Inline Actions Volatile operations can't be promoted, can assert that `!SI->isVolatile()`. nikic: Volatile operations can't be promoted, can assert that `!SI->isVolatile()`.
}		}

		// Collect the allocas for promotion
		for (const auto &Pair : OffsetToAlloca) {
		assert(isAllocaPromotable(Pair.second) &&
		nikicUnsubmitted Done Reply Inline Actions Can we `assert(isAllocaPromotable())` here? I think that by design, we should only produce promotable allocas. nikic: Can we `assert(isAllocaPromotable())` here? I think that by design, we should only produce…
		psamolysovAuthorUnsubmitted Done Reply Inline Actions I've added the assertion for `isAllocaPromotable`. psamolysov: I've added the assertion for `isAllocaPromotable`.
		"By design, only promotable allocas should be produced.");
		Allocas.push_back(Pair.second);
		}
		}

		LLVM_DEBUG(dbgs() << "ARG PROMOTION: " << Allocas.size()
		<< " alloca(s) are promotable by Mem2Reg\n");

		if (!Allocas.empty()) {
		// And we are able to call the `promoteMemoryToRegister()` function.
		// Our earlier checks have ensured that PromoteMemToReg() will
		// succeed.
		PromoteMemToReg(Allocas, DTGetter(NF), ACGetter(NF));
		aeubanksUnsubmitted Not Done Reply Inline Actions creating the DominatorTree should go through something like `function_ref<DominatorTree &(Function &F)> DTGetter`, see `AARGetter`. but we run SROA (including mem2reg) right after argpromotion, is there a reason to run mem2reg here rather than leave it for the function simplification pipeline? then we wouldn't need to worry about AssumptionCache/DominatorTree here since they're purely used for mem2reg aeubanks: creating the DominatorTree should go through something like `function_ref<DominatorTree &…
		psamolysovAuthorUnsubmitted Done Reply Inline Actions About running `mem2reg` in the pass. I tried to remove the `mem2reg` and run a few tests with `opt -O3` (and adding the `noinline` attribute for the callee before), and see that the almost of the code of the callees is optimized out, so the idea just to remove the `mem2reg` call from the pass looks suitable. @nikic what is your opinion? Should we call the `mem2reg` inside the argument promotion pass or let the compilation pipeline call it? P.S. I'm also planning to add the `Dead Argument Elimination` pass into the pipeline after landing this patch because `mem2reg` after new argument promotion leaves unused arguments. psamolysov: About running `mem2reg` in the pass. I tried to remove the `mem2reg` and run a few tests with…
		psamolysovAuthorUnsubmitted Done Reply Inline Actions @aeubanks Thank you for the suggestion about `DTGetter`. Unfortunately, I cannot get why the `DTGetter` should be used. As I get, `AARGetter` is a wrapper over the `FAM.getResult<AAManager>(F)` for the new PM and is just an instance of the `LegacyAARGetter` class for the legacy PM. It's goal is to give the correct AAR depending on the used pass manager. In our case, the `DominatorTree` is just built again for a new function after the argument promotion and doesn't depend on the used pass manager. psamolysov: @aeubanks Thank you for the suggestion about `DTGetter`. Unfortunately, I cannot get why the…
		nikicUnsubmitted Not Done Reply Inline Actions I'm okay with not doing the promotion here. We should probably add `mem2reg` to the tests though, to make sure we generate trivially promotable code. nikic: I'm okay with not doing the promotion here. We should probably add `mem2reg` to the tests…
		psamolysovAuthorUnsubmitted Done Reply Inline Actions I've started to remove the promotion (mem2reg) from the pass and found the following problem: the `chained.ll` test is not worked as it is designed now, so to run the `argpromotion` pass and then `mem2reg` is not the same as run the `mem2reg` in the `argpromotion` pass because the pass runs `mem2reg` after every promotion attempt while the pass manager run passes one by one. If we consider the `chained.ll` case, after the `argpromotion` pass without `mem2reg` call inside the pass, the generated code is the following: define internal i32 @test(i32* %x.0.val) { entry: %x.0.allc = alloca i32, align 8 store i32 %x.0.val, i32** %x.0.allc, align 8 %y = load i32, i32* %x.0.allc, align 8 %z = load i32, i32* %y, align 4 ret i32 %z } define i32 @caller() { entry: %G2.val = load i32, i32* @G2, align 8 %x = call i32 @test(i32* %G2.val) ret i32 %x } And the pass cannot promote the pointer `%x.0.val` again because there is a `store`: ArgPromotion of i32* %x.0.val failed: unknown user store i32* %x.0.val, i32 %x.0.allc, align 8 and the `byval` attribute is not present so stores aren't allowed. If we run `mem2reg` from the pass, the `alloca` and corresponding `store`s will be promoted and the `argpromotion` pass gets a chance to do another iteration. So that to run `mem2reg` from the pass makes sense. psamolysov:** I've started to remove the promotion (mem2reg) from the pass and found the following problem…
		nikicUnsubmitted Not Done Reply Inline Actions Ah, good point. So we do need to run promotion here. Unfortunately, I cannot get why the DTGetter should be used. As I get, AARGetter is a wrapper over the FAM.getResult<AAManager>(F) for the new PM and is just an instance of the LegacyAARGetter class for the legacy PM. It's goal is to give the correct AAR depending on the used pass manager. The general motivation is to cache the DT, so the next pass using it won't have to recompute it. But I don't know what complications the legacy pass manager introduces here. I didn't see any compile-time impact from your current code (though I expect this is mainly because argument promotion triggers rarely.) nikic: Ah, good point. So we do need to run promotion here. > Unfortunately, I cannot get why the…
		psamolysovAuthorUnsubmitted Done Reply Inline Actions When I tried to reuse the DominatorTree from the legacy pass manager, I got problems on some tests from polly. It looks like my problem was in that I built the Dominator Tree for the old function and tried to reuse it to promote the allocas in the new one. If the Dominator Tree is lazy built for the new function using the `DTGetter`, everything works for the new pass manager (but, as I remember, it worked even when I tried to pass the DT for the old function, I believe it worked because we actually don't change CFG). For the legacy PM, the situation is more dramatic because the `ArgPromotion` is a `CallGraphSCCPass` pass and not a `FunctionPass` one and this code: `getAnalysis<DominatorTreeWrapperPass>(F).getDomTree()` doesn't work as it is expected: the following error occurs on the Legacy PM: Unable to schedule 'Dominator Tree Construction' required by 'Promote 'by reference' arguments to scalars' Unable to schedule pass As a workaround, I just create a new `DominatorTree` instance from the `DTGetter` created from the legacy pass. The idea is stolen from the `lib/Analysis/MustExecute.cpp` file. Unfortunately, this requires to store all the created `DominatorTree` instances while the pass is running. psamolysov: When I tried to reuse the DominatorTree from the legacy pass manager, I got problems on some…
}		}

return NF;		return NF;
}		}

/// Return true if we can prove that all callees pass in a valid pointer for the		/// Return true if we can prove that all callees pass in a valid pointer for the
/// specified function argument.		/// specified function argument.
static bool allCallersPassValidPointerForArgument(Argument *Arg,		static bool allCallersPassValidPointerForArgument(Argument *Arg,
Align NeededAlign,		Align NeededAlign,
uint64_t NeededDerefBytes) {		uint64_t NeededDerefBytes) {
Function *Callee = Arg->getParent();		Function *Callee = Arg->getParent();
const DataLayout &DL = Callee->getParent()->getDataLayout();		const DataLayout &DL = Callee->getParent()->getDataLayout();
APInt Bytes(64, NeededDerefBytes);		APInt Bytes(64, NeededDerefBytes);

// Check if the argument itself is marked dereferenceable and aligned.		// Check if the argument itself is marked dereferenceable and aligned.
if (isDereferenceableAndAlignedPointer(Arg, NeededAlign, Bytes, DL))		if (isDereferenceableAndAlignedPointer(Arg, NeededAlign, Bytes, DL))
return true;		return true;

// Look at all call sites of the function. At this point we know we only have		// Look at all call sites of the function. At this point we know we only have
// direct callees.		// direct callees.
return all_of(Callee->users(), [&](User *U) {		return all_of(Callee->users(), [&](User *U) {
CallBase &CB = cast<CallBase>(*U);		CallBase &CB = cast<CallBase>(*U);
return isDereferenceableAndAlignedPointer(		return isDereferenceableAndAlignedPointer(CB.getArgOperand(Arg->getArgNo()),
CB.getArgOperand(Arg->getArgNo()), NeededAlign, Bytes, DL);		NeededAlign, Bytes, DL);
});		});
}		}

/// Determine that this argument is safe to promote, and find the argument		/// Determine that this argument is safe to promote, and find the argument
/// parts it can be promoted into.		/// parts it can be promoted into.
static bool findArgParts(Argument *Arg, const DataLayout &DL, AAResults &AAR,		static bool findArgParts(Argument *Arg, const DataLayout &DL, AAResults &AAR,
unsigned MaxElements, bool IsRecursive,		unsigned MaxElements, bool IsRecursive,
SmallVectorImpl<OffsetAndArgPart> &ArgPartsVec) {		SmallVectorImpl<OffsetAndArgPart> &ArgPartsVec) {
// Quick exit for unused arguments		// Quick exit for unused arguments
if (Arg->use_empty())		if (Arg->use_empty())
return true;		return true;

// We can only promote this argument if all of the uses are loads at known		// We can only promote this argument if all the uses are loads at known
// offsets.		// offsets.
//		//
// Promoting the argument causes it to be loaded in the caller		// Promoting the argument causes it to be loaded in the caller
// unconditionally. This is only safe if we can prove that either the load		// unconditionally. This is only safe if we can prove that either the load
// would have happened in the callee anyway (ie, there is a load in the entry		// would have happened in the callee anyway (ie, there is a load in the entry
// block) or the pointer passed in at every call site is guaranteed to be		// block) or the pointer passed in at every call site is guaranteed to be
// valid.		// valid.
// In the former case, invalid loads can happen, but would have happened		// In the former case, invalid loads can happen, but would have happened
// anyway, in the latter case, invalid loads won't happen. This prevents us		// anyway, in the latter case, invalid loads won't happen. This prevents us
// from introducing an invalid load that wouldn't have happened in the		// from introducing an invalid load that wouldn't have happened in the
// original code.		// original code.

SmallDenseMap<int64_t, ArgPart, 4> ArgParts;		SmallDenseMap<int64_t, ArgPart, 4> ArgParts;
Align NeededAlign(1);		Align NeededAlign(1);
uint64_t NeededDerefBytes = 0;		uint64_t NeededDerefBytes = 0;

// Returns None if this load is not based on the argument. Return true if		// And if this is a byval argument we also allow to have store instructions.
// we can promote the load, false otherwise.		// Only handle in such way arguments with specified alignment;
auto HandleLoad = [&](LoadInst *LI,		// if it's unspecified, the actual alignment of the argument is
		// target-specific.
		bool AreStoresAllowed = Arg->getParamByValType() && Arg->getParamAlign();
		nikicUnsubmitted Done Reply Inline Actions IsStoresAllowed -> AreStoresAllowed nikic: IsStoresAllowed -> AreStoresAllowed

		// An end user of a pointer argument is a load or store instruction.
		// Returns None if this load or store is not based on the argument. Return
		// true if we can promote the instruction, false otherwise.
		auto HandleEndUser = [&](auto I, Type Ty,
bool GuaranteedToExecute) -> Optional<bool> {		bool GuaranteedToExecute) -> Optional<bool> {
// Don't promote volatile or atomic loads.		// Don't promote volatile or atomic instructions.
if (!LI->isSimple())		if (!I->isSimple())
		nikicUnsubmitted Done Reply Inline Actions The "the" can be dropped here. nikic: The "the" can be dropped here.
return false;		return false;

Value *Ptr = LI->getPointerOperand();		Value *Ptr = I->getPointerOperand();
APInt Offset(DL.getIndexTypeSizeInBits(Ptr->getType()), 0);		APInt Offset(DL.getIndexTypeSizeInBits(Ptr->getType()), 0);
Ptr = Ptr->stripAndAccumulateConstantOffsets(DL, Offset,		Ptr = Ptr->stripAndAccumulateConstantOffsets(DL, Offset,
/* AllowNonInbounds */ true);		/* AllowNonInbounds */ true);
if (Ptr != Arg)		if (Ptr != Arg)
return None;		return None;

if (Offset.getSignificantBits() >= 64)		if (Offset.getSignificantBits() >= 64)
return false;		return false;

Type *Ty = LI->getType();
TypeSize Size = DL.getTypeStoreSize(Ty);		TypeSize Size = DL.getTypeStoreSize(Ty);
// Don't try to promote scalable types.		// Don't try to promote scalable types.
if (Size.isScalable())		if (Size.isScalable())
return false;		return false;

// If this is a recursive function and one of the types is a pointer,		// If this is a recursive function and one of the types is a pointer,
// then promoting it might lead to recursive promotion.		// then promoting it might lead to recursive promotion.
if (IsRecursive && Ty->isPointerTy())		if (IsRecursive && Ty->isPointerTy())
return false;		return false;

int64_t Off = Offset.getSExtValue();		int64_t Off = Offset.getSExtValue();
auto Pair = ArgParts.try_emplace(		auto Pair = ArgParts.try_emplace(
Off, ArgPart{Ty, LI->getAlign(), GuaranteedToExecute ? LI : nullptr});		Off, ArgPart{Ty, I->getAlign(), GuaranteedToExecute ? I : nullptr});
ArgPart &Part = Pair.first->second;		ArgPart &Part = Pair.first->second;
bool OffsetNotSeenBefore = Pair.second;		bool OffsetNotSeenBefore = Pair.second;

// We limit promotion to only promoting up to a fixed number of elements of		// We limit promotion to only promoting up to a fixed number of elements of
// the aggregate.		// the aggregate.
if (MaxElements > 0 && ArgParts.size() > MaxElements) {		if (MaxElements > 0 && ArgParts.size() > MaxElements) {
LLVM_DEBUG(dbgs() << "ArgPromotion of " << *Arg << " failed: "		LLVM_DEBUG(dbgs() << "ArgPromotion of " << *Arg << " failed: "
<< "more than " << MaxElements << " parts\n");		<< "more than " << MaxElements << " parts\n");
return false;		return false;
}		}

// For now, we only support loading one specific type at a given offset.		// For now, we only support loading/storing one specific type at a given
		// offset.
if (Part.Ty != Ty) {		if (Part.Ty != Ty) {
LLVM_DEBUG(dbgs() << "ArgPromotion of " << *Arg << " failed: "		LLVM_DEBUG(dbgs() << "ArgPromotion of " << *Arg << " failed: "
<< "loaded via both " << Part.Ty << " and " << Ty		<< "accessed as both " << Part.Ty << " and " << Ty
		nikicUnsubmitted Done Reply Inline Actions I would replaced this with "accessed as" to cover both. Otherwise it sounds like it's either both loads or both stores with conflicting types, but it can also be a mix. nikic: I would replaced this with "accessed as" to cover both. Otherwise it sounds like it's either…
		psamolysovAuthorUnsubmitted Done Reply Inline Actions Ups... I've forgotten that `ArgParts` collected all the promotable parts for loads as well as for stores and a mix is possible. psamolysov: Ups... I've forgotten that `ArgParts` collected all the promotable parts for loads as well as…
<< " at offset " << Off << "\n");		<< " at offset " << Off << "\n");
return false;		return false;
}		}

// If this load is not guaranteed to execute, and we haven't seen a load at		// If this instruction is not guaranteed to execute, and we haven't seen a
// this offset before (or it had lower alignment), then we need to remember		// load or store at this offset before (or it had lower alignment), then we
// that requirement.		// need to remember that requirement.
// Note that skipping loads of previously seen offsets is only correct		// Note that skipping instructions of previously seen offsets is only
// because we only allow a single type for a given offset, which also means		// correct because we only allow a single type for a given offset, which
// that the number of accessed bytes will be the same.		// also means that the number of accessed bytes will be the same.
if (!GuaranteedToExecute &&		if (!GuaranteedToExecute &&
(OffsetNotSeenBefore \|\| Part.Alignment < LI->getAlign())) {		(OffsetNotSeenBefore \|\| Part.Alignment < I->getAlign())) {
// We won't be able to prove dereferenceability for negative offsets.		// We won't be able to prove dereferenceability for negative offsets.
if (Off < 0)		if (Off < 0)
return false;		return false;

// If the offset is not aligned, an aligned base pointer won't help.		// If the offset is not aligned, an aligned base pointer won't help.
if (!isAligned(LI->getAlign(), Off))		if (!isAligned(I->getAlign(), Off))
return false;		return false;

NeededDerefBytes = std::max(NeededDerefBytes, Off + Size.getFixedValue());		NeededDerefBytes = std::max(NeededDerefBytes, Off + Size.getFixedValue());
NeededAlign = std::max(NeededAlign, LI->getAlign());		NeededAlign = std::max(NeededAlign, I->getAlign());
}		}

Part.Alignment = std::max(Part.Alignment, LI->getAlign());		Part.Alignment = std::max(Part.Alignment, I->getAlign());
return true;		return true;
};		};

// Look for loads that are guaranteed to execute on entry.		// Look for loads and stores that are guaranteed to execute on entry.
for (Instruction &I : Arg->getParent()->getEntryBlock()) {		for (Instruction &I : Arg->getParent()->getEntryBlock()) {
		Optional<bool> Res{};
if (LoadInst *LI = dyn_cast<LoadInst>(&I))		if (LoadInst *LI = dyn_cast<LoadInst>(&I))
if (Optional<bool> Res = HandleLoad(LI, /* GuaranteedToExecute */ true))		Res = HandleEndUser(LI, LI->getType(), /* GuaranteedToExecute */ true);
if (!*Res)		else if (StoreInst *SI = dyn_cast<StoreInst>(&I))
		nikicUnsubmitted Done Reply Inline Actions else if nikic: else if
		Res = HandleEndUser(SI, SI->getValueOperand()->getType(),
		/* GuaranteedToExecute */ true);
		if (Res && !*Res)
return false;		return false;

if (!isGuaranteedToTransferExecutionToSuccessor(&I))		if (!isGuaranteedToTransferExecutionToSuccessor(&I))
break;		break;
}		}

// Now look at all loads of the argument. Remember the load instructions		// Now look at all loads of the argument. Remember the load instructions
// for the aliasing check below.		// for the aliasing check below.
SmallVector<Value *, 16> Worklist;		SmallVector<const Use *, 16> Worklist;
SmallPtrSet<Value *, 16> Visited;		SmallPtrSet<const Use *, 16> Visited;
SmallVector<LoadInst *, 16> Loads;		SmallVector<LoadInst *, 16> Loads;
auto AppendUsers = [&](Value *V) {		auto AppendUses = [&](const Value *V) {
for (User *U : V->users())		for (const Use &U : V->uses())
if (Visited.insert(U).second)		if (Visited.insert(&U).second)
Worklist.push_back(U);		Worklist.push_back(&U);
};		};
AppendUsers(Arg);		AppendUses(Arg);
while (!Worklist.empty()) {		while (!Worklist.empty()) {
Value *V = Worklist.pop_back_val();		const Use *U = Worklist.pop_back_val();
		Value *V = U->getUser();
if (isa<BitCastInst>(V)) {		if (isa<BitCastInst>(V)) {
AppendUsers(V);		AppendUses(V);
continue;		continue;
		nikicUnsubmitted Done Reply Inline Actions Put `U->getUser()` into a variable, as it's used so much? nikic: Put `U->getUser()` into a variable, as it's used so much?
		psamolysovAuthorUnsubmitted Done Reply Inline Actions OK, thank you for the suggestion. psamolysov: OK, thank you for the suggestion.
}		}

if (auto *GEP = dyn_cast<GetElementPtrInst>(V)) {		if (auto *GEP = dyn_cast<GetElementPtrInst>(V)) {
if (!GEP->hasAllConstantIndices())		if (!GEP->hasAllConstantIndices())
return false;		return false;
AppendUsers(V);		AppendUses(V);
continue;		continue;
}		}

if (auto *LI = dyn_cast<LoadInst>(V)) {		if (auto *LI = dyn_cast<LoadInst>(V)) {
if (!HandleLoad(LI, / GuaranteedToExecute */ false))		if (!HandleEndUser(LI, LI->getType(), / GuaranteedToExecute */ false))
return false;		return false;
Loads.push_back(LI);		Loads.push_back(LI);
continue;		continue;
}		}

		// Stores are allowed for byval arguments
		auto *SI = dyn_cast<StoreInst>(V);
		if (AreStoresAllowed && SI &&
		nikicUnsubmitted Done Reply Inline Actions I think this works, but the robust way to check this is `U->getOperandNo() == SI->getPointerOperandIndex()`. (We will separately visit the use in the pointer and value operands, and only consider the pointer operand a known use, while the value operand will fall through to the bailout below.) nikic: I think this works, but the robust way to check this is `U->getOperandNo() == SI…
		psamolysovAuthorUnsubmitted Done Reply Inline Actions I've applied the suggestion. Thanks. psamolysov: I've applied the suggestion. Thanks.
		U->getOperandNo() == StoreInst::getPointerOperandIndex()) {
		nikicUnsubmitted Done Reply Inline Actions And with the previous change, I believe you can change this to `if (!HandleEndUser` like for LoadInst, as we show now be guaranteed that this is store based-on the argument. nikic:* And with the previous change, I believe you can change this to `if (!*HandleEndUser` like for…
		if (!*HandleEndUser(SI, SI->getValueOperand()->getType(),
		/* GuaranteedToExecute */ false))
		psamolysovAuthorUnsubmitted Done Reply Inline Actions I'm not aware about speculative stores. Should we handle the stores in the exactly same way as loads here, so should the parameter be set to `false`? psamolysov: I'm not aware about speculative stores. Should we handle the stores in the exactly same way as…
		nikicUnsubmitted Done Reply Inline Actions I think the parameter should be false here, because the store is not guaranteed to executed, and it would be confusing otherwise. The relevant part here is that we're not going to speculate any stores, so we don't care about the alignment at all -- we can explicitly skip the alignment code in HandleEndUser for stores to make this clear. nikic: I think the parameter should be false here, because the store is not guaranteed to executed…
		psamolysovAuthorUnsubmitted Done Reply Inline Actions Thank you for the suggestion. I've changed the argument's value to `true` and fix the alignment check in the `HandleEndUser` in order to let it work for loads only. Also, I've edited the comment before the check to make it clear that only loads are checked. Two questions, please. Currently `Part` can be either a load or a store previously saved in the `ArgParts` map by the same offset. Should the condition that the `MustExecInstr` in the seen before `ArgPart` is a load (or at least no a store) also be added? Also, when the `Part.Alignment` field is recalculated - `Part.Alignment = std::max(Part.Alignment, I->getAlign());`, should we do this whenever `I` is a load only or in any case? Thank you. psamolysov: Thank you for the suggestion. I've changed the argument's value to `true` and fix the…
		nikicUnsubmitted Done Reply Inline Actions Hm, so I guess it's not quite true that we don't speculate stores, in the sense that we do insert a store to store the argument into the alloca. Of course, as this store gets promoted later, the actual alignment doesn't really matter. It's probably still best to produce correct alignment for it. So I think the safest thing to do here is just not special case the store case at all -- or would that cause any regressions in tests? nikic: Hm, so I guess it's not quite true that we don't speculate stores, in the sense that we do…
		psamolysovAuthorUnsubmitted Done Reply Inline Actions No, it adds no regressions, so I've just remove the check that the instruction is a `load` and use the same check for `load` as well as `store` instructions and reformulated the comment a bit to cover both cases. psamolysov: No, it adds no regressions, so I've just remove the check that the instruction is a `load` and…
		nikicUnsubmitted Not Done Reply Inline Actions Though we could also explicitly set the align of the alloca to the same value, in which case those would always match. Probably a good idea to do that anyway. nikic: Though we could also explicitly set the align of the alloca to the same value, in which case…
		psamolysovAuthorUnsubmitted Not Done Reply Inline Actions set the align of the alloca to the same value... Should the value be `Pair.second.Alignment` as for the `store` into the alloca instruction on line 365? psamolysov: >> set the align of the alloca to the same value... Should the value be `Pair.second.
		nikicUnsubmitted Done Reply Inline Actions Yeah, that's what I had in mind. nikic: Yeah, that's what I had in mind.
		psamolysovAuthorUnsubmitted Done Reply Inline Actions Done. psamolysov: Done.
		return false;
		continue;
		// Only stores TO the argument is allowed, all the other stores are
		// unknown users
		}

		nikicUnsubmitted Done Reply Inline Actions What happens if the we're storing the argument into itself? `store ptr %arg, ptr %arg` or similar. I don't think your code handles this correct right now? It might be best to walk over `Use`s rather than `User`s, so we can check which operand the use is on. nikic: What happens if the we're storing the argument into itself? `store ptr %arg, ptr %arg` or…
		psamolysovAuthorUnsubmitted Done Reply Inline Actions Good catch! This is correct: when I tried `store ptr %arg, ptr %arg`, an access violation error in the `llvm::Instruction::eraseFromParent()` function occurred because in the dead instruction elimination loop in the `doPromotion` function, the `store` instruction handles twice. Thank you for the idea! I've rewritten the walk to use `Use`s instead of `User`s and to check that the value the `store` is the user of is not the `store`'s value operand. If so, this `store` is an "unknown user" because we can store something into the pointer but not the pointer's value into somewhere. A LIT test was also added. I tried to add a similar check for the `load` instructions (if the `load`'s pointer operand is exactly the value the `load` is the user of) and remove the check in `HandleEndUser` that `Ptr` is equal to `Arg`. Unfortunately, this makes no sense because before the walking, there is a loop where we try to handle every `load` instruction in the entry BB for every argument and this check actually checks whether the current `load` instruction is related to the current argument. So, the check is still required. psamolysov: Good catch! This is correct: when I tried `store ptr %arg, ptr %arg`, an access violation error…
// Unknown user.		// Unknown user.
LLVM_DEBUG(dbgs() << "ArgPromotion of " << *Arg << " failed: "		LLVM_DEBUG(dbgs() << "ArgPromotion of " << *Arg << " failed: "
<< "unknown user " << *V << "\n");		<< "unknown user " << *V << "\n");
return false;		return false;
}		}

if (NeededDerefBytes \|\| NeededAlign > 1) {		if (NeededDerefBytes \|\| NeededAlign > 1) {
// Try to prove a required deref / aligned requirement.		// Try to prove a required deref / aligned requirement.
if (!allCallersPassValidPointerForArgument(Arg, NeededAlign,		if (!allCallersPassValidPointerForArgument(Arg, NeededAlign,
NeededDerefBytes)) {		NeededDerefBytes)) {
LLVM_DEBUG(dbgs() << "ArgPromotion of " << *Arg << " failed: "		LLVM_DEBUG(dbgs() << "ArgPromotion of " << *Arg << " failed: "
<< "not dereferenceable or aligned\n");		<< "not dereferenceable or aligned\n");
return false;		return false;
}		}
}		}

if (ArgParts.empty())		if (ArgParts.empty())
return true; // No users, this is a dead argument.		return true; // No users, this is a dead argument.

// Sort parts by offset.		// Sort parts by offset.
append_range(ArgPartsVec, ArgParts);		append_range(ArgPartsVec, ArgParts);
sort(ArgPartsVec,		sort(ArgPartsVec,
[](const auto &A, const auto &B) { return A.first < B.first; });		[](const auto &A, const auto &B) { return A.first < B.first; });

// Make sure the parts are non-overlapping.		// Make sure the parts are non-overlapping.
nikicUnsubmitted Done Reply Inline Actions The first line of this comment can be kept. nikic: The first line of this comment can be kept.
// TODO: As we're doing pure load promotion here, overlap should be fine from
// a correctness perspective. Profitability is less obvious though.
int64_t Offset = ArgPartsVec[0].first;		int64_t Offset = ArgPartsVec[0].first;
		nikicUnsubmitted Done Reply Inline Actions I think it's safe to drop these TODOs -- this is not really compatible with the new promotion approach that uses separate allocas (we'd have to reconstruct the result from multiple allocas, which makes little sense). nikic: I think it's safe to drop these TODOs -- this is not really compatible with the new promotion…
		psamolysovAuthorUnsubmitted Done Reply Inline Actions I believe this comment was added because it wasn't clear for me whether stores can be speculative. Thank you for the answer, the situation is clear for now and I've removed my TODO comment as well as the previous one. psamolysov: I believe this comment was added because it wasn't clear for me whether stores can be…
for (const auto &Pair : ArgPartsVec) {		for (const auto &Pair : ArgPartsVec) {
if (Pair.first < Offset)		if (Pair.first < Offset)
return false; // Overlap with previous part.		return false; // Overlap with previous part.

Offset = Pair.first + DL.getTypeStoreSize(Pair.second.Ty);		Offset = Pair.first + DL.getTypeStoreSize(Pair.second.Ty);
}		}

		// If store instructions are allowed, the path from the entry of the function
		// to each load may be not free of instructions that potentially invalidate
		// the load, and this is an admissible situation.
		if (AreStoresAllowed)
		return true;

// Okay, now we know that the argument is only used by load instructions, and		// Okay, now we know that the argument is only used by load instructions, and
// it is safe to unconditionally perform all of them. Use alias analysis to		// it is safe to unconditionally perform all of them. Use alias analysis to
// check to see if the pointer is guaranteed to not be modified from entry of		// check to see if the pointer is guaranteed to not be modified from entry of
// the function to each of the load instructions.		// the function to each of the load instructions.

// Because there could be several/many load instructions, remember which		// Because there could be several/many load instructions, remember which
// blocks we know to be transparent to the load.		// blocks we know to be transparent to the load.
df_iterator_default_set<BasicBlock *, 16> TranspBlocks;		df_iterator_default_set<BasicBlock *, 16> TranspBlocks;
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	for (unsigned I = 0, E = StructTy->getNumElements(); I < E; ++I) {
if (StartPos != Layout->getElementOffsetInBits(I))		if (StartPos != Layout->getElementOffsetInBits(I))
return false;		return false;
StartPos += DL.getTypeAllocSizeInBits(ElTy);		StartPos += DL.getTypeAllocSizeInBits(ElTy);
}		}

return true;		return true;
}		}

/// Checks if the padding bytes of an argument could be accessed.
static bool canPaddingBeAccessed(Argument *Arg) {
assert(Arg->hasByValAttr());

// Track all the pointers to the argument to make sure they are not captured.
SmallPtrSet<Value *, 16> PtrValues;
PtrValues.insert(Arg);

// Track all of the stores.
SmallVector<StoreInst *, 16> Stores;

// Scan through the uses recursively to make sure the pointer is always used
// sanely.
SmallVector<Value *, 16> WorkList(Arg->users());
while (!WorkList.empty()) {
Value *V = WorkList.pop_back_val();
if (isa<GetElementPtrInst>(V) \|\| isa<PHINode>(V)) {
if (PtrValues.insert(V).second)
append_range(WorkList, V->users());
} else if (StoreInst *Store = dyn_cast<StoreInst>(V)) {
Stores.push_back(Store);
} else if (!isa<LoadInst>(V)) {
return true;
}
}

// Check to make sure the pointers aren't captured
for (StoreInst *Store : Stores)
if (PtrValues.count(Store->getValueOperand()))
return true;

return false;
}

/// Check if callers and callee agree on how promoted arguments would be		/// Check if callers and callee agree on how promoted arguments would be
/// passed.		/// passed.
static bool areTypesABICompatible(ArrayRef<Type *> Types, const Function &F,		static bool areTypesABICompatible(ArrayRef<Type *> Types, const Function &F,
const TargetTransformInfo &TTI) {		const TargetTransformInfo &TTI) {
return all_of(F.uses(), [&](const Use &U) {		return all_of(F.uses(), [&](const Use &U) {
CallBase *CB = dyn_cast<CallBase>(U.getUser());		CallBase *CB = dyn_cast<CallBase>(U.getUser());
if (!CB)		if (!CB)
return false;		return false;

const Function *Caller = CB->getCaller();		const Function *Caller = CB->getCaller();
const Function *Callee = CB->getCalledFunction();		const Function *Callee = CB->getCalledFunction();
return TTI.areTypesABICompatible(Caller, Callee, Types);		return TTI.areTypesABICompatible(Caller, Callee, Types);
});		});
}		}

/// PromoteArguments - This method checks the specified function to see if there		/// PromoteArguments - This method checks the specified function to see if there
/// are any promotable arguments and if it is safe to promote the function (for		/// are any promotable arguments and if it is safe to promote the function (for
/// example, all callers are direct). If safe to promote some arguments, it		/// example, all callers are direct). If safe to promote some arguments, it
/// calls the DoPromotion method.		/// calls the DoPromotion method.
static Function *		static Function *
promoteArguments(Function *F, function_ref<AAResults &(Function &F)> AARGetter,		promoteArguments(Function *F, function_ref<AAResults &(Function &F)> AARGetter,
		function_ref<DominatorTree &(Function &F)> DTGetter,
		function_ref<AssumptionCache *(Function &F)> ACGetter,
unsigned MaxElements,		unsigned MaxElements,
Optional<function_ref<void(CallBase &OldCS, CallBase &NewCS)>>		Optional<function_ref<void(CallBase &OldCS, CallBase &NewCS)>>
ReplaceCallSite,		ReplaceCallSite,
const TargetTransformInfo &TTI, bool IsRecursive) {		const TargetTransformInfo &TTI, bool IsRecursive) {
// Don't perform argument promotion for naked functions; otherwise we can end		// Don't perform argument promotion for naked functions; otherwise we can end
// up removing parameters that are seemingly 'not used' as they are referred		// up removing parameters that are seemingly 'not used' as they are referred
// to in the assembly.		// to in the assembly.
if(F->hasFnAttribute(Attribute::Naked))		if (F->hasFnAttribute(Attribute::Naked))
return nullptr;		return nullptr;

// Make sure that it is local to this module.		// Make sure that it is local to this module.
if (!F->hasLocalLinkage())		if (!F->hasLocalLinkage())
return nullptr;		return nullptr;

// Don't promote arguments for variadic functions. Adding, removing, or		// Don't promote arguments for variadic functions. Adding, removing, or
// changing non-pack parameters can change the classification of pack		// changing non-pack parameters can change the classification of pack
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	promoteArguments(Function *F, function_ref<AAResults &(Function &F)> AARGetter,

const DataLayout &DL = F->getParent()->getDataLayout();		const DataLayout &DL = F->getParent()->getDataLayout();

AAResults &AAR = AARGetter(*F);		AAResults &AAR = AARGetter(*F);

// Check to see which arguments are promotable. If an argument is promotable,		// Check to see which arguments are promotable. If an argument is promotable,
// add it to ArgsToPromote.		// add it to ArgsToPromote.
DenseMap<Argument *, SmallVector<OffsetAndArgPart, 4>> ArgsToPromote;		DenseMap<Argument *, SmallVector<OffsetAndArgPart, 4>> ArgsToPromote;
SmallPtrSet<Argument *, 8> ByValArgsToTransform;
for (Argument *PtrArg : PointerArgs) {		for (Argument *PtrArg : PointerArgs) {
// Replace sret attribute with noalias. This reduces register pressure by		// Replace sret attribute with noalias. This reduces register pressure by
// avoiding a register copy.		// avoiding a register copy.
if (PtrArg->hasStructRetAttr()) {		if (PtrArg->hasStructRetAttr()) {
unsigned ArgNo = PtrArg->getArgNo();		unsigned ArgNo = PtrArg->getArgNo();
F->removeParamAttr(ArgNo, Attribute::StructRet);		F->removeParamAttr(ArgNo, Attribute::StructRet);
F->addParamAttr(ArgNo, Attribute::NoAlias);		F->addParamAttr(ArgNo, Attribute::NoAlias);
for (Use &U : F->uses()) {		for (Use &U : F->uses()) {
CallBase &CB = cast<CallBase>(*U.getUser());		CallBase &CB = cast<CallBase>(*U.getUser());
CB.removeParamAttr(ArgNo, Attribute::StructRet);		CB.removeParamAttr(ArgNo, Attribute::StructRet);
CB.addParamAttr(ArgNo, Attribute::NoAlias);		CB.addParamAttr(ArgNo, Attribute::NoAlias);
}		}
}		}

// If we can promote the pointer to its value.		// If we can promote the pointer to its value.
SmallVector<OffsetAndArgPart, 4> ArgParts;		SmallVector<OffsetAndArgPart, 4> ArgParts;

if (findArgParts(PtrArg, DL, AAR, MaxElements, IsRecursive, ArgParts)) {		if (findArgParts(PtrArg, DL, AAR, MaxElements, IsRecursive, ArgParts)) {
SmallVector<Type *, 4> Types;		SmallVector<Type *, 4> Types;
for (const auto &Pair : ArgParts)		for (const auto &Pair : ArgParts)
Types.push_back(Pair.second.Ty);		Types.push_back(Pair.second.Ty);

if (areTypesABICompatible(Types, *F, TTI)) {		if (areTypesABICompatible(Types, *F, TTI)) {
ArgsToPromote.insert({PtrArg, std::move(ArgParts)});		ArgsToPromote.insert({PtrArg, std::move(ArgParts)});
continue;
}		}
}		}

// Otherwise, if this is a byval argument, and if the aggregate type is
// small, just pass the elements, which is always safe, if the passed value
// is densely packed or if we can prove the padding bytes are never
// accessed.
//
// Only handle arguments with specified alignment; if it's unspecified, the
// actual alignment of the argument is target-specific.
Type *ByValTy = PtrArg->getParamByValType();
bool IsSafeToPromote =
ByValTy && PtrArg->getParamAlign() &&
(ArgumentPromotionPass::isDenselyPacked(ByValTy, DL) \|\|
!canPaddingBeAccessed(PtrArg));
if (!IsSafeToPromote) {
LLVM_DEBUG(dbgs() << "ArgPromotion disables passing the elements of"
<< " the argument '" << PtrArg->getName()
<< "' because it is not safe.\n");
continue;
}
if (StructType *STy = dyn_cast<StructType>(ByValTy)) {
if (MaxElements > 0 && STy->getNumElements() > MaxElements) {
LLVM_DEBUG(dbgs() << "ArgPromotion disables passing the elements of"
<< " the argument '" << PtrArg->getName()
<< "' because it would require adding more"
<< " than " << MaxElements
<< " arguments to the function.\n");
continue;
}
SmallVector<Type *, 4> Types;
append_range(Types, STy->elements());

// If all the elements are single-value types, we can promote it.
bool AllSimple =
all_of(Types, [](Type *Ty) { return Ty->isSingleValueType(); });

// Safe to transform. Passing the elements as a scalar will allow sroa to
// hack on the new alloca we introduce.
if (AllSimple && areTypesABICompatible(Types, *F, TTI))
ByValArgsToTransform.insert(PtrArg);
}
}		}

// No promotable pointer arguments.		// No promotable pointer arguments.
if (ArgsToPromote.empty() && ByValArgsToTransform.empty())		if (ArgsToPromote.empty())
return nullptr;		return nullptr;

return doPromotion(F, ArgsToPromote, ByValArgsToTransform, ReplaceCallSite);		return doPromotion(F, DTGetter, ACGetter, ArgsToPromote, ReplaceCallSite);
}		}

PreservedAnalyses ArgumentPromotionPass::run(LazyCallGraph::SCC &C,		PreservedAnalyses ArgumentPromotionPass::run(LazyCallGraph::SCC &C,
CGSCCAnalysisManager &AM,		CGSCCAnalysisManager &AM,
LazyCallGraph &CG,		LazyCallGraph &CG,
CGSCCUpdateResult &UR) {		CGSCCUpdateResult &UR) {
bool Changed = false, LocalChange;		bool Changed = false, LocalChange;

Show All 10 Lines	for (LazyCallGraph::Node &N : C) {

// FIXME: This lambda must only be used with this function. We should		// FIXME: This lambda must only be used with this function. We should
// skip the lambda and just get the AA results directly.		// skip the lambda and just get the AA results directly.
auto AARGetter = [&](Function &F) -> AAResults & {		auto AARGetter = [&](Function &F) -> AAResults & {
assert(&F == &OldF && "Called with an unexpected function!");		assert(&F == &OldF && "Called with an unexpected function!");
return FAM.getResult<AAManager>(F);		return FAM.getResult<AAManager>(F);
};		};

const TargetTransformInfo &TTI = FAM.getResult<TargetIRAnalysis>(OldF);		auto DTGetter = [&](Function &F) -> DominatorTree & {
Function *NewF = promoteArguments(&OldF, AARGetter, MaxElements, None,		assert(&F != &OldF && "Called with the obsolete function!");
TTI, IsRecursive);		return FAM.getResult<DominatorTreeAnalysis>(F);
		};

		auto ACGetter = [&](Function &F) -> AssumptionCache * {
		assert(&F != &OldF && "Called with the obsolete function!");
		return &FAM.getResult<AssumptionAnalysis>(F);
		};

		const auto &TTI = FAM.getResult<TargetIRAnalysis>(OldF);
		Function *NewF = promoteArguments(&OldF, AARGetter, DTGetter, ACGetter,
		MaxElements, None, TTI, IsRecursive);
if (!NewF)		if (!NewF)
continue;		continue;
LocalChange = true;		LocalChange = true;

// Directly substitute the functions in the call graph. Note that this		// Directly substitute the functions in the call graph. Note that this
// requires the old function to be completely dead and completely		// requires the old function to be completely dead and completely
// replaced by the new function. It does no call graph updates, it merely		// replaced by the new function. It does no call graph updates, it merely
// swaps out the particular function mapped to a particular node in the		// swaps out the particular function mapped to a particular node in the
Show All 26 Lines

llvm/test/Transforms/ArgumentPromotion/attrs.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --scrub-attributes			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --scrub-attributes
	; RUN: opt < %s -passes=argpromotion -S \| FileCheck %s			; RUN: opt < %s -passes=argpromotion -S \| FileCheck %s

	%struct.ss = type { i32, i64 }			%struct.ss = type { i32, i64 }

	; Don't drop 'byval' on %X here.
	define internal void @f(%struct.ss* byval(%struct.ss) align 4 %b, i32* byval(i32) align 4 %X, i32 %i) nounwind {			define internal void @f(%struct.ss* byval(%struct.ss) align 4 %b, i32* byval(i32) align 4 %X, i32 %i) nounwind {
	; CHECK-LABEL: define {{[^@]+}}@f			; CHECK-LABEL: define {{[^@]+}}@f
	; CHECK-SAME: (i32 [[B_0:%.]], i64 [[B_1:%.]], i32* byval(i32) align 4 [[X:%.]], i32 [[I:%.]]) #[[ATTR0:[0-9]+]] {			; CHECK-SAME: (i32 [[B_0:%.]], i32 [[X:%.]], i32 [[I:%.*]]) #[[ATTR0:[0-9]+]] {
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[B:%.]] = alloca [[STRUCT_SS:%.]], align 4			; CHECK-NEXT: [[TEMP:%.*]] = add i32 [[B_0]], 1
	; CHECK-NEXT: [[DOT0:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[B]], i32 0, i32 0
	; CHECK-NEXT: store i32 [[B_0]], i32* [[DOT0]], align 4
	; CHECK-NEXT: [[DOT1:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[B]], i32 0, i32 1
	; CHECK-NEXT: store i64 [[B_1]], i64* [[DOT1]], align 4
	; CHECK-NEXT: [[TEMP:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[B]], i32 0, i32 0
	; CHECK-NEXT: [[TEMP1:%.]] = load i32, i32 [[TEMP]], align 4
	; CHECK-NEXT: [[TEMP2:%.*]] = add i32 [[TEMP1]], 1
	; CHECK-NEXT: store i32 [[TEMP2]], i32* [[TEMP]], align 4
	; CHECK-NEXT: store i32 0, i32* [[X]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:

	%temp = getelementptr %struct.ss, %struct.ss* %b, i32 0, i32 0			%temp = getelementptr %struct.ss, %struct.ss* %b, i32 0, i32 0
	%temp1 = load i32, i32* %temp, align 4			%temp1 = load i32, i32* %temp, align 4
	%temp2 = add i32 %temp1, 1			%temp2 = add i32 %temp1, 1
	store i32 %temp2, i32* %temp, align 4			store i32 %temp2, i32* %temp, align 4

	store i32 0, i32* %X			store i32 0, i32* %X
	ret void			ret void
	}			}

	; Also make sure we don't drop the call zeroext attribute.			; Also make sure we don't drop the call zeroext attribute.
	define i32 @test(i32* %X) {			define i32 @test(i32* %X) {
	; CHECK-LABEL: define {{[^@]+}}@test			; CHECK-LABEL: define {{[^@]+}}@test
	; CHECK-SAME: (i32* [[X:%.*]]) {			; CHECK-SAME: (i32* [[X:%.*]]) {
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[S:%.]] = alloca [[STRUCT_SS:%.]], align 8			; CHECK-NEXT: [[S:%.]] = alloca [[STRUCT_SS:%.]], align 8
	; CHECK-NEXT: [[TEMP1:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i32 0, i32 0			; CHECK-NEXT: [[TEMP1:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i32 0, i32 0
	; CHECK-NEXT: store i32 1, i32* [[TEMP1]], align 8			; CHECK-NEXT: store i32 1, i32* [[TEMP1]], align 8
	; CHECK-NEXT: [[TEMP4:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i32 0, i32 1			; CHECK-NEXT: [[TEMP4:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i32 0, i32 1
	; CHECK-NEXT: store i64 2, i64* [[TEMP4]], align 4			; CHECK-NEXT: store i64 2, i64* [[TEMP4]], align 4
	; CHECK-NEXT: [[S_0:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i32 0, i32 0			; CHECK-NEXT: [[S_0:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i64 0, i32 0
	; CHECK-NEXT: [[S_0_VAL:%.]] = load i32, i32 [[S_0]], align 4			; CHECK-NEXT: [[S_0_VAL:%.]] = load i32, i32 [[S_0]], align 4
	; CHECK-NEXT: [[S_1:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i32 0, i32 1			; CHECK-NEXT: [[X_VAL:%.]] = load i32, i32 [[X]], align 4
	; CHECK-NEXT: [[S_1_VAL:%.]] = load i64, i64 [[S_1]], align 4			; CHECK-NEXT: call void @f(i32 [[S_0_VAL]], i32 [[X_VAL]], i32 zeroext 0)
	; CHECK-NEXT: call void @f(i32 [[S_0_VAL]], i64 [[S_1_VAL]], i32* byval(i32) align 4 [[X]], i32 zeroext 0)
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	entry:			entry:
	%S = alloca %struct.ss			%S = alloca %struct.ss
	%temp1 = getelementptr %struct.ss, %struct.ss* %S, i32 0, i32 0			%temp1 = getelementptr %struct.ss, %struct.ss* %S, i32 0, i32 0
	store i32 1, i32* %temp1, align 8			store i32 1, i32* %temp1, align 8
	%temp4 = getelementptr %struct.ss, %struct.ss* %S, i32 0, i32 1			%temp4 = getelementptr %struct.ss, %struct.ss* %S, i32 0, i32 1
	store i64 2, i64* %temp4, align 4			store i64 2, i64* %temp4, align 4

	call void @f( %struct.ss* byval(%struct.ss) align 4 %S, i32* byval(i32) align 4 %X, i32 zeroext 0)			call void @f( %struct.ss* byval(%struct.ss) align 4 %S, i32* byval(i32) align 4 %X, i32 zeroext 0)

	ret i32 0			ret i32 0
	}			}

llvm/test/Transforms/ArgumentPromotion/byval-2.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --scrub-attributes			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --scrub-attributes
	; RUN: opt < %s -passes=argpromotion -S \| FileCheck %s			; RUN: opt < %s -passes=argpromotion -S \| FileCheck %s

	; Arg promotion eliminates the struct argument.			; Arg promotion eliminates the struct argument.
	; FIXME: We should eliminate the i32* argument.

	%struct.ss = type { i32, i64 }			%struct.ss = type { i32, i64 }

	define internal void @f(%struct.ss* byval(%struct.ss) align 8 %b, i32* byval(i32) align 4 %X) nounwind {			define internal void @f(%struct.ss* byval(%struct.ss) align 8 %b, i32* byval(i32) align 4 %X) nounwind {
	; CHECK-LABEL: define {{[^@]+}}@f			; CHECK-LABEL: define {{[^@]+}}@f
	; CHECK-SAME: (i32 [[B_0:%.]], i64 [[B_1:%.]], i32* byval(i32) align 4 [[X:%.*]]) #[[ATTR0:[0-9]+]] {			; CHECK-SAME: (i32 [[B_0:%.]], i32 [[X:%.]]) #[[ATTR0:[0-9]+]] {
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[B:%.]] = alloca [[STRUCT_SS:%.]], align 8			; CHECK-NEXT: [[TEMP:%.*]] = add i32 [[B_0]], 1
	; CHECK-NEXT: [[DOT0:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[B]], i32 0, i32 0
	; CHECK-NEXT: store i32 [[B_0]], i32* [[DOT0]], align 8
	; CHECK-NEXT: [[DOT1:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[B]], i32 0, i32 1
	; CHECK-NEXT: store i64 [[B_1]], i64* [[DOT1]], align 4
	; CHECK-NEXT: [[TEMP:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[B]], i32 0, i32 0
	; CHECK-NEXT: [[TEMP1:%.]] = load i32, i32 [[TEMP]], align 4
	; CHECK-NEXT: [[TEMP2:%.*]] = add i32 [[TEMP1]], 1
	; CHECK-NEXT: store i32 [[TEMP2]], i32* [[TEMP]], align 4
	; CHECK-NEXT: store i32 0, i32* [[X]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%temp = getelementptr %struct.ss, %struct.ss* %b, i32 0, i32 0			%temp = getelementptr %struct.ss, %struct.ss* %b, i32 0, i32 0
	%temp1 = load i32, i32* %temp, align 4			%temp1 = load i32, i32* %temp, align 4
	%temp2 = add i32 %temp1, 1			%temp2 = add i32 %temp1, 1
	store i32 %temp2, i32* %temp, align 4			store i32 %temp2, i32* %temp, align 4

	store i32 0, i32* %X			store i32 0, i32* %X
	ret void			ret void
	}			}

	define i32 @test(i32* %X) {			define i32 @test(i32* %X) {
	; CHECK-LABEL: define {{[^@]+}}@test			; CHECK-LABEL: define {{[^@]+}}@test
	; CHECK-SAME: (i32* [[X:%.*]]) {			; CHECK-SAME: (i32* [[X:%.*]]) {
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[S:%.]] = alloca [[STRUCT_SS:%.]], align 8			; CHECK-NEXT: [[S:%.]] = alloca [[STRUCT_SS:%.]], align 8
	; CHECK-NEXT: [[TEMP1:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i32 0, i32 0			; CHECK-NEXT: [[TEMP1:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i32 0, i32 0
	; CHECK-NEXT: store i32 1, i32* [[TEMP1]], align 8			; CHECK-NEXT: store i32 1, i32* [[TEMP1]], align 8
	; CHECK-NEXT: [[TEMP4:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i32 0, i32 1			; CHECK-NEXT: [[TEMP4:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i32 0, i32 1
	; CHECK-NEXT: store i64 2, i64* [[TEMP4]], align 4			; CHECK-NEXT: store i64 2, i64* [[TEMP4]], align 4
	; CHECK-NEXT: [[S_0:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i32 0, i32 0			; CHECK-NEXT: [[S_0:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i64 0, i32 0
	; CHECK-NEXT: [[S_0_VAL:%.]] = load i32, i32 [[S_0]], align 8			; CHECK-NEXT: [[S_0_VAL:%.]] = load i32, i32 [[S_0]], align 4
	; CHECK-NEXT: [[S_1:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i32 0, i32 1			; CHECK-NEXT: [[X_VAL:%.]] = load i32, i32 [[X]], align 4
	; CHECK-NEXT: [[S_1_VAL:%.]] = load i64, i64 [[S_1]], align 4			; CHECK-NEXT: call void @f(i32 [[S_0_VAL]], i32 [[X_VAL]])
	; CHECK-NEXT: call void @f(i32 [[S_0_VAL]], i64 [[S_1_VAL]], i32* byval(i32) align 4 [[X]])
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	entry:			entry:
	%S = alloca %struct.ss, align 8			%S = alloca %struct.ss, align 8
	%temp1 = getelementptr %struct.ss, %struct.ss* %S, i32 0, i32 0			%temp1 = getelementptr %struct.ss, %struct.ss* %S, i32 0, i32 0
	store i32 1, i32* %temp1, align 8			store i32 1, i32* %temp1, align 8
	%temp4 = getelementptr %struct.ss, %struct.ss* %S, i32 0, i32 1			%temp4 = getelementptr %struct.ss, %struct.ss* %S, i32 0, i32 1
	store i64 2, i64* %temp4, align 4			store i64 2, i64* %temp4, align 4
	call void @f( %struct.ss* byval(%struct.ss) align 8 %S, i32* byval(i32) align 4 %X)			call void @f( %struct.ss* byval(%struct.ss) align 8 %S, i32* byval(i32) align 4 %X)
	ret i32 0			ret i32 0
	}			}

llvm/test/Transforms/ArgumentPromotion/byval-through-pointer-promotion.ll

This file was moved to llvm/test/Transforms/ArgumentPromotion/byval-with-padding.ll.

llvm/test/Transforms/ArgumentPromotion/byval-with-padding.ll

This file was moved from llvm/test/Transforms/ArgumentPromotion/byval-through-pointer-promotion.ll.

The contents of this file were not changed.

llvm/test/Transforms/ArgumentPromotion/byval.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --scrub-attributes			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --scrub-attributes
	; RUN: opt < %s -passes=argpromotion -S \| FileCheck %s			; RUN: opt < %s -passes=argpromotion -S \| FileCheck %s

	target datalayout = "E-p:64:64:64-a0:0:8-f32:32:32-f64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-v64:64:64-v128:128:128"			target datalayout = "E-p:64:64:64-a0:0:8-f32:32:32-f64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-v64:64:64-v128:128:128"

	%struct.ss = type { i32, i64 }			%struct.ss = type { i32, i64 }

	define internal void @f(%struct.ss* byval(%struct.ss) align 4 %b) nounwind {			define internal void @f(%struct.ss* byval(%struct.ss) align 4 %b) nounwind {
	; CHECK-LABEL: define {{[^@]+}}@f			; CHECK-LABEL: define {{[^@]+}}@f
	; CHECK-SAME: (i32 [[B_0:%.]], i64 [[B_1:%.]]) #[[ATTR0:[0-9]+]] {			; CHECK-SAME: (i32 [[B_0:%.*]]) #[[ATTR0:[0-9]+]] {
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[B:%.]] = alloca [[STRUCT_SS:%.]], align 4			; CHECK-NEXT: [[TEMP:%.*]] = add i32 [[B_0]], 1
	; CHECK-NEXT: [[DOT0:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[B]], i32 0, i32 0
	; CHECK-NEXT: store i32 [[B_0]], i32* [[DOT0]], align 4
	; CHECK-NEXT: [[DOT1:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[B]], i32 0, i32 1
	; CHECK-NEXT: store i64 [[B_1]], i64* [[DOT1]], align 4
	; CHECK-NEXT: [[TEMP:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[B]], i32 0, i32 0
	; CHECK-NEXT: [[TEMP1:%.]] = load i32, i32 [[TEMP]], align 4
	; CHECK-NEXT: [[TEMP2:%.*]] = add i32 [[TEMP1]], 1
	; CHECK-NEXT: store i32 [[TEMP2]], i32* [[TEMP]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%temp = getelementptr %struct.ss, %struct.ss* %b, i32 0, i32 0			%temp = getelementptr %struct.ss, %struct.ss* %b, i32 0, i32 0
	%temp1 = load i32, i32* %temp, align 4			%temp1 = load i32, i32* %temp, align 4
	%temp2 = add i32 %temp1, 1			%temp2 = add i32 %temp1, 1
	store i32 %temp2, i32* %temp, align 4			store i32 %temp2, i32* %temp, align 4
	ret void			ret void
	}			}


	define internal void @g(%struct.ss* byval(%struct.ss) align 32 %b) nounwind {			define internal void @g(%struct.ss* byval(%struct.ss) align 32 %b) nounwind {
	; CHECK-LABEL: define {{[^@]+}}@g			; CHECK-LABEL: define {{[^@]+}}@g
	; CHECK-SAME: (i32 [[B_0:%.]], i64 [[B_1:%.]]) #[[ATTR0]] {			; CHECK-SAME: (i32 [[B_0:%.*]]) #[[ATTR0]] {
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[B:%.]] = alloca [[STRUCT_SS:%.]], align 32			; CHECK-NEXT: [[TEMP:%.*]] = add i32 [[B_0]], 1
	; CHECK-NEXT: [[DOT0:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[B]], i32 0, i32 0
	; CHECK-NEXT: store i32 [[B_0]], i32* [[DOT0]], align 32
	; CHECK-NEXT: [[DOT1:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[B]], i32 0, i32 1
	; CHECK-NEXT: store i64 [[B_1]], i64* [[DOT1]], align 4
	; CHECK-NEXT: [[TEMP:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[B]], i32 0, i32 0
	; CHECK-NEXT: [[TEMP1:%.]] = load i32, i32 [[TEMP]], align 4
	; CHECK-NEXT: [[TEMP2:%.*]] = add i32 [[TEMP1]], 1
	; CHECK-NEXT: store i32 [[TEMP2]], i32* [[TEMP]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%temp = getelementptr %struct.ss, %struct.ss* %b, i32 0, i32 0			%temp = getelementptr %struct.ss, %struct.ss* %b, i32 0, i32 0
	%temp1 = load i32, i32* %temp, align 4			%temp1 = load i32, i32* %temp, align 4
	%temp2 = add i32 %temp1, 1			%temp2 = add i32 %temp1, 1
	store i32 %temp2, i32* %temp, align 4			store i32 %temp2, i32* %temp, align 4
	ret void			ret void
	Show All 17 Lines
	entry:			entry:
	%temp = getelementptr %struct.ss, %struct.ss* %b, i32 0, i32 0			%temp = getelementptr %struct.ss, %struct.ss* %b, i32 0, i32 0
	%temp1 = load i32, i32* %temp, align 4			%temp1 = load i32, i32* %temp, align 4
	%temp2 = add i32 %temp1, 1			%temp2 = add i32 %temp1, 1
	store i32 %temp2, i32* %temp, align 4			store i32 %temp2, i32* %temp, align 4
	ret void			ret void
	}			}

				; Transform even if an argument is written to and then is loaded from.
				define internal void @k(%struct.ss* byval(%struct.ss) align 4 %b) nounwind {
				; CHECK-LABEL: define {{[^@]+}}@k
				; CHECK-SAME: (i32 [[B_0:%.*]]) #[[ATTR0]] {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TEMP:%.*]] = add i32 [[B_0]], 1
				; CHECK-NEXT: ret void
				;
				entry:
				%temp = getelementptr %struct.ss, %struct.ss* %b, i32 0, i32 0
				%temp1 = load i32, i32* %temp, align 4
				%temp2 = add i32 %temp1, 1
				store i32 %temp2, i32* %temp, align 4
				%temp3 = load i32, i32* %temp, align 4
				ret void
				}

				; Transform even if a store instruction is the single user.
				define internal void @l(%struct.ss* byval(%struct.ss) align 4 %b) nounwind {
				; CHECK-LABEL: define {{[^@]+}}@l
				; CHECK-SAME: (i32 [[B_0:%.*]]) #[[ATTR0]] {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: ret void
				;
				entry:
				%temp = getelementptr %struct.ss, %struct.ss* %b, i32 0, i32 0
				store i32 1, i32* %temp, align 4
				ret void
				}

				; Transform all the arguments creating the required number of 'alloca's and
				; then optimize them out.
				define internal void @m(%struct.ss* byval(%struct.ss) align 4 %b, %struct.ss* byval(%struct.ss) align 4 %c) nounwind {
				; CHECK-LABEL: define {{[^@]+}}@m
				; CHECK-SAME: (i32 [[B_0:%.]], i32 [[C_0:%.]], i64 [[C_1:%.*]]) #[[ATTR0]] {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TEMP2:%.*]] = add i32 [[B_0]], 1
				; CHECK-NEXT: [[TEMP6:%.*]] = add i64 [[C_1]], 1
				; CHECK-NEXT: ret void
				;
				entry:
				%temp = getelementptr %struct.ss, %struct.ss* %b, i32 0, i32 0
				%temp1 = load i32, i32* %temp, align 4
				%temp2 = add i32 %temp1, 1
				store i32 %temp2, i32* %temp, align 4

				%temp3 = getelementptr %struct.ss, %struct.ss* %c, i32 0, i32 0
				store i32 %temp2, i32* %temp3, align 4

				%temp4 = getelementptr %struct.ss, %struct.ss* %c, i32 0, i32 1
				%temp5 = load i64, i64* %temp4, align 8
				%temp6 = add i64 %temp5, 1
				store i64 %temp6, i64* %temp4, align 8

				ret void
				}

	define i32 @main() nounwind {			define i32 @main() nounwind {
	; CHECK-LABEL: define {{[^@]+}}@main			; CHECK-LABEL: define {{[^@]+}}@main
	; CHECK-SAME: () #[[ATTR0]] {			; CHECK-SAME: () #[[ATTR0]] {
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[S:%.]] = alloca [[STRUCT_SS:%.]], align 32			; CHECK-NEXT: [[S:%.]] = alloca [[STRUCT_SS:%.]], align 32
	; CHECK-NEXT: [[TEMP1:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i32 0, i32 0			; CHECK-NEXT: [[TEMP1:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i32 0, i32 0
	; CHECK-NEXT: store i32 1, i32* [[TEMP1]], align 8			; CHECK-NEXT: store i32 1, i32* [[TEMP1]], align 8
	; CHECK-NEXT: [[TEMP4:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i32 0, i32 1			; CHECK-NEXT: [[TEMP4:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i32 0, i32 1
	; CHECK-NEXT: store i64 2, i64* [[TEMP4]], align 4			; CHECK-NEXT: store i64 2, i64* [[TEMP4]], align 4
	; CHECK-NEXT: [[S_0:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i32 0, i32 0			; CHECK-NEXT: [[S_0_0_0:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i64 0, i32 0
	; CHECK-NEXT: [[S_0_VAL:%.]] = load i32, i32 [[S_0]], align 4			; CHECK-NEXT: [[S_0_0_0_VAL:%.]] = load i32, i32 [[S_0_0_0]], align 4
	; CHECK-NEXT: [[S_1:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i32 0, i32 1			; CHECK-NEXT: call void @f(i32 [[S_0_0_0_VAL]])
	; CHECK-NEXT: [[S_1_VAL:%.]] = load i64, i64 [[S_1]], align 4			; CHECK-NEXT: [[S_1_0_0:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i64 0, i32 0
	; CHECK-NEXT: call void @f(i32 [[S_0_VAL]], i64 [[S_1_VAL]])			; CHECK-NEXT: [[S_1_0_0_VAL:%.]] = load i32, i32 [[S_1_0_0]], align 4
	; CHECK-NEXT: [[S_01:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i32 0, i32 0			; CHECK-NEXT: call void @g(i32 [[S_1_0_0_VAL]])
	; CHECK-NEXT: [[S_01_VAL:%.]] = load i32, i32 [[S_01]], align 32
	; CHECK-NEXT: [[S_12:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i32 0, i32 1
	; CHECK-NEXT: [[S_12_VAL:%.]] = load i64, i64 [[S_12]], align 4
	; CHECK-NEXT: call void @g(i32 [[S_01_VAL]], i64 [[S_12_VAL]])
	; CHECK-NEXT: call void @h(%struct.ss* byval([[STRUCT_SS]]) [[S]])			; CHECK-NEXT: call void @h(%struct.ss* byval([[STRUCT_SS]]) [[S]])
				; CHECK-NEXT: [[S_2_0_0:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i64 0, i32 0
				; CHECK-NEXT: [[S_2_0_0_VAL:%.]] = load i32, i32 [[S_2_0_0]], align 4
				; CHECK-NEXT: call void @k(i32 [[S_2_0_0_VAL]])
				; CHECK-NEXT: [[S_3_0_0:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i64 0, i32 0
				; CHECK-NEXT: [[S_3_0_0_VAL:%.]] = load i32, i32 [[S_3_0_0]], align 4
				; CHECK-NEXT: call void @l(i32 [[S_3_0_0_VAL]])
				; CHECK-NEXT: [[S_4_0_0:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i64 0, i32 0
				; CHECK-NEXT: [[S_4_0_0_VAL:%.]] = load i32, i32 [[S_4_0_0]], align 4
				; CHECK-NEXT: [[S_4_1_0:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i64 0, i32 0
				; CHECK-NEXT: [[S_4_1_0_VAL:%.]] = load i32, i32 [[S_4_1_0]], align 4
				; CHECK-NEXT: [[S_4_1_1:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i64 0, i32 1
				; CHECK-NEXT: [[S_4_1_1_VAL:%.]] = load i64, i64 [[S_4_1_1]], align 8
				; CHECK-NEXT: call void @m(i32 [[S_4_0_0_VAL]], i32 [[S_4_1_0_VAL]], i64 [[S_4_1_1_VAL]])
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	entry:			entry:
	%S = alloca %struct.ss, align 32			%S = alloca %struct.ss, align 32
	%temp1 = getelementptr %struct.ss, %struct.ss* %S, i32 0, i32 0			%temp1 = getelementptr %struct.ss, %struct.ss* %S, i32 0, i32 0
	store i32 1, i32* %temp1, align 8			store i32 1, i32* %temp1, align 8
	%temp4 = getelementptr %struct.ss, %struct.ss* %S, i32 0, i32 1			%temp4 = getelementptr %struct.ss, %struct.ss* %S, i32 0, i32 1
	store i64 2, i64* %temp4, align 4			store i64 2, i64* %temp4, align 4
	call void @f(%struct.ss* byval(%struct.ss) align 4 %S) nounwind			call void @f(%struct.ss* byval(%struct.ss) align 4 %S) nounwind
	call void @g(%struct.ss* byval(%struct.ss) align 32 %S) nounwind			call void @g(%struct.ss* byval(%struct.ss) align 32 %S) nounwind
	call void @h(%struct.ss* byval(%struct.ss) %S) nounwind			call void @h(%struct.ss* byval(%struct.ss) %S) nounwind
				call void @k(%struct.ss* byval(%struct.ss) align 4 %S) nounwind
				call void @l(%struct.ss* byval(%struct.ss) align 4 %S) nounwind
				call void @m(%struct.ss* byval(%struct.ss) align 4 %S, %struct.ss* byval(%struct.ss) align 4 %S) nounwind
	ret i32 0			ret i32 0
	}			}

llvm/test/Transforms/ArgumentPromotion/dbg.ll

Show All 11 Lines	;
%1 = load i32, i32* %X, align 8		%1 = load i32, i32* %X, align 8
%2 = load i32, i32* %1, align 8		%2 = load i32, i32* %1, align 8
call void @sink(i32 %2)		call void @sink(i32 %2)
ret void		ret void
}		}

%struct.pair = type { i32, i32 }		%struct.pair = type { i32, i32 }

		; Do not promote because there is a store of the pointer %P itself. Even if %P
		; had been promoted as a byval argument, the result would have been not
		; optimizable for SROA.
define internal void @test_byval(%struct.pair* byval(%struct.pair) align 4 %P) {		define internal void @test_byval(%struct.pair* byval(%struct.pair) align 4 %P) {
; CHECK-LABEL: define {{[^@]+}}@test_byval		; CHECK-LABEL: define {{[^@]+}}@test_byval
; CHECK-SAME: (i32 [[P_0:%.]], i32 [[P_1:%.]]) {		; CHECK-SAME: ([[STRUCT_PAIR:%.]] byval([[STRUCT_PAIR]]) align 4 [[P:%.*]]) {
; CHECK-NEXT: [[P:%.]] = alloca [[STRUCT_PAIR:%.]], align 4
; CHECK-NEXT: [[DOT0:%.]] = getelementptr [[STRUCT_PAIR]], [[STRUCT_PAIR]] [[P]], i32 0, i32 0
; CHECK-NEXT: store i32 [[P_0]], i32* [[DOT0]], align 4
; CHECK-NEXT: [[DOT1:%.]] = getelementptr [[STRUCT_PAIR]], [[STRUCT_PAIR]] [[P]], i32 0, i32 1
; CHECK-NEXT: store i32 [[P_1]], i32* [[DOT1]], align 4
; CHECK-NEXT: [[SINK:%.]] = alloca i32, align 8		; CHECK-NEXT: [[SINK:%.]] = alloca i32, align 8
; CHECK-NEXT: [[DOT2:%.]] = getelementptr [[STRUCT_PAIR]], [[STRUCT_PAIR]] [[P]], i32 0, i32 0		; CHECK-NEXT: [[TEMP:%.]] = getelementptr [[STRUCT_PAIR]], [[STRUCT_PAIR]] [[P]], i32 0, i32 0
; CHECK-NEXT: store i32* [[DOT2]], i32** [[SINK]], align 8		; CHECK-NEXT: store i32* [[TEMP]], i32** [[SINK]], align 8
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%1 = alloca i32*, align 8		%1 = alloca i32*, align 8
%2 = getelementptr %struct.pair, %struct.pair* %P, i32 0, i32 0		%2 = getelementptr %struct.pair, %struct.pair* %P, i32 0, i32 0
store i32* %2, i32** %1, align 8 ; to protect from "usual" promotion		store i32* %2, i32** %1, align 8 ; to protect from promotion
ret void		ret void
}		}

define void @caller(i32** %Y, %struct.pair* %P) {		define void @caller(i32** %Y, %struct.pair* %P) {
; CHECK-LABEL: define {{[^@]+}}@caller		; CHECK-LABEL: define {{[^@]+}}@caller
; CHECK-SAME: (i32** [[Y:%.]], %struct.pair [[P:%.*]]) {		; CHECK-SAME: (i32** [[Y:%.]], %struct.pair [[P:%.*]]) {
; CHECK-NEXT: [[Y_VAL:%.]] = load i32, i32** [[Y]], align 8, !dbg [[DBG4:![0-9]+]]		; CHECK-NEXT: [[Y_VAL:%.]] = load i32, i32** [[Y]], align 8, !dbg [[DBG4:![0-9]+]]
; CHECK-NEXT: [[Y_VAL_VAL:%.]] = load i32, i32 [[Y_VAL]], align 8, !dbg [[DBG4]]		; CHECK-NEXT: [[Y_VAL_VAL:%.]] = load i32, i32 [[Y_VAL]], align 8, !dbg [[DBG4]]
; CHECK-NEXT: call void @test(i32 [[Y_VAL_VAL]]), !dbg [[DBG4]]		; CHECK-NEXT: call void @test(i32 [[Y_VAL_VAL]]), !dbg [[DBG4]]
; CHECK-NEXT: [[P_0:%.]] = getelementptr [[STRUCT_PAIR:%.]], %struct.pair* [[P]], i32 0, i32 0, !dbg [[DBG5:![0-9]+]]		; CHECK-NEXT: call void @test_byval([[STRUCT_PAIR]]* byval([[STRUCT_PAIR]]) align 4 [[P]]), !dbg [[DBG5:![0-9]+]]
; CHECK-NEXT: [[P_0_VAL:%.]] = load i32, i32 [[P_0]], align 4, !dbg [[DBG5]]
; CHECK-NEXT: [[P_1:%.]] = getelementptr [[STRUCT_PAIR]], %struct.pair [[P]], i32 0, i32 1, !dbg [[DBG5]]
; CHECK-NEXT: [[P_1_VAL:%.]] = load i32, i32 [[P_1]], align 4, !dbg [[DBG5]]
; CHECK-NEXT: call void @test_byval(i32 [[P_0_VAL]], i32 [[P_1_VAL]]), !dbg [[DBG5]]
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
call void @test(i32** %Y), !dbg !1		call void @test(i32** %Y), !dbg !1

call void @test_byval(%struct.pair* byval(%struct.pair) align 4 %P), !dbg !6		call void @test_byval(%struct.pair* byval(%struct.pair) align 4 %P), !dbg !6
ret void		ret void
}		}

Show All 10 Lines

llvm/test/Transforms/ArgumentPromotion/fp80.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --scrub-attributes			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --scrub-attributes
	; RUN: opt < %s -passes=argpromotion -S \| FileCheck %s			; RUN: opt < %s -passes=argpromotion -S \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	%union.u = type { x86_fp80 }			%union.u = type { x86_fp80 }
	%struct.s = type { double, i16, i8, [5 x i8] }			%struct.s = type { double, i16, i8, [5 x i8] }

	@b = internal global %struct.s { double 3.14, i16 9439, i8 25, [5 x i8] undef }, align 16			@b = internal global %struct.s { double 3.14, i16 9439, i8 25, [5 x i8] undef }, align 16

	%struct.Foo = type { i32, i64 }			%struct.Foo = type { i32, i64 }
	@a = internal global %struct.Foo { i32 1, i64 2 }, align 8			@a = internal global %struct.Foo { i32 1, i64 2 }, align 8

	define void @run() {			define void @run() {
	; CHECK-LABEL: define {{[^@]+}}@run() {			; CHECK-LABEL: define {{[^@]+}}@run() {
	; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = bitcast %union.u bitcast (%struct.s* @b to %union.u) to i8			; CHECK-NEXT: [[TMP0:%.]] = bitcast %union.u bitcast (%struct.s* @b to %union.u) to i8
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr i8, i8 [[TMP0]], i64 10			; CHECK-NEXT: [[TMP1:%.]] = getelementptr i8, i8 [[TMP0]], i64 10
	; CHECK-NEXT: [[DOTVAL:%.]] = load i8, i8 [[TMP1]], align 1			; CHECK-NEXT: [[DOTVAL:%.]] = load i8, i8 [[TMP1]], align 1
	; CHECK-NEXT: [[TMP2:%.*]] = tail call i8 @UseLongDoubleUnsafely(i8 [[DOTVAL]])			; CHECK-NEXT: [[TMP2:%.*]] = tail call i8 @UseLongDoubleUnsafely(i8 [[DOTVAL]])
	; CHECK-NEXT: [[DOT0:%.]] = getelementptr [[UNION_U:%.]], %union.u* bitcast (%struct.s* @b to %union.u*), i32 0, i32 0			; CHECK-NEXT: [[DOT0:%.]] = getelementptr [[UNION_U:%.]], %union.u* bitcast (%struct.s* @b to %union.u*), i64 0, i32 0
	; CHECK-NEXT: [[DOT0_VAL:%.]] = load x86_fp80, x86_fp80 [[DOT0]], align 16			; CHECK-NEXT: [[DOT0_VAL:%.]] = load x86_fp80, x86_fp80 [[DOT0]], align 16
	; CHECK-NEXT: [[TMP3:%.*]] = tail call x86_fp80 @UseLongDoubleSafely(x86_fp80 [[DOT0_VAL]])			; CHECK-NEXT: [[TMP3:%.*]] = tail call x86_fp80 @UseLongDoubleSafely(x86_fp80 [[DOT0_VAL]])
	; CHECK-NEXT: [[TMP4:%.]] = bitcast %struct.Foo @a to i64*			; CHECK-NEXT: [[TMP4:%.]] = tail call x86_fp80 @UseLongDoubleSafelyNoPromotion(%union.u byval(%union.u) align 16 bitcast (%struct.s* @b to %union.u*))
	; CHECK-NEXT: [[A_VAL:%.]] = load i64, i64 [[TMP4]], align 8			; CHECK-NEXT: [[TMP5:%.]] = bitcast %struct.Foo @a to i64*
	; CHECK-NEXT: [[TMP5:%.*]] = call i64 @AccessPaddingOfStruct(i64 [[A_VAL]])			; CHECK-NEXT: [[A_VAL:%.]] = load i64, i64 [[TMP5]], align 8
	; CHECK-NEXT: [[TMP6:%.]] = call i64 @CaptureAStruct(%struct.Foo byval([[STRUCT_FOO:%.*]]) @a)			; CHECK-NEXT: [[TMP6:%.*]] = call i64 @AccessPaddingOfStruct(i64 [[A_VAL]])
				; CHECK-NEXT: [[TMP7:%.]] = call i64 @CaptureAStruct(%struct.Foo byval([[STRUCT_FOO:%.*]]) @a)
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:
	tail call i8 @UseLongDoubleUnsafely(%union.u* byval(%union.u) align 16 bitcast (%struct.s* @b to %union.u*))			tail call i8 @UseLongDoubleUnsafely(%union.u* byval(%union.u) align 16 bitcast (%struct.s* @b to %union.u*))
	tail call x86_fp80 @UseLongDoubleSafely(%union.u* byval(%union.u) align 16 bitcast (%struct.s* @b to %union.u*))			tail call x86_fp80 @UseLongDoubleSafely(%union.u* byval(%union.u) align 16 bitcast (%struct.s* @b to %union.u*))
				tail call x86_fp80 @UseLongDoubleSafelyNoPromotion(%union.u* byval(%union.u) align 16 bitcast (%struct.s* @b to %union.u*))
	call i64 @AccessPaddingOfStruct(%struct.Foo* byval(%struct.Foo) @a)			call i64 @AccessPaddingOfStruct(%struct.Foo* byval(%struct.Foo) @a)
	call i64 @CaptureAStruct(%struct.Foo* byval(%struct.Foo) @a)			call i64 @CaptureAStruct(%struct.Foo* byval(%struct.Foo) @a)
	ret void			ret void
	}			}

	define internal i8 @UseLongDoubleUnsafely(%union.u* byval(%union.u) align 16 %arg) {			define internal i8 @UseLongDoubleUnsafely(%union.u* byval(%union.u) align 16 %arg) {
	; CHECK-LABEL: define {{[^@]+}}@UseLongDoubleUnsafely			; CHECK-LABEL: define {{[^@]+}}@UseLongDoubleUnsafely
	; CHECK-SAME: (i8 [[ARG_10_VAL:%.*]]) {			; CHECK-SAME: (i8 [[ARG_0_VAL:%.*]]) {
	; CHECK-NEXT: entry:			; CHECK-NEXT: ret i8 [[ARG_0_VAL]]
	; CHECK-NEXT: ret i8 [[ARG_10_VAL]]
	;			;
	entry:
	%bitcast = bitcast %union.u* %arg to %struct.s*			%bitcast = bitcast %union.u* %arg to %struct.s*
	%gep = getelementptr inbounds %struct.s, %struct.s* %bitcast, i64 0, i32 2			%gep = getelementptr inbounds %struct.s, %struct.s* %bitcast, i64 0, i32 2
	%result = load i8, i8* %gep			%result = load i8, i8* %gep
	ret i8 %result			ret i8 %result
	}			}

	define internal x86_fp80 @UseLongDoubleSafely(%union.u* byval(%union.u) align 16 %arg) {			define internal x86_fp80 @UseLongDoubleSafely(%union.u* byval(%union.u) align 16 %arg) {
	; CHECK-LABEL: define {{[^@]+}}@UseLongDoubleSafely			; CHECK-LABEL: define {{[^@]+}}@UseLongDoubleSafely
	; CHECK-SAME: (x86_fp80 [[ARG_0:%.*]]) {			; CHECK-SAME: (x86_fp80 [[ARG_0_VAL:%.*]]) {
	; CHECK-NEXT: [[ARG:%.]] = alloca [[UNION_U:%.]], align 16			; CHECK-NEXT: ret x86_fp80 [[ARG_0_VAL]]
	; CHECK-NEXT: [[DOT0:%.]] = getelementptr [[UNION_U]], [[UNION_U]] [[ARG]], i32 0, i32 0			;
	; CHECK-NEXT: store x86_fp80 [[ARG_0]], x86_fp80* [[DOT0]], align 16			%gep = getelementptr inbounds %union.u, %union.u* %arg, i64 0, i32 0
				%fp80 = load x86_fp80, x86_fp80* %gep
				ret x86_fp80 %fp80
				}

				define internal x86_fp80 @UseLongDoubleSafelyNoPromotion(%union.u* byval(%union.u) align 16 %arg) {
				; CHECK-LABEL: define {{[^@]+}}@UseLongDoubleSafelyNoPromotion
				; CHECK-SAME: ([[UNION_U]]* byval([[UNION_U]]) align 16 [[ARG:%.*]]) {
	; CHECK-NEXT: [[GEP:%.]] = getelementptr inbounds [[UNION_U]], [[UNION_U]] [[ARG]], i64 0, i32 0			; CHECK-NEXT: [[GEP:%.]] = getelementptr inbounds [[UNION_U]], [[UNION_U]] [[ARG]], i64 0, i32 0
	; CHECK-NEXT: [[IDX_P:%.*]] = alloca i64, align 8			; CHECK-NEXT: [[TMP_IDX:%.*]] = alloca i64, align 8
	; CHECK-NEXT: store i64 0, i64* [[IDX_P]], align 8			; CHECK-NEXT: store i64 0, i64* [[TMP_IDX]], align 8
	; CHECK-NEXT: [[IDX:%.]] = load i64, i64 [[IDX_P]], align 8			; CHECK-NEXT: [[IDX:%.]] = load i64, i64 [[TMP_IDX]], align 8
	; CHECK-NEXT: [[GEP_IDX:%.]] = getelementptr inbounds [[UNION_U]], [[UNION_U]] [[ARG]], i64 [[IDX]], i32 0			; CHECK-NEXT: [[GEP_IDX:%.]] = getelementptr inbounds [[UNION_U]], [[UNION_U]] [[ARG]], i64 [[IDX]], i32 0
	; CHECK-NEXT: [[FP80:%.]] = load x86_fp80, x86_fp80 [[GEP]], align 16			; CHECK-NEXT: [[FP80:%.]] = load x86_fp80, x86_fp80 [[GEP]]
	; CHECK-NEXT: ret x86_fp80 [[FP80]]			; CHECK-NEXT: ret x86_fp80 [[FP80]]
	;			;
	%gep = getelementptr inbounds %union.u, %union.u* %arg, i64 0, i32 0			%gep = getelementptr inbounds %union.u, %union.u* %arg, i64 0, i32 0
	%idx_slot = alloca i64, align 8			%idx_slot = alloca i64, align 8
	store i64 0, i64* %idx_slot, align 8			store i64 0, i64* %idx_slot, align 8
	%idx = load i64, i64* %idx_slot, align 8			%idx = load i64, i64* %idx_slot, align 8
	%gep_idx = getelementptr inbounds %union.u, %union.u* %arg, i64 %idx, i32 0 ; to protect from "usual" promotion			%gep_idx = getelementptr inbounds %union.u, %union.u* %arg, i64 %idx, i32 0 ; to protect from promotion
	%fp80 = load x86_fp80, x86_fp80* %gep			%fp80 = load x86_fp80, x86_fp80* %gep
	ret x86_fp80 %fp80			ret x86_fp80 %fp80
	}			}

	define internal i64 @AccessPaddingOfStruct(%struct.Foo* byval(%struct.Foo) %a) {			define internal i64 @AccessPaddingOfStruct(%struct.Foo* byval(%struct.Foo) %a) {
	; CHECK-LABEL: define {{[^@]+}}@AccessPaddingOfStruct			; CHECK-LABEL: define {{[^@]+}}@AccessPaddingOfStruct
	; CHECK-SAME: (i64 [[A_0_VAL:%.*]]) {			; CHECK-SAME: (i64 [[A_0_VAL:%.*]]) {
	; CHECK-NEXT: ret i64 [[A_0_VAL]]			; CHECK-NEXT: ret i64 [[A_0_VAL]]
	Show All 30 Lines

llvm/test/Transforms/ArgumentPromotion/metadata.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --scrub-attributes		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --scrub-attributes
; RUN: opt < %s -passes=argpromotion -S \| FileCheck %s		; RUN: opt < %s -passes=argpromotion -S \| FileCheck %s

declare void @use.i32(i32)		declare void @use.i32(i32)
declare void @use.p32(i32*)		declare void @use.p32(i32*)

define internal void @callee(i32* %p1, i32 %p2, i32 %p3, i32 %p4, i32 %p5, i32** %p6) {		define internal void @callee(i32* %p1, i32 %p2, i32 %p3, i32 %p4, i32 %p5, i32** %p6) {
; CHECK-LABEL: define {{[^@]+}}@callee		; CHECK-LABEL: define {{[^@]+}}@callee
; CHECK-SAME: (i32 [[P1_0_VAL:%.]], i32 [[P2_0_VAL:%.]], i32 [[P3_0_VAL:%.]], i32 [[P4_0_VAL:%.]], i32 [[P5_0_VAL:%.]], i32 [[P6_0_VAL:%.*]]) {		; CHECK-SAME: (i32 [[P1_0_VAL:%.]], i32 [[P2_0_VAL:%.]], i32 [[P3_0_VAL:%.]], i32 [[P4_0_VAL:%.]], i32 [[P5_0_VAL:%.]], i32 [[P6_0_VAL:%.*]]) {
		; CHECK-NEXT: [[IS_NOT_NULL:%.]] = icmp ne i32 [[P2_0_VAL]], null
		; CHECK-NEXT: call void @llvm.assume(i1 [[IS_NOT_NULL]])
; CHECK-NEXT: call void @use.i32(i32 [[P1_0_VAL]])		; CHECK-NEXT: call void @use.i32(i32 [[P1_0_VAL]])
; CHECK-NEXT: call void @use.p32(i32* [[P2_0_VAL]])		; CHECK-NEXT: call void @use.p32(i32* [[P2_0_VAL]])
; CHECK-NEXT: call void @use.p32(i32* [[P3_0_VAL]])		; CHECK-NEXT: call void @use.p32(i32* [[P3_0_VAL]])
; CHECK-NEXT: call void @use.p32(i32* [[P4_0_VAL]])		; CHECK-NEXT: call void @use.p32(i32* [[P4_0_VAL]])
; CHECK-NEXT: call void @use.p32(i32* [[P5_0_VAL]])		; CHECK-NEXT: call void @use.p32(i32* [[P5_0_VAL]])
; CHECK-NEXT: call void @use.p32(i32* [[P6_0_VAL]])		; CHECK-NEXT: call void @use.p32(i32* [[P6_0_VAL]])
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
Show All 28 Lines	;
ret void		ret void
}		}

define internal i32* @callee_conditional(i1 %c, i32** dereferenceable(8) align 8 %p) {		define internal i32* @callee_conditional(i1 %c, i32** dereferenceable(8) align 8 %p) {
; CHECK-LABEL: define {{[^@]+}}@callee_conditional		; CHECK-LABEL: define {{[^@]+}}@callee_conditional
; CHECK-SAME: (i1 [[C:%.]], i32 [[P_0_VAL:%.*]]) {		; CHECK-SAME: (i1 [[C:%.]], i32 [[P_0_VAL:%.*]]) {
; CHECK-NEXT: br i1 [[C]], label [[IF:%.]], label [[ELSE:%.]]		; CHECK-NEXT: br i1 [[C]], label [[IF:%.]], label [[ELSE:%.]]
; CHECK: if:		; CHECK: if:
		; CHECK-NEXT: [[IS_NOT_NULL:%.]] = icmp ne i32 [[P_0_VAL]], null
		; CHECK-NEXT: call void @llvm.assume(i1 [[IS_NOT_NULL]])
; CHECK-NEXT: ret i32* [[P_0_VAL]]		; CHECK-NEXT: ret i32* [[P_0_VAL]]
; CHECK: else:		; CHECK: else:
; CHECK-NEXT: ret i32* null		; CHECK-NEXT: ret i32* null
;		;
br i1 %c, label %if, label %else		br i1 %c, label %if, label %else

if:		if:
%v = load i32, i32* %p, !nonnull !1		%v = load i32, i32* %p, !nonnull !1
Show All 21 Lines

llvm/test/Transforms/ArgumentPromotion/store-after-load.ll

This file was added.

				; RUN: opt < %s -passes=argpromotion -S \| FileCheck %s

				; Store instructions are allowed users for byval arguments only.
				define internal void @callee(i32* %arg) nounwind {
				; CHECK-LABEL: define {{[^@]+}}@callee
				; CHECK-SAME: (i32* [[ARG:%.*]]) #[[ATTR0:[0-9]+]] {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TEMP:%.]] = load i32, i32 [[ARG]], align 4
				; CHECK-NEXT: [[SUM:%.*]] = add i32 [[TEMP]], 1
				; CHECK-NEXT: store i32 [[SUM]], i32* [[ARG]], align 4
				; CHECK-NEXT: ret void
				;
				entry:
				%temp = load i32, i32* %arg, align 4
				%sum = add i32 %temp, 1
				store i32 %sum, i32* %arg, align 4
				ret void
				}

				define i32 @caller(i32* %arg) nounwind {
				; CHECK-LABEL: define {{[^@]+}}@caller
				; CHECK-SAME: (i32* [[ARG:%.*]]) #[[ATTR0]] {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: call void @callee(i32* [[ARG]]) #[[ATTR0]]
				; CHECK-NEXT: ret i32 0
				;
				entry:
				call void @callee(i32* %arg) nounwind
				ret i32 0
				}

llvm/test/Transforms/ArgumentPromotion/store-into-inself.ll

This file was added.

				; RUN: opt < %s -passes=argpromotion -S \| FileCheck %s

				target datalayout = "E-p:64:64:64-a0:0:8-f32:32:32-f64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-v64:64:64-v128:128:128"

				%struct.ss = type { i32, i64 }

				define internal void @f(ptr byval(ptr) align 4 %p) nounwind {
				; CHECK-LABEL: define {{[^@]+}}@f
				; CHECK-SAME: (ptr byval(ptr) align 4 [[P:%.*]]) #[[ATTR0:[0-9]+]] {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: store ptr [[P]], ptr [[P]]
				; CHECK-NEXT: ret void
				;
				entry:
				store ptr %p, ptr %p
				ret void
				}

				define internal void @g(ptr byval(ptr) align 4 %p) nounwind {
				; CHECK-LABEL: define {{[^@]+}}@g
				; CHECK-SAME: (ptr byval(ptr) align 4 [[P:%.*]]) #[[ATTR0]] {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[P1:%.*]] = getelementptr i8, ptr [[P]], i64 4
				; CHECK-NEXT: store ptr [[P]], ptr [[P1]]
				; CHECK-NEXT: ret void
				;
				entry:
				%p1 = getelementptr i8, ptr %p, i64 4
				store ptr %p, ptr %p1
				ret void
				}

				define internal void @h(ptr byval(ptr) align 4 %p) nounwind {
				; CHECK-LABEL: define {{[^@]+}}@h
				; CHECK-SAME: (ptr byval(ptr) align 4 [[P:%.*]]) #[[ATTR0]] {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[P1:%.*]] = getelementptr i8, ptr [[P]], i64 4
				; CHECK-NEXT: store ptr [[P1]], ptr [[P]]
				; CHECK-NEXT: ret void
				;
				entry:
				%p1 = getelementptr i8, ptr %p, i64 4
				store ptr %p1, ptr %p
				ret void
				}

				define internal void @k(ptr byval(ptr) align 4 %p) nounwind {
				; CHECK-LABEL: define {{[^@]+}}@k
				; CHECK-SAME: (ptr byval(ptr) align 4 [[P:%.*]]) #[[ATTR0]] {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[X:%.*]] = load ptr, ptr [[P]]
				; CHECK-NEXT: store ptr [[P]], ptr [[X]]
				; CHECK-NEXT: ret void
				;
				entry:
				%x = load ptr, ptr %p
				store ptr %p, ptr %x
				ret void
				}

				define internal void @l(ptr byval(ptr) align 4 %p) nounwind {
				; CHECK-LABEL: define {{[^@]+}}@l
				; CHECK-SAME: () #[[ATTR0]] {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: ret void
				;
				entry:
				%x = load ptr, ptr %p
				store ptr %x, ptr %p
				ret void
				}

				define i32 @main() nounwind {
				; CHECK-LABEL: define {{[^@]+}}@main
				; CHECK-SAME: () #[[ATTR0]] {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[S:%.]] = alloca [[STRUCT_SS:%.]], align 32
				; CHECK-NEXT: [[TEMP1:%.*]] = getelementptr [[STRUCT_SS]], ptr [[S]], i32 0, i32 0
				; CHECK-NEXT: store i32 1, ptr [[TEMP1]], align 4
				; CHECK-NEXT: [[TEMP4:%.*]] = getelementptr [[STRUCT_SS]], ptr [[S]], i32 0, i32 1
				; CHECK-NEXT: store i64 2, ptr [[TEMP4]], align 8
				; CHECK-NEXT: call void @f(ptr byval(ptr) align 4 [[S]]) #[[ATTR0]]
				; CHECK-NEXT: call void @g(ptr byval(ptr) align 4 [[S]]) #[[ATTR0]]
				; CHECK-NEXT: call void @h(ptr byval(ptr) align 4 [[S]]) #[[ATTR0]]
				; CHECK-NEXT: call void @k(ptr byval(ptr) align 4 [[S]]) #[[ATTR0]]
				; CHECK-NEXT: [[S_VAL:%.*]] = load ptr, ptr [[S]], align 8
				; CHECK-NEXT: call void @l() #[[ATTR0]]
				; CHECK-NEXT: ret i32 0
				;
				entry:
				%S = alloca %struct.ss, align 32
				%temp1 = getelementptr %struct.ss, %struct.ss* %S, i32 0, i32 0
				store i32 1, i32* %temp1, align 4
				%temp4 = getelementptr %struct.ss, %struct.ss* %S, i32 0, i32 1
				store i64 2, i64* %temp4, align 8
				call void @f(ptr byval(ptr) align 4 %S) nounwind
				call void @g(ptr byval(ptr) align 4 %S) nounwind
				call void @h(ptr byval(ptr) align 4 %S) nounwind
				call void @k(ptr byval(ptr) align 4 %S) nounwind
				call void @l(ptr byval(ptr) align 4 %S) nounwind
				ret i32 0
				}

This is an archive of the discontinued LLVM Phabricator instance.

[ArgPromotion] Unify byval promotion with non-byvalClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 440571

llvm/include/llvm/Transforms/IPO/ArgumentPromotion.h

llvm/lib/Transforms/IPO/ArgumentPromotion.cpp

llvm/test/Transforms/ArgumentPromotion/attrs.ll

llvm/test/Transforms/ArgumentPromotion/byval-2.ll

llvm/test/Transforms/ArgumentPromotion/byval-through-pointer-promotion.ll

llvm/test/Transforms/ArgumentPromotion/byval-with-padding.ll

llvm/test/Transforms/ArgumentPromotion/byval.ll

llvm/test/Transforms/ArgumentPromotion/dbg.ll

llvm/test/Transforms/ArgumentPromotion/fp80.ll

llvm/test/Transforms/ArgumentPromotion/metadata.ll

llvm/test/Transforms/ArgumentPromotion/store-after-load.ll

llvm/test/Transforms/ArgumentPromotion/store-into-inself.ll

[ArgPromotion] Unify byval promotion with non-byval
ClosedPublic