This is an archive of the discontinued LLVM Phabricator instance.

[ArgPromotion] Unify byval promotion with non-byval
ClosedPublic

Authored by psamolysov on May 12 2022, 10:58 AM.

Details

Summary

It makes sense to handle byval promotion in the same way as non-byval
but also allowing store instructions. However, these should
use the same checks as the load instructions do, i.e. be part of the
ArgsToPromote collection. For these instructions, the check for
interfering modifications can be disabled, though. The promotion
algorithm itself has been modified a lot: all the accesses (i.e. loads
and stores) are rewritten to the emitted alloca instructions. To optimize
these new allocas out, the PromoteMemToReg function from
Transforms/Utils/PromoteMemoryToRegister.cpp file is invoked after
promotion.

In order to let the PromoteMemToReg promote as many allocas as it
is possible, there should be no GEPs from the allocas. To
eliminate the GEPs, its own alloca is generated for every argument
part because a single alloca for the whole argument (that significantly
simplifies the code of the pass though) unfortunately cannot be used.

The idea comes from the following discussion:
https://reviews.llvm.org/D124514#3479676

Diff Detail

Event Timeline

psamolysov created this revision.May 12 2022, 10:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 12 2022, 10:58 AM
psamolysov requested review of this revision.May 12 2022, 10:58 AM

for the byval case, a store can change the value loaded by a later load, so it's not completely dead in that regard

@aeubanks Thank you for the comment. I'm not sure that I really get what "loaded by a later load" means, if it means any load instruction after the store in the same function, this case won't be optimized:

define internal void @k(%struct.ss* byval(%struct.ss) align 4 %b) nounwind  {
entry:
  %temp = getelementptr %struct.ss, %struct.ss* %b, i32 0, i32 0
  %temp1 = load i32, i32* %temp, align 4
  %temp2 = add i32 %temp1, 1
  store i32 %temp2, i32* %temp, align 4
  %temp3 = load i32, i32* %temp, align 4
  ret void
}

because the check AAR.canInstructionRangeModRef (https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp#L658) will return true that means the pointer was invalidated. If the load instruction takes place in another basic block, the AAR.canBasicBlockModify(*TranspBB, Loc) check will save us.

But I've found the new unified approach has a disadvantage from this point of view: when the original byval promotion restores the argument's structure in the callee and remains its users the same, the new approach just does nothing. So, when the original byval promotion gives room for the SROA optimization, the new scheme does not. The example above will be optimized into a single line with the -sroa pass after the "old" -argpromotion one:

define internal void @k(i32 %b.0, i64 %b.1) #0 {
entry:
  %temp2 = add i32 %b.0, 1
  ret void
}

and will not be optimized at all with the -sroa pass after the "new" -argpromotion one.

Currently I have no idea how to restore the optimization room for SROA in that case.

Reformat the source code because the pre-merge run on Linux filed with the following error message:

ERROR   git-clang-format returned an non-zero exit code 1
nikic added a comment.May 13 2022, 1:43 AM

@psamolysov What I had in mind is something along the following lines:

  • For byval, allow stores -- however, these should still use the same checks as loads, i.e. be part of ArgParts collection. We need to make sure that different loads/stores don't have any partial overlaps.
  • For byval, drop the check for interfering mods.
  • Change promotion to work by rewriting accesses to an alloca (as is currently done for byval promotion) and then call promoteMemoryToRegister(). Our earlier checks have ensured that promoteMemoryToRegister() will succeed. (For the non-byval case, we don't really need alloca+mem2reg, but it would help with unifying the implementations.)

@nikic Thank you for detailed description of the idea. I'm not sure that I get the idea correctly, if to analyse your comment by points:

  1. I believe this is implemented in the patch.
  1. If stores are allowed (IsStoresAllowed in the code), the findArgParts function should disable the AAR.canInstructionRangeModRef and AAR.canBasicBlockModify(*TranspBB, Loc) checks because the pointers may be invalidated.
  1. This makes the most difficult thing for me. As I get your point, all accesses (load/stores) to the arguments must be replaced with accesses to new allocas with the corresponding types, so instead of just eliminate loads we should for every promotable pointer argument (regardless whether it is byval or not) create an alloca, add a store of the new value argument into this alloca, keep all the allowed stores and loads (they will work with this alloca) and then call the 'promoteMemoryToRegister' function to do a part of mem2reg's work.

Is my understanding correct?

nikic added a comment.May 13 2022, 2:15 AM

@nikic Thank you for detailed description of the idea. I'm not sure that I get the idea correctly, if to analyse your comment by points:

  1. I believe this is implemented in the patch.

Not quite: We should set IsStoresAllowed == true for all byval+align arguments (independently of "densely packed" and "padding access" predicates), and stores shouldn't be unconditionally allowed, but go through HandleLoad (well, a slight generalization that works on both load&store, but basically does the same thing).

  1. If stores are allowed (IsStoresAllowed in the code), the findArgParts function should disable the AAR.canInstructionRangeModRef and AAR.canBasicBlockModify(*TranspBB, Loc) checks because the pointers may be invalidated.

Correct.

  1. This makes the most difficult thing for me. As I get your point, all accesses (load/stores) to the arguments must be replaced with accesses to new allocas with the corresponding types, so instead of just eliminate loads we should for every promotable pointer argument (regardless whether it is byval or not) create an alloca, add a store of the new value argument into this alloca, keep all the allowed stores and loads (they will work with this alloca) and then call the 'promoteMemoryToRegister' function to do a part of mem2reg's work.

Yeah, exactly. Basically do what byval promotion currently does, just based on ArgParts and not byval structure, and add a mem2reg call at the end.

@nikic Thank you very much, it is crystal clear. I'm going to implement the suggested plan.

psamolysov added a comment.EditedMay 16 2022, 4:27 AM

@nikic If you have time, could you make it clear why the OffsetAndArgPart alias is defined as std::pair<int64_t, ArgPart>, so why int64_t is used for Offset, not uint64_t? All the LLVM's API for work with offsets, APInt (is created from an uint64_t value), StructLayout::getElementOffset (returns uint64_t), etc use unsigned values. Do we really use negative offsets and if not, can the offset part of the pair be replaced with uint64_t? Thank you.

nikic added a comment.May 16 2022, 5:58 AM

@nikic If you have time, could you make it clear why the OffsetAndArgPart alias is defined as std::pair<int64_t, ArgPart>, so why int64_t is used for Offset, not uint64_t? All the LLVM's API for work with offsets, APInt (is created from an uint64_t value), StructLayout::getElementOffset (returns uint64_t), etc use unsigned values. Do we really use negative offsets and if not, can the offset part of the pair be replaced with uint64_t? Thank you.

This is an int64_t, because GEP offsets in LLVM are signed. This is only supported "because we can" though, so if it makes things complicated it could be dropped (this would require bailing out of the transform for negative offsets, not just changing the type). Note that for byval, an access at negative offset is UB, so we're free to transform it even if it makes no sense in that case.

@nikic Thank you for the good explanation. Currently I try to use the createByteGEP function to emit GEPs and the signed nature of the offset creates no problems (I believe there is some singed to unsigned promotion in the function calls inside the createByteGEP one, but I have no warnings).

psamolysov added a comment.EditedMay 16 2022, 7:24 AM

Unfortunately, PromoteMemToReg optimizes almost nothing. If there is a GEP from an alloca, and this GEP is used by store, for example, the isAllocaPromotable function returns false because the if (!onlyUsedByLifetimeMarkersOrDroppableInsts(GEPI)) (which allows intrinsics as GEP users only) check fails. I always use store instructions to write the new argument values into the alloca, so this check will always fail for structures and will pass for primitives only. I've tried to run the mem2reg pass using the opt utility and get the same result, my IR generated by the ArgumentPromotion pass looks as not optimized. The SROA pass works fine, but I'm not sure is this a good idea to run SROA directly from another pass (the PromoteMemToReg function creates an instance of the PromoteMem2Reg class and calls its run member function but this class is not a pass).

nikic added a comment.May 16 2022, 8:13 AM

Unfortunately, PromoteMemToReg optimizes almost nothing. If there is a GEP from an alloca, and this GEP is used by store, for example, the isAllocaPromotable function returns false because the if (!onlyUsedByLifetimeMarkersOrDroppableInsts(GEPI)) (which allows intrinsics as GEP users only) check fails. I always use store instructions to write the new argument values into the alloca, so this check will always fail for structures and will pass for primitives only. I've tried to run the mem2reg pass using the opt utility and get the same result, my IR generated by the ArgumentPromotion pass looks as not optimized. The SROA pass works fine, but I'm not sure is this a good idea to run SROA directly from another pass (the PromoteMemToReg function creates an instance of the PromoteMem2Reg class and calls its run member function but this class is not a pass).

When rewriting to an alloca, I believe that you should end up with only direct loads and stores to the alloca, without any remaining GEPs. There should be one alloca for each ArgPart, not one alloca for all parts. In that case PromoteMemToReg should be able to promote it.

Hm, I thought about this, having one alloca for every ArgPart looks as a solution, one thing to overcome is how to re-target all the old load/store instructions from the old GEPs from a large argument to the new allocas. I was in doubt how to do this but I believe the same approach with dead instructions detection and elimination from non-byval promotion could work.

ormris removed a subscriber: ormris.May 16 2022, 11:08 AM

I've implemented the approach proposed by @nikic (thank you again and again!) What I see, after PromoteMemToReg, some new inserted function arguments might become unused (so, there are no users for the argument after promotion). Theoretically, we can move the code that generates GEPs and loads in the callers after the code that makes promotion in the callee and take those unused argument into account (so, just not to generate the instructions for such arguments) but it requires to generate a new (more new than already generated NF) definition of the callee without the unused arguments. I tried and it makes the code of the pass very difficult to read and understand, so I believe this is a job for another pass in IPO (and maybe this pass already exists) in pair with DCE.

psamolysov edited the summary of this revision. (Show Details)

Implement the approach proposed by @nikic

Reformat the code to let the git-clang-format check pass.

psamolysov added a comment.EditedMay 18 2022, 8:56 AM

Unfortunately, a CallGraphSCCPass pass (the legacy ArgPromotion pass is a child of this class) cannot use a function analysis pass, DominatorTreeWrapperPass for example. It is interesting, the compiled with the new version of the ArgumentPromotion pass opt tool crashes by this reason only during a run of the Polly tests (you can see the crash in the pre-merge checks on Windows).

A workaround can be the following: convert the legacy ArgPromotion pass into a module pass and use the <llvm/ADT/SCCIterator.h>, the solution is described on the StackOverflow: https://stackoverflow.com/questions/30059622/using-dominatortreewrapperpass-in-callgraphsccpass This approach has a few disadvantages: the code of the CallGraphSCCPass::getAnalysisUsage(AnalysisUsage &) method should be copied into the ArgPromotion::getAnalysisUsage (2 lines are there though) and skipSCC(SCC) won't be available anymore as well as CallGraphSCCPass::doInitialization() but this is an empty method.

-I'm going to implement this workaround to check if this works at all. This has no impact on the ArgumentPromotionPass pass that is used by the new pass manager.-

Fortunately, this is not required, the DominatorTree must be calculated for the new function, cached DominatorTree, even if we are able to get it in the pass, just won't work.

Create a DominatorTree for a new function

New function requires its own dominator tree for the call to the
PromoteMem2Reg function. The dominator tree calculated for an old
function is not actual after the argument promotion and using the
tree may lead to UB inside the PromoteMem2Reg.

I'm in doubt what to do for the Assumption Cache (AC). Currently I use
the AC pre-built for the old function. Our pass does nothing to deal with
the @llvm.assume intrinsics, so I believe the cache should be actual and
for the new function too. Do you agree?

Colleagues the patch is ready for review. Could you have a look again? Thank you.

psamolysov added inline comments.May 23 2022, 1:48 AM
llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
621

I'm not aware about speculative stores. Should we handle the stores in the exactly same way as loads here, so should the parameter be set to false?

Fix a typo in the comment to the not used anymore isDenselyPacked static member function.

psamolysov added subscribers: eli.friedman, chandlerc.

Unfortunately, it looks like @nikic and @aeubanks have no time to review the patch (or maybe I'm getting something wrong and I should have added the nicks via @ in my comments during ping request sending, I'm sorry if you, colleagues, just didn't catch my requests). @chandlerc Could you as a code owner and one of the pass's author have a look at the patch? @jdoerfert @eli.friedman Or you? Thanks a lot in advance.

@nikic @aeubanks Colleagues, ping, could you have a look at the changes?

nikic added a comment.Jun 23 2022, 6:40 AM

Sorry for the long delay here. This looks pretty nice!

llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
323

SmallVector<AllocaInst *>

327

I doubt this code is performance-critical enough to need this accumulate + reserve :)

367

Can drop the false argument, non-volatile store is the default.

374–380

This comment looks a bit out of place, as this closure doesn't remove any dead instructions itself.

382

Mild preference for OffsetToAlloca.lookup() here, which makes it clear that we don't intend to modify the map (which operator[] can do).

398

It's possible to reuse one IRBuilder instance using IRB.SetInsertPoint().

400

Can omit empty name, it's the default.

414

Volatile operations can't be promoted, can assert that !SI->isVolatile().

419

Can we assert(isAllocaPromotable()) here? I think that by design, we should only produce promotable allocas.

486–498

The "the" can be dropped here.

490

IsStoresAllowed -> AreStoresAllowed

539

I would replaced this with "accessed as" to cover both. Otherwise it sounds like it's either both loads or both stores with conflicting types, but it can also be a mix.

573

else if

621

I think the parameter should be false here, because the store is not guaranteed to executed, and it would be confusing otherwise.

The relevant part here is that we're not going to speculate any stores, so we don't care about the alignment at all -- we can explicitly skip the alignment code in HandleEndUser for stores to make this clear.

627

What happens if the we're storing the argument into itself? store ptr %arg, ptr %arg or similar. I don't think your code handles this correct right now?

It might be best to walk over Uses rather than Users, so we can check which operand the use is on.

653

I think it's safe to drop these TODOs -- this is not really compatible with the new promotion approach that uses separate allocas (we'd have to reconstruct the result from multiple allocas, which makes little sense).

aeubanks added inline comments.Jun 23 2022, 10:29 AM
llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
431–432

creating the DominatorTree should go through something like
function_ref<DominatorTree &(Function &F)> DTGetter, see AARGetter.

but we run SROA (including mem2reg) right after argpromotion, is there a reason to run mem2reg here rather than leave it for the function simplification pipeline? then we wouldn't need to worry about AssumptionCache/DominatorTree here since they're purely used for mem2reg

psamolysov marked 14 inline comments as done.Jun 24 2022, 3:15 AM

@nikic Thank you very much for the review and comments, I've addressed almost all of them and asked a pair of questions about alignment check for loads in the HandleEndUser lambda. Could you have a look and give your answers?

llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
327

I agree, removed this accumulate and reserve.

374–380

Thank you. I've moved the comment just before the dead code elimination loop.

398

I wasn't aware about this method, thanks!

419

I've added the assertion for isAllocaPromotable.

431–432

About running mem2reg in the pass. I tried to remove the mem2reg and run a few tests with opt -O3 (and adding the noinline attribute for the callee before), and see that the almost of the code of the callees is optimized out, so the idea just to remove the mem2reg call from the pass looks suitable.

@nikic what is your opinion? Should we call the mem2reg inside the argument promotion pass or let the compilation pipeline call it?

P.S. I'm also planning to add the Dead Argument Elimination pass into the pipeline after landing this patch because mem2reg after new argument promotion leaves unused arguments.

431–432

@aeubanks Thank you for the suggestion about DTGetter.
Unfortunately, I cannot get why the DTGetter should be used. As I get, AARGetter is a wrapper over the FAM.getResult<AAManager>(F) for the new PM and is just an instance of the LegacyAARGetter class for the legacy PM. It's goal is to give the correct AAR depending on the used pass manager.

In our case, the DominatorTree is just built again for a new function after the argument promotion and doesn't depend on the used pass manager.

539

Ups... I've forgotten that ArgParts collected all the promotable parts for loads as well as for stores and a mix is possible.

621

Thank you for the suggestion. I've changed the argument's value to true and fix the alignment check in the HandleEndUser in order to let it work for loads only. Also, I've edited the comment before the check to make it clear that only loads are checked.

Two questions, please. Currently Part can be either a load or a store previously saved in the ArgParts map by the same offset. Should the condition that the MustExecInstr in the seen before ArgPart is a load (or at least no a store) also be added? Also, when the Part.Alignment field is recalculated - Part.Alignment = std::max(Part.Alignment, I->getAlign());, should we do this whenever I is a load only or in any case?

Thank you.

627

Good catch! This is correct: when I tried store ptr %arg, ptr %arg, an access violation error in the llvm::Instruction::eraseFromParent() function occurred because in the dead instruction elimination loop in the doPromotion function, the store instruction handles twice.

Thank you for the idea! I've rewritten the walk to use Uses instead of Users and to check that the value the store is the user of is not the store's value operand. If so, this store is an "unknown user" because we can store something into the pointer but not the pointer's value into somewhere. A LIT test was also added.

I tried to add a similar check for the load instructions (if the load's pointer operand is exactly the value the load is the user of) and remove the check in HandleEndUser that Ptr is equal to Arg. Unfortunately, this makes no sense because before the walking, there is a loop where we try to handle *every* load instruction in the entry BB for *every* argument and this check actually checks whether the current load instruction is related to the current argument. So, the check is still required.

653

I believe this comment was added because it wasn't clear for me whether stores can be speculative. Thank you for the answer, the situation is clear for now and I've removed my TODO comment as well as the previous one.

psamolysov marked 6 inline comments as done.

Address almost the all reviewer's comments.

nikic added inline comments.Jun 24 2022, 4:00 AM
llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
431–432

I'm okay with not doing the promotion here. We should probably add mem2reg to the tests though, to make sure we generate trivially promotable code.

599

Put U->getUser() into a variable, as it's used so much?

618

I think this works, but the robust way to check this is U->getOperandNo() == SI->getPointerOperandIndex(). (We will separately visit the use in the pointer and value operands, and only consider the pointer operand a known use, while the value operand will fall through to the bailout below.)

619

And with the previous change, I believe you can change this to if (!*HandleEndUser like for LoadInst, as we show now be guaranteed that this is store based-on the argument.

621

Hm, so I guess it's not quite true that we don't speculate stores, in the sense that we do insert a store to store the argument into the alloca. Of course, as this store gets promoted later, the actual alignment doesn't really matter. It's probably still best to produce correct alignment for it.

So I think the safest thing to do here is just not special case the store case at all -- or would that cause any regressions in tests?

nikic added inline comments.Jun 24 2022, 5:05 AM
llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
621

Though we could also explicitly set the align of the alloca to the same value, in which case those would always match. Probably a good idea to do that anyway.

psamolysov marked 6 inline comments as done.Jun 24 2022, 5:24 AM
psamolysov added inline comments.
llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
431–432

I've started to remove the promotion (mem2reg) from the pass and found the following problem: the chained.ll test is not worked as it is designed now, so to run the argpromotion pass and then mem2reg is not the same as run the mem2reg in the argpromotion pass because the pass runs mem2reg after every promotion attempt while the pass manager run passes one by one.

If we consider the chained.ll case, after the argpromotion pass without mem2reg call inside the pass, the generated code is the following:

define internal i32 @test(i32* %x.0.val) {
entry:
  %x.0.allc = alloca i32*, align 8
  store i32* %x.0.val, i32** %x.0.allc, align 8
  %y = load i32*, i32** %x.0.allc, align 8
  %z = load i32, i32* %y, align 4
  ret i32 %z
}

define i32 @caller() {
entry:
  %G2.val = load i32*, i32** @G2, align 8
  %x = call i32 @test(i32* %G2.val)
  ret i32 %x
}

And the pass cannot promote the pointer %x.0.val again because there is a store:

ArgPromotion of i32* %x.0.val failed: unknown user   store i32* %x.0.val, i32** %x.0.allc, align 8

and the byval attribute is not present so stores aren't allowed.

If we run mem2reg from the pass, the alloca and corresponding stores will be promoted and the argpromotion pass gets a chance to do another iteration.

So that to run mem2reg from the pass makes sense.

599

OK, thank you for the suggestion.

618

I've applied the suggestion. Thanks.

621

No, it adds no regressions, so I've just remove the check that the instruction is a load and use the same check for load as well as store instructions and reformulated the comment a bit to cover both cases.

psamolysov marked 2 inline comments as done.

Apply suggestions from the code review.

The condition that the instruction is load was removed from the alignment check in the HandleEndUser lambda.

psamolysov added inline comments.Jun 24 2022, 5:29 AM
llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
621

set the align of the alloca to the same value...

Should the value be Pair.second.Alignment as for the store into the alloca instruction on line 365?

nikic added inline comments.Jun 24 2022, 6:04 AM
llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
431–432

Ah, good point. So we do need to run promotion here.

Unfortunately, I cannot get why the DTGetter should be used. As I get, AARGetter is a wrapper over the FAM.getResult<AAManager>(F) for the new PM and is just an instance of the LegacyAARGetter class for the legacy PM. It's goal is to give the correct AAR depending on the used pass manager.

The general motivation is to cache the DT, so the next pass using it won't have to recompute it. But I don't know what complications the legacy pass manager introduces here. I didn't see any compile-time impact from your current code (though I expect this is mainly because argument promotion triggers rarely.)

621

Yeah, that's what I had in mind.

psamolysov marked an inline comment as done.Jun 24 2022, 7:37 AM
psamolysov added inline comments.
llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
431–432

When I tried to reuse the DominatorTree from the legacy pass manager, I got problems on some tests from polly.

It looks like my problem was in that I built the Dominator Tree for the old function and tried to reuse it to promote the allocas in the new one. If the Dominator Tree is lazy built for the new function using the DTGetter, everything works for the new pass manager (but, as I remember, it worked even when I tried to pass the DT for the old function, I believe it worked because we actually don't change CFG). For the legacy PM, the situation is more dramatic because the ArgPromotion is a CallGraphSCCPass pass and not a FunctionPass one and this code: getAnalysis<DominatorTreeWrapperPass>(F).getDomTree() doesn't work as it is expected: the following error occurs on the Legacy PM:

Unable to schedule 'Dominator Tree Construction' required by 'Promote 'by reference' arguments to scalars'
Unable to schedule pass

As a workaround, I just create a new DominatorTree instance from the DTGetter created from the legacy pass. The idea is stolen from the lib/Analysis/MustExecute.cpp file. Unfortunately, this requires to store all the created DominatorTree instances while the pass is running.

621

Done.

psamolysov marked an inline comment as done.

Implement the proposed solution with DTGetter to reuse DominatorTrees from the pass manager. This works fine on the new PM but not on the legacy one. As a workaround, a new DominatorTree is created for every new generated function whenever the Argument Promotion Pass is used through the legacy pass manager.

nikic added a comment.Jun 24 2022, 8:18 AM

We should be fetching AssumptionCache for the new function rather than the old function as well. I think it is working out in practice because there won't be any assumes, but if it were queried, it would be on the wrong function.

For the legacy pass manager, I would like to propose this innovative solution: D128536 IMHO, at this point, if the legacy pass manager is causing issues, we should be fixing them by dropping support for it.

psamolysov added a comment.EditedJun 24 2022, 8:36 AM

@nikic I believe this is a very good idea just to drop the legacy PM support in ArgPromotion. Thank you for the patch, once your patch has been landed, I'll just leave the single DTGetter.

We should be fetching AssumptionCache for the new function rather than the old function as well.

For AssumptionCache there should be no problem with the legacy PM because the following construct is supported:

getAnalysis<AssumptionCacheTracker>().getAssumptionCache(*OldF)

I've added the ACGetter lambdas to the modern pass as well as the legacy one and I'm ready to remove the getter for the legacy pass as soon as the pass has been removed.

Fetch AssumptionCache for the new function rather than the old one.

nikic added a comment.Jun 27 2022, 2:52 AM

Okay, D128536 has landed, can you please rebase this patch?

Rebase the patch on main, remove all the stuff related to the legacy pass manager.

psamolysov added a comment.EditedJun 27 2022, 3:07 AM

@nikic thank you for landing D128536. The patch has been rebased.

nikic accepted this revision.Jun 27 2022, 3:29 AM

LGTM

llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
402

Just realized that we can probably just do a LI->setOperand(0, GetAlloca(Ptr)) here and don't really need to create a new instruction and RAUW.

632

The first line of this comment can be kept.

This revision is now accepted and ready to land.Jun 27 2022, 3:29 AM
psamolysov marked an inline comment as done.

Apply the suggestion about doing a LI->setOperand(0, GetAlloca(Ptr)) and not really creating a new instruction and RAUW.

psamolysov added inline comments.Jun 27 2022, 4:04 AM
llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
402

Replaced with LI->setOperand(LoadInst::getPointerOperandIndex(), GetAlloca(Ptr)) Thank you very much for the suggestion. I've applied it for store instructions too.

One thing: when the LI was replaced with OffsetToArg[Offset.getSExtValue()] in the old code or with the NewLI in the patch, the instruction's metadata was not copied. Currently, when we are actually don't replace the instruction, the metadata is reused and in the metadata.ll test, the following code is generated:

%1 = icmp ne i32* %p2.0.val, null
call void @llvm.assume(i1 %1)

for this line of code:

%v2 = load i32*, i32** %p2, !nonnull !1

I've updated the test to take this behavior into account, but is this correct and what is expected?

nikic accepted this revision.Jun 27 2022, 4:16 AM
nikic added inline comments.
llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
402

Yes, this is exactly the effect I wanted to see :)

@nikic Thank you for accepting. I've got the commit access and able to land patches. I'm going to land the patch in a day or two.

This revision was landed with ongoing or failed builds.Jun 28 2022, 5:23 AM
This revision was automatically updated to reflect the committed changes.