This is an archive of the discontinued LLVM Phabricator instance.

[IPRA] Do not collect register usage information on functions that can be derefined
Needs ReviewPublic

Authored by kbarton on Apr 4 2018, 7:49 PM.

Details

Summary

Do not collect register usage information to be used by IPRA on functions that can be derefined.
If it is possible for a function to be replaced at link time with another version of the function, that has been optimized differently, we cannot assume the reg mask will be correct. For a more indepth explanation of de-refinement, see https://reviews.llvm.org/D18634.

This patch checks the isDefinitionExact method on the function to ensure the function cannot be redefined. If the definition is not exact, do not refine the register usage information during the collection phase of IPRA.
The isSafeForNoCSROpt method is also updated to ensure the function cannot be redefined. This is necessary because the isSafeForNoCSROpt relies on IPRA information, and if IPRA cannot be performed on a function, it consequently is not safe for NoCSROpt.

Diff Detail

Event Timeline

kbarton created this revision.Apr 4 2018, 7:49 PM

This seems perfectly fine. Maybe the test case can be slightly improved. Adding the originator of this code to the review.

llvm/test/CodeGen/PowerPC/ipra-odr.ll
2

Can we make the checks stronger with -print-regusage?

12

I imagine this stray b character will cause test case failures.

Change looks good to me. But please fix the new test case as suggested by @nemanjai
In this new test do you see improvement in register allocation due to IPRA? If test-case does not show any benefit due to IPRA then it is not suitable test for this patch.
Also I hope this patch is also tested with other architecture which have LIT based test-cases for IPRA. It is very unlikely that such test-case will have function with linkage type which return true for isInterposableLinkage() but if such case is there then this change may break such tests.

jonpa added a subscriber: jonpa.Apr 29 2018, 11:46 PM

ping!

I recently tried IPRA on SystemZ, and tracked down a benchmark crash to this very same issue, see https://bugs.llvm.org/show_bug.cgi?id=37284. I also verified that this patch seems to handle my test case.

jonpa added a comment.May 1 2018, 4:59 AM

With this patch, any called function which is not "definition exact" keeps the unmodified regmask on the call instruction, which assumes that the callee saved registers are saved if modified by callee.

I therefore think the check !F.hasLocalLinkage() in isSafeForNoCSROpt() should be replaced with !F.isDefinitionExact(), since the former check only covers a subset of the second one.

Otherwise, the saving of the CSRs might be skipped without this being reflected in the call regmask in caller, right?

jonpa added a subscriber: uweigand.May 1 2018, 5:00 AM

With this patch, any called function which is not "definition exact" keeps the unmodified regmask on the call instruction, which assumes that the callee saved registers are saved if modified by callee.

I therefore think the check !F.hasLocalLinkage() in isSafeForNoCSROpt() should be replaced with !F.isDefinitionExact(), since the former check only covers a subset of the second one.

Otherwise, the saving of the CSRs might be skipped without this being reflected in the call regmask in caller, right?

Yes this needs to be done. Sorry I missed in previous review that isSafeForNoCSROpt() is also used in determineCalleeSaves()

Change looks good to me. But please fix the new test case as suggested by @nemanjai
In this new test do you see improvement in register allocation due to IPRA? If test-case does not show any benefit due to IPRA then it is not suitable test for this patch.
Also I hope this patch is also tested with other architecture which have LIT based test-cases for IPRA. It is very unlikely that such test-case will have function with linkage type which return true for isInterposableLinkage() but if such case is there then this change may break such tests.

This test doesn't show an improvement due to IPRA, because it is testing to ensure IPRA is not run on the bar function because it is marked as weak_odr. Without this fix, IPRA would cause the initialization of r3 and r4 between the calls to bar (CHECKS on line 19 and 20 of the test case) to be removed because bar does not clobber those registers. I have other tests that show the benefit of IPRA that I will include in the patch to enable this on PowerPC.

Yes, this patch has been tested with all of the lit tests, including SystemZ and X86 and there were no failures.

With this patch, any called function which is not "definition exact" keeps the unmodified regmask on the call instruction, which assumes that the callee saved registers are saved if modified by callee.

I therefore think the check !F.hasLocalLinkage() in isSafeForNoCSROpt() should be replaced with !F.isDefinitionExact(), since the former check only covers a subset of the second one.

Otherwise, the saving of the CSRs might be skipped without this being reflected in the call regmask in caller, right?

llvm/test/CodeGen/PowerPC/ipra-odr.ll
2

I've updated the testcase with -print-regusage. Will upload a new diff soon.

12

Yes, I don't know how that got there. Fixed.

kbarton updated this revision to Diff 150244.Jun 6 2018, 7:16 PM
kbarton edited the summary of this revision. (Show Details)

Added the isDefinitionExact check to isSafeForNoCSROpt method.
Updated the test case to use -print-regusage to check for exact register clobbers computed by IPRA.

@kbarton can you please validate below tablel?

LinkageTypeNoCSROptAllowed before this changeNoCSROptAllowed after this change
WeakAnynot allowednot allowed
LinkOncenot allowednot allowed
Commonnot allowednot allowed
ExternalWeaknot allowednot allowed
Externalnot allowedallowed
Appendingnot allowedallowed (but this linkage does not apply to function)
Internalallowedallowed
Privateallowedallowed
WeakODRnot allowednot allowed
LinkOnceODRnot allowednot allowed
Available Externallynot allowednot allowed

Function with external linkage should not be allowed for noCSR opt. For the attached test-case created for external linkage type I think your change allows noCSROpt foo function as for foo I see following value in debugger.
(lldb) p F.isDefinitionExact()
(bool) $1 = true
(lldb) p F.hasLocalLinkage()
(bool) $2 = false

In Attached files external.ipra.s.1 is with your change and external.ipra.s is TOT.

@vivekvpandya You're right - the change will allow noCSROpt to run on functions with external linkage, which is not correct.
I think what we really want here is the isSafeForNoCSROpt to check for the existing conditions (i.e., prior to this patch, when it looks for local linkage) and also whether IPRA was successful for this function. I added the isDefinitionExact() call, to try and replicate the necessary conditions for IPRA, but now I'm wondering if it would be better to query the PRUI and make sure it contains a valid RegMask for the given function. Does this seem reasonable to you?

@vivekvpandya You're right - the change will allow noCSROpt to run on functions with external linkage, which is not correct.
I think what we really want here is the isSafeForNoCSROpt to check for the existing conditions (i.e., prior to this patch, when it looks for local linkage) and also whether IPRA was successful for this function. I added the isDefinitionExact() call, to try and replicate the necessary conditions for IPRA, but now I'm wondering if it would be better to query the PRUI and make sure it contains a valid RegMask for the given function. Does this seem reasonable to you?

It should work but I think we can discuss properly if you update this patch as per your expectation.
I assume that you will return early in RegUsageInfoCollector::runOnMachineFunction() if definition is not exact. So we don't have regmask in PRUI for given function.
Now while inserting prologue/epilogue along with isSafeForNoCSROpt() you would like to check if due to above mention condition no regmask is present in PRUI for given function?

Also I realised a potential optimization problem with return early in RegUsageInfoCollector::runOnMachineFunction() for any function without inserting a regmask (even default given as per calling convention for function).
For example there is call sequence like A->B
now as per calling convention for B R1, R2, R3 are not callee saved, but RegUsageInfoCollector detects that it only uses R1, R2 so on TOT version every function will have regmask in PRUI so that RegUsageInforPropogate() can use this data in caller function. In this case A can use the fact that R3 is free to be used. Now with early return due to any condition will make A use default calling convention for B i.e R3 should not be used without spill/restore around call site of B.

Thanks @vivekvpandya for getting back to me.
I'll work on updating the patch now. I think we're on the same page with the potential problems.
Essentially, there seems to be some assumptions that when EnableIIPRA is set, it will work properly and add additional information that can be used. Now, with the early exit due to isDefinitionExact, that is not necessarily the case. So, I'm trying to put a check in to validate that IPRA was successful before using the information. This way, if we extend the checks in IPRA in the future, we don't need to continually replicate them in other places that rely on IPRA being successful.

Regarding the prologue/epilogue inserter, I don't immediately see any use of the PRUI collected by IPRA. I'll look closer though, as I might have missed something. But, essentially if anyone tries to use the regmask in PRUI, and that mask is NULL, they must assume IPRA was not successful and fallback on the original calling conventions for saving registers across calls.

MatzeB added inline comments.Jul 13 2018, 12:44 PM
llvm/include/llvm/CodeGen/TargetFrameLowering.h
338

I believe this should rather be
F.hasAvailableExternallyLinkage() || F.isInterposable() || F.hasAddressTaken() || !F.hasFnAttribute(Attribute::NoRecurse).

MatzeB added inline comments.Jul 13 2018, 12:47 PM
llvm/include/llvm/CodeGen/TargetFrameLowering.h
338

And we probably should also abort for isLinkOnceLinkage() (though I'm currently not sure if that linkage is even possible when the symbol doesn't have an external linkage...)

MatzeB added inline comments.Jul 13 2018, 12:49 PM
llvm/lib/CodeGen/RegUsageInfoCollector.cpp
104–105

Maybe add a comment that this is a shortcut as the result won't get used?

kbarton added inline comments.Jul 17 2018, 10:42 AM
llvm/include/llvm/CodeGen/TargetFrameLowering.h
338

The implementation of isDefinitionExact calls mayBeDerefined, which checks for both hasAvailableExternallyLinkage, and isInterposable. It also checks for many other linkage types as well.

Was your suggestion to relax the check to make this work on the other linkage types that would currently be prevented by the call to isDefinitonExact? If not, I would prefer to keep the call to isDefinitionExact as I think it's more concise.

llvm/lib/CodeGen/RegUsageInfoCollector.cpp
104–105

I've been talking with Vivek on how we can change this. There is a subtle handshake between IPRA and isSafeForNoCSROpt that I'd like to make more explicit. I'll be posting another patch soon, as soon as I've had a chance to redo this. If that doesn't get accepted, then yes, I'll definitely add a comment here.

kbarton updated this revision to Diff 156989.Jul 24 2018, 2:52 AM

Created a new function (isSafeForIPRA) to consistently check whether IPRA can be performed. This check is used in RegUsageInfoCollector as well as isSafeForNoCSROpt. I also went back to the original checks in isSafeForNoCSROpt to check for LocalLinkage.

Coge change looks good but I have few questions on lit test cases.

  1. in llvm/test/CodeGen/PowerPC/ipra-odr.ll IPRA is not called on bar() then it should follow calling convention so in foo() why registers are getting restored between calls to bar() because without IPRA bar() is expected to save callee-saved-registers as per ABI.
  2. llvm/test/CodeGen/X86/ipra-external-linkage.ll same question as above applies. As foo() will save callee-saved-registers as per ABI why around foo() call in bar() it should save/restore callee-saved-register.
llvm/test/CodeGen/X86/ipra-external-linkage.ll
9

For function definition that has external linkage IPRA is applicable but LLVM can't optimize it to have NoCSROpt.
So check is correct but comment needs to be accurate.

Is this patch dead? Does @kbarton or perhaps @jonpa have any interest in seeing this through so IPRA can be enabled on PPC/SystemZ?

Herald added a project: Restricted Project. · View Herald TranscriptMar 31 2019, 6:03 AM
Herald added a subscriber: jsji. · View Herald Transcript

Is this patch dead? Does @kbarton or perhaps @jonpa have any interest in seeing this through so IPRA can be enabled on PPC/SystemZ?

Thanks for the ping - this had fallen off my radar.
I'd like to get this finalized and committed. I'll post a follow-up patch in the next couple days after I've done my context switch and remembered what the problems were :)

arsenm added a subscriber: arsenm.Apr 11 2019, 9:45 AM
arsenm added inline comments.
llvm/include/llvm/CodeGen/TargetFrameLowering.h
353–356

Just return the bool directly?

Just to note there is related commit https://reviews.llvm.org/rL366557.

sanjoy resigned from this revision.Jan 29 2022, 5:31 PM