Page MenuHomePhabricator

[llvm] Teach whole program devirtualization about relative vtables
Needs ReviewPublic

Authored by leonardchan on Sep 20 2022, 2:54 PM.

Details

Summary

Prior to this patch, WPD was not acting on relative-vtables in C++. This involves teaching WPD about two things:

  • llvm.load.relative which is how relative-vtables are indexed (instead of GEP)
  • dso_local_equivalent which is used in the vtable itself when taking the offset between a virtual function and vtable
  • Update llvm/test/ThinLTO/X86/devirt.ll to use opaque pointers and add equivalent tests for RV

Diff Detail

Event Timeline

leonardchan created this revision.Sep 20 2022, 2:54 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 20 2022, 2:54 PM
leonardchan requested review of this revision.Sep 20 2022, 2:54 PM

Feel free to add other reviewers you think may be appropriate.

In regards to testing, my initial idea was to have equivalent tests for relative vtables for each existing test in llvm/test/Transforms/WholeProgramDevirt/, but it feels like there's a lot of overlap between the tests I already have in this patch. Lemme know if I should go ahead and make copies for each test or if there's any specific tests I should add here.

tejohnson added a subscriber: pcc.Sep 30 2022, 8:20 AM

@pcc is probably the best reviewer for this since it looks like he originally introduced the llvm.load.relative intrinsic for relative vtables and will have good familiarity with that and WPD.

Does this rely on D134687 for correctness?

Feel free to add other reviewers you think may be appropriate.

In regards to testing, my initial idea was to have equivalent tests for relative vtables for each existing test in llvm/test/Transforms/WholeProgramDevirt/, but it feels like there's a lot of overlap between the tests I already have in this patch. Lemme know if I should go ahead and make copies for each test or if there's any specific tests I should add here.

The tests you have added are all testing I think pure regular LTO. Given that your changes to findLoadCallsAtConstantOffset which is called from findDevirtualizableCallsForTypeTest, this should also allow your new WPD to kick in for split module aka hybrid Regular + Thin LTO, as it is called when summarizing type tests during summary building here: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Analysis/ModuleSummaryAnalysis.cpp#L187
That is sufficient for that mode's WPD since it only relies on the type tests getting ThinLTO summaries, as the vtables themselves will be in the regular LTO split module.

So please also add tests for this mode. A good way would be in a test like llvm/test/ThinLTO/X86/devirt.ll. Note that this tests both split module aka hybrid LTO as well as pure index only ThinLTO. I added a comment below about why the latter will not yet work with your changes and what would be needed there (that relies on vtables being summarized as well). So unless you add that support in this patch too, you could just add the checking in for the new WPD for the split module case (see %t.o), and add a TODO in for the non-split pure ThinLTO case (see %t2.o).

llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
1006

It looks like this relies on support added to getPointerAtOffset for Swift relative vtables in D107645, and there is a comment in this routine that says "(Swift-specific) relative-pointer support starts here":
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Analysis/TypeMetadataUtils.cpp#L162

Should this comment be removed, or does the code need to be adjusted to not be Swift-specific in some way?

Additionally, it would be nice to support this WPD for non-split (index only) ThinLTO. To do so it looks like the equivalent handling to the "swift-only" changes in getPointerAtOffset would need to be added when we compute the vtable summaries in findFuncPointers here:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Analysis/ModuleSummaryAnalysis.cpp#L523-L559

along with the change you are making here to look through DSOLocalEquivalent. Can you add a TODO there if you don't want to add that support in this patch?

leonardchan edited the summary of this revision. (Show Details)
leonardchan added a reviewer: pcc.
leonardchan removed a subscriber: pcc.

@pcc is probably the best reviewer for this since it looks like he originally introduced the llvm.load.relative intrinsic for relative vtables and will have good familiarity with that and WPD.

Does this rely on D134687 for correctness?

Yes, for getting LTO working on RV end-to-end. Although since it's just the frontend, it can be landed independently of any llvm changes. My understanding is the frontend just needs to emit the correct metadata which will be consumed by WPD via this patch.

Feel free to add other reviewers you think may be appropriate.

In regards to testing, my initial idea was to have equivalent tests for relative vtables for each existing test in llvm/test/Transforms/WholeProgramDevirt/, but it feels like there's a lot of overlap between the tests I already have in this patch. Lemme know if I should go ahead and make copies for each test or if there's any specific tests I should add here.

The tests you have added are all testing I think pure regular LTO. Given that your changes to findLoadCallsAtConstantOffset which is called from findDevirtualizableCallsForTypeTest, this should also allow your new WPD to kick in for split module aka hybrid Regular + Thin LTO, as it is called when summarizing type tests during summary building here: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Analysis/ModuleSummaryAnalysis.cpp#L187
That is sufficient for that mode's WPD since it only relies on the type tests getting ThinLTO summaries, as the vtables themselves will be in the regular LTO split module.

So please also add tests for this mode. A good way would be in a test like llvm/test/ThinLTO/X86/devirt.ll. Note that this tests both split module aka hybrid LTO as well as pure index only ThinLTO. I added a comment below about why the latter will not yet work with your changes and what would be needed there (that relies on vtables being summarized as well). So unless you add that support in this patch too, you could just add the checking in for the new WPD for the split module case (see %t.o), and add a TODO in for the non-split pure ThinLTO case (see %t2.o).

I think this updated diff should also cover the split-module case(?)

llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
1006

It looks like this relies on support added to getPointerAtOffset for Swift relative vtables in D107645, and there is a comment in this routine that says "(Swift-specific) relative-pointer support starts here":
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Analysis/TypeMetadataUtils.cpp#L162

Should this comment be removed, or does the code need to be adjusted to not be Swift-specific in some way?

Ah, it looks like it might be more correct to instead handle DSOLocalEquivalent in getPointerAtOffset. I suppose this could also allow non-swift relative vtables to be accounted for in GlobalDCE. Yeah I think the comment and function should be adjusted since those code bits are used by more than swift vtables.

Additionally, it would be nice to support this WPD for non-split (index only) ThinLTO. To do so it looks like the equivalent handling to the "swift-only" changes in getPointerAtOffset would need to be added when we compute the vtable summaries in findFuncPointers here:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Analysis/ModuleSummaryAnalysis.cpp#L523-L559

along with the change you are making here to look through DSOLocalEquivalent. Can you add a TODO there if you don't want to add that support in this patch?

I *think* the new diff adds the proper support for this in that function.

Sorry for the slow review. Thanks for adding support for index-only WPD, and adding a test for both that and the split-module (hybrid) LTO case! I have a few questions below.

llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
619

Do we also need to look through DSOLocalEquivalent in this function, as is done in getPointerAtOffset, or why not?

622

I don't completely understand what is meant by "find the function if we know the function".

627

Do we need to do any checking or handling of LHSOffset and RHSOffset?

llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
1386

What provoked needing this change, and do we (or why not) need a similar check for all the other WPD optimizations that process the CallSites?

leonardchan marked an inline comment as done.Wed, Jan 18, 4:47 PM
leonardchan added inline comments.
llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
619

Not needed here since IsConstantOffsetFromGlobal already handles DSOLocalEquivalent to get the offset and global.

622

Sorry, I don't even know what I typed. It should instead say something like: "If this constant can be reduced to the offset between a function and a global, then we know this is a valid virtual function if the RHS is the original vtable we're scanning through."

627

In the context of findFuncPointers, I don't think so. LHS/RHSOffset just return the byte offset of either LHS/RHS from their respective globals (like via a GEP). IsConstantOffsetFromGlobal is just a nifty function for getting the underlying global in these cases. The relevant bits here are confirming RHS refers to the original vtable and we continue to track the function via LHS.

llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
1386

Hmm I distinctly remember running into a segfault in CB.getCaller() because this CB no longer had a parent, but I can't seem to reproduce this anymore in a local build, so I guess it can be removed

leonardchan updated this revision to Diff 490334.
leonardchan marked an inline comment as done.
pcc added inline comments.Thu, Jan 19, 6:31 PM
llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
619

This is just stripping Trunc, right? You could put this below this line instead of introducing stripCasts:

if (CE->getOpcode() == Instruction::Trunc)
  CE = CE->getOperand(0);
627

I think Teresa's concern was that if the offsets don't match this could end up being loaded as function + 1 or something, which would not be callable. But I suppose we could just say that because calling e.g. function + 1 is UB we can ignore that possibility and pretend that function was being called instead. So we don't need to check the offset. It may be worth adding a comment about that though.

llvm/lib/Analysis/TypeMetadataUtils.cpp
140

I suppose it is the caller who knows whether it is appropriate to strip the DSOLocalEquivalent. But it seems that the existing callers should be fine with it so I don't mind adding this here.

It's somewhat suboptimal that this function handles absolute and relative pointers in the same function, but I suppose this is justifiable for the "UB" reasons that also justify ignoring the offset above. It may have been better to split out the "destructuring" part of this function from the pointer-identifying part (which could be done in two functions, one for absolute pointers and another for relative). That would make the semantics clearer at least. But you didn't add the relative pointer support to this function so I don't mind if we don't clean this up for now.

leonardchan marked 2 inline comments as done.
leonardchan added inline comments.
llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
619

Yeah, removed.

627

Oh I see. Hmm specifically for RV, I think LHSOffset should effectively always be zero since the RHS should essentially be a pointer to the virtual function. Since this is a new addition to lto for vtables, I imagine it's better to be conservative here and add the LHSOffset check. Likewise for RHS, the offset should point to somewhere in the vtable, so we can also check here that the RHSOffset is at the very least within the vtable.

llvm/lib/Analysis/TypeMetadataUtils.cpp
140

Added a TODO comment for this for now.

leonardchan added a comment.EditedWed, Feb 1, 12:34 PM

ping, any more comments?