This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Implement index-based WPD
ClosedPublic

Authored by tejohnson on Nov 30 2018, 4:25 PM.

Details

Summary

This patch adds support to the WholeProgramDevirt pass to perform
index-based WPD, which is invoked from ThinLTO during the thin link.

The ThinLTO backend (WPD import phase) behaves the same regardless of
whether the WPD decisions were made with the index-based or (the
existing) IR-based analysis.

Depends on D54815.

Diff Detail

Event Timeline

tejohnson created this revision.Nov 30 2018, 4:25 PM
pcc added a comment.Jan 27 2019, 3:41 PM

Does this correctly handle the case where the only implementation is internal? I didn't see any code handling that. In the existing implementation this is done here: http://llvm-cs.pcc.me.uk/lib/Transforms/IPO/WholeProgramDevirt.cpp#814

In D55153#1373111, @pcc wrote:

Does this correctly handle the case where the only implementation is internal? I didn't see any code handling that. In the existing implementation this is done here: http://llvm-cs.pcc.me.uk/lib/Transforms/IPO/WholeProgramDevirt.cpp#814

This is a good catch. I suspect I didn't hit it since I am marking all the added call edges as Hot, which means we are highly likely to import them, which will result in them getting promoted. But if we were to ever not import it would cause an issue. Let me see if I can confirm this, and the fix should be straightforward (ensure they are all treated as exported if we have references on type tests in other modules so we mark them as promoted in the index).

In D55153#1373111, @pcc wrote:

Does this correctly handle the case where the only implementation is internal? I didn't see any code handling that. In the existing implementation this is done here: http://llvm-cs.pcc.me.uk/lib/Transforms/IPO/WholeProgramDevirt.cpp#814

This is a good catch. I suspect I didn't hit it since I am marking all the added call edges as Hot, which means we are highly likely to import them, which will result in them getting promoted. But if we were to ever not import it would cause an issue. Let me see if I can confirm this, and the fix should be straightforward (ensure they are all treated as exported if we have references on type tests in other modules so we mark them as promoted in the index).

There were actually a few different issues I discovered while fixing this and writing an additional test.

First of all, we need to make note of any devirtualized targets that have type test users in a different module so that they are not subsequently internalized (since the designation of being exported is based on the GlobalResolution's Partition info which was set up based on linker resolution info). To do this we simply pass in the ExportedGUIDs set from the client that we accumulate this info into and update it based on WPD results. This also ensures we promote any local devirtualized targets used in another module.

Second of all, when the devirtualized implementation is internal and it is used by devirtualized call in another module it must be promoted (ensured by the above change), and we need to note the promoted name in the WPDRes SingleImplName. We can simply record the global promoted name in the WPDRes when we detect that we are adding a cross-module devirtualiztion.

Third, when the devirtualized implementation is internal, and all devirtualized uses are in the same module, it still might need to be promoted if it is subsequently exported by cross-module importing. The simplest and most efficient way to handle this that I came up with is to simply record the information necessary to quickly locate the corresponding WPDRes structure for each local devirt target's ValueInfo in the case where it is not yet exported by a devirtualization. Then after cross module importing, we simply check if any of these VIs has been exported by importing, and update the recorded SingleImplName.

All of these cases are checked with an additional test (devirt2.ll) that I have added.

Finally, the new devirt2.ll also checks the existing hybrid WPD for completeness. Since in that case the local targets are promoted and renamed during module splitting, and the name is based on a hash, I added some handling to llvm-lto2.cpp to enable matching of the provided resolutions based on the unpromoted name.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 5 2019, 7:53 AM
tejohnson updated this revision to Diff 185318.Feb 5 2019, 8:29 AM

Handle exported symbols correctly

pcc added a comment.Apr 26 2019, 5:58 PM

Could you please revert the changes from D54815 out of this change so that it is easier to read?

In D55153#1481171, @pcc wrote:

Could you please revert the changes from D54815 out of this change so that it is easier to read?

Yep, sorry, looks like that got messed up when I merged in some other changes awhile back. I'm working on merging in the changes I made this morning and last week to the underlying index patch, and then will update this to contain only the delta. Hopefully it will get updated later this afternoon.

In D55153#1481171, @pcc wrote:

Could you please revert the changes from D54815 out of this change so that it is easier to read?

Yep, sorry, looks like that got messed up when I merged in some other changes awhile back. I'm working on merging in the changes I made this morning and last week to the underlying index patch, and then will update this to contain only the delta. Hopefully it will get updated later this afternoon.

I have rebased on top of the recent changes to D54815, and updated the patch so it reflects the diffs on top of that patch.

In D55153#1481171, @pcc wrote:

Could you please revert the changes from D54815 out of this change so that it is easier to read?

Yep, sorry, looks like that got messed up when I merged in some other changes awhile back. I'm working on merging in the changes I made this morning and last week to the underlying index patch, and then will update this to contain only the delta. Hopefully it will get updated later this afternoon.

I have rebased on top of the recent changes to D54815, and updated the patch so it reflects the diffs on top of that patch.

ping, @pcc can you take a look?

pcc added inline comments.Jun 28 2019, 7:06 PM
lib/Transforms/IPO/WholeProgramDevirt.cpp
845

Can't the vtable be linkonce_odr if the key function is inline?

tejohnson marked an inline comment as done.Jul 8 2019, 8:27 PM
tejohnson added inline comments.
lib/Transforms/IPO/WholeProgramDevirt.cpp
845

Good point. I have changed this to assert that all copies are ODR linkage if there is more than one. I also added this case to the new test.

tejohnson updated this revision to Diff 208585.Jul 8 2019, 8:28 PM

Address comments.

Ping - @pcc any more comments?

pcc accepted this revision.Jul 31 2019, 2:25 PM

LGTM

lib/LTO/LTO.cpp
45

nit: sort

test/ThinLTO/X86/devirt2.ll
200 ↗(On Diff #208585)

Is -DAG supported with -COUNT? (Doesn't look like it: http://llvm-cs.pcc.me.uk/lib/Support/FileCheck.cpp#1046 )
I would just write this out twice.

This revision is now accepted and ready to land.Jul 31 2019, 2:25 PM
tejohnson marked 2 inline comments as done.Aug 1 2019, 2:17 PM
tejohnson updated this revision to Diff 212905.Aug 1 2019, 2:18 PM

Address comments

This revision was automatically updated to reflect the committed changes.
foad added a subscriber: foad.Aug 2 2019, 7:27 AM

This test is failing for me:

Exit Code: 1

Command Output (stderr):
--
/home/jayfoad2/llvm-debug/bin/opt: /home/jayfoad2/git/llvm-project/llvm/test/ThinLTO/X86/Inputs/devirt3.ll: error: Could not open input file: No such file or directory

--

********************
Testing Time: 2.22s
********************
Failing Tests (1):
    LLVM :: ThinLTO/X86/devirt2.ll

  Expected Passes    : 90
  Unexpected Failures: 1

Did you forget to commit devirt3.ll? Or is it a typo for devirt2.ll?

This test is failing for me:

Exit Code: 1

Command Output (stderr):
--
/home/jayfoad2/llvm-debug/bin/opt: /home/jayfoad2/git/llvm-project/llvm/test/ThinLTO/X86/Inputs/devirt3.ll: error: Could not open input file: No such file or directory

--

********************
Testing Time: 2.22s
********************
Failing Tests (1):
    LLVM :: ThinLTO/X86/devirt2.ll

  Expected Passes    : 90
  Unexpected Failures: 1

Did you forget to commit devirt3.ll? Or is it a typo for devirt2.ll?

Sorry about that, yes it should have been devirt2.ll. I fixed this in r367680 and r367682 (there were 2 instances and I only caught the first one initially).

aganea added a subscriber: aganea.Apr 14 2020, 8:47 AM
saudi added a subscriber: saudi.Apr 14 2020, 2:38 PM

When switching from clang 9 to clang 10, we encountered a freeze in one of our games built with thin-LTO.
git bisect pointed to this commit.
Unfortunately, it is a pretty large project, and I couldn't repro that behavior in other -smaller- contexts.

Behavior:
In a call to a virtual function, a direct call is made to an address containing a single instruction : a short jump to that same instruction, causing an infinite loop:

00007FF72C7DA80C 5B                   pop         rbx  
00007FF72C7DA80D 5F                   pop         rdi  
00007FF72C7DA80E 5E                   pop         rsi  
00007FF72C7DA80F C3                   ret  
--- [path_to_file.h]
00007FF72C7DA810 EB FE                jmp         00007FF72C7DA810      # <- Infinite loop here
--- No source file -------------------------------------------------------------
00007FF72C7DA812 CC                   int         3  
00007FF72C7DA813 CC                   int         3  
00007FF72C7DA814 CC                   int         3

I noticed, looking into the output map file, that several other methods of that same class had that same address.

Our project is built clang-cl, with options -flto=thin and -Xclang -fwhole-program-vtables, andl links with pre-compiled libs that are built with MSVC.
Without -fwhole-program-vtables the bug disappears.

Notes:

  • We tried applying the patch from D77421, without any luck.
  • Our project links with pre-compiled libs (that were built without LTO). Would -fwhole-program-vtables be invalid to use in our case?
  • The bug also occurs on the master branch.
  • I tried with an assert-enabled build, no asserts were raised
  • The class hierarchy has several virtual methods inlined; the others are built without LTO in a separate library.

At this point, I would like to create a simple repro, but I'm running out of ideas, as I'm not very familiar with LTO detail. Our repro involves a pretty big compilation (resulting .exe file is about 100MB, link time ~8 min).

I would be interested in suggestions about how to extract helpful information, for example:

  • specific command line arguments to clang-cl.exe or lld-link.exe that would give details about devirtualization;
  • a significant place to put a breakpoint in lld's code;
  • command lines to run on the thin-lto generated object files;
  • ... ?

Thanks!

A couple of immediate questions/ideas below:

When switching from clang 9 to clang 10, we encountered a freeze in one of our games built with thin-LTO.
git bisect pointed to this commit.
Unfortunately, it is a pretty large project, and I couldn't repro that behavior in other -smaller- contexts.

Behavior:
In a call to a virtual function, a direct call is made to an address containing a single instruction : a short jump to that same instruction, causing an infinite loop:

00007FF72C7DA80C 5B                   pop         rbx  
00007FF72C7DA80D 5F                   pop         rdi  
00007FF72C7DA80E 5E                   pop         rsi  
00007FF72C7DA80F C3                   ret  
--- [path_to_file.h]
00007FF72C7DA810 EB FE                jmp         00007FF72C7DA810      # <- Infinite loop here
--- No source file -------------------------------------------------------------
00007FF72C7DA812 CC                   int         3  
00007FF72C7DA813 CC                   int         3  
00007FF72C7DA814 CC                   int         3

I noticed, looking into the output map file, that several other methods of that same class had that same address.

Do you know what the original virtual call was, and whether the devirtualization seems legitimate given the class hierarchy?

Our project is built clang-cl, with options -flto=thin and -Xclang -fwhole-program-vtables, andl links with pre-compiled libs that are built with MSVC.
Without -fwhole-program-vtables the bug disappears.

Are you building with -fno-split-lto-unit? If not, you are getting hybrid regular/Thin LTO devirtualization, and this patch doesn't really apply. By default with -fwhole-program-vtables and -flto=thin, the LTO units are split into regular and thin portions, and all vtables are in the regular portion. This patch simply extended WPD to work without splitting, hence ThinLTO "index only" - i.e. no regular LTO and no IR, and only kicks in with -fno-split-lto-unit on top of the other options. The original hybrid ThinLTO WPD was implemented by @pcc, who may have some ideas as well.

Notes:

  • We tried applying the patch from D77421, without any luck.
  • Our project links with pre-compiled libs (that were built without LTO). Would -fwhole-program-vtables be invalid to use in our case?

It depends. Are you also compiling with -fvisibility=hidden? I believe -fwhole-program-vtables should be safe on its own, but often is combined with -fvisibility=hidden to maximize the devirtualization candidates. Setting the visibility to hidden can be dangerous if you don't have full whole program scope for LTO. I.e. if any of those pre-compiled libs inherits from the classes in the LTO objects.

  • The bug also occurs on the master branch.
  • I tried with an assert-enabled build, no asserts were raised
  • The class hierarchy has several virtual methods inlined; the others are built without LTO in a separate library.

At this point, I would like to create a simple repro, but I'm running out of ideas, as I'm not very familiar with LTO detail. Our repro involves a pretty big compilation (resulting .exe file is about 100MB, link time ~8 min).

I would be interested in suggestions about how to extract helpful information, for example:

  • specific command line arguments to clang-cl.exe or lld-link.exe that would give details about devirtualization;

Try enabling pass remarks in the LTO pipeline. For lld this would be -Wl,-mllvm,-pass-remarks=wholeprogramdevirt

  • a significant place to put a breakpoint in lld's code;

This sort of depends on what you are looking for. I would start with the pass remarks and figure out what is the bad devirtualization. The WPD code is in llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp.

  • command lines to run on the thin-lto generated object files;
  • ... ?

Thanks!

Our project is built clang-cl, with options -flto=thin and -Xclang -fwhole-program-vtables, andl links with pre-compiled libs that are built with MSVC.
Without -fwhole-program-vtables the bug disappears.

Are you building with -fno-split-lto-unit? If not, you are getting hybrid regular/Thin LTO devirtualization, and this patch doesn't really apply. By default with -fwhole-program-vtables and -flto=thin, the LTO units are split into regular and thin portions, and all vtables are in the regular portion. This patch simply extended WPD to work without splitting, hence ThinLTO "index only" - i.e. no regular LTO and no IR, and only kicks in with -fno-split-lto-unit on top of the other options. The original hybrid ThinLTO WPD was implemented by @pcc, who may have some ideas as well.

I was not using -fno-split-lto-unit option; however it was unwantedly infered by the use of -Xclang before -fwhole_program_vtables : the driver didn't see the -fwhole_program_vtables and hence didn't activate -fsplit-lto-unit on cc1 argument.
This peculiar setup seems to have been hiding my problem until this patch.
Removing the -Xclang, I could repro the bug on previous commits.
I'm investigating that now.

Thank you for your time!