Page MenuHomePhabricator

[ThinLTO][Legacy] Compute PreservedGUID based on IRName in Symtab
ClosedPublic

Authored by steven_wu on Jul 28 2020, 2:37 PM.

Details

Summary

Instead of computing GUID based on some assumption about symbol mangling
rule from IRName to symbol name, lookup the IRName from all the symtabs
from all the input files to see if there are any matching symbols entry
provides the IRName for GUID computation.

rdar://65853754

Diff Detail

Event Timeline

steven_wu created this revision.Jul 28 2020, 2:37 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2020, 2:37 PM
steven_wu requested review of this revision.Jul 28 2020, 2:37 PM

This is basically a fundamental issues in GUID for machO format, which I am not sure if we can fix the underlying issue. That requires computing GUID based on mangled name, which can be more expensive and I am not sure if it will break the module summary for compatibility if we switch algorithm. So I fixed the export/preserve list only because that is the only thing I can think of that will surface as a bug. Otherwise, the mismatch GUID will just be a missed optimization opportunity.

Not very familiar with MachO. Can you clarify the issue - is it the case that some modules will use the mangled names and some won't? Otherwise it seems like it could use either all mangled or all unmangled.

Not very familiar with MachO. Can you clarify the issue - is it the case that some modules will use the mangled names and some won't? Otherwise it seems like it could use either all mangled or all unmangled.

The issue is that for macho, for a symbol in the object file (symbol table seen by the linker), let's say "_symbol". It can be either. "symbol" or "\01_symbol" in the bitcode file. When user/linker says it want to preserve "_symbol", you don't know what the GUID for that symbol is because it can be both.

Now I think about the patch, it is also not correct that it will preserve "__symbol". Is there any reason why GUID is calculated based on getGlobalIdentifier()? getGlobalIdentifier() drops the \01 prefix so "\01_symbol" is the same GUID is "_symbol".

Not very familiar with MachO. Can you clarify the issue - is it the case that some modules will use the mangled names and some won't? Otherwise it seems like it could use either all mangled or all unmangled.

The issue is that for macho, for a symbol in the object file (symbol table seen by the linker), let's say "_symbol". It can be either. "symbol" or "\01_symbol" in the bitcode file. When user/linker says it want to preserve "_symbol", you don't know what the GUID for that symbol is because it can be both.

I see - so this is an issue with the way the user has specified the name through some kind of option to say it needs to be preserved.

Now I think about the patch, it is also not correct that it will preserve "__symbol". Is there any reason why GUID is calculated based on getGlobalIdentifier()? getGlobalIdentifier() drops the \01 prefix so "\01_symbol" is the same GUID is "_symbol".

It is computed based on getGlobalIdentifier because we need to make sure local names are globalized, and we by design want to ensure the global identifier scheme is the same as that used by PGO (for indirect call promotion). Looks like the code to strip out the '\01' has been there from before ThinLTO (for the PGO handling).

llvm/lib/LTO/ThinLTOCodeGenerator.cpp
292

The code doesn't seem to match the new comment. Should it be also preserving "\01_$NAME"?

Not very familiar with MachO. Can you clarify the issue - is it the case that some modules will use the mangled names and some won't? Otherwise it seems like it could use either all mangled or all unmangled.

The issue is that for macho, for a symbol in the object file (symbol table seen by the linker), let's say "_symbol". It can be either. "symbol" or "\01_symbol" in the bitcode file. When user/linker says it want to preserve "_symbol", you don't know what the GUID for that symbol is because it can be both.

I see - so this is an issue with the way the user has specified the name through some kind of option to say it needs to be preserved.

The deeper issue is that thinLTO doesn't know "symbol" and "\01_symbol" is the same symbol so it will not optimize them but neither will fullLTO I guess.

Now I think about the patch, it is also not correct that it will preserve "__symbol". Is there any reason why GUID is calculated based on getGlobalIdentifier()? getGlobalIdentifier() drops the \01 prefix so "\01_symbol" is the same GUID is "_symbol".

It is computed based on getGlobalIdentifier because we need to make sure local names are globalized, and we by design want to ensure the global identifier scheme is the same as that used by PGO (for indirect call promotion). Looks like the code to strip out the '\01' has been there from before ThinLTO (for the PGO handling).

I don't know what is the best way to deal with this because for macho, "_symbol" and "\01_symbol" are two different symbols. Maybe drop \01_ for macho is a better option.

Not very familiar with MachO. Can you clarify the issue - is it the case that some modules will use the mangled names and some won't? Otherwise it seems like it could use either all mangled or all unmangled.

The issue is that for macho, for a symbol in the object file (symbol table seen by the linker), let's say "_symbol". It can be either. "symbol" or "\01_symbol" in the bitcode file. When user/linker says it want to preserve "_symbol", you don't know what the GUID for that symbol is because it can be both.

I see - so this is an issue with the way the user has specified the name through some kind of option to say it needs to be preserved.

The deeper issue is that thinLTO doesn't know "symbol" and "\01_symbol" is the same symbol so it will not optimize them but neither will fullLTO I guess.

As long as they are treated the same for GUID computation they would be treated the same by ThinLTO. But in GlobalValue::getGlobalIdentifier it looks like only the "\1" is being stripped, so wouldn't that mean that "_symbol" and "\1_symbol" are being treated the same? Is that correct?

Now I think about the patch, it is also not correct that it will preserve "__symbol". Is there any reason why GUID is calculated based on getGlobalIdentifier()? getGlobalIdentifier() drops the \01 prefix so "\01_symbol" is the same GUID is "_symbol".

It is computed based on getGlobalIdentifier because we need to make sure local names are globalized, and we by design want to ensure the global identifier scheme is the same as that used by PGO (for indirect call promotion). Looks like the code to strip out the '\01' has been there from before ThinLTO (for the PGO handling).

I don't know what is the best way to deal with this because for macho, "_symbol" and "\01_symbol" are two different symbols. Maybe drop \01_ for macho is a better option.

It sounds like it from what you are describing, but again I have no familiarity here. BTW the handling here to strip off the "_" for MachO was added in D19103. Maybe @mehdi_amini remembers.

So getGlobalIdentifier removes the leading \01 because it only indicates that the symbol is already mangled, but on the other hand does not apply the platform mangling when there isn't a \01? It seems inconsistent isn't it? Could getGlobalIdentifier mangle the symbol when there isn't the leading \01?

So getGlobalIdentifier removes the leading \01 because it only indicates that the symbol is already mangled, but on the other hand does not apply the platform mangling when there isn't a \01? It seems inconsistent isn't it? Could getGlobalIdentifier mangle the symbol when there isn't the leading \01?

That is the other option. I am not sure if what the impact of that will be, especially on PGO. Let me try that.

llvm/lib/LTO/ThinLTOCodeGenerator.cpp
292

It does. The GUID for "\01_$NAME" is computed from "_$NAME" as indicated in getGlobalIdentifier()

steven_wu updated this revision to Diff 283442.Aug 5 2020, 4:47 PM

Change how the GUID is computed in ThinLTOCodeGenerator.

It is not feasible to change GUID to computed from mangled name since for all the places
that needs to compute GUID, it might not have access to DataLayout and etc. to enable the
mangler.

Instead, do conversion from symbol name to IRName based on the symbol info in the InputFile.

tejohnson added inline comments.Aug 7 2020, 12:45 PM
llvm/test/ThinLTO/X86/mangled_symbol.ll
9

Is the output meant be the exact same in the REGULAR and INTERNALIZE cases? I'm confused as to what this is testing. Comments in the test would help.

steven_wu updated this revision to Diff 284013.Aug 7 2020, 1:26 PM

Address review feedback.

Fix the test and add comments.

steven_wu retitled this revision from [ThinLTO][MachO] Preserve both possible GUIDs from exported list to [ThinLTO][Legacy] Compute PreservedGUID based on IRName in Symtab.Aug 7 2020, 1:29 PM
steven_wu edited the summary of this revision. (Show Details)

It looks ok to me with the big caveat that I don't have a good understanding of MachO mangling and the difference between the symbol name and IRName. @mehdi_amini can you review?

Also added a followup that can potentially deal with the conflict between "_$NAME" and "\01_$NAME". Not sure if that is going to affect some other platforms or not so I put them in a separate review: https://reviews.llvm.org/D85564

This revision is now accepted and ready to land.Aug 26 2020, 9:34 AM
This revision was landed with ongoing or failed builds.Aug 26 2020, 10:15 AM
This revision was automatically updated to reflect the committed changes.
dmajor added a subscriber: dmajor.Aug 28 2020, 11:40 AM

Hi, we're seeing an assertion failure in one of the Firefox LTO bots:

char llvm::StringRef::operator[](size_t) const: Assertion `Index < Length && "Invalid index!"' failed.

I'm still working on finishing the bisection but I have a hunch that it may be this change. Unfortunately I can't seem to get a backtrace out of this particular bot.

As a guess, could it be the operator[] in getGlobalIdentifier at https://github.com/llvm/llvm-project/blob/4cb016cd2d8467c572b2e5c5d34f376ee79e4ac1/llvm/lib/IR/Globals.cpp#L140 ? It looks like there was previously a check for Name.size() > 0 that isn't checked anymore.

Hi, we're seeing an assertion failure in one of the Firefox LTO bots:

char llvm::StringRef::operator[](size_t) const: Assertion `Index < Length && "Invalid index!"' failed.

I'm still working on finishing the bisection but I have a hunch that it may be this change. Unfortunately I can't seem to get a backtrace out of this particular bot.

As a guess, could it be the operator[] in getGlobalIdentifier at https://github.com/llvm/llvm-project/blob/4cb016cd2d8467c572b2e5c5d34f376ee79e4ac1/llvm/lib/IR/Globals.cpp#L140 ? It looks like there was previously a check for Name.size() > 0 that isn't checked anymore.

It would be good to have a test case for this. The only way that can happen is either:

  • Firefox pass a preserved symbol that is empty string and thus that symbol has IRName of empty string
  • Somehow the IRName associated with a non-empty symbol is empty string.

I can attempt a fix but I am not sure if any of the case above can happen.

Yeah, I wish I could give a nice test case, but it's a huge build and the affected configuration needs a lot of setup so I can't even replicate it locally at the moment.

If you have proposed patches I could send them off for a dry run relatively easily though.

steven_wu added a comment.EditedAug 28 2020, 2:33 PM

Yeah, I wish I could give a nice test case, but it's a huge build and the affected configuration needs a lot of setup so I can't even replicate it locally at the moment.

If you have proposed patches I could send them off for a dry run relatively easily though.

Maybe try this?

diff --git a/llvm/lib/LTO/ThinLTOCodeGenerator.cpp b/llvm/lib/LTO/ThinLTOCodeGenerator.cpp
index 4adc9a22a7b2..14dae848b362 100644
--- a/llvm/lib/LTO/ThinLTOCodeGenerator.cpp
+++ b/llvm/lib/LTO/ThinLTOCodeGenerator.cpp
@@ -276,7 +276,7 @@ static void computeGUIDPreservedSymbols(const lto::InputFile &File,
   // Iterate the symbols in the input file and if the input has preserved symbol
   // compute the GUID for the symbol.
   for (const auto &Sym : File.symbols()) {
-    if (PreservedSymbols.count(Sym.getName()))
+    if (PreservedSymbols.count(Sym.getName()) && !Sym.getIRName().empty())
       GUIDs.insert(GlobalValue::getGUID(GlobalValue::getGlobalIdentifier(
           Sym.getIRName(), GlobalValue::ExternalLinkage, "")));
   }

Note IRName should really not be empty. Even this fixes the problem, the root cause might be somewhere else.

Our builds are back to green with that change. I agree that the root cause is likely elsewhere, but getting to the bottom of it would unfortunately take more time than I am able to commit.

Our builds are back to green with that change. I agree that the root cause is likely elsewhere, but getting to the bottom of it would unfortunately take more time than I am able to commit.

I committed in 97866b8de81ce71cf9ae9e50feb450335b0537a. It will still be really helpful to get a reproducer to understand how it happened.