Create an internal alias with the original name for static functions
that are renamed in promoteInternals to avoid breaking inline
assembly references to them.
Details
- Reviewers
nickdesaulniers pcc tejohnson kees eugenis - Commits
- rG7ce1c4da7726: ThinLTO: Fix inline assembly references to static functions with CFI
rG700d07f8ce6f: ThinLTO: Fix inline assembly references to static functions with CFI
rG8e3b5cb39eef: ThinLTO: Fix inline assembly references to static functions with CFI
rGe3d24b45b8f8: ThinLTO: Fix inline assembly references to static functions with CFI
rG4474958d3a97: ThinLTO: Fix inline assembly references to static functions with CFI
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp | ||
---|---|---|
93 | Note that I'm adding the alias to llvm.compiler.used because it's otherwise removed during optimization. Is there are better way to accomplish this? |
Looks like this breaks check-llvm on mac, see http://45.33.8.238/mac/32814/summary.html
Please take a look and revert for now if it takes a while to fix.
llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp | ||
---|---|---|
71–93 | Can you avoid making a copy of the OldName by doing the appendToCompilerUsed BEFORE making the dangling reference via ExportGV.setName(NewName);? |
llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp | ||
---|---|---|
71–93 | No, I have to rename the existing function before I can create an alias with the same name, and as ExportGV.setName() invalidates Name, I need to create a copy first. |
Hi, this caused compiler crash: "Assertion `materialized_use_empty() && "Uses remain when a value is destroyed!"'" on chromium build https://ci.chromium.org/ui/p/chromium/builders/try/linux-official/151/overview.
Here is the error message:
While deleting: void ()* % Use still stuck around after Def is destroyed:i8* bitcast (void ()* <badref> to i8*) While deleting: void ()* % Use still stuck around after Def is destroyed:i8* bitcast (voidld.lld: /usr/local/google/home/zequanwu/llvm-project/llvm/lib/IR/Value.cpp:103: llvm::Value::~Value(): Assertion `materialized_use_empty() && "Uses remain when a value is destroyed!"' failed. ()* <badref> to i8*)
Moved the alias creation to module level inline assembly to avoid issues with LowerTypeTestsModule, based on pcc's suggestion.
Change LGTM, but I don't understand why the following tests are modified:
- llvm/test/ThinLTO/X86/devirt2.ll
- llvm/test/Transforms/ThinLTOBitcodeWriter/split-internal2.ll
- llvm/test/Transforms/ThinLTOBitcodeWriter/split-vfunc-internal.ll
This is needed to fix two missing symbol resolution errors that are caused by the aliases we added.
- llvm/test/Transforms/ThinLTOBitcodeWriter/split-internal2.ll
- llvm/test/Transforms/ThinLTOBitcodeWriter/split-vfunc-internal.ll
And for these, we need to specify a target triple to use module inline assembly. According to pcc, there shouldn't be a real-world situation where the triple is missing, but these two tests don't currently specify one.
I'm curious if this will lead to breakages with LTO in general? I suppose not, since it's llvm-lto2 that needs the explicit list of symbols that can be linked against.
- llvm/test/Transforms/ThinLTOBitcodeWriter/split-internal2.ll
- llvm/test/Transforms/ThinLTOBitcodeWriter/split-vfunc-internal.ll
And for these, we need to specify a target triple to use module inline assembly. According to pcc, there shouldn't be a real-world situation where the triple is missing, but these two tests don't currently specify one.
Neither of these tests use module inline assembly though, AFAICT?
No, this won't break LTO.
- llvm/test/Transforms/ThinLTOBitcodeWriter/split-internal2.ll
- llvm/test/Transforms/ThinLTOBitcodeWriter/split-vfunc-internal.ll
And for these, we need to specify a target triple to use module inline assembly. According to pcc, there shouldn't be a real-world situation where the triple is missing, but these two tests don't currently specify one.
Neither of these tests use module inline assembly though, AFAICT?
The tests don't, but we add module inline assembly to create the aliases, which means that IR with type metadata will need a target triple for ThinLTO.
The tests that now specify a target triple also need ; REQUIRES: x86-registered-target or they will obviously fail. I'll upload another revision.
This patch broke the sanitizer-windows bot: https://lab.llvm.org/buildbot/#/builders/127/builds/14257
Failed Tests (4): cfi-devirt-lld-thinlto-x86_64 :: anon-namespace.cpp cfi-devirt-lld-thinlto-x86_64 :: simple-pass.cpp cfi-standalone-lld-thinlto-x86_64 :: anon-namespace.cpp cfi-standalone-lld-thinlto-x86_64 :: simple-pass.cpp
Please revert or fix.
llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp | ||
---|---|---|
90 | Is there more information about "promotion aliases with x86_64-pc-windows-msvc" from D106392? Can we quote these only for msvc target triples? Can we add a comment about the quoting be necessary for those targets? It's still not clear to me how tests using explicit -linux-gnu triples could fail on -mscv hosts. |
llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp | ||
---|---|---|
90 | Also, I think the quoting hurts the readability of the generated asm. Maybe that doesn't matter for LTO, but I'd be curious if we could do such escaping only when necessary? Perhaps that's only when targeting -msvc triples? |
llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp | ||
---|---|---|
90 | I was going to say, the proper way to do this is what MCSymbol::print does: That code doesn't seem exhaustive, but at least it escapes quotes. We can't call MC from here due to library layering. The LLVM IR readability is poor, but inline asm in IR is already hard to read. I wouldn't worry about that, I'd mainly worry about the output of -S. Quoting always seems fine to me, I guess. |
llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp | ||
---|---|---|
90 |
Yes, the -msvc targets use Visual C++ compatible name mangling, which requires quotes when referred to in inline assembly. Here's a trivial reproducer: $ cat test.cpp static void a(void) {} void* b(void) { return (void *)a; } $ clang -flto=thin -fvisibility=default -fsanitize=cfi -c test.cpp $ clang -flto=thin -fvisibility=default -fsanitize=cfi -target x86_64-pc-windows-msvc -c test.cpp Either SourceMgr should be available UNREACHABLE executed at llvm-project/llvm/lib/MC/MCContext.cpp:913! ... Here's the inline assembly generated when we compile the above example for the -msvc target: .set ?a@@YAXXZ,?a@@YAXXZ.d7b56b39ccc53bc7515ae1b2533f1e3d
I assume we could limit this only to -msvc targets, but I feel like that would unnecessarily complicate the code as there's no harm in quoting the names always.
These tests don't specify a -linux-gnu triple. They are C++ and end up using the default target, which in this case is -msvc. |
llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp | ||
---|---|---|
90 |
Is it actually possible to have function names that contain quotes? If so, I suppose we need to do something similar here and escape any quotes in the names.
The promotion happens only when we write bitcode, so the aliases won't yet exist in the -S output. |
As we only care about fixing inline assembly references, mangled names are
not that important in the first place. This version skips any functions
that have unusual characters in their names that would otherwise require
quotes, which includes any functions with MSVC compatible name mangling.
llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp | ||
---|---|---|
45 | Something like? return any_of(Name, [](const char &C) { return isAlnum(C) || C == '_' || C == '.'; } or maybe we need !none_of(...)? (not sure if characters of a string can be enumerated this way) |
llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp | ||
---|---|---|
45 | Since we want to ensure all the characters in the string match the predicate, I believe it would make sense to use all_of() instead. However, I don't see the point of introducing additional complexity to such a trivial function to shave off a couple of lines of code. While it might not actually matter here, using all_of() also seems to generate ~5x as many instructions to execute: https://godbolt.org/z/Pndfxj6rM If you feel like the current version is too long, I can drop one line by changing the loop to use: if (!isAlnum(C) && C != '_' && C != '.') return false; I initially wanted to keep the test identical to MCAsmInfo::isAcceptableChar() to make it easier to see it actually matches, but since it's no longer identical, I suppose that doesn't matter. Thoughts? |
I think the new approach of skipping non C-ish identifier names is reasonable. Looks good to me, but wait for the more active reviewers to stamp it.
llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp | ||
---|---|---|
45 |
Hard to argue with that. Might be nice to add some test coverage of this code with an mscv target triple though. |
Mentioning it here in case others run into the same thing: We bisected a 7x (!) binary size regression to this. Details at https://bugs.chromium.org/p/chromium/issues/detail?id=1261715
What about MCAsmInfoXCOFF::isAcceptableChar? I guess we could be targeting a XCOFF object file format with LTO?