This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Prevent exporting of locals used/defined in module level asm
ClosedPublic

Authored by tejohnson on Oct 31 2016, 6:07 AM.

Details

Summary

This patch uses the same approach added for inline asm in r285513 to
similarly prevent promotion/renaming of locals used or defined in module
level asm.

All global values defined in normal IR and used in module level asm
should be included on either the llvm.used or llvm.compiler.used global.
The former were already being flagged as NoRename in the summary, and
I've simply added llvm.compiler.used values to this handling.

Module level asm may also contain defs of values. We need to prevent
export of any refs to local values defined in module level asm (e.g. a
ref in normal IR), since that also requires renaming/promotion of the
local. To do that, the summary index builder looks at all values in the
module level asm string that are not marked Weak or Global, which is
exactly the set of locals that are defined. A summary is created for
each of these local defs and flagged as NoRename.

This required adding handling to the BitcodeWriter to look at GV
declarations to see if they have a summary (rather than skipping them
all).

Finally, added an assert to IRObjectFile::CollectAsmUndefinedRefs to
ensure that an MCAsmParser is available, otherwise the module asm parse
would silently fail. Initialized the asm parser in the opt tool for use
in testing this fix.

Fixes PR30610.

Diff Detail

Repository
rL LLVM

Event Timeline

tejohnson updated this revision to Diff 76391.Oct 31 2016, 6:07 AM
tejohnson retitled this revision from to [ThinLTO] Prevent exporting of locals used/defined in module level asm.
tejohnson updated this object.
tejohnson added a reviewer: mehdi_amini.
mehdi_amini edited edge metadata.Nov 1 2016, 1:13 PM

All global values defined in normal IR and used in module level asm should be included on either the llvm.used or llvm.compiler.used global.

Can you clarify where this requirement comes from?

All global values defined in normal IR and used in module level asm should be included on either the llvm.used or llvm.compiler.used global.

Can you clarify where this requirement comes from?

For inline asm, according to http://llvm.org/docs/LangRef.html#the-llvm-used-global-variable:
"This is commonly used to represent references from inline asms and other things the compiler cannot “see”, and corresponds to “attribute((used))” in GNU C."

And for module level asm, I got this info from you =). It was via the conversation we had on the Chromium build issue that discussed this fix:

On Thu, Oct 27, 2016 at 8:08 AM, Mehdi Amini <mehdi.amini@apple.com> wrote:

On Oct 27, 2016, at 6:51 AM, Teresa Johnson <tejohnson@google.com> wrote:

On Tue, Oct 25, 2016 at 8:31 PM, Mehdi Amini <mehdi.amini@apple.com> wrote:

On Oct 25, 2016, at 11:06 AM, Teresa Johnson <tejohnson@google.com> wrote:

I have a draft set of fixes in https://reviews.llvm.org/D25951 (WIP, not for review). I've done some testing using the regression tests and one I added for module asm, but it needs cleanup and more testing. Feel free to give it a try.

I'm going to split it up and send various parts for review separately - it was a bit more involved than I expected.

Also, I have attempted to handle the case where a local value is used in module level asm. E.g.:


@b = internal global i32 1, align 4

module asm "\09.text"
module asm "\09.type\09foo,@function"
module asm "foo:"
module asm "\09movl b, %eax"
module asm "\09ret "
module asm "\09.size\09foo, .-foo"
module asm “"

The frontend is responsible to add @b to llvm.compiler_used.

Ok, adding the llvm.compiler.used containing @b fixes the opt -O2 build of this case. This will simplify the patch as I don't need to look for uses within module asm blocks (just defs). I can simply flag all values within both llvm.used and llvm.compiler.used as non-renamable.

For inline asm we assume the uses are added to llvm.used, not llvm.compiler.used - I guess this is because they are user inserted? I do see in CGObjCMac.cpp where there are a bunch of calls to add to the compiler used, but that seems to be when CGObjCMac is synthesizing module assembly. In the Chromium case the module level asm is generated from file scope asm() calls - I assume it is therefore the user's responsibility to mark any uses within those with “attribute((used))”, just as one would for inline asm (asm added within a function), right?

Right.


Mehdi

The attribute((used)) goes beyond "what the compiler can see", it also prevents linker dead-strip.

The point for references from inline ASM only applies to static globals (chromium build). It does not apply to non-static I believe, and this is why are currently parsing the module inline ASM at LTO time and updating the llvm.compiler_used, because this is not something the FE is required to do.

The attribute((used)) goes beyond "what the compiler can see", it also prevents linker dead-strip.

The point for references from inline ASM only applies to static globals (chromium build). It does not apply to non-static I believe, and this is why are currently parsing the module inline ASM at LTO time and updating the llvm.compiler_used, because this is not something the FE is required to do.

Then the assumption works for this fix since it is only looking for static globals (i.e. local linkage), right? I can just update the description and possibly the code comment if that is the case

Then the assumption works for this fix since it is only looking for static globals (i.e. local linkage), right? I can just update the description and possibly the code comment if that is the case

The attribute((used)) will makes use of the @llvm.used, which you already handled before this patch right?

include/llvm/IR/ModuleSummaryIndex.h
426 ↗(On Diff #76391)

I'm not convinced by this API: the extra parameter corresponds exactly to what is usually exposed as a separate API (like std::vector::get vs std::vector::operator[] for example)

lib/Analysis/ModuleSummaryAnalysis.cpp
208 ↗(On Diff #76391)

Can you clarify what is specific here that wouldn't make us add here the "compiler_used" variables?

243 ↗(On Diff #76391)

s/locak/local/

276 ↗(On Diff #76391)

Early exit

Then the assumption works for this fix since it is only looking for static globals (i.e. local linkage), right? I can just update the description and possibly the code comment if that is the case

The attribute((used)) will makes use of the @llvm.used, which you already handled before this patch right?

Right for llvm.used, but I was referring to static (local linkage) values in the llvm.compiler.used which is added in this patch, since you mentioned: "It does not apply to non-static I believe, and this is why are currently parsing the module inline ASM at LTO time and updating the llvm.compiler_used, because this is not something the FE is required to do."

I read from that that any statics would have been placed by the FE in the llvm.compiler.used set, or is that not a correct interpretation? Going back to the email discussion I included in my earlier response, it seems that it must be true that "The frontend is responsible to add [local ref] @b to llvm.compiler_used", in which case the handling in this current patch that looks for locals in the llvm.compiler.used is correct, right?

include/llvm/IR/ModuleSummaryIndex.h
426 ↗(On Diff #76391)

I removed this change and modified the use sites to instead use the existing findGlobalValueSummaryList.

lib/Analysis/ModuleSummaryAnalysis.cpp
208 ↗(On Diff #76391)

I updated the comment above this block to address this hopefully.

243 ↗(On Diff #76391)

fixed

276 ↗(On Diff #76391)

fixed

tejohnson updated this revision to Diff 77203.Nov 8 2016, 9:07 AM
tejohnson edited edge metadata.

Addressed review comments.

Then the assumption works for this fix since it is only looking for static globals (i.e. local linkage), right? I can just update the description and possibly the code comment if that is the case

The attribute((used)) will makes use of the @llvm.used, which you already handled before this patch right?

Right for llvm.used, but I was referring to static (local linkage) values in the llvm.compiler.used which is added in this patch, since you mentioned: "It does not apply to non-static I believe, and this is why are currently parsing the module inline ASM at LTO time and updating the llvm.compiler_used, because this is not something the FE is required to do."

I read from that that any statics would have been placed by the FE in the llvm.compiler.used set, or is that not a correct interpretation? Going back to the email discussion I included in my earlier response, it seems that it must be true that "The frontend is responsible to add [local ref] @b to llvm.compiler_used", in which case the handling in this current patch that looks for locals in the llvm.compiler.used is correct, right?

Then the assumption works for this fix since it is only looking for static globals (i.e. local linkage), right? I can just update the description and possibly the code comment if that is the case

The attribute((used)) will makes use of the @llvm.used, which you already handled before this patch right?

Right for llvm.used, but I was referring to static (local linkage) values in the llvm.compiler.used which is added in this patch, since you mentioned: "It does not apply to non-static I believe, and this is why are currently parsing the module inline ASM at LTO time and updating the llvm.compiler_used, because this is not something the FE is required to do."

I read from that that any statics would have been placed by the FE in the llvm.compiler.used set, or is that not a correct interpretation? Going back to the email discussion I included in my earlier response, it seems that it must be true that "The frontend is responsible to add [local ref] @b to llvm.compiler_used", in which case the handling in this current patch that looks for locals in the llvm.compiler.used is correct, right?

To be more precise: the FE can put uses from inline ASM to static symbol in either llvm.used or llvm.compiler_used (i.e. the latter should be "enough").
(for clang, this is done by the user who usually in C++ uses __attribute__((used)), which translates into llvm.used)

lib/Analysis/ModuleSummaryAnalysis.cpp
256 ↗(On Diff #77203)

To be more clear with the above, this is the snippet I asked about why is it done here and not above around line 212.

208 ↗(On Diff #76391)

My question was the opposite: why *only* llvm.used here, and not *also* llvm.compiler_used.

You wrote here "We collect them here because ...." and the same reasons should apply to "compiler_used".

mehdi_amini added inline comments.Nov 8 2016, 9:26 AM
test/ThinLTO/X86/module_asm2.ll
73 ↗(On Diff #77203)

Can you add a comment to this file explaining what you're testing, maybe with each of the CHECK groups.

Then the assumption works for this fix since it is only looking for static globals (i.e. local linkage), right? I can just update the description and possibly the code comment if that is the case

The attribute((used)) will makes use of the @llvm.used, which you already handled before this patch right?

Right for llvm.used, but I was referring to static (local linkage) values in the llvm.compiler.used which is added in this patch, since you mentioned: "It does not apply to non-static I believe, and this is why are currently parsing the module inline ASM at LTO time and updating the llvm.compiler_used, because this is not something the FE is required to do."

I read from that that any statics would have been placed by the FE in the llvm.compiler.used set, or is that not a correct interpretation? Going back to the email discussion I included in my earlier response, it seems that it must be true that "The frontend is responsible to add [local ref] @b to llvm.compiler_used", in which case the handling in this current patch that looks for locals in the llvm.compiler.used is correct, right?

To be more precise: the FE can put uses from inline ASM to static symbol in either llvm.used or llvm.compiler_used (i.e. the latter should be "enough").

Only module-level asm, not "inline asm calls", which are opaque to the compiler and require a user attribute like:

(for clang, this is done by the user who usually in C++ uses __attribute__((used)), which translates into llvm.used)

Let me know if I have addressed your concerns with my other responses below.

lib/Analysis/ModuleSummaryAnalysis.cpp
256 ↗(On Diff #77203)

Because as I noted in the comment added around the earlier handling, only the llvm.used set locals may be referenced by functions that make inline asm calls, and must be marked explicitly by the user via attribute((used)), which end up in llvm.used only. We only need to collect the llvm.compiler.used here to mark those as NoRename.

208 ↗(On Diff #76391)

I thought I did answer that: "Uses in inline asm must be marked in the source with “attribute((used))”, which causes them to be included in the llvm.used set." As you noted also, this causes those to go into the llvm.used (not llvm.compiler.used). I will make an explicit note here about these not being in the llvm.compiler.used set, and therefore not needing to affect functions that make inline asm calls.

test/ThinLTO/X86/module_asm2.ll
73 ↗(On Diff #77203)

Will do.

Then the assumption works for this fix since it is only looking for static globals (i.e. local linkage), right? I can just update the description and possibly the code comment if that is the case

The attribute((used)) will makes use of the @llvm.used, which you already handled before this patch right?

Right for llvm.used, but I was referring to static (local linkage) values in the llvm.compiler.used which is added in this patch, since you mentioned: "It does not apply to non-static I believe, and this is why are currently parsing the module inline ASM at LTO time and updating the llvm.compiler_used, because this is not something the FE is required to do."

I read from that that any statics would have been placed by the FE in the llvm.compiler.used set, or is that not a correct interpretation? Going back to the email discussion I included in my earlier response, it seems that it must be true that "The frontend is responsible to add [local ref] @b to llvm.compiler_used", in which case the handling in this current patch that looks for locals in the llvm.compiler.used is correct, right?

To be more precise: the FE can put uses from inline ASM to static symbol in either llvm.used or llvm.compiler_used (i.e. the latter should be "enough").

Only module-level asm, not "inline asm calls", which are opaque to the compiler and require a user attribute like:

This is not clear to my why?
It seems perfectly legit to me that a FE would generate inline ASM calls to a local function and add it to llvm.compiler_used and not llvm.used.

tejohnson added inline comments.Nov 8 2016, 9:38 AM
lib/Analysis/ModuleSummaryAnalysis.cpp
256 ↗(On Diff #77203)

only the llvm.used set locals may be referenced by functions that make inline asm calls

That should be "only the llvm.used set locals may be referenced from within inline asm calls, and need to be handled by the inline-asm-calling functions above."

tejohnson updated this revision to Diff 77227.Nov 8 2016, 11:50 AM

Address more review comments

It seems perfectly legit to me that a FE would generate inline ASM calls to a local function and add it to llvm.compiler_used and not llvm.used.

As discussed on IRC, I now have the same handling for inline asm and module level asm.

LGTM, thanks.
(Ideally we should have a test that verifies that the encoding in the bitcode itself is as expected with llvm-bcanalyzer, but such tests are more difficult to maintain).

lib/Analysis/ModuleSummaryAnalysis.cpp
286 ↗(On Diff #77227)

Use isa<> instead of dyn_cast<> if you don't use the result.

mehdi_amini accepted this revision.Nov 8 2016, 1:15 PM
mehdi_amini edited edge metadata.

(See also one nit inline)

This revision is now accepted and ready to land.Nov 8 2016, 1:15 PM

(See also one nit inline)

Got it. Thanks!

lib/Analysis/ModuleSummaryAnalysis.cpp
286 ↗(On Diff #77227)

Fixed.

This revision was automatically updated to reflect the committed changes.