This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Prevent importing of "llvm.used" values
ClosedPublic

Authored by tejohnson on Apr 11 2016, 12:13 PM.

Details

Summary

This patch prevents importing from (and therefore exporting from) any
module with a "llvm.used" local value. Local values need to be promoted
and renamed when importing, and their presense on the llvm.used variable
indicates that there are opaque uses that won't see the rename. One such
example is a use in inline assembly.

See also the discussion at:
http://lists.llvm.org/pipermail/llvm-dev/2016-April/098047.html

As part of this, move collectUsedGlobalVariables out of Transforms/Utils
and into IR/Module so that it can be used more widely. There are several
other places in LLVM that used copies of this code that can be cleaned
up as a follow on NFC patch.

Diff Detail

Repository
rL LLVM

Event Timeline

tejohnson updated this revision to Diff 53303.Apr 11 2016, 12:13 PM
tejohnson retitled this revision from to [ThinLTO] Prevent importing of "llvm.used" values.
tejohnson updated this object.
tejohnson added a reviewer: mehdi_amini.
tejohnson added subscribers: llvm-commits, pcc.
mehdi_amini added inline comments.Apr 11 2016, 12:26 PM
include/llvm/IR/Module.h
757 ↗(On Diff #53303)

Why was this code-motion needed?

lib/Analysis/ModuleSummaryAnalysis.cpp
133 ↗(On Diff #53303)

Is it a "big hammer" fix?
I'd expect to prevent renaming/importing this specific GV, nothing more.
I see here the same kind of issue as D18298.

lib/Bitcode/Writer/BitcodeWriter.cpp
2892 ↗(On Diff #53303)

Looks like you can go ahead and commit this right now

Weird - I didn't get an email with the review comments. I just noticed them because I opened the patch in Phab to ping it.

include/llvm/IR/Module.h
757 ↗(On Diff #53303)

So that it can be used in lib/Analysis/ModuleSummaryAnalysis.cpp

lib/Analysis/ModuleSummaryAnalysis.cpp
133 ↗(On Diff #53303)

Yes, this is a big hammer fix, which will eventually need something more refined. The problem with simply trying to prevent renaming/importing of these specific GVs is the same as the one I noted in D18298: you also need to prevent importing of any references to these GVs (not just the GV themselves), since it is the importing of any reference to them that provokes the need to promote and rename. It is doable, but this is a simple workaround for what should be an uncommon case, especially given that we are currently doing aggressive promotion/renaming in some of the ThinLTO clients. It provides an immediate fix for a correctness problem that can be refined later.

lib/Bitcode/Writer/BitcodeWriter.cpp
2892 ↗(On Diff #53303)

I can do that.

mehdi_amini edited edge metadata.Apr 18 2016, 1:30 PM

Also, could we promote and rename while keeping a private alias with the old name for these globals?

include/llvm/IR/Module.h
757 ↗(On Diff #53303)

Oh and you can't introduce a dependency from Analysis to TransformsUtil

In D18986#404324, @joker.eph wrote:

Also, could we promote and rename while keeping a private alias with the old name for these globals?

No, because we could import inline assembly from two different modules, each containing a reference to a local 'myvar', and the two imported uses within the imported inline assembly need to each see the correct corresponding promoted global.

include/llvm/IR/Module.h
757 ↗(On Diff #53303)

Right, didn't spell that out but that is the root of it. Also, as I noted in the patch description, there are several other places across LLVM that have copies of the same functionality and could use this same function if it is moved somewhere accessible.

In D18986#404324, @joker.eph wrote:

Also, could we promote and rename while keeping a private alias with the old name for these globals?

No, because we could import inline assembly from two different modules, each containing a reference to a local 'myvar', and the two imported uses within the imported inline assembly need to each see the correct corresponding promoted global.

There are two things:

  1. "llvm.used" symbols (that are referred *from* inline assembly)
  2. Functions that contains inline assembly themselves.

I was referring to the first one while you are mentioning the second ones right?

In D18986#404415, @joker.eph wrote:
In D18986#404324, @joker.eph wrote:

Also, could we promote and rename while keeping a private alias with the old name for these globals?

No, because we could import inline assembly from two different modules, each containing a reference to a local 'myvar', and the two imported uses within the imported inline assembly need to each see the correct corresponding promoted global.

There are two things:

  1. "llvm.used" symbols (that are referred *from* inline assembly)
  2. Functions that contains inline assembly themselves.

I was referring to the first one while you are mentioning the second ones right?

I'm talking about the first one, with regards to the symbol that is renamed/promoted. The functions containing inline assembly matter just in the sense that it is through importing them that you provoke the renaming of the symbols used in the inline assembly (and referenced on the llvm.used).

E.g. if you look at the test case I added here, let's say there was a second file inlineasm2.ll that looks like:

@myvar = internal constant i8 1, align 1
@llvm.used = appending global [1 x i8*] [i8* @myvar], section "llvm.metadata"

define void @bar(i64* %v) #0 {
entry:

%v.addr = alloca i64*, align 8
store i64* %v, i64** %v.addr, align 8
%0 = load i64*, i64** %v.addr, align 8
call void asm sideeffect "movzbl     myvar(%rip), %eax\0A\09movq %rax, $0\0A\09", "=*m,~{eax},~{dirflag},~{fpsr},~{flags}"(i64* %0) #1
ret void

}

I.e. this is the same as Input/inlineasm.ll, except that we have a function bar instead of foo (inline assembly is the same, and each uses a local variable @myvar). If we import bar() into the main module as well (assume main() is changed to call both foo() and bar()), then we need to promote/rename both copies of myvar. The imported inline assembly copies need to refer to the correct renamed/promoted copy of myvar. It isn't possible to set up a private alias in main()'s module from myvar to both of the renamed/promoted myvars. Peter had some ideas (see the link in the patch summary) for some new assembly directives to handle this, but that is a longer-term fix that is quite a bit more extensive.

In D18986#404415, @joker.eph wrote:
In D18986#404324, @joker.eph wrote:

Also, could we promote and rename while keeping a private alias with the old name for these globals?

No, because we could import inline assembly from two different modules, each containing a reference to a local 'myvar', and the two imported uses within the imported inline assembly need to each see the correct corresponding promoted global.

There are two things:

  1. "llvm.used" symbols (that are referred *from* inline assembly)
  2. Functions that contains inline assembly themselves.

I was referring to the first one while you are mentioning the second ones right?

I'm talking about the first one, with regards to the symbol that is renamed/promoted. The functions containing inline assembly matter just in the sense that it is through importing them that you provoke the renaming of the symbols used in the inline assembly (and referenced on the llvm.used).

E.g. if you look at the test case I added here, let's say there was a second file inlineasm2.ll that looks like:

@myvar = internal constant i8 1, align 1
@llvm.used = appending global [1 x i8*] [i8* @myvar], section "llvm.metadata"

define void @bar(i64* %v) #0 {
entry:

%v.addr = alloca i64*, align 8
store i64* %v, i64** %v.addr, align 8
%0 = load i64*, i64** %v.addr, align 8
call void asm sideeffect "movzbl     myvar(%rip), %eax\0A\09movq %rax, $0\0A\09", "=*m,~{eax},~{dirflag},~{fpsr},~{flags}"(i64* %0) #1
ret void

}

I.e. this is the same as Input/inlineasm.ll, except that we have a function bar instead of foo (inline assembly is the same, and each uses a local variable @myvar). If we import bar() into the main module as well (assume main() is changed to call both foo() and bar()), then we need to promote/rename both copies of myvar. The imported inline assembly copies need to refer to the correct renamed/promoted copy of myvar. It isn't possible to set up a private alias in main()'s module from myvar to both of the renamed/promoted myvars. Peter had some ideas (see the link in the patch summary) for some new assembly directives to handle this, but that is a longer-term fix that is quite a bit more extensive.

If you import bar() it mean you are not considering my 1) but the 2) above (this is a function that *contains* inline asm and not a function that is *referenced* by inline asm).

In D18986#404453, @joker.eph wrote:
In D18986#404415, @joker.eph wrote:
In D18986#404324, @joker.eph wrote:

Also, could we promote and rename while keeping a private alias with the old name for these globals?

No, because we could import inline assembly from two different modules, each containing a reference to a local 'myvar', and the two imported uses within the imported inline assembly need to each see the correct corresponding promoted global.

There are two things:

  1. "llvm.used" symbols (that are referred *from* inline assembly)
  2. Functions that contains inline assembly themselves.

I was referring to the first one while you are mentioning the second ones right?

I'm talking about the first one, with regards to the symbol that is renamed/promoted. The functions containing inline assembly matter just in the sense that it is through importing them that you provoke the renaming of the symbols used in the inline assembly (and referenced on the llvm.used).

E.g. if you look at the test case I added here, let's say there was a second file inlineasm2.ll that looks like:

@myvar = internal constant i8 1, align 1
@llvm.used = appending global [1 x i8*] [i8* @myvar], section "llvm.metadata"

define void @bar(i64* %v) #0 {
entry:

%v.addr = alloca i64*, align 8
store i64* %v, i64** %v.addr, align 8
%0 = load i64*, i64** %v.addr, align 8
call void asm sideeffect "movzbl     myvar(%rip), %eax\0A\09movq %rax, $0\0A\09", "=*m,~{eax},~{dirflag},~{fpsr},~{flags}"(i64* %0) #1
ret void

}

I.e. this is the same as Input/inlineasm.ll, except that we have a function bar instead of foo (inline assembly is the same, and each uses a local variable @myvar). If we import bar() into the main module as well (assume main() is changed to call both foo() and bar()), then we need to promote/rename both copies of myvar. The imported inline assembly copies need to refer to the correct renamed/promoted copy of myvar. It isn't possible to set up a private alias in main()'s module from myvar to both of the renamed/promoted myvars. Peter had some ideas (see the link in the patch summary) for some new assembly directives to handle this, but that is a longer-term fix that is quite a bit more extensive.

If you import bar() it mean you are not considering my 1) but the 2) above (this is a function that *contains* inline asm and not a function that is *referenced* by inline asm).

I'm not following. When you said:

  1. "llvm.used" symbols (that are referred *from* inline assembly)

I assumed you meant the variable like @myvar which is on the llvm.used, and is referenced in the inline asm:

@llvm.used = appending global [1 x i8*] [i8* @myvar], section "llvm.metadata"

...

call void asm sideeffect "movzbl     myvar(%rip), ...

(In the example there is no function referenced by inline asm, just a variable.)

It is the llvm.used symbol myvar which cannot be correctly handled with a private alias, unless I am missing something. Maybe you could show what you are proposing using this example (i.e. the example in the patch augmented with an import to the additional module containing bar() shown above).

mehdi_amini added inline comments.Apr 18 2016, 3:57 PM
include/llvm/IR/Module.h
752 ↗(On Diff #53303)

(Remove brief).

lib/Transforms/Utils/FunctionImportUtils.cpp
219 ↗(On Diff #53303)

I feel we should check GlobalsToImport here, the client has the ability to ask for promotion for specific symbols right?

If you import bar() it mean you are not considering my 1) but the 2) above (this is a function that *contains* inline asm and not a function that is *referenced* by inline asm).

I'm not following. When you said:

  1. "llvm.used" symbols (that are referred *from* inline assembly)

I assumed you meant the variable like @myvar which is on the llvm.used, and is referenced in the inline asm:

@llvm.used = appending global [1 x i8*] [i8* @myvar], section "llvm.metadata"

...

call void asm sideeffect "movzbl     myvar(%rip), ...

(In the example there is no function referenced by inline asm, just a variable.)

It is the llvm.used symbol myvar which cannot be correctly handled with a private alias, unless I am missing something. Maybe you could show what you are proposing using this example (i.e. the example in the patch augmented with an import to the additional module containing bar() shown above).

Let say myvar was instead myfunc and was referenced from the inline asm. And then let say you are trying to import myfunc into another module and you need to promote it. You should be able to rename myfunc() to myfunc.1234() and create a local alias myfunc = alias myfunc.1234().
Makes sense?

In D18986#404576, @joker.eph wrote:

If you import bar() it mean you are not considering my 1) but the 2) above (this is a function that *contains* inline asm and not a function that is *referenced* by inline asm).

I'm not following. When you said:

  1. "llvm.used" symbols (that are referred *from* inline assembly)

I assumed you meant the variable like @myvar which is on the llvm.used, and is referenced in the inline asm:

@llvm.used = appending global [1 x i8*] [i8* @myvar], section "llvm.metadata"

...

call void asm sideeffect "movzbl     myvar(%rip), ...

(In the example there is no function referenced by inline asm, just a variable.)

It is the llvm.used symbol myvar which cannot be correctly handled with a private alias, unless I am missing something. Maybe you could show what you are proposing using this example (i.e. the example in the patch augmented with an import to the additional module containing bar() shown above).

Let say myvar was instead myfunc and was referenced from the inline asm. And then let say you are trying to import myfunc into another module and you need to promote it. You should be able to rename myfunc() to myfunc.1234() and create a local alias myfunc = alias myfunc.1234().
Makes sense?

I'm not sure local functions are fundamentally different. Note if we import a local function we don't really need to promote it (we can use the imported copy). But let's say a reference to local function myfunc is imported from module1.o, it will need to be promoted/renamed. We could have a different local function myfunc that we import a reference to from module2.o, also requiring a promotion/rename. You can't create a single myfunc alias to both of them.

include/llvm/IR/Module.h
752 ↗(On Diff #53303)

Will do.

lib/Transforms/Utils/FunctionImportUtils.cpp
219 ↗(On Diff #53303)

But as I mentioned earlier, the problem comes into play with local symbols that are not in the GlobalsToImport set (but are rather referenced by a function in the GlobalsToImport set, requiring the promotion).

In D18986#404576, @joker.eph wrote:

If you import bar() it mean you are not considering my 1) but the 2) above (this is a function that *contains* inline asm and not a function that is *referenced* by inline asm).

I'm not following. When you said:

  1. "llvm.used" symbols (that are referred *from* inline assembly)

I assumed you meant the variable like @myvar which is on the llvm.used, and is referenced in the inline asm:

@llvm.used = appending global [1 x i8*] [i8* @myvar], section "llvm.metadata"

...

call void asm sideeffect "movzbl     myvar(%rip), ...

(In the example there is no function referenced by inline asm, just a variable.)

It is the llvm.used symbol myvar which cannot be correctly handled with a private alias, unless I am missing something. Maybe you could show what you are proposing using this example (i.e. the example in the patch augmented with an import to the additional module containing bar() shown above).

Let say myvar was instead myfunc and was referenced from the inline asm. And then let say you are trying to import myfunc into another module and you need to promote it. You should be able to rename myfunc() to myfunc.1234() and create a local alias myfunc = alias myfunc.1234().
Makes sense?

I'm not sure local functions are fundamentally different. Note if we import a local function we don't really need to promote it (we can use the imported copy). But let's say a reference to local function myfunc is imported from module1.o,

Yes this is what I was after.

it will need to be promoted/renamed. We could have a different local function myfunc that we import a reference to from module2.o, also requiring a promotion/rename. You can't create a single myfunc alias to both of them.

I'm lost... The aliases lies in the *source* module and are internal. Let me examplify:

ModuleA: defines foo(), which calls barB() and barC()
ModuleB: defines barB() which references myLocalFunc() (which is also in llvm.used)
ModuleC: defines barC() which references myLocalFunc() (which is also in llvm.used)

Now to import barB() and barC() into ModuleA, you transform this way:

ModuleA: defines foo(), import barB() and barC(), these are referencing myLocalFunc.1234() and myLocalFunc.4567() (promoted in each source module B and C)
ModuleB: defines barB() and myLocalFunc.1234() (promoted). And with barB() that references this promoted version. We also create an internal alias myLocalFunc() = myLocalFunc.1234() (the alias is also in llvm.used)
ModuleC: defines barC() and myLocalFunc.4567() (promoted). And with barC() that references this promoted version. We also create an internal alias myLocalFunc() = myLocalFunc.4567() (the alias is also in llvm.used)

mehdi_amini added inline comments.Apr 18 2016, 4:57 PM
lib/Transforms/Utils/FunctionImportUtils.cpp
219 ↗(On Diff #53303)

GlobalsToImport should be renamed into GlobalsToPromote indeed, this is how it is intended to be used right? You already gets wrong result if you don't fill it with referenced symbols or is there some magic I missed? (I fill the "export lists" with the referenced symbols for this purpose).

In D18986#404590, @joker.eph wrote:
In D18986#404576, @joker.eph wrote:

If you import bar() it mean you are not considering my 1) but the 2) above (this is a function that *contains* inline asm and not a function that is *referenced* by inline asm).

I'm not following. When you said:

  1. "llvm.used" symbols (that are referred *from* inline assembly)

I assumed you meant the variable like @myvar which is on the llvm.used, and is referenced in the inline asm:

@llvm.used = appending global [1 x i8*] [i8* @myvar], section "llvm.metadata"

...

call void asm sideeffect "movzbl     myvar(%rip), ...

(In the example there is no function referenced by inline asm, just a variable.)

It is the llvm.used symbol myvar which cannot be correctly handled with a private alias, unless I am missing something. Maybe you could show what you are proposing using this example (i.e. the example in the patch augmented with an import to the additional module containing bar() shown above).

Let say myvar was instead myfunc and was referenced from the inline asm. And then let say you are trying to import myfunc into another module and you need to promote it. You should be able to rename myfunc() to myfunc.1234() and create a local alias myfunc = alias myfunc.1234().
Makes sense?

I'm not sure local functions are fundamentally different. Note if we import a local function we don't really need to promote it (we can use the imported copy). But let's say a reference to local function myfunc is imported from module1.o,

Yes this is what I was after.

it will need to be promoted/renamed. We could have a different local function myfunc that we import a reference to from module2.o, also requiring a promotion/rename. You can't create a single myfunc alias to both of them.

I'm lost... The aliases lies in the *source* module and are internal. Let me examplify:

ModuleA: defines foo(), which calls barB() and barC()
ModuleB: defines barB() which references myLocalFunc() (which is also in llvm.used)
ModuleC: defines barC() which references myLocalFunc() (which is also in llvm.used)

Now to import barB() and barC() into ModuleA, you transform this way:

ModuleA: defines foo(), import barB() and barC(), these are referencing myLocalFunc.1234() and myLocalFunc.4567() (promoted in each source module B and C)
ModuleB: defines barB() and myLocalFunc.1234() (promoted). And with barB() that references this promoted version. We also create an internal alias myLocalFunc() = myLocalFunc.1234() (the alias is also in llvm.used)
ModuleC: defines barC() and myLocalFunc.4567() (promoted). And with barC() that references this promoted version. We also create an internal alias myLocalFunc() = myLocalFunc.4567() (the alias is also in llvm.used)

Ah, now I understand our disconnect. What you describe above will work in ModuleB and ModuleC after promotion. The issue is in ModuleA, which now has references to myLocalFunc.1234 and myLocalFunc.4567, if those are referenced in inline assembly within the imported barB and barC. (Or using the example from this patch, the renamed myvar references in the imported inline assembly)

Yes this was my tentative distinction 1) and 2) above, i.e. promoting a function present in llvm.used vs importing a function that contains inline ASM.
We should be able to always promote AFAICT, but not import the one that contains Inline ASM if it has a reference to a local function.

In D18986#404640, @joker.eph wrote:

Yes this was my tentative distinction 1) and 2) above, i.e. promoting a function present in llvm.used vs importing a function that contains inline ASM.

They are related: 1) (local value present on the llvm.used) means that 2) (importing a function with inline asm) has to be conservative as the inline asm may contain a reference to that local value.

We should be able to always promote AFAICT, but not import the one that contains Inline ASM if it has a reference to a local function.

Except that you don't know which inline asm references which local function or variable from the llvm.used. For instance, the source code used to generate the example in this patch is:

static const unsigned char attribute((used)) attribute ((aligned (1))) myvar = 1;
void foo(unsigned long int *v) {

__asm__ volatile("movzbl     myvar(%%rip), %%eax\n\t"
                 "movq %%rax, %0\n\t"
                     : "=*m" (*v)
  :
  : "%eax"
  );

}

The inline asm string "movzbl myvar(..." is an opaque blob that the compiler doesn't AFAIK do any analysis on to decide what is a global value - the reason why the myvar ends up in llvm.used is only because of the attribute((used)) on the myvar def, not because the compiler detects the reference within an inline assembly string.

But what I could do, which is a bit smaller of a hammer, is to prevent importing of any function that has any inline asm in it, if the module also contains an llvm.used with local values. However, I thought that inline assembly would be rare enough to make this simpler workaround reasonable, especially since a longer term solution will be needed for the general issue on the LTO side (where this workaround won't help). I also have some concerns about getting it right if the decisions are more fine grained: e.g. if we decide to prevent importing for just those functions with inline assembly, do we suppress the importing by not emitting their summaries (might cause some complications if their linkage needs to be adjusted e.g. if they themselves are local and referenced by a different function that is imported thus requiring their promotion), or should they have summaries but mark them as no import somehow (in that case do their summaries need to conservatively reference all values in the llvm.used sets?), etc...haven't entirely thought through all the issues with implementing that approach.

I am not entirely sure what other cases will use llvm.used besides inline assembly, are there other cases where we need to be conservative in the presence of one, because we won't have visibility into where the uses actually are or be able to rename them?

In D18986#404640, @joker.eph wrote:

Yes this was my tentative distinction 1) and 2) above, i.e. promoting a function present in llvm.used vs importing a function that contains inline ASM.

They are related: 1) (local value present on the llvm.used) means that 2) (importing a function with inline asm) has to be conservative as the inline asm may contain a reference to that local value.

I think functions in 1) can be totally decoupled, because as shown in my example previously you can always promote/import these regardless of functions in 2), because you can use a private alias.

We should be able to always promote AFAICT, but not import the one that contains Inline ASM if it has a reference to a local function.

Except that you don't know which inline asm references which local function or variable from the llvm.used.

How does it matter? I thought we cleared it with my example earlier: you can still always promote with a private alias, as long as you never import a function with inline ASM it should be fine.

For instance, the source code used to generate the example in this patch is:

static const unsigned char attribute((used)) attribute ((aligned (1))) myvar = 1;
void foo(unsigned long int *v) {

__asm__ volatile("movzbl     myvar(%%rip), %%eax\n\t"
                 "movq %%rax, %0\n\t"
                     : "=*m" (*v)
  :
  : "%eax"
  );

}

The inline asm string "movzbl myvar(..." is an opaque blob that the compiler doesn't AFAIK do any analysis on to decide what is a global value - the reason why the myvar ends up in llvm.used is only because of the attribute((used)) on the myvar def, not because the compiler detects the reference within an inline assembly string.

See CollectAsmUndefinedRefs in D19103 for the analysis performed to parse the assembly and update llvm.used.
But that's not relevant to the point I was making (see below).

But what I could do, which is a bit smaller of a hammer, is to prevent importing of any function that has any inline asm in it, if the module also contains an llvm.used with local values.

That was my point from the beginning: there is no issues with importing the function in llvm.used, it only an issue for the functions that defines the inline ASM.

However, I thought that inline assembly would be rare enough to make this simpler workaround reasonable, especially since a longer term solution will be needed for the general issue on the LTO side (where this workaround won't help). I also have some concerns about getting it right if the decisions are more fine grained: e.g. if we decide to prevent importing for just those functions with inline assembly, do we suppress the importing by not emitting their summaries (might cause some complications if their linkage needs to be adjusted e.g. if they themselves are local and referenced by a different function that is imported thus requiring their promotion), or should they have summaries but mark them as no import somehow (in that case do their summaries need to conservatively reference all values in the llvm.used sets?), etc...haven't entirely thought through all the issues with implementing that approach.

The big hammer seems reasonable to me on the very short term.
Preventing the import of these function can be done using a "flag" in the summary (part of the flags we already talked about, i.e. optnone, optsize, etc.).

I am not entirely sure what other cases will use llvm.used besides inline assembly, are there other cases where we need to be conservative in the presence of one, because we won't have visibility into where the uses actually are or be able to rename them?

I don't know other cases for llvm.user. We also have llvm.compiler_used that needs to be preserved and can't be renamed, but they shouldn't be private either (?). I haven't thought much about these.

In D18986#404818, @joker.eph wrote:
In D18986#404640, @joker.eph wrote:

Yes this was my tentative distinction 1) and 2) above, i.e. promoting a function present in llvm.used vs importing a function that contains inline ASM.

They are related: 1) (local value present on the llvm.used) means that 2) (importing a function with inline asm) has to be conservative as the inline asm may contain a reference to that local value.

I think functions in 1) can be totally decoupled, because as shown in my example previously you can always promote/import these regardless of functions in 2), because you can use a private alias.

The private alias only works in the exporting module, not when there are conflicts on the importing side requiring renaming, which is what is shown in my follow on example.

We should be able to always promote AFAICT, but not import the one that contains Inline ASM if it has a reference to a local function.

Except that you don't know which inline asm references which local function or variable from the llvm.used.

How does it matter? I thought we cleared it with my example earlier: you can still always promote with a private alias, as long as you never import a function with inline ASM it should be fine.

I think we are saying essentially the same thing here - the issue is on the importing side, you can't import functions with inline assembly (although it really should only be an issue if there is an llvm.used indicating that there are uses within the inline assembly).

For instance, the source code used to generate the example in this patch is:

static const unsigned char attribute((used)) attribute ((aligned (1))) myvar = 1;
void foo(unsigned long int *v) {

__asm__ volatile("movzbl     myvar(%%rip), %%eax\n\t"
                 "movq %%rax, %0\n\t"
                     : "=*m" (*v)
  :
  : "%eax"
  );

}

The inline asm string "movzbl myvar(..." is an opaque blob that the compiler doesn't AFAIK do any analysis on to decide what is a global value - the reason why the myvar ends up in llvm.used is only because of the attribute((used)) on the myvar def, not because the compiler detects the reference within an inline assembly string.

See CollectAsmUndefinedRefs in D19103 for the analysis performed to parse the assembly and update llvm.used.
But that's not relevant to the point I was making (see below).

Interesting, I didn't realize there was some parsing/introspection of the inline assembly beyond the register assignment.

But what I could do, which is a bit smaller of a hammer, is to prevent importing of any function that has any inline asm in it, if the module also contains an llvm.used with local values.

That was my point from the beginning: there is no issues with importing the function in llvm.used, it only an issue for the functions that defines the inline ASM.

However, I thought that inline assembly would be rare enough to make this simpler workaround reasonable, especially since a longer term solution will be needed for the general issue on the LTO side (where this workaround won't help). I also have some concerns about getting it right if the decisions are more fine grained: e.g. if we decide to prevent importing for just those functions with inline assembly, do we suppress the importing by not emitting their summaries (might cause some complications if their linkage needs to be adjusted e.g. if they themselves are local and referenced by a different function that is imported thus requiring their promotion), or should they have summaries but mark them as no import somehow (in that case do their summaries need to conservatively reference all values in the llvm.used sets?), etc...haven't entirely thought through all the issues with implementing that approach.

The big hammer seems reasonable to me on the very short term.
Preventing the import of these function can be done using a "flag" in the summary (part of the flags we already talked about, i.e. optnone, optsize, etc.).

I am not entirely sure what other cases will use llvm.used besides inline assembly, are there other cases where we need to be conservative in the presence of one, because we won't have visibility into where the uses actually are or be able to rename them?

I don't know other cases for llvm.user. We also have llvm.compiler_used that needs to be preserved and can't be renamed, but they shouldn't be private either (?). I haven't thought much about these.

In D18986#404818, @joker.eph wrote:
In D18986#404640, @joker.eph wrote:

[...]

The inline asm string "movzbl myvar(..." is an opaque blob that the compiler doesn't AFAIK do any analysis on to decide what is a global value - the reason why the myvar ends up in llvm.used is only because of the attribute((used)) on the myvar def, not because the compiler detects the reference within an inline assembly string.

See CollectAsmUndefinedRefs in D19103 for the analysis performed to parse the assembly and update llvm.used.
But that's not relevant to the point I was making (see below).

Interesting, I didn't realize there was some parsing/introspection of the inline assembly beyond the register assignment.

I looked at this functionality and it doesn't handle this type of inline assembly. Confirmed by walking through the similar handling in IRObjectFile on which CollectAsmUndefinedRefs was modeled. It only handles "Module-level assembly" as described here: http://llvm.org/docs/LangRef.html#module-level-inline-assembly.

So if we instead want to use a finer grained approach here, I think we need to conservatively reject importing for any function containing inline assembly if there is any llvm.used.

[...]

The big hammer seems reasonable to me on the very short term.

This is my preference given what I think needs to be done to get this to work on a finer grained level vs how uncommon I think the inline assembly is in practice. It can be refined if we find important cases from modules with inline assembly.

Preventing the import of these function can be done using a "flag" in the summary (part of the flags we already talked about, i.e. optnone, optsize, etc.).

To get the finer-grained solution to work, we need to add this indication to the summary, and prevent importing of those functions. Then on the exporting side we need to add an alias to the old name for anything in the llvm.used set (even if we don't export any functions with inline assembly, a reference in another function could be exported, requiring the promotion).

lib/Transforms/Utils/FunctionImportUtils.cpp
219 ↗(On Diff #53303)

No, the supplied GlobalsToImport is only used to distinguish what is being imported as a definition vs only a declaration - it is used to select the proper linkage type (e.g. available_externally vs external). It isn't used to drive promotion. And in fact FunctionImporter::importFunctions only places the imported GVs in this set. And the set isn't used when running this on the exporting module (see paragraph after next).

When importing, we promote any referenced local value, which makes sense since those are exactly the ones we need to promote since they are now being referenced from another module (at least until we implement an optimization that decides when it is better to import a local as a definition and leave it local, but that is orthogonal to this issue).

On the exporting side, which is where the code added here by the patch is handling, we don't pass in a GlobalsToImport set, since it isn't relevant. In order to minimize promotions on the exporting side, we need to know what references are being exported, as you mention here. That should be passed in as a separate argument. At that point, we could intersect the values on the llvm.used set with those in the supplied exported references set as a sanity check that none of the llvm.used values ended up exported.

Note that if we want to do a finer grained approach to handling inline asm, preventing importing of just those functions containing the inline assembly, we will either have to implement the support to pass in the exported symbols here, or add an alias to the original name for anything in llvm.used. In the former case the reference graph would have to be augmented to add references from all functions with inline assembly to all the llvm.used values (see my other note here on why CollectAsmUndefinedRefs can't be used to get a more accurate indication on which inline assembly uses which values).

mehdi_amini accepted this revision.Apr 19 2016, 2:11 PM
mehdi_amini edited edge metadata.

I looked at this functionality and it doesn't handle this type of inline assembly. Confirmed by walking through the similar handling in IRObjectFile on which CollectAsmUndefinedRefs was modeled. It only handles "Module-level assembly" as described here: http://llvm.org/docs/LangRef.html#module-level-inline-assembly.

OK

So if we instead want to use a finer grained approach here, I think we need to conservatively reject importing for any function containing inline assembly if there is any llvm.used.

If this is what I am after.

[...]

The big hammer seems reasonable to me on the very short term.

This is my preference given what I think needs to be done to get this to work on a finer grained level vs how uncommon I think the inline assembly is in practice. It can be refined if we find important cases from modules with inline assembly.

Sure.
I just noticed that I didn't approve the rev, I thought I did yesterday *before* all this discussion. To be clear I was not saying we need the fine grain now, I was just clarifying that it is "big hammer" and what can be done later.

Preventing the import of these function can be done using a "flag" in the summary (part of the flags we already talked about, i.e. optnone, optsize, etc.).

To get the finer-grained solution to work, we need to add this indication to the summary, and prevent importing of those functions. Then on the exporting side we need to add an alias to the old name for anything in the llvm.used set (even if we don't export any functions with inline assembly, a reference in another function could be exported, requiring the promotion).

Yes, this is what I had in mind.

What needs to be added also is that there is an issue to import *into* a Module with llvm-used in some cases because of name collision:

Module A: internal foo() (<- referenced in llvm.used
Module B: void foo()

And now you want to import foo() from B into A?
Another case to forbid...

This revision is now accepted and ready to land.Apr 19 2016, 2:11 PM
In D18986#405716, @joker.eph wrote:

I just noticed that I didn't approve the rev, I thought I did yesterday *before* all this discussion. To be clear I was not saying we need the fine grain now, I was just clarifying that it is "big hammer" and what can be done later.

Great. thanks.

Preventing the import of these function can be done using a "flag" in the summary (part of the flags we already talked about, i.e. optnone, optsize, etc.).

To get the finer-grained solution to work, we need to add this indication to the summary, and prevent importing of those functions. Then on the exporting side we need to add an alias to the old name for anything in the llvm.used set (even if we don't export any functions with inline assembly, a reference in another function could be exported, requiring the promotion).

Yes, this is what I had in mind.

What needs to be added also is that there is an issue to import *into* a Module with llvm-used in some cases because of name collision:

Module A: internal foo() (<- referenced in llvm.used
Module B: void foo()

And now you want to import foo() from B into A?
Another case to forbid...

The current patch prevents importing into A because A will not have a summary index. But with a finer grain approach we will need to explicitly prevent this.

I will add a FIXME comment to the code before I submit to summarize the changes needed for a finer grain fix.

mehdi_amini added inline comments.Apr 19 2016, 6:42 PM
lib/Transforms/Utils/FunctionImportUtils.cpp
219 ↗(On Diff #53303)

Reminds me that I started splitting FunctionImportGlobalProcessing into two part: one to perform the linkage changes required before importing, and the other doing the promotion.

This revision was automatically updated to reflect the committed changes.