Page MenuHomePhabricator

Add switch to use "source_filename" instead of a hash ID for globally promoted local
ClosedPublic

Authored by void on Jun 29 2022, 4:18 PM.

Details

Summary

During LTO a local promoted to a global gets a unique suffix based on
a hash of the module IR. This means that changes in the local's module
can affect the contents in another module that imported it (because the name
of the imported promoted local is changed, but that doesn't reflect a
real change in the importing module). So any tool that's
validating changes to the importing module will see a superficial change.

Instead of using the module hash, we can use the "source_filename" if it
exists to generate a unique identifier that doesn't change due to LTO
shenanigans.

Diff Detail

Event Timeline

void created this revision.Jun 29 2022, 4:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2022, 4:18 PM
void requested review of this revision.Jun 29 2022, 4:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2022, 4:18 PM

During LTO a local promoted to a global gets a unique suffix based partially on the Module's ID.

The hash is actually a hash of the full module IR, computed when we write out the bitcode before the ThinLTO link, saved in the bitcode file, then read into a table indexed by module Id.

This has a problem, because the Module ID is not the same as the original Module ID.

Can you clarify what the different module IDs you are comparing here?

void added a comment.Jun 30 2022, 2:32 PM

During LTO a local promoted to a global gets a unique suffix based partially on the Module's ID.

The hash is actually a hash of the full module IR, computed when we write out the bitcode before the ThinLTO link, saved in the bitcode file, then read into a table indexed by module Id.

This has a problem, because the Module ID is not the same as the original Module ID.

Can you clarify what the different module IDs you are comparing here?

I may have a misunderstanding of the module ID. I thought it was based on the name of the Module and not a hash of the full IR. However, the result is the same for the tool we're using. Here're the full details:

  • We build the Linux kernel with LTO and PGO enabled.
  • Later on, we need to generate a "live-patch" for the kernel (one that'll be applied to the kernel without rebooting it).
  • The mechanism we use to generate the live-patch does a large number of checks to determine what changed between the pre and post patched object files.
  • Changes in certain sections aren't valid.
    • So if a patched local function "foo" is promoted to a global "foo.llvm.1234567890abcdef", and that function is called from one of the sections where changes are invalid, changing its promoted name (to say "foo.llvm.fedcba0987654321") because of a different Module ID hash causes validation to fail.

So I wanted to avoid this validation muck by not relying upon the Module ID hash, which can apparently change. :-)

void added a comment.Mon, Jul 11, 3:15 PM

Friendly ping?

void added a comment.Sun, Jul 17, 11:47 PM

Another ping. :-)

During LTO a local promoted to a global gets a unique suffix based partially on the Module's ID.

The hash is actually a hash of the full module IR, computed when we write out the bitcode before the ThinLTO link, saved in the bitcode file, then read into a table indexed by module Id.

This has a problem, because the Module ID is not the same as the original Module ID.

Can you clarify what the different module IDs you are comparing here?

I may have a misunderstanding of the module ID. I thought it was based on the name of the Module and not a hash of the full IR. However, the result is the same for the tool we're using. Here're the full details:

  • We build the Linux kernel with LTO and PGO enabled.
  • Later on, we need to generate a "live-patch" for the kernel (one that'll be applied to the kernel without rebooting it).
  • The mechanism we use to generate the live-patch does a large number of checks to determine what changed between the pre and post patched object files.
  • Changes in certain sections aren't valid.
    • So if a patched local function "foo" is promoted to a global "foo.llvm.1234567890abcdef", and that function is called from one of the sections where changes are invalid, changing its promoted name (to say "foo.llvm.fedcba0987654321") because of a different Module ID hash causes validation to fail.

So I wanted to avoid this validation muck by not relying upon the Module ID hash, which can apparently change. :-)

Sorry for the slow reply, I've been out all of July due to vacation followed by illness.

Ok, I understand the issue. The hash changes due to code changes in a module A that is allowed to change, but this has the side effect of superficially changing the callsite in another module B that isn't allowed to change. Couple concerns below.

Fundamentally, LTO is doing whole program and cross-module optimization. How do you ensure that a source change to module A doesn't actually affect the optimization in module B? E.g. the reason you are getting a call to a local foo from module A in module B in the first place is that module B decided to import the caller of foo. If foo or its caller changed in module A the importing decisions into module B could actually change its code generation. We could decide not to import foo's caller or foo itself might get imported. There are other whole program optimizations that could also affect the code generation in B depending on changes in other modules. How will you handle or prevent these cases?

Using the source path this way will fix this particular instance, however, you have to be careful and ensure that all source files are fully and uniquely qualified by their path when they are compiled. The kernel build may do this already, but it is somewhat dangerous to assume this in general so we'd have to document this clearly.

Change also needs a test. But I'm concerned overall as described earlier that this isn't a full fix for potential issues due to LTO changes.

void added a comment.Mon, Jul 18, 11:51 AM

During LTO a local promoted to a global gets a unique suffix based partially on the Module's ID.

The hash is actually a hash of the full module IR, computed when we write out the bitcode before the ThinLTO link, saved in the bitcode file, then read into a table indexed by module Id.

This has a problem, because the Module ID is not the same as the original Module ID.

Can you clarify what the different module IDs you are comparing here?

I may have a misunderstanding of the module ID. I thought it was based on the name of the Module and not a hash of the full IR. However, the result is the same for the tool we're using. Here're the full details:

  • We build the Linux kernel with LTO and PGO enabled.
  • Later on, we need to generate a "live-patch" for the kernel (one that'll be applied to the kernel without rebooting it).
  • The mechanism we use to generate the live-patch does a large number of checks to determine what changed between the pre and post patched object files.
  • Changes in certain sections aren't valid.
    • So if a patched local function "foo" is promoted to a global "foo.llvm.1234567890abcdef", and that function is called from one of the sections where changes are invalid, changing its promoted name (to say "foo.llvm.fedcba0987654321") because of a different Module ID hash causes validation to fail.

So I wanted to avoid this validation muck by not relying upon the Module ID hash, which can apparently change. :-)

Sorry for the slow reply, I've been out all of July due to vacation followed by illness.

Doh! I hope you're feeling better!

Ok, I understand the issue. The hash changes due to code changes in a module A that is allowed to change, but this has the side effect of superficially changing the callsite in another module B that isn't allowed to change. Couple concerns below.

Fundamentally, LTO is doing whole program and cross-module optimization. How do you ensure that a source change to module A doesn't actually affect the optimization in module B? E.g. the reason you are getting a call to a local foo from module A in module B in the first place is that module B decided to import the caller of foo. If foo or its caller changed in module A the importing decisions into module B could actually change its code generation. We could decide not to import foo's caller or foo itself might get imported. There are other whole program optimizations that could also affect the code generation in B depending on changes in other modules. How will you handle or prevent these cases?

The short answer is "we won't be able to prevent this." It's one of the risks of using LTO with live-patching (a.k.a. ksplice, kpatch, klivepatch). The risk itself isn't that it will change other places where it was inlined (that's expected, of course), but that the patch may become much larger, making it more difficult to validate. If the patch remains of reasonable size, then the risk involved in using said patch will be minimal.

Using the source path this way will fix this particular instance, however, you have to be careful and ensure that all source files are fully and uniquely qualified by their path when they are compiled. The kernel build may do this already, but it is somewhat dangerous to assume this in general so we'd have to document this clearly.

Right. That's why I wanted to do this with a command line switch. This is of course a very unique use of LTO, and I don't expect anyone to use this unless they have a uniquely qualified patch for the source_filename. I can make that more explicit in the comments.

Change also needs a test. But I'm concerned overall as described earlier that this isn't a full fix for potential issues due to LTO changes.

I understand. And this change is definitely on the "hacky" side of things, but I'm not sure of a better way to ensure that module A (original) and module A' (patched) will produce the same suffix for a promoted local object.

void updated this revision to Diff 445610.Mon, Jul 18, 1:18 PM

Add comment about flag use.

Ok, I understand the issue. The hash changes due to code changes in a module A that is allowed to change, but this has the side effect of superficially changing the callsite in another module B that isn't allowed to change. Couple concerns below.

Fundamentally, LTO is doing whole program and cross-module optimization. How do you ensure that a source change to module A doesn't actually affect the optimization in module B? E.g. the reason you are getting a call to a local foo from module A in module B in the first place is that module B decided to import the caller of foo. If foo or its caller changed in module A the importing decisions into module B could actually change its code generation. We could decide not to import foo's caller or foo itself might get imported. There are other whole program optimizations that could also affect the code generation in B depending on changes in other modules. How will you handle or prevent these cases?

The short answer is "we won't be able to prevent this." It's one of the risks of using LTO with live-patching (a.k.a. ksplice, kpatch, klivepatch). The risk itself isn't that it will change other places where it was inlined (that's expected, of course), but that the patch may become much larger, making it more difficult to validate. If the patch remains of reasonable size, then the risk involved in using said patch will be minimal.

If you can handle a change in module A causing a change in module B due to different cross-module inlining by making the patch larger (and I guess including both modules), could you handle this case, where there is a rename of a function in module A causing the callsite in module B to be renamed, similarly?

void added a comment.Mon, Jul 18, 1:38 PM

Ok, I understand the issue. The hash changes due to code changes in a module A that is allowed to change, but this has the side effect of superficially changing the callsite in another module B that isn't allowed to change. Couple concerns below.

Fundamentally, LTO is doing whole program and cross-module optimization. How do you ensure that a source change to module A doesn't actually affect the optimization in module B? E.g. the reason you are getting a call to a local foo from module A in module B in the first place is that module B decided to import the caller of foo. If foo or its caller changed in module A the importing decisions into module B could actually change its code generation. We could decide not to import foo's caller or foo itself might get imported. There are other whole program optimizations that could also affect the code generation in B depending on changes in other modules. How will you handle or prevent these cases?

The short answer is "we won't be able to prevent this." It's one of the risks of using LTO with live-patching (a.k.a. ksplice, kpatch, klivepatch). The risk itself isn't that it will change other places where it was inlined (that's expected, of course), but that the patch may become much larger, making it more difficult to validate. If the patch remains of reasonable size, then the risk involved in using said patch will be minimal.

If you can handle a change in module A causing a change in module B due to different cross-module inlining by making the patch larger (and I guess including both modules), could you handle this case, where there is a rename of a function in module A causing the callsite in module B to be renamed, similarly?

That's not quite the situation.

Let's say I have module A and I patch function foo() and get module B. A local function in A, named bar(), is promoted during its compilation to bar.llvm.1234(). After patching, B will also promote bar() to a global, but give it a different suffix, say bar.llvm.abcd(). A separate tool comes along to determine what's changed between the A's ELF file and B's ELF file. It sees that bar.llvm.1234() is now bar.llvm.abcd() and claims that's a change. (In the instance that came up, the changed function was called from a section we forbid changes in.)

Ok, I understand the issue. The hash changes due to code changes in a module A that is allowed to change, but this has the side effect of superficially changing the callsite in another module B that isn't allowed to change. Couple concerns below.

Fundamentally, LTO is doing whole program and cross-module optimization. How do you ensure that a source change to module A doesn't actually affect the optimization in module B? E.g. the reason you are getting a call to a local foo from module A in module B in the first place is that module B decided to import the caller of foo. If foo or its caller changed in module A the importing decisions into module B could actually change its code generation. We could decide not to import foo's caller or foo itself might get imported. There are other whole program optimizations that could also affect the code generation in B depending on changes in other modules. How will you handle or prevent these cases?

The short answer is "we won't be able to prevent this." It's one of the risks of using LTO with live-patching (a.k.a. ksplice, kpatch, klivepatch). The risk itself isn't that it will change other places where it was inlined (that's expected, of course), but that the patch may become much larger, making it more difficult to validate. If the patch remains of reasonable size, then the risk involved in using said patch will be minimal.

If you can handle a change in module A causing a change in module B due to different cross-module inlining by making the patch larger (and I guess including both modules), could you handle this case, where there is a rename of a function in module A causing the callsite in module B to be renamed, similarly?

That's not quite the situation.

Let's say I have module A and I patch function foo() and get module B. A local function in A, named bar(), is promoted during its compilation to bar.llvm.1234(). After patching, B will also promote bar() to a global, but give it a different suffix, say bar.llvm.abcd(). A separate tool comes along to determine what's changed between the A's ELF file and B's ELF file. It sees that bar.llvm.1234() is now bar.llvm.abcd() and claims that's a change. (In the instance that came up, the changed function was called from a section we forbid changes in.)

Sure, I believe I understand what's happening in your situation. But I think you and I are using the terms "module A" and "module B" differently. Here your module B is the new version of module A after patching. I was referring to 2 modules from totally different source files. What I was thinking of is a case where there are 2 source files A.cpp and B.cpp that become modules A and B:

A.cpp
static void bar() {...}
void foo() { bar(); }

B.cpp
void caller() { foo(); }

I was assuming you had a situation where with ThinLTO we had imported and inlined foo into module B, requiring bar to be promoted in module A, say to bar.llvm.1 (and leading to a call to the promoted bar.llvm.1 in module B). Then if something in A.cpp got patched, let's just say bar itself got patched, then you build a new module A object where bar is promoted with a different suffix, say to bar.llvm.2. And now you have changes to both modules A and B.

But you could just as easily change the importing decisions into module B due to a change to bar. E.g. if bar was made smaller, we might decide to import and inline it as well into module B. It sounds like you would handle this situation by including the objects for both module A and B in the patch. I was just wondering if the former case could be handled similarly. I.e. include the new objects for both modules A and B in the patch since there were changes to both as a result of the change in A.cpp.

If the change is in a section that isn't allowed to be patched, then that seems like it could cause an issue for the other example I gave above too (where we decided to go ahead and import/inline bar as well). Is that correct?

void added a comment.EditedMon, Jul 18, 5:38 PM

Ok, I understand the issue. The hash changes due to code changes in a module A that is allowed to change, but this has the side effect of superficially changing the callsite in another module B that isn't allowed to change. Couple concerns below.

Fundamentally, LTO is doing whole program and cross-module optimization. How do you ensure that a source change to module A doesn't actually affect the optimization in module B? E.g. the reason you are getting a call to a local foo from module A in module B in the first place is that module B decided to import the caller of foo. If foo or its caller changed in module A the importing decisions into module B could actually change its code generation. We could decide not to import foo's caller or foo itself might get imported. There are other whole program optimizations that could also affect the code generation in B depending on changes in other modules. How will you handle or prevent these cases?

The short answer is "we won't be able to prevent this." It's one of the risks of using LTO with live-patching (a.k.a. ksplice, kpatch, klivepatch). The risk itself isn't that it will change other places where it was inlined (that's expected, of course), but that the patch may become much larger, making it more difficult to validate. If the patch remains of reasonable size, then the risk involved in using said patch will be minimal.

If you can handle a change in module A causing a change in module B due to different cross-module inlining by making the patch larger (and I guess including both modules), could you handle this case, where there is a rename of a function in module A causing the callsite in module B to be renamed, similarly?

That's not quite the situation.

Let's say I have module A and I patch function foo() and get module B. A local function in A, named bar(), is promoted during its compilation to bar.llvm.1234(). After patching, B will also promote bar() to a global, but give it a different suffix, say bar.llvm.abcd(). A separate tool comes along to determine what's changed between the A's ELF file and B's ELF file. It sees that bar.llvm.1234() is now bar.llvm.abcd() and claims that's a change. (In the instance that came up, the changed function was called from a section we forbid changes in.)

Sure, I believe I understand what's happening in your situation. But I think you and I are using the terms "module A" and "module B" differently. Here your module B is the new version of module A after patching. I was referring to 2 modules from totally different source files. What I was thinking of is a case where there are 2 source files A.cpp and B.cpp that become modules A and B:

A.cpp
static void bar() {...}
void foo() { bar(); }

B.cpp
void caller() { foo(); }

I was assuming you had a situation where with ThinLTO we had imported and inlined foo into module B, requiring bar to be promoted in module A, say to bar.llvm.1 (and leading to a call to the promoted bar.llvm.1 in module B). Then if something in A.cpp got patched, let's just say bar itself got patched, then you build a new module A object where bar is promoted with a different suffix, say to bar.llvm.2. And now you have changes to both modules A and B.

But you could just as easily change the importing decisions into module B due to a change to bar. E.g. if bar was made smaller, we might decide to import and inline it as well into module B. It sounds like you would handle this situation by including the objects for both module A and B in the patch. I was just wondering if the former case could be handled similarly. I.e. include the new objects for both modules A and B in the patch since there were changes to both as a result of the change in A.cpp.

If the change is in a section that isn't allowed to be patched, then that seems like it could cause an issue for the other example I gave above too (where we decided to go ahead and import/inline bar as well). Is that correct?

Yes, that could happen, but it's a separate issue than what I'm trying to solve here (i.e. it's not a compiler issue). :-)

Ok, I understand the issue. The hash changes due to code changes in a module A that is allowed to change, but this has the side effect of superficially changing the callsite in another module B that isn't allowed to change. Couple concerns below.

Fundamentally, LTO is doing whole program and cross-module optimization. How do you ensure that a source change to module A doesn't actually affect the optimization in module B? E.g. the reason you are getting a call to a local foo from module A in module B in the first place is that module B decided to import the caller of foo. If foo or its caller changed in module A the importing decisions into module B could actually change its code generation. We could decide not to import foo's caller or foo itself might get imported. There are other whole program optimizations that could also affect the code generation in B depending on changes in other modules. How will you handle or prevent these cases?

The short answer is "we won't be able to prevent this." It's one of the risks of using LTO with live-patching (a.k.a. ksplice, kpatch, klivepatch). The risk itself isn't that it will change other places where it was inlined (that's expected, of course), but that the patch may become much larger, making it more difficult to validate. If the patch remains of reasonable size, then the risk involved in using said patch will be minimal.

If you can handle a change in module A causing a change in module B due to different cross-module inlining by making the patch larger (and I guess including both modules), could you handle this case, where there is a rename of a function in module A causing the callsite in module B to be renamed, similarly?

That's not quite the situation.

Let's say I have module A and I patch function foo() and get module B. A local function in A, named bar(), is promoted during its compilation to bar.llvm.1234(). After patching, B will also promote bar() to a global, but give it a different suffix, say bar.llvm.abcd(). A separate tool comes along to determine what's changed between the A's ELF file and B's ELF file. It sees that bar.llvm.1234() is now bar.llvm.abcd() and claims that's a change. (In the instance that came up, the changed function was called from a section we forbid changes in.)

Sure, I believe I understand what's happening in your situation. But I think you and I are using the terms "module A" and "module B" differently. Here your module B is the new version of module A after patching. I was referring to 2 modules from totally different source files. What I was thinking of is a case where there are 2 source files A.cpp and B.cpp that become modules A and B:

A.cpp
static void bar() {...}
void foo() { bar(); }

B.cpp
void caller() { foo(); }

I was assuming you had a situation where with ThinLTO we had imported and inlined foo into module B, requiring bar to be promoted in module A, say to bar.llvm.1 (and leading to a call to the promoted bar.llvm.1 in module B). Then if something in A.cpp got patched, let's just say bar itself got patched, then you build a new module A object where bar is promoted with a different suffix, say to bar.llvm.2. And now you have changes to both modules A and B.

But you could just as easily change the importing decisions into module B due to a change to bar. E.g. if bar was made smaller, we might decide to import and inline it as well into module B. It sounds like you would handle this situation by including the objects for both module A and B in the patch. I was just wondering if the former case could be handled similarly. I.e. include the new objects for both modules A and B in the patch since there were changes to both as a result of the change in A.cpp.

If the change is in a section that isn't allowed to be patched, then that seems like it could cause an issue for the other example I gave above too (where we decided to go ahead and import/inline bar as well). Is that correct?

Yes, that could happen, but it's a separate issue than what I'm trying to solve here (i.e. it's not a compiler issue). :-)

To me this doesn't seem like a separate issue, just a different incarnation of the same problem - with LTO optimizations a patch to one source file can have effects on the modules for other source files. Actually let me back up a minute - you mentioned earlier that object files are being patched. What does this mean for the kernel in the context of ThinLTO? Unless you are doing distributed ThinLTO I guess you are patching at the binary level? In which case the module boundaries have been erased, so it is sort of irrelevant I guess whether the other affected code was originally in a different source file or not. Also, what is meant by a section that isn't allowed to be patched? Sorry for all the questions - I'm asking all this because I'm trying to understand in general how you prevent changes to code in one place from affecting code in these unpatchable sections (even without ThinLTO, if they came from the same source file, i.e. you could get different inlining). My concern is that this fix just addresses one small aspect of a potentially larger issue you could run into. Today the change in the unpatchable section is just a different suffix on the callee, but down the line it could be that we decided to inline that callsite completely due to the code changes.

void added a comment.Tue, Jul 19, 3:02 PM

Ok, I understand the issue. The hash changes due to code changes in a module A that is allowed to change, but this has the side effect of superficially changing the callsite in another module B that isn't allowed to change. Couple concerns below.

Fundamentally, LTO is doing whole program and cross-module optimization. How do you ensure that a source change to module A doesn't actually affect the optimization in module B? E.g. the reason you are getting a call to a local foo from module A in module B in the first place is that module B decided to import the caller of foo. If foo or its caller changed in module A the importing decisions into module B could actually change its code generation. We could decide not to import foo's caller or foo itself might get imported. There are other whole program optimizations that could also affect the code generation in B depending on changes in other modules. How will you handle or prevent these cases?

The short answer is "we won't be able to prevent this." It's one of the risks of using LTO with live-patching (a.k.a. ksplice, kpatch, klivepatch). The risk itself isn't that it will change other places where it was inlined (that's expected, of course), but that the patch may become much larger, making it more difficult to validate. If the patch remains of reasonable size, then the risk involved in using said patch will be minimal.

If you can handle a change in module A causing a change in module B due to different cross-module inlining by making the patch larger (and I guess including both modules), could you handle this case, where there is a rename of a function in module A causing the callsite in module B to be renamed, similarly?

That's not quite the situation.

Let's say I have module A and I patch function foo() and get module B. A local function in A, named bar(), is promoted during its compilation to bar.llvm.1234(). After patching, B will also promote bar() to a global, but give it a different suffix, say bar.llvm.abcd(). A separate tool comes along to determine what's changed between the A's ELF file and B's ELF file. It sees that bar.llvm.1234() is now bar.llvm.abcd() and claims that's a change. (In the instance that came up, the changed function was called from a section we forbid changes in.)

Sure, I believe I understand what's happening in your situation. But I think you and I are using the terms "module A" and "module B" differently. Here your module B is the new version of module A after patching. I was referring to 2 modules from totally different source files. What I was thinking of is a case where there are 2 source files A.cpp and B.cpp that become modules A and B:

A.cpp
static void bar() {...}
void foo() { bar(); }

B.cpp
void caller() { foo(); }

I was assuming you had a situation where with ThinLTO we had imported and inlined foo into module B, requiring bar to be promoted in module A, say to bar.llvm.1 (and leading to a call to the promoted bar.llvm.1 in module B). Then if something in A.cpp got patched, let's just say bar itself got patched, then you build a new module A object where bar is promoted with a different suffix, say to bar.llvm.2. And now you have changes to both modules A and B.

But you could just as easily change the importing decisions into module B due to a change to bar. E.g. if bar was made smaller, we might decide to import and inline it as well into module B. It sounds like you would handle this situation by including the objects for both module A and B in the patch. I was just wondering if the former case could be handled similarly. I.e. include the new objects for both modules A and B in the patch since there were changes to both as a result of the change in A.cpp.

If the change is in a section that isn't allowed to be patched, then that seems like it could cause an issue for the other example I gave above too (where we decided to go ahead and import/inline bar as well). Is that correct?

Yes, that could happen, but it's a separate issue than what I'm trying to solve here (i.e. it's not a compiler issue). :-)

To me this doesn't seem like a separate issue, just a different incarnation of the same problem - with LTO optimizations a patch to one source file can have effects on the modules for other source files. Actually let me back up a minute - you mentioned earlier that object files are being patched. What does this mean for the kernel in the context of ThinLTO? Unless you are doing distributed ThinLTO I guess you are patching at the binary level? In which case the module boundaries have been erased, so it is sort of irrelevant I guess whether the other affected code was originally in a different source file or not. Also, what is meant by a section that isn't allowed to be patched? Sorry for all the questions - I'm asking all this because I'm trying to understand in general how you prevent changes to code in one place from affecting code in these unpatchable sections (even without ThinLTO, if they came from the same source file, i.e. you could get different inlining). My concern is that this fix just addresses one small aspect of a potentially larger issue you could run into. Today the change in the unpatchable section is just a different suffix on the callee, but down the line it could be that we decided to inline that callsite completely due to the code changes.

I should have explained better. I sometimes forget that not everyone has the same context that I have when describing a problem. :-) The following goes into detail, some of which you know of course, but I wanted to be thorough.

Here's the situation we're in. We're able to patch the kernel without rebooting the system. It's called "live patching." It's done by generating a module with the patched functions, and modifying the beginning of the original functions to jump to the patched code instead. (Data objects can also be patched, but that's more complicated.) When the live patching module is loaded, the kernel is able to wait until no thread has the patched function in its call stack before suspending the system and applying the patch.

Now a change in a patched function could theoretically allow it to become inlinable when before it wasn't. That will propagate the patch through more functions than expected. The result is that each function it was inlined into will now be considered "patched" and the unpatched functions will be modified to jump to the patched version of the functions, etc. This is the situation I think you're describing, and it's expected, though not ideal. But there may be mitigations we could use to avoid it (like using the noinline attribute on the original patched function).

Some sections shouldn't be patched for whatever reason. Code in those sections could theoretically call one of the patched functions *not* residing in the unpatchable section, but that's okay, as those sections marked as "unpatchable" tend to be things like sections with initialization code (and so isn't useful because that code has already been executed and possible those sections reclaimed from memory).

What I'm trying to accomplish with this however isn't related to what I just described. Instead I'm trying to avoid a situation which is tricking the live patching tools into thinking that a change exists where it doesn't actually exist (i.e. the change is purely cosmetic). In particular, when a local object is promoted to a global, it uses the module ID hash. But that hash changes between the patched and unpatched versions, causing the object names to change. Normally this won't be an issue. However, the live patching tools look at the changed name and determines that it's part of the patch we're applying.

So let's say that in some section that we don't allow a change in, the original code looks something like this in the ELF file:

  section .text
bar.llvm.1:
  ...

  section .init.text
foo:
  ...
  call    bar.llvm.1
  ...

In the patched code, it's changed into:

  section .text
bar.llvm.2:
  ...

  section .init.text
foo:
  ...
  call    bar.llvm.2
  ...

Now, whether bar() was part of the actual patch or not isn't really an issue. It's the change of bar()'s name that causes the live patching tool to see foo() and say, "Oops! We don't support changing foo(), because it's in the .init.text section." When it really wasn't changed. (It's slightly confusing, because we could change bar() with the patch, but we're not changing the generated code for foo(), which is all the live patching tool cares about in this context.) To solve this issue, I want the compiler to use a suffix for a promoted object that won't change based on changes in a Module.

If it comes to pass that changing bar() causes it to be inlinable into foo(), then foo()'s generated code does change and we'll have to address that in some other fashion (probably with noinline). But that's changing the patch rather than what the compiler's doing.

The following goes into detail, some of which you know of course, but I wanted to be thorough.

Thanks for going into all the details. I had understood most of this before but it helped connect the dots. This was the main part I was missing:

If it comes to pass that changing bar() causes it to be inlinable into foo(), then foo()'s generated code does change and we'll have to address that in some other fashion (probably with noinline). But that's changing the patch rather than what the compiler's doing.

So essentially, if changes percolate into other functions in sections that you are allowed to patch, you just expand the patch to include those. But if they percolate into unpatchable sections, other than the type of changes being handled here you will need to modify the code manually to prevent changes.

Unfortunately, I think that you will have limited success with that approach. E.g. by marking a function 'noinline' you will potentially block other inlines that happed for that function in the original version, creating other differences that could affect your unpatchable section (e.g. let's say bar was previously already inlined in another callsite in foo or some other function in the unpatchable section, now you have a change at that callsite from blocking all inlining of bar). There are other whole program optimizations that will be harder to adjust with manual source changes too.

I'm ok with approving this patch though if it helps you make forward progress. Can you add a test? One other suggestion in the code below too.

I do think you will run into other issues down the line that are related but not so easy to work around. One thing I can think of is to prevent unpatchable sections from taking part in ThinLTO in the first place - which today is only possible afaik if they are in separate files that can be compiled to native code without ThinLTO. It gives a performance cost but would avoid the potential patching issues due to changes of this nature.

llvm/lib/Transforms/Utils/FunctionImportUtils.cpp
114

Can you replace all of this with something simpler that replaces all non-alphanumeric chars, see example at https://github.com/llvm/llvm-project/blob/d71128d97df9986e98c3d2b56e5629f5e36e4c45/llvm/lib/ObjCopy/ELF/ELFObject.cpp#L1287-L1289

void updated this revision to Diff 449096.Mon, Aug 1, 12:36 PM

Improve string substitution logic.

TBD: testcase.

void added a comment.Mon, Aug 1, 4:54 PM

I'm ok with approving this patch though if it helps you make forward progress. Can you add a test? One other suggestion in the code below too.

I'm not sure how to create a testcase for this option. There doesn't seem to be anyway to trigger this code via the tools...

I'm ok with approving this patch though if it helps you make forward progress. Can you add a test? One other suggestion in the code below too.

I'm not sure how to create a testcase for this option. There doesn't seem to be anyway to trigger this code via the tools...

You should be able to use llvm-lto or llvm-lto2. For example, see this test case that uses llvm-lto to test promotion: llvm/test/ThinLTO/X86/local_name_conflict.ll. There are some other examples in that directory with llvm-lto2 that look for promotions (they usually don't check the hash, but rather for the ".llvm." name to confirm promotion happened). The 2 tools use different LTO apis but the code you modified is exercised with both, so either tool will work for your purposes. You should probably create a simple example that results in importing of a local symbol reference and compare the hashes with and without your new option.

void updated this revision to Diff 449410.Tue, Aug 2, 1:35 PM

Add testcase.

void added a comment.Tue, Aug 2, 1:35 PM

I'm ok with approving this patch though if it helps you make forward progress. Can you add a test? One other suggestion in the code below too.

I'm not sure how to create a testcase for this option. There doesn't seem to be anyway to trigger this code via the tools...

You should be able to use llvm-lto or llvm-lto2. For example, see this test case that uses llvm-lto to test promotion: llvm/test/ThinLTO/X86/local_name_conflict.ll. There are some other examples in that directory with llvm-lto2 that look for promotions (they usually don't check the hash, but rather for the ".llvm." name to confirm promotion happened). The 2 tools use different LTO apis but the code you modified is exercised with both, so either tool will work for your purposes. You should probably create a simple example that results in importing of a local symbol reference and compare the hashes with and without your new option.

Ah! Thank you. Done.

void updated this revision to Diff 449770.Wed, Aug 3, 1:28 PM

Remove debugging statements.

Can you simplify the test (see comment there) and also update the summary to reflect that we usually use a module hash (not a Module ID hash) - what's changing in your case is the hash of the module's bitcode.

llvm/test/ThinLTO/X86/promote-local-name.ll
21

I don't think there is a need for this testing here (this is behavior tested elsewhere). You should be able to remove promote-local-name-1.ll completely from this test.

void updated this revision to Diff 449800.Wed, Aug 3, 3:22 PM

Remove unnecessary test check and improve commit message.

void retitled this revision from Add switch to use "source_file_name" instead of Module ID for globally promoted local to Add switch to use "source_filename" instead of a hash ID for globally promoted local.Wed, Aug 3, 3:22 PM
void edited the summary of this revision. (Show Details)
void added inline comments.
llvm/test/ThinLTO/X86/promote-local-name.ll
21

I removed the extra check, but when I tried to remove one of the .ll files it fails. (I low-key don't know what the llvm-lto tool with all of its options does.) Can I just leave it as is?

I think the summary could give a clearer description of the problem (I'd also use "module hash" vs "hash ID" to be more explicit as to what the hash is). I think instead of:

During LTO a local promoted to a global gets a unique suffix based
partially on a hash ID. This has a problem, because the hash ID is not
the same as the original hash ID. So any tool that's validating changes
will see a change when there isn't one (i.e. the names of the promoted
locals are changed, but it doesn't reflect a real change).

Maybe something like this?:

During LTO a local promoted to a global gets a unique suffix based on
a hash of the module IR. This means that changes in the local's module
can affect the contents in another module that imported it (because the name
of the imported promoted local is changed, but that doesn't reflect a
real change in the importing module). So any tool that's
validating changes to the importing module will see a superficial change.

Instead of using the module hash, ...

llvm/test/ThinLTO/X86/promote-local-name.ll
21

Hmm, that's odd. If you remove that .ll file and remove references to %t1.bc I'm not sure why you would be getting failures. Looks like no symbols from that file are used elsewhere.

void updated this revision to Diff 449825.Wed, Aug 3, 4:29 PM

Update test case and commit message.

void edited the summary of this revision. (Show Details)Wed, Aug 3, 4:29 PM

I think the summary could give a clearer description of the problem (I'd also use "module hash" vs "hash ID" to be more explicit as to what the hash is). I think instead of:

During LTO a local promoted to a global gets a unique suffix based
partially on a hash ID. This has a problem, because the hash ID is not
the same as the original hash ID. So any tool that's validating changes
will see a change when there isn't one (i.e. the names of the promoted
locals are changed, but it doesn't reflect a real change).

Maybe something like this?:

During LTO a local promoted to a global gets a unique suffix based on
a hash of the module IR. This means that changes in the local's module
can affect the contents in another module that imported it (because the name
of the imported promoted local is changed, but that doesn't reflect a
real change in the importing module). So any tool that's
validating changes to the importing module will see a superficial change.

Instead of using the module hash, ...

Looks good to me. :-)

llvm/test/ThinLTO/X86/promote-local-name.ll
21

Ah! I was doing it wrong. Fixed.

This revision is now accepted and ready to land.Wed, Aug 3, 4:36 PM
This revision was landed with ongoing or failed builds.Wed, Aug 3, 4:42 PM
This revision was automatically updated to reflect the committed changes.