Page MenuHomePhabricator

[Bitcode] materialize Functions early when BlockAddress taken
ClosedPublic

Authored by nickdesaulniers on Mar 1 2022, 3:16 PM.

Details

Summary

IRLinker builds a work list of functions to materialize, then moves them
from a source module to a destination module one at a time.

This is a problem for blockaddress Constants, since they need not refer
to the function they are used in; IPSCCP is quite good at sinking these
constants deep into other functions when passed as arguments.

This would lead to curious errors during LTO:

ld.lld: error: Never resolved function from blockaddress ...

based on the ordering of function definitions in IR.

The problem was that IRLinker would basically do:

for function f in worklist:
  materialize f
  splice f from source module to destination module

in one pass, with Functions being lazily added to the running worklist.
This confuses BitcodeReader, which cannot disambiguate whether a
blockaddress is referring to a function which has not yet been parsed
("materialized") or is simply empty because its body was spliced out.
This causes BitcodeReader to insert Functions into its BasicBlockFwdRefs
list incorrectly, as it will never re-materialize an already
materialized (but spliced out) function.

Because of the possibility that blockaddress Constants may appear in
Functions other than the ones they reference, this patch adds a new
bitcode function code FUNC_CODE_BLOCKADDR_USERS that is a simple list of
Functions that contain BlockAddress Constants that refer back to this
Function, rather then the Function they are scoped in. We then
materialize those functions when materializing f from the example loop
above. This might over-materialize Functions should the user of
BitcodeReader ultimately decide not to link those Functions, but we can
at least now we can avoid this ordering related issue with blockaddresses.

Fixes: https://github.com/llvm/llvm-project/issues/52787
Fixes: https://github.com/ClangBuiltLinux/linux/issues/1215

Diff Detail

Unit TestsFailed

TimeTest
60,090 msx64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vloxseg.c
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/llvm-project/build/lib/clang/15.0.0/include -nostdsysteminc -triple riscv64 -target-feature +f -target-feature +d -target-feature +zfh -target-feature +v -disable-O0-optnone -emit-llvm /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics/vloxseg.c -o - | /var/lib/buildkite-agent/builds/llvm-project/build/bin/opt -S -mem2reg | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck --check-prefix=CHECK-RV64 /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics/vloxseg.c
60,040 msx64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vluxseg.c
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/llvm-project/build/lib/clang/15.0.0/include -nostdsysteminc -triple riscv64 -target-feature +f -target-feature +d -target-feature +zfh -target-feature +v -disable-O0-optnone -emit-llvm /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics/vluxseg.c -o - | /var/lib/buildkite-agent/builds/llvm-project/build/bin/opt -S -mem2reg | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck --check-prefix=CHECK-RV64 /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics/vluxseg.c
60,040 msx64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-overloaded::vloxseg.c
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/llvm-project/build/lib/clang/15.0.0/include -nostdsysteminc -triple riscv64 -target-feature +f -target-feature +d -target-feature +v -disable-O0-optnone -emit-llvm /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics-overloaded/vloxseg.c -o - | /var/lib/buildkite-agent/builds/llvm-project/build/bin/opt -S -mem2reg | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck --check-prefix=CHECK-RV64 /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics-overloaded/vloxseg.c
60,050 msx64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-overloaded::vluxseg.c
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/llvm-project/build/lib/clang/15.0.0/include -nostdsysteminc -triple riscv64 -target-feature +f -target-feature +d -target-feature +v -disable-O0-optnone -emit-llvm /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics-overloaded/vluxseg.c -o - | /var/lib/buildkite-agent/builds/llvm-project/build/bin/opt -S -mem2reg | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck --check-prefix=CHECK-RV64 /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics-overloaded/vluxseg.c

Event Timeline

nickdesaulniers created this revision.Mar 1 2022, 3:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2022, 3:16 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
nickdesaulniers requested review of this revision.Mar 1 2022, 3:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2022, 3:16 PM
nickdesaulniers edited the summary of this revision. (Show Details)Mar 1 2022, 3:19 PM
  • update comment in test

Thanks for working on this; it's pretty hairy.

This confuses BitcodeReader, which cannot disambiguate whether a
blockaddress is referring to a function which has not yet been parsed
("materialized") or is simply empty because its body was spliced out.

Another way to solve this would be to mark the function somehow (either intrusively, somehow, or by adding it to a set / map). Can you contrast with that approach?

llvm/lib/Linker/IRMover.cpp
1522–1526

Can materialize() add things to the Worklist? If so, I don't think the ranged-based for is safe.

I also worry if it's sufficient; can the worklist loop below add new functions to the worklist, which haven't been materialized ahead of time?

Thanks for working on this; it's pretty hairy.

This confuses BitcodeReader, which cannot disambiguate whether a
blockaddress is referring to a function which has not yet been parsed
("materialized") or is simply empty because its body was spliced out.

Another way to solve this would be to mark the function somehow (either intrusively, somehow, or by adding it to a set / map). Can you contrast with that approach?

BitcodeReader could maintain a set of all functions it's imported, so that it could do the right thing here (rather than assume a function being empty means not yet parsed);
https://github.com/llvm/llvm-project/blob/1e16272ba793e5a6e7308898ecf6ef0dc99e2ad3/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L3037-L3066

Thanks for working on this; it's pretty hairy.

This confuses BitcodeReader, which cannot disambiguate whether a
blockaddress is referring to a function which has not yet been parsed
("materialized") or is simply empty because its body was spliced out.

Another way to solve this would be to mark the function somehow (either intrusively, somehow, or by adding it to a set / map). Can you contrast with that approach?

BitcodeReader could maintain a set of all functions it's imported, so that it could do the right thing here (rather than assume a function being empty means not yet parsed);
https://github.com/llvm/llvm-project/blob/1e16272ba793e5a6e7308898ecf6ef0dc99e2ad3/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L3037-L3066

Can it also check isMaterializable() to differentiate? Is the "materializable" bit reset after materializing, allowing us to differentiate between "empty becuase it's materializable" and "empty because content got moved away"?

Not entirely sure this'll work, because of the cycle case (testcase @c and @d)... but maybe if you know it's been moved, there's a way to look up the location of the new definition? (is the valuemapper available?)

(If this isn't going to materialize anything *extra*, it seems fine to materialize in an earlier loop (your patch).)

Another potential concern: what happens with weak linkage? I.e., what if you have:

define void @sink(i8*)
define linkonce_odr void @haslabel() {
  br label %label
label:
  ret void
}
define void @usesblockaddress() {
  call void @sink(i8* blockaddress(@haslabel, %label))
  ret void
}

What if this TU's @haslabel is not selected by the linker? (Maybe this is just a failed link, since there's no guarantee the other @haslabel still has a basic block, or that it means the same thing, since it could be optimized away if its TU didn't have a blockaddress pointing at it.)

Thanks for working on this; it's pretty hairy.

This confuses BitcodeReader, which cannot disambiguate whether a
blockaddress is referring to a function which has not yet been parsed
("materialized") or is simply empty because its body was spliced out.

Another way to solve this would be to mark the function somehow (either intrusively, somehow, or by adding it to a set / map). Can you contrast with that approach?

BitcodeReader could maintain a set of all functions it's imported, so that it could do the right thing here (rather than assume a function being empty means not yet parsed);
https://github.com/llvm/llvm-project/blob/1e16272ba793e5a6e7308898ecf6ef0dc99e2ad3/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L3037-L3066

Can it also check isMaterializable() to differentiate? Is the "materializable" bit reset after materializing, allowing us to differentiate between "empty becuase it's materializable" and "empty because content got moved away"?

Playing with this idea; I don't think maintaining a set of parsed functions will work; for the @x/@y case, @x has already been moved; there's no way to find the %label in @x. (Dumping the ir prints call void @f(i8* blockaddress(@x, <badref>)) since IRMover has spliced out the body of @x, we can't find the Value %label).

Not entirely sure this'll work, because of the cycle case (testcase @c and @d)... but maybe if you know it's been moved, there's a way to look up the location of the new definition? (is the valuemapper available?)

The value mapper is not available to the bitcode reader, AFAICT. The bitcode reader doesn't know about value mapping; IRMover does. IRMover is using BitcodeReader to materialize functions then splice+map values from one source module to another destination module.

Maybe passing an optional mapper into bitcode reader than querying that is the way to go? That feels odd, since the mapper is (IIUC) mapping Values from one Module to another, while BitcodeReader AFAICT is only concerned ATM with one single Module. I'm not sure that the mapper will handle mapping a Value from the destination module (since I suspect it only works with values from the source module as inputs).

(If this isn't going to materialize anything *extra*, it seems fine to materialize in an earlier loop (your patch).)

I think your early concern "can the worklist loop below add new functions to the worklist, which haven't been materialized ahead of time" is accurate; I think more unmaterialized functions can be. Items are added to the Worklist via IRLinker::maybeAdd(). IRMover calls this in its constructor, but also via a callback declared in IRLinker::shouldLink. IRLinker::AddLazyFor is a callback, which comes from IRMover::move. This callback is only defined to do anything in ModuleLinker::run. ModuleLinker::addLazyFor seems like it is adding items to the worklist (IIUC) (possibly multiple for COMDAT). (FWIW, there are 2 other callers of IRMover::move that both use empty callbacks that don't appear to add more values to the Worklist: LTO::linkRegularLTO and FunctionImporter::importFunctions. So ModuleLinker::run is the case in which unmaterialized functions MIGHT be added to the Worklist, IIUC).

Another potential concern: what happens with weak linkage? I.e., what if you have:

declare void @sink(i8*)
define linkonce_odr void @haslabel() {
  br label %label
label:
  ret void
}
define void @usesblockaddress() {
  call void @sink(i8* blockaddress(@haslabel, %label))
  ret void
}

What if this TU's @haslabel is not selected by the linker? (Maybe this is just a failed link, since there's no guarantee the other @haslabel still has a basic block, or that it means the same thing, since it could be optimized away if its TU didn't have a blockaddress pointing at it.)

The above test case currently runs through llvm-link just fine, regardless of my patch.

I'm having a difficult time figuring out how to solve the issue in the face of new functions being added to the worklist during mapping, without either passing the value map down to the bitcode reader, or maybe it would be better to add a callback that can be used to have the IRLinker tell the bitcode reader block address parsing what to do and do the mapping if the function has been materialized?

Another potential concern: what happens with weak linkage? I.e., what if you have:

declare void @sink(i8*)
define linkonce_odr void @haslabel() {
  br label %label
label:
  ret void
}
define void @usesblockaddress() {
  call void @sink(i8* blockaddress(@haslabel, %label))
  ret void
}

What if this TU's @haslabel is not selected by the linker? (Maybe this is just a failed link, since there's no guarantee the other @haslabel still has a basic block, or that it means the same thing, since it could be optimized away if its TU didn't have a blockaddress pointing at it.)

The above test case currently runs through llvm-link just fine, regardless of my patch.

Sorry, I guess I wasn't clear. That's not a complete test case. Complete testcase needs two files so that @haslabel gets replaced by another TU. For example:

% cat a.ll
define weak_odr void @haslabel() {
  ret void
}
% cat b.ll
declare void @sink(i8*)
define linkonce_odr void @haslabel() {
  br label %label
label:
  ret void
}

define void @usesblockaddress() {
  call void @sink(i8* blockaddress(@haslabel, %label))
  ret void
}

Result of linking a.ll and b.ll looks to me like it produces a bogus blockaddress:

% ./staging/build/bin/llvm-link a.ll b.ll -S
; ModuleID = 'llvm-link'
source_filename = "llvm-link"

define weak_odr void @haslabel() {
  ret void
}

define void @usesblockaddress() {
  call void @sink(i8* inttoptr (i32 1 to i8*))
  ret void
}

declare void @sink(i8*)

Maybe it should error instead... or maybe taking a blockaddress to a weak function shouldn't be allowed at all?

I'm having a difficult time figuring out how to solve the issue in the face of new functions being added to the worklist during mapping, without either passing the value map down to the bitcode reader, or maybe it would be better to add a callback that can be used to have the IRLinker tell the bitcode reader block address parsing what to do and do the mapping if the function has been materialized?

I think the callback option would work. One concern (with either solution) is we need to be sure that the function we point the blockaddress at actually came from this bitcode file (otherwise, I think it could lead to hard-to-reason about miscompiles).

A related idea: when the IRMover moves content from one function to another, leave behind a Function* in the moved-from function that the bitcodereader knows how to find (not sure exactly where to store it, but I think the place to do the work is in IRLinker::linkFunctionBody(), something like Src.recordMovedToFunction(Dst)). Unlike ValueMapper, this would not have a mapping in the "function replaced" case where it seems more likely to lead to miscompiles.

But maybe your callback option is better. I'm not sure.

dexonsmith added inline comments.Mar 4 2022, 3:02 PM
llvm/test/Linker/blockaddress.ll
4

BTW, it'd be better to have positive tests here than the CHECK-NOT. You can use llvm-link -S to emit the IR, and then check that each blockaddress is correct (rather than checking that no errors were emitted).

nickdesaulniers marked an inline comment as done.
  • update test to check for good blockaddresses, as per @dexonsmith

Result of linking a.ll and b.ll looks to me like it produces a bogus blockaddress:

That's weird, but note that that case does not lazily add to the running worklist. available_externally linkage seems to, but test cases like mine but marked available_externally already seem to work regardless of my patch (and I'm happy to add tests for that to prove that).

Otherwise I don't think we've found a concrete test case for which my approach does not work.

Result of linking a.ll and b.ll looks to me like it produces a bogus blockaddress:

A slightly simpler test, take my @x + @y test and mark @x linkonce_odr. Though I'd argue that's a different case also involving blockaddresses which is broken, and my patch doesn't fix, but that seems slightly orthogonal to the bug I'm solving here, which is a concrete LTO failure observed in the wild Never resolved function from blockaddress.

nickdesaulniers added inline comments.Mar 7 2022, 2:36 PM
llvm/lib/Linker/IRMover.cpp
1522–1526

Can materialize() add things to the Worklist? If so, I don't think the ranged-based for is safe.

I don't think so (though please do triple check). IRLinker::maybeAdd is the only method that adds to the worklist. That is only called from IRLinker's constructor and IRLinker::shouldLink. While IRLinker::shouldLink is called from IRLinker::materialize, the method I'm calling here is GlobalValue::materialize (which through some indirection is a call to BitcodeReader::materialize. While these methods are all called materialize, BitcodeReader doesn't (AFAICT) have any reference to IRLinker's private Worklist member.

To triple check the other calls of IRLinker::shouldLink, I see IRLinker::linkAppendingVarProto and IRLinker::linkGlobalValueProto. The latter may call the former. The former is only called from IRLinker::materialize.

So check my work, but it looks to me like the answer to Can materialize() add things to the Worklist? is "No; IRLinker::materialize can, but I'm not calling that; I'm calling GlobalValue::materialize". So I don't think there's an iterator invalidation bug here as implied by the question.

Working on a reply to the second question...stay tuned.

nickdesaulniers edited reviewers, added: rickyz; removed: ricky.Mar 7 2022, 3:14 PM
llvm/lib/Linker/IRMover.cpp
1522–1526

I also worry if it's sufficient; can the worklist loop below add new functions to the worklist, which haven't been materialized ahead of time?

Here's a test case that fails, with the same error I'm trying to fix here.

declare void @f(i8*)

; 1. @x gets parsed + moved. References to @x's %label can no longer be
;    created.
define void @x() {
  br label %label

label:
  ret void
}

; 2. @y not parsed; will be parsed lazily if referenced.
define linkonce_odr void @y() {
  br label %label

label:
; 4. when lazy parsing, @x's %label is gone, resulting in a badref.
  call void @f(i8* blockaddress(@x, %label))
  ret void
}

; 3. @z parsed, call to @y adds @y to worklist.
define void @z() {
  call void @y()
  ret void
}

$ llvm-as z.ll
$ llvm-link -S z.bc
error: Never resolved function from blockaddress (Producer: 'LLVM15.0.0git' Reader: 'LLVM 15.0.0git')

Let me see if I can more thoroughly separate the parsing from the linking/remapping, then I'll add the above test case to this patch.

nickdesaulniers planned changes to this revision.Mar 8 2022, 10:07 AM
nickdesaulniers edited the summary of this revision. (Show Details)
  • handle functions added lazily to the worklist
nickdesaulniers planned changes to this revision.Mar 9 2022, 11:25 AM
nickdesaulniers added inline comments.
llvm/lib/Linker/IRMover.cpp
1522–1526

I also worry if it's sufficient; can the worklist loop below add new functions to the worklist, which haven't been materialized ahead of time?

Diff 414166 now appears to handle this. I have a TODO on the existing call to GlobalValue::materialize() that I'd like to remove. That causes some tests to fail, so marking this "Changes Planned" until I have more time to sort that out. But PTAL with this approach.

dexonsmith added inline comments.Mar 10 2022, 3:43 PM
llvm/lib/Linker/IRMover.cpp
1522–1526

Can materialize() add things to the Worklist? If so, I don't think the ranged-based for is safe.

I don't think so (though please do triple check).

Is there a way to add an assertion to that effect? If so, it'd be great to do that as a prep commit (do a local bootstrap with assertions on, see if it survives the pre-commit bots, etc.) before we starting to rely on the fact. This is also important to prevent that behaviour from showing up later. As you described, the logic here is entangled. I've been surprised before by various co-recursive things in the valuemapper and irlinker.

1604–1612

I worry this could cause a big compile-time hit for ThinLTO, depending on the exact code paths it uses.

Here's part of a testcase for situation I'm worried about:

; f1.ll
define weak_odr void @f1() {
  ret void
}

; f2.ll
define linkonce_odr void @f1() {
  ret void
}
define void @f2() {
  call @f1()
  ret void
}

The scenario (more scaffolding needed):

  • Linker decides to take @f1 from f1.ll and @f2 from f2.ll.
  • The worklist for f2.bc contains just @f2, and @f1 will (never) be materialized from f2.bc.
  • IIUC, this patch changes it so that @f1 is eagerly materialized from f2.bc even though it will be ignored.

In the general case, this would effectively defeat lazy-loading of templates and inline functions.

@tejohnson , do you know how to set up the above scenario for ThinLTO?

tejohnson added inline comments.Mar 11 2022, 9:11 AM
llvm/lib/Linker/IRMover.cpp
1604–1612

Yes this is a good catch. But I think where it affects issues is a slight change from what you have described above. We don't use the IRMover to load the destination module for each ThinLTO backend, so it wouldn't affect f2.ll when we are importing into it - f2.ll will be parsed here: http://llvm-cs.pcc.me.uk/lib/LTO/LTO.cpp#1088) in the thread for its ThinLTO backend.

However, where it will affect things is when we are importing f2 into another module, when the copy of f1 was selected from f1.ll. We can use llvm-lto2 which is passed explicit linker resolutions to simulate.

; f1.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

define weak_odr void @f1() {
  ret void
}

; f2.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

define linkonce_odr void @f1() {
  ret void
}
define void @f2() {
  call void @f1()
  ret void
}

; f3.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

declare void @f2()
define void @f3() {
  call void @f2()
  ret void
}
$ opt -module-summary f1.ll -o f1.o
$ opt -module-summary f2.ll -o f2.o
$ opt -module-summary f2.ll -o f2.o
$ llvm-lto2 run f1.o f2.o f3.o -save-temps -o f -r=f1.o,f1,plx -r=f2.o,f1, -r=f2.o,f2,plx -r=f3.o,f2, -r=f3.o,f3,plx -debug-only=function-import

The end of the debug output shows what we import from where into f3.o:

Is importing function 2072045998141807037 f1 from f1.ll
Not importing function 2072045998141807037 f1 from f2.ll
Is importing function 8471399308421654326 f2 from f2.ll
Imported 2 functions for Module f3.o

So we should not materialize f1 from f2.o when importing in f3.o's ThinLTO backend thread.

However, as noted in one of the comments earlier, the function importer has an empty AddLazyFor callback, so we will never add new values to materialize during the value mapping. I wonder if we key off of this somehow to not attempt to add new functions to materialize here?

Thinking (more), I think the materializing-too-much explosion will be a problem for use cases other than LTO, and not just for weak functions.

Consider a JIT, which initially just materializes main and codegens with callback stubs for any calls. When a stub is called the JIT materializes the called function, patches the stub, and continues, pulling in functions one at a time. This pre-materialization logic would cause the JIT to eagerly materialize everything that main transitively references.

That would be a huge regression. This needs to be significantly more narrow.

I've also changed my opinion from the comment in https://github.com/llvm/llvm-project/issues/52787#issuecomment-1010314787, where I said:

This could work, but I think it should be possible to fix BitcodeReader/IRMover to do the right thing without adding backedges.

[...]

If we did need something new serialized, maybe it could be limited to a bit per basic block saying "is this basic block referenced by a blockaddress outside this function?" (instead of a backedge, just a bit saying there is an edge), and then add ValueHandles for those basic blocks when materializing them to track what they get RAUW'd to.

But I'd much prefer to avoid having the function content in bitcode depend on what references it. Seems like the IRMover and BitcodeReader should be able to track this somehow. E.g., maybe the caller of IRMover should tell BitcodeReader that the basic blocks were moved to another function.

I've realized through this review (thanks @nickdesaulniers and @tejohnson for all iteration!) that:

  • A function being blockaddress-able ties it tightly to anything that points at it.
  • The only way to avoid a materialization explosion is to record backedges.

(Sorry for the poor guidance!)

Here's what I suggest:

  • Add to bitcode a way to say that materializing a function requires materializing another one. Maybe this could be a new record in the FUNCTION_BLOCK that's arbitrary size, only emitted if there are non-zero of them, which lists the functions that need to be materialized when this one is.
  • Use this feature to add edges in both directions every time one function has a blockaddress to another one. Probably not too hard; during value enumeration of function bodies, create a map of which functions need to be materialized with each other (based on observed blockaddresses), then use that to generate the records during bitcode emission.
  • Change the lazy materializer to use this information somehow (maybe this is the hard part).

WDYT?

  • The only way to avoid a materialization explosion is to record backedges.

Unless I can make @tejohnson 's suggestion work regarding making IRLinker::AddLazyFor optional, then at that point I agree.

(Sorry for the poor guidance!)

No, no! I appreciate the feedback; it's helpful to have folks engage on this issue.

Here's what I suggest:

  • Add to bitcode a way to say that materializing a function requires materializing another one. Maybe this could be a new record in the FUNCTION_BLOCK that's arbitrary size, only emitted if there are non-zero of them, which lists the functions that need to be materialized when this one is.
  • Use this feature to add edges in both directions every time one function has a blockaddress to another one. Probably not too hard; during value enumeration of function bodies, create a map of which functions need to be materialized with each other (based on observed blockaddresses), then use that to generate the records during bitcode emission.
  • Change the lazy materializer to use this information somehow (maybe this is the hard part).

WDYT?

I can give it a shot.

llvm/lib/Linker/IRMover.cpp
1604–1612

So we should not materialize f1 from f2.o when importing in f3.o's ThinLTO backend thread.

Ah, thanks for the test case. My current approach (Diff 414166) does materialize f1 from f2.o, so that would be a regression.

However, as noted in one of the comments earlier, the function importer has an empty AddLazyFor callback, so we will never add new values to materialize during the value mapping. I wonder if we key off of this somehow to not attempt to add new functions to materialize here?

So if we made IRLinker::AddLazyFor an Optional function reference (or function pointer) then we could easily tell+guarantee that no new functions could be added lazily to IRLinker::Worklist. Then we could go back to an earlier+simpler diff https://reviews.llvm.org/D120781?id=412277 just wrapped in one conditional check that AddLazyFor was not available. That might be the smallest possible fix that might be possible to get into the clang-14 release.

Let me play with that approach a little before pursuing @dexonsmith 's recommendation to encode "backedges" (references) to functions in the bitcode, and let's see what that looks like if it works.

  • The only way to avoid a materialization explosion is to record backedges.

Unless I can make @tejohnson 's suggestion work regarding making IRLinker::AddLazyFor optional, then at that point I agree.

Is that just going to fix ThinLTO, or will it also fix other lazy-materialization use cases as well (like the potential JIT case I described)?

@lhames, do you know of concrete JIT uses cases that rely on lazy materialization for performance? Is there someone we could add to the review to help get a testcase to check, if @nickdesaulniers is able to get the AddLazyFor fix working for ThinLTO?

Another potential concern: what happens with weak linkage? I.e., what if you have:

Thinking about this over the weekend; I don't think we should allow blockaddresses to reference available_externally linkage functions if the blockaddress' Function differs from the Instruction's Function (for the Instruction with the blockaddress Constant operand).

For example:

; y.ll
declare void @f(i8*)

define available_externally void @x() {
  br label %label

label:
  call void @y()
  ret void
}

define void @y() {
  ; Note: blockaddress refers to @x in fn @y.
  call void @f(i8* blockaddress(@x, %label))
  ret void
}
$ llc -o - y.ll
	.text
	.file	"y.ll"
	.globl	y                               # -- Begin function y
	.p2align	4, 0x90
	.type	y,@function
y:                                      # @y
	.cfi_startproc
# %bb.0:
	pushq	%rax
	.cfi_def_cfa_offset 16
	movl	$.Ltmp0, %edi ; oops! where is .Ltmp0?
	callq	f@PLT
	popq	%rax
	.cfi_def_cfa_offset 8
	retq
.Lfunc_end0:
	.size	y, .Lfunc_end0-y
	.cfi_endproc
                                        # -- End function
	.section	".note.GNU-stack","",@progbits
$ opt -verify -S y.ll -o /dev/null
$ echo $?
0

Perhaps I should think harder about additional linkages that are added lazily to IRLinkers's Worklist (like linkonce_odr and perhaps weak). If all of the cases that could be added lazily are made invalid IR, with new verifier checks added, we could update IPSCCP to not sink such blockaddress Constants that would produce such invalid IR. Let me see if a quick verifier check can find any existing test cases I'm not thinking of where that might not work.

llvm/lib/Linker/IRMover.cpp
1604–1612

So if we made IRLinker::AddLazyFor an Optional function reference (or function pointer) then we could easily tell+guarantee that no new functions could be added lazily to IRLinker::Worklist. Then we could go back to an earlier+simpler diff https://reviews.llvm.org/D120781?id=412277 just wrapped in one conditional check that AddLazyFor was not available. That might be the smallest possible fix that might be possible to get into the clang-14 release.

That might work for thinLTO, but it doesn't for the simple test case using llvm-link added in this patch; IRLinker::AddLazyFor is defined. :( https://reviews.llvm.org/D121630 is still a worthwhile cleanup IMO.

I'll start looking into bitcode modifications.

Here's what I suggest:

  • Add to bitcode a way to say that materializing a function requires materializing another one. Maybe this could be a new record in the FUNCTION_BLOCK that's arbitrary size, only emitted if there are non-zero of them, which lists the functions that need to be materialized when this one is.
  • Use this feature to add edges in both directions every time one function has a blockaddress to another one. Probably not too hard; during value enumeration of function bodies, create a map of which functions need to be materialized with each other (based on observed blockaddresses), then use that to generate the records during bitcode emission.
  • Change the lazy materializer to use this information somehow (maybe this is the hard part).

WDYT?

One potential issue, consider the following case:

declare void @f(i8*)

define void @x() {
  br label %label

label:
  ret void
}

define linkonce_odr void @y() {
  br label %label

label:
  call void @f(i8* blockaddress(@x, %label))
  ret void
}

So @y contains a blockaddress that refers to @x, but because of its linkage AND lack of reference from any function w/ external linkage, @y is never emitted (let alone materialized). I think in this case, if we were to record that @x had a "backedge" from @y, and thus materialized @y as a result of materializing @x, we'd wind up "over-materializing" again. This is because BitcodeReader doesn't know about IRMover::Worklist and the complex linkage type handling, and lazy additions to the worklist.

One potential issue, consider the following case:

declare void @f(i8*)

define void @x() {
  br label %label

label:
  ret void
}

define linkonce_odr void @y() {
  br label %label

label:
  call void @f(i8* blockaddress(@x, %label))
  ret void
}

So @y contains a blockaddress that refers to @x, but because of its linkage AND lack of reference from any function w/ external linkage, @y is never emitted (let alone materialized). I think in this case, if we were to record that @x had a "backedge" from @y, and thus materialized @y as a result of materializing @x, we'd wind up "over-materializing" again. This is because BitcodeReader doesn't know about IRMover::Worklist and the complex linkage type handling, and lazy additions to the worklist.

I think over-materializing here is probably okay, as a penalty for @x having exposed an internal pointer via blockaddress.

Given that blockaddress is a rarely used feature, I think the goals for this patch should be (in rough priority order):

  1. Avoid compile-time regressions when not using blockaddress. Limit any compile-time regressions to IR intrinsically tied to the feature.
  2. Get blockaddress working for these (new) cases.
  3. Avoid unnecessary compile-time regressions for IR intrinsically tied to the feature.

I think over-materializing @x falls into category (3).

dexonsmith added a comment.EditedMar 15 2022, 12:32 PM

That case with @x reminded me of my concern about miscompiles due to blockaddress to a de-refinable function.

I was wondering if we could make those illegal in the verifier. In particular, reject a reference blockaddress(@f, ...) if:

  • it's not a self-reference (it appears outside the body of @f) AND
  • @f is de-refinable (linkage is available_externally or {weak,linkonce}{,_odr}) AND
  • the reference is NOT in the same comdat as @f.

But I think we probably can't disallow it. Clang will generate IR that violates that for:

void sink(...);
inline void f1() {
  static void *Label = &&label;
  sink(Label);
label:
  return;
}
void f2() { f1(); }

Maybe it'd be desirable to change Clang to put Label in the same comdat as f1(), but that won't solve object formats (like Mach-O) that don't have comdats.

For this case, the linker probably *in practice* probably selects them from the same object file.

Here's what I suggest:

  • Add to bitcode a way to say that materializing a function requires materializing another one. Maybe this could be a new record in the FUNCTION_BLOCK that's arbitrary size, only emitted if there are non-zero of them, which lists the functions that need to be materialized when this one is.
  • Use this feature to add edges in both directions every time one function has a blockaddress to another one. Probably not too hard; during value enumeration of function bodies, create a map of which functions need to be materialized with each other (based on observed blockaddresses), then use that to generate the records during bitcode emission.
  • Change the lazy materializer to use this information somehow (maybe this is the hard part).

WDYT?

So I think when BitcodeWriter is serializing a function, it can check whether each of the function's basicblocks has its address taken. If so, it can build the set of GlobalValues that are Users of that BlockAddress. I'm curious if this should be part of the record or an Abbreviation? (Not sure I fully understand what an abbreviation is, despite just reading https://llvm.org/docs/BitCodeFormat.html#abbreviations). It seems like adding a record is unconditional though? So not sure about making it optional/only emitted if necessary?

Then, when BitcodeReader is parsing function records (parseFunctionRecord), it can read back these sub-records (or abbreviations). When parsing a function body (lazily), it can check these sub-records and call DeferredFunctionInfo.find, findFunctionInStream, Stream.JumpToBit, parseFunctionBody (basically the steps of BitcodeReader::materialize).

So I think when BitcodeWriter is serializing a function, it can check whether each of the function's basicblocks has its address taken. If so, it can build the set of GlobalValues that are Users of that BlockAddress. I'm curious if this should be part of the record or an Abbreviation? (Not sure I fully understand what an abbreviation is, despite just reading https://llvm.org/docs/BitCodeFormat.html#abbreviations). It seems like adding a record is unconditional though? So not sure about making it optional/only emitted if necessary?

You'd either want to add a new record or add something to an existing record.

  • An abbreviation stores "meta-information", describing the encoding of any record that refers to it.
  • A record stores "content". It can optionally refer to any already-emitted abbreviation, customizing the record's encoding to match the abbreviation. This doesn't affect the record's content when parsed; it just changes the binary representation.
  • If you change the number/order/content of fields in a record, it's possible the abbreviation it was using will no longer apply. You might need to update it, or use a different one.

The other concept is a block, which is a scope into which you can put abbreviations, records, and other blocks. Blocks are nested, and are guaranteed to be 4B-aligned (records are not even 1B-aligned IIRC; they can start on any bit). You can remember where a block is and go back to read it later.

Then, when BitcodeReader is parsing function records (parseFunctionRecord), it can read back these sub-records (or abbreviations). When parsing a function body (lazily), it can check these sub-records and call DeferredFunctionInfo.find, findFunctionInStream, Stream.JumpToBit, parseFunctionBody (basically the steps of BitcodeReader::materialize).

Instead of adding a new field to a function record, I'd suggest adding a new record to FUNCTION_BLOCK, which contains the instructions and basic blocks. Then you're tying together:

  • Loading basic blocks of a function
  • Pulling up dependents that might refer directly to the function's basic blocks

I'd suggest creating a new record kind for this.

  • rework to use new bitcode function code record
nickdesaulniers retitled this revision from [IRLinker] materialize Functions before moving any to [Bitcode] materialize Functions early when BlockAddress taken.Apr 7 2022, 2:58 PM
nickdesaulniers edited the summary of this revision. (Show Details)
nickdesaulniers planned changes to this revision.Apr 7 2022, 3:02 PM

Looks like I accidentally dropped my tests. Let me re-add those.

Also, this is incomplete; there's some tests failing where the "use list" (???) seems to be incorrect; some blockaddresses are being deserialized with the basic block name changed to a number from a string that I don't yet understand.

there's some tests failing where the "use list" (???) seems to be incorrect

I suspect these are "use-list order" tests, which confirm that we can round-trip to bitcode without changing the use-list order.

Quick summary:

  • Every Value has an intrusive linked list of its Uses -- i.e., where it's used as an operand.
  • Some algorithms walk a Value's use-list, and their output can be influenced by the order that the Uses are listed.
  • This order is deterministic, but arbitrary somewhat arbitrary.
  • Bitcode serialization has an optional feature to serialize / recover this use-list order. This is important for reproducing some (rare) bugs, or and for guaranteeing that serializing/deserializing has no side effects.
  • To minimize overhead, the algorithm only stores diffs vs. the "natural" use-lists you'd get by running materializeAll(). (Side note: this feature is optional because the storage overhead is a bit too high. If we could guarantee that no one looked at use-lists of pure constants like i1 0, ptr null, and i32 0, it would likely be cheap enough to store all the time.)

(Incidentally, that feature is the only reason I know anything about blockaddress...)

Anyway, if you stop calling materialize() from inside parseFunctionBody(), you'll stop changing the use-list order, and these tests should stop bothering you.

some blockaddresses are being deserialized with the basic block name changed to a number from a string that I don't yet understand.

I'm guessing it's not safe to call materialize() from parseFunctionBody(); it's maybe not re-entrant, and something is getting corrupted?

Have a look at how blockaddress forward references are handled; if you follow a similar path for these discovered back-edges the corruption might go away.

llvm/lib/Bitcode/Reader/BitcodeReader.cpp
4324–4343 ↗(On Diff #421343)

I think it'd be better to save these in a list to iterate through after finishing materializing whatever's already "scheduled".

I think the right place to run the code is probably in materializeForwardReferencedFunctions(), which is already doing stuff for block addresses (in the other direction!). This is called from two places:

  • BitcodeModule::getModuleImpl(), which only calls it if lazy-loading -- i.e., if NOT calling materializeAll(). I am guessing that lazy-loading needs this because block addresses can be referenced by something that isn't lazily loaded (not sure what that is; maybe globals or some metadata?).
  • BitcodeReader::materialize(), at the end.
ychen added a subscriber: ychen.Apr 7 2022, 4:34 PM
nickdesaulniers marked 3 inline comments as done.
  • materialize later, re-add previous test, add llvm-bcanalyzer test; all tests now green

there's some tests failing where the "use list" (???) seems to be incorrect

I suspect these are "use-list order" tests, which confirm that we can round-trip to bitcode without changing the use-list order.

Tremendous and informative feedback! Indeed, your suggestion worked. PTAL.

dexonsmith accepted this revision.Apr 11 2022, 10:50 AM

LGTM, with one nitpick inline!

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
3398–3399 ↗(On Diff #421574)

Nit: IIRC, LLVM style usually adds {} braces to outer control structures whenever an inner one has them.

This revision is now accepted and ready to land.Apr 11 2022, 10:50 AM
  • fix style nit
nickdesaulniers marked an inline comment as done.Apr 12 2022, 11:31 AM
This revision was landed with ongoing or failed builds.Apr 13 2022, 2:31 PM
This revision was automatically updated to reflect the committed changes.