Page MenuHomePhabricator

[Bitcode] Include indirect users of BlockAddresses in bitcode
ClosedPublic

Authored by twd2 on May 3 2022, 2:19 PM.

Details

Summary

The original fix (commit 23ec5782c3cc) of https://github.com/llvm/llvm-project/issues/52787 only adds Functions that have Instructions that directly use BlockAddresses into the bitcode (FUNC_CODE_BLOCKADDR_USERS).

However, in either @rickyz's original reproducing code:

void f(long);

__attribute__((noinline)) static void fun(long x) {
  f(x + 1);
}

void repro(void) {
  fun(({
    label:
      (long)&&label;
  }));
}
...
define dso_local void @repro() #0 {
entry:
  br label %label

label:                                            ; preds = %entry
  tail call fastcc void @fun()
  ret void
}

define internal fastcc void @fun() unnamed_addr #1 {
entry:
  tail call void @f(i64 add (i64 ptrtoint (i8* blockaddress(@repro, %label) to i64), i64 1)) #3
  ret void
}
...

or the xfs and overlayfs in the Linux kernel, BlockAddresses (e.g., i8* blockaddress(@repro, %label)) may first compose ConstantExprs (e.g., i64 ptrtoint (i8* blockaddress(@repro, %label) to i64)) and then used by Instructions. This case is not handled by the original fix.

This patch adds *indirect* users of BlockAddresses, i.e., the Instructions using some Constants which further use the BlockAddresses, into the bitcode as well, by doing depth-first searches.

Fixes: https://github.com/llvm/llvm-project/issues/52787
Fixes: 23ec5782c3cc ("[Bitcode] materialize Functions early when BlockAddress taken")

Diff Detail

Event Timeline

twd2 created this revision.May 3 2022, 2:19 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2022, 2:19 PM
twd2 requested review of this revision.May 3 2022, 2:19 PM
twd2 added a comment.EditedMay 3 2022, 2:34 PM

FYI, before the patch:

$ cat repro.c
void f(long);

__attribute__((noinline)) static void fun(long x) {
  f(x + 1);
}

void repro(void) {
  fun(({
    label:
      (long)&&label;
  }));
}
$ clang -O2 -flto -c repro.c -o repro.i
$ llvm-bcanalyzer -dump repro.i | grep USER                         
$ ld.lld repro.i
ld.lld: error: Never resolved function from blockaddress (Producer: 'LLVM15.0.0git' Reader: 'LLVM 15.0.0git')

After the patch:

$ clang -O2 -flto -c repro.c -o repro.i
$ llvm-bcanalyzer -dump repro.i | grep USER                         
    <BLOCKADDR_USERS op0=1/>
                      1        28                    BLOCKADDR_USERS
$ ld.lld repro.i
ld.lld: warning: cannot find entry symbol _start; not setting start address

Thanks for the patch! Indeed, the users _might_ be Constants rather than just Instructions. One thing to think about is that you can declare global Constants. I used the dyn_cast<Instruction> to check that this was not the case. I think you'll need to differentiate that, too. Meaning, if the user is a Constant, are the Constant's users Instructions (or not if simply GlobalVar).

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
3405–3408

Can we simply add additional dyn_cast here for ConstantData, ConstantAggregate, and ConstantExpr rather than two new data structures and loops?

That said, I guess Constant's are unique to a module, so not revisiting them is nice.

3416–3417

Consider adding test coverage for 3 different types of Users.

twd2 added inline comments.May 4 2022, 1:54 PM
llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
3416–3417

Thanks for the comment!

I just find that ConstantDatas take no operand, so U will never be a ConstantData. I'll remove this condition and add tests for the remaining 2 types.

twd2 updated this revision to Diff 427131.May 4 2022, 2:02 PM

Remove the ConstantData condition and add tests for the remaining 2 types. This update also adds a test for nested users of BlockAddresses.

twd2 marked an inline comment as done.May 4 2022, 2:02 PM

Please add a test for:

@foo = global i8* blockaddress(@repro, %label)

define void @repro() {
  br label %label

label:
  ret void
}

That this DOES NOT emit a BLOCKADDR_USERS. This is the main case I care about, and I think we can be do less work with the queue in such cases. Differentiating when the use is an Instruction or not is important, because we only need to emit BLOCKADDR_USERS for Instructions, not globals.

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
3405–3422

Can we use a SmallVector (push_back, pop_back_val) instead?

twd2 added a comment.May 4 2022, 2:28 PM

Thanks for the patch! Indeed, the users _might_ be Constants rather than just Instructions. One thing to think about is that you can declare global Constants. I used the dyn_cast<Instruction> to check that this was not the case. I think you'll need to differentiate that, too. Meaning, if the user is a Constant, are the Constant's users Instructions (or not if simply GlobalVar).

The users of Constants are not only Instructions and GlobalValues, but also other Constants.
For example, in rickyz's original reproducing code, the only user of i8* blockaddress(@repro, %label) is a ConstantExpr, i64 ptrtoint (i8* blockaddress(@repro, %label) to i64).
This ConstantExpr will not pass the dyn_cast<Instruction> check, and no Instruction will be encoded in the bitcode.
Further, the only user of i64 ptrtoint (... to i64) is also a ConstantExpr, i64 add (i64 ptrtoint (i8* blockaddress(@repro, %label) to i64), i64 1).
Finally, the only user of i64 add (..., i64 1) is the tail call ... Instruction in Function @fun.
We should find this Instruction and add the Function into the bitcode.

So, we might have to do a searching loop to first identify what Constants (directly or recursively) use the given BlockAddress, and what Instructions (Functions) use these Constants, as my patch proposes.

twd2 updated this revision to Diff 427143.May 4 2022, 2:44 PM

Add the GlobalValue test mentioned by @nickdesaulniers

twd2 added a comment.May 4 2022, 2:50 PM

Thanks for the patch! Indeed, the users _might_ be Constants rather than just Instructions. One thing to think about is that you can declare global Constants. I used the dyn_cast<Instruction> to check that this was not the case. I think you'll need to differentiate that, too. Meaning, if the user is a Constant, are the Constant's users Instructions (or not if simply GlobalVar).

The users of Constants are not only Instructions and GlobalValues, but also other Constants.
For example, in rickyz's original reproducing code, the only user of i8* blockaddress(@repro, %label) is a ConstantExpr, i64 ptrtoint (i8* blockaddress(@repro, %label) to i64).
This ConstantExpr will not pass the dyn_cast<Instruction> check, and no Instruction will be encoded in the bitcode.
Further, the only user of i64 ptrtoint (... to i64) is also a ConstantExpr, i64 add (i64 ptrtoint (i8* blockaddress(@repro, %label) to i64), i64 1).
Finally, the only user of i64 add (..., i64 1) is the tail call ... Instruction in Function @fun.
We should find this Instruction and add the Function into the bitcode.

So, we might have to do a searching loop to first identify what Constants (directly or recursively) use the given BlockAddress, and what Instructions (Functions) use these Constants, as my patch proposes.

This may visit some Constants that are not used by Instructions (e.g., used by GlobalValues) at the end, but I'm afraid that we have to visit them before getting the ultimate users.

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
3405–3422

Sure, I believe it's better!

twd2 updated this revision to Diff 427153.May 4 2022, 3:09 PM
twd2 edited the summary of this revision. (Show Details)

Use SmallVector instead of deque

twd2 marked an inline comment as done.May 4 2022, 3:09 PM

At a high-level, the design for writing bitcode is a two-pass traversal algorithm:

  • First pass in ValueEnumerator indexes everything
  • Second pass in BitcodeWriter writes things

It feels like this new logic could be put into the first pass in ValueEnumerator. (Probably the previous patch, which this builds on, should have done that as well, to avoid iterating over users() in BitcodeWriter, but I guess I didn't notice!)

I think you add some bookkeeping to ValueEnumerator::EnumerateValue(). It currently has:

if (const Constant *C = dyn_cast<Constant>(V)) {
  if (isa<GlobalValue>(C)) {
    // Initializers for globals are handled explicitly elsewhere.
  } else if (C->getNumOperands()) {
    for (User::const_op_iterator I = C->op_begin(), E = C->op_end();
         I != E; ++I)
      if (!isa<BasicBlock>(*I)) // Don't enumerate BB operand to BlockAddress.
        EnumerateValue(*I);

you could change this to something like:

if (const Constant *C = dyn_cast<Constant>(V)) {
  if (isa<GlobalValue>(C)) {
    // Initializers for globals are handled explicitly elsewhere.
  } else if (C->getNumOperands()) {
    SmallSetVector<const BasicBlock *, 4> FoundBBs;
    for (User::const_op_iterator I = C->op_begin(), E = C->op_end();
         I != E; ++I) {
      // Don't enumerate BB operand to BlockAddress, but do track them.
      if (const BasicBlock *BB = dyn_cast<BasicBlock>(&*I)) {
        FoundBBs.insert(BB);
        continue;
      }
      EnumerateValue(*I);
      auto I = BlockAddressesForConstant.find(&cast<Constant>(*I));
      if (I != BlockAddressesForConstant())
        FoundBBs.insert(I->begin(), I->end());
    }
    if (!BlockAddresses.empty())
      BlockAddressesForConstant[C].assign(FoundBBs.begin(), FoundBBs.end());

where ValueEnumerator has a new thing:

SmallDenseMap<const Constant *, SmallVector<const BasicBlock *, 0>> BlockAddressesForConstant;

Then you can change incorporateFunction, which currently has:

// Add all function-level constants to the value table.
for (const BasicBlock &BB : F) {
  for (const Instruction &I : BB) {
    for (const Use &OI : I.operands()) {
      if ((isa<Constant>(OI) && !isa<GlobalValue>(OI)) || isa<InlineAsm>(OI))
        EnumerateValue(OI);
    }
    if (auto *SVI = dyn_cast<ShuffleVectorInst>(&I))
      EnumerateValue(SVI->getShuffleMaskForBitcode());
  }
  BasicBlocks.push_back(&BB);
  ValueMap[&BB] = BasicBlocks.size();
}

to be:

// Add all function-level constants to the value table.
SmallSetVector<const BasicBlock *> FoundBlockAddresses;
for (const BasicBlock &BB : F) {
  for (const Instruction &I : BB) {
    for (const Use &OI : I.operands()) {
      if (isa<GlobalValue>(OI))
        continue;
      auto *C = dyn_cast<Constant>(OI);
      if (!C && !isa<InlineAsm>(OI))
        continue;
      EnumerateValue(OI);
      if (!C)
        continue;
      auto BBs = BlockAddressesForConstant.find(C);
      if (BBs != BlockAddressesForConstant.end())
        FoundBlockAddresses.append(BBs->begin(), BBs->end());
    }
    if (auto *SVI = dyn_cast<ShuffleVectorInst>(&I))
      EnumerateValue(SVI->getShuffleMaskForBitcode());
  }
  BasicBlocks.push_back(&BB);
  ValueMap[&BB] = BasicBlocks.size();
}

if (!FoundBlockAddresses.empty()) {
  // Filter out ones that are local.
  SmallVector<const BasicBlock *> ForeignBlockAddresses;
  llvm::copy_if(FoundBlockAddresses, [&](const BasicBlock *BB) { return BB->getOwner() != &F; },
                std::back_inserter(ForeignBlockAddresses));
  if (!ForeignBlockAddresses.empty())
    FunctionBlockAddresses[&F] = std::move(ForeignBlockAddresses);
}

where ValueEnumerator has:

SmallDenseMap<const Function *, SmallVector<const BasicBlock *, 0>> FunctionBlockAddresses;

Then BitcodeWriter has no need to walk upwards through users(). It can just look at FunctionBlockAddresses.

llvm/lib/Bitcode/Reader/BitcodeReader.cpp
3095 ↗(On Diff #427143)

Nit: please commit any unrelated whitespace changes in separate NFC patches.

twd2 added a comment.EditedMay 5 2022, 3:43 PM

At a high-level, the design for writing bitcode is a two-pass traversal algorithm:

  • First pass in ValueEnumerator indexes everything
  • Second pass in BitcodeWriter writes things

It feels like this new logic could be put into the first pass in ValueEnumerator. (Probably the previous patch, which this builds on, should have done that as well, to avoid iterating over users() in BitcodeWriter, but I guess I didn't notice!)

I just tried your advice to update this patch (not uploaded yet) but found the existing logic is

for each function
    incorporateFunction, which enumerates function-local constants, including BlockAddresses, and we can maintain the users of referred foreign functions here
    writeFunction, which emits BLOCKADDR_USERS records and the function-local constants

However, for every function, before emitting BLOCKADDR_USERS records, we should expect that we have visited all the constants, including function-local ones, and thus we have collected all the users of the blockaddresses in the function.

Can we enumerate (and thus emit) all the constants at the beginning, or add a traversing loop dedicated to collecting the users of blockaddresses?
The latter will touch all the constants one more time.

twd2 updated this revision to Diff 427555.May 6 2022, 1:29 AM

drop unrelated whitespace changes

At a high-level, the design for writing bitcode is a two-pass traversal algorithm:

  • First pass in ValueEnumerator indexes everything
  • Second pass in BitcodeWriter writes things

It feels like this new logic could be put into the first pass in ValueEnumerator. (Probably the previous patch, which this builds on, should have done that as well, to avoid iterating over users() in BitcodeWriter, but I guess I didn't notice!)

I just tried your advice to update this patch (not uploaded yet) but found the existing logic is

for each function
    incorporateFunction, which enumerates function-local constants, including BlockAddresses, and we can maintain the users of referred foreign functions here
    writeFunction, which emits BLOCKADDR_USERS records and the function-local constants

However, for every function, before emitting BLOCKADDR_USERS records, we should expect that we have visited all the constants, including function-local ones, and thus we have collected all the users of the blockaddresses in the function.

Can we enumerate (and thus emit) all the constants at the beginning, or add a traversing loop dedicated to collecting the users of blockaddresses?
The latter will touch all the constants one more time.

Ah, sorry for the misdirection; I'd forgotten that incorporate+write are in an inner loop. Enumerating all function-local constants ahead of time would have side effects on the bitcode output (effectively changing numbering to be global). Traversing all constants an extra time just to index blockaddress (without enumerating) would penalize serialization time in the common case, where there are no/few blockaddresses.

I suggest for now that you go ahead with "traverse users()" logic for fixing the bug (maybe it'd be better to move the logic to ValueEnumerator, but I'll leave that up to you).

(A more invasive change would be to index in ValueEnumerator as I suggested, and write a placeholder record in functions that export BlockAddress that can be patched up later somehow after indexing is complete (the patch could point into a new bitcode block). This would probably add more complexity than it'd be worth though...)

nickdesaulniers accepted this revision.May 10 2022, 10:15 AM
This revision is now accepted and ready to land.May 10 2022, 10:15 AM
twd2 added a comment.May 10 2022, 1:44 PM

(A more invasive change would be to index in ValueEnumerator as I suggested, and write a placeholder record in functions that export BlockAddress that can be patched up later somehow after indexing is complete (the patch could point into a new bitcode block). This would probably add more complexity than it'd be worth though...)

Thanks for the explanation. I just leave the current state here: https://reviews.llvm.org/differential/diff/428488/
Maybe we can do this later.

twd2 added a comment.May 10 2022, 1:57 PM

Thank you all for reviewing.

Could you please commit this patch on my behalf? Wende Tan <twd2.me@gmail.com>

And, a small question: should we use SmallSetVector instead in SmallPtrSet<Function *, 4> BlockAddressUsers; to make sure that the order of users encoded in the bitcode is independent of the internal algorithm of SmallPtrSet?

Thank you all for reviewing.

You're welcome. Thanks for the patch!

Could you please commit this patch on my behalf? Wende Tan <twd2.me@gmail.com>

Sure, I can take care of that.

And, a small question: should we use SmallSetVector instead in SmallPtrSet<Function *, 4> BlockAddressUsers; to make sure that the order of users encoded in the bitcode is independent of the internal algorithm of SmallPtrSet?

We don't iterate BlockAddressUsersVisited; I think the code is fine as is. Am I missing something?

twd2 added a comment.May 10 2022, 2:03 PM

Thank you all for reviewing.

You're welcome. Thanks for the patch!

Could you please commit this patch on my behalf? Wende Tan <twd2.me@gmail.com>

Sure, I can take care of that.

And, a small question: should we use SmallSetVector instead in SmallPtrSet<Function *, 4> BlockAddressUsers; to make sure that the order of users encoded in the bitcode is independent of the internal algorithm of SmallPtrSet?

We don't iterate BlockAddressUsersVisited; I think the code is fine as is. Am I missing something?

It's BlockAddressUsers in Line 3368, your patch added it.

And, a small question: should we use SmallSetVector instead in SmallPtrSet<Function *, 4> BlockAddressUsers; to make sure that the order of users encoded in the bitcode is independent of the internal algorithm of SmallPtrSet?

We don't iterate BlockAddressUsersVisited; I think the code is fine as is. Am I missing something?

It's BlockAddressUsers in Line 3368, your patch added it.

Ah! Sorry, I mixed up which variable you were referring to. Yeah, looks like it should be. Care to fix that in this patch?

twd2 updated this revision to Diff 428501.May 10 2022, 2:55 PM

Use SmallSetVector instead in SmallPtrSet<Function *, 4> BlockAddressUsers; to make sure that the order of users encoded in the bitcode is independent of the internal algorithm of SmallPtrSet.

twd2 added a comment.May 10 2022, 2:56 PM

Ah! Sorry, I mixed up which variable you were referring to. Yeah, looks like it should be. Care to fix that in this patch?

Yes, I just updated the diff. If everything is ok, I think we can commit this patch :)

This revision was landed with ongoing or failed builds.May 10 2022, 4:04 PM
This revision was automatically updated to reflect the committed changes.