This is an archive of the discontinued LLVM Phabricator instance.

[INLINER] allow inlining of blockaddresses if sole uses are callbrs
ClosedPublic

Authored by nickdesaulniers on Feb 14 2019, 3:44 PM.

Details

Summary

It was supposed that Ref LazyCallGraph::Edge's were being inserted by
inlining, but that doesn't seem to be the case. Instead, it seems that
there was no test for a blockaddress Constant in an instruction that
referenced the function that contained the instruction. Ex:

define void @f() {
  %1 = alloca i8*, align 8
2:
  store i8* blockaddress(@f, %2), i8** %1, align 8
  ret void
}

When iterating blockaddresses, do not add the function they refer to
back to the worklist if the blockaddress is referring to the contained
function (as opposed to an external function).

Because blockaddress has sligtly different semantics than GNU C's
address of labels, there are 3 cases that can occur with blockaddress,
where only 1 can happen in GNU C due to C's scoping rules:

  • blockaddress is within the function it refers to (possible in GNU C).
  • blockaddress is within a different function than the one it refers to

(not possible in GNU C).

  • blockaddress is used in to declare a global (not possible in GNU C).

The second case is tested in:

$ ./llvm/build/unittests/Analysis/AnalysisTests \
  --gtest_filter=LazyCallGraphTest.HandleBlockAddress

This patch adjusts the iteration of blockaddresses in
LazyCallGraph::visitReferences to not revisit the blockaddresses
function in the first case.

The Linux kernel contains code that's not semantically valid at -O0;
specifically code passed to asm goto. It requires that asm goto be
inline-able. This patch conservatively does not attempt to handle the
more general case of inlining blockaddresses that have non-callbr users
(pr/39560).

https://bugs.llvm.org/show_bug.cgi?id=39560
https://bugs.llvm.org/show_bug.cgi?id=40722
https://github.com/ClangBuiltLinux/linux/issues/6
https://reviews.llvm.org/rL212077

Event Timeline

Posting code for now. I still need to fix the crash in the existing test and add some more tests, but looking for early feedback.

nickdesaulniers edited the summary of this revision. (Show Details)Feb 14 2019, 3:45 PM

I'm concerned that "nocapture" is not sufficient to describe the necessary property here.

Two examples I've thought of:

  1. Suppose the function is recursive. It could pass a blockaddress to itself, which wouldn't actually capture the address because the copy goes away before the function returns.
  2. Suppose the CallInst is not in the same function as the basic block. Similar to above, it could pass the blockaddress the function. Or maybe it do something else like an equality comparison against another copy of the blockaddress. (I think nocapture doesn't actually forbid checking if two nocapture pointers are equal, although the documentation isn't really clear on that.)
llvm/lib/IR/BasicBlock.cpp
447 ↗(On Diff #186933)

This is backwards; you need to return true if the use is not a CallInst.

jyknight added inline comments.Feb 15 2019, 10:31 AM
llvm/include/llvm/IR/BasicBlock.h
393–395 ↗(On Diff #186933)

Calls are not the only thing which may allow the value to escape. A store to externally-accessible memory will too, for example. The right implementation of this predicate is going to involve calling into something like PointerMayBeCaptured, *but* that's not setup to deal properly with constants at the moment, only SSA values within a function.

This should be something like:
Returns true if there are any uses of the address of this basic block that may escape the function. At the moment, this is a very conservative approximation, and treats ANY use except as a "noescape" parameter to a call as potentially escaping.

llvm/lib/IR/BasicBlock.cpp
445 ↗(On Diff #186933)

This will need to use a worklist loop, to handle a few kinds of uses. Something like this:

  1. If the use isa<Constant> && !isa<GlobalValue> (most particularly to handle a pointer bitcast): add all the uses of _that_ to the worklist to check.
  2. Call with nocapture attribute -> OK, keep checking.
  3. Anything else -> potentially escapes, return true.

I'm concerned that "nocapture" is not sufficient to describe the necessary property here.

Let's continue this thread the bug instead.

  • improve method comment and check the non-global constant case
nickdesaulniers marked 3 inline comments as done.Feb 15 2019, 3:10 PM
nickdesaulniers added inline comments.
llvm/lib/IR/BasicBlock.cpp
445 ↗(On Diff #186933)

Ok, I tried to codify this description (thanks for it, and the help so far). It seems to work for my simplistic test case, but I'm still reading through the LangRef furiously while trying to write up a few test cases.

Transforms/Inline/blockaddress.ll still crashes, but maybe I can look into that last; once I have some added unit tests.

nickdesaulniers marked an inline comment as done.
  • add two very basic test cases, comment out broken test for now, fix indentation
chandlerc requested changes to this revision.Feb 18 2019, 9:39 PM
chandlerc added a subscriber: chandlerc.
chandlerc added inline comments.
llvm/include/llvm/IR/BasicBlock.h
397 ↗(On Diff #187108)

The issue is not escape, but capture if I understand correctly?

We already have an analysis that answers this question generically: PointerMayBeCaptured. I don't think you need to add a new API.

llvm/test/Transforms/Inline/blockaddress.ll
7 ↗(On Diff #187108)

?

60 ↗(On Diff #187108)

You don't actually check that the post inline code is correct though? Could you add precise checks for the relevant *contents* of the function?

I'd also use CHECK-LABEL based patterns within each function as we do in other (modern) tests.

I'll actually be surprised if this works. It is possible that the blockaddress constant expression will somehow be rewritten by the code that actually does the inlining, but honestly, I wouldn't expect it to be rewritten naively. I would expect to need to change the inliner itself (as opposed to the cost) to teach it how to rewrite the blockaddress constant expressions that are used to reference the cloned basic blocks. I wouldn't expect the current implementation to walk past an operand that is a constant expr, but I've not checked it.

This revision now requires changes to proceed.Feb 18 2019, 9:39 PM
nickdesaulniers added inline comments.Mar 6 2019, 4:42 PM
llvm/test/Transforms/Inline/blockaddress.ll
60 ↗(On Diff #187108)

IIUC, llvm/lib/Transforms/Utils/CloneFunction.cpp#PruningFunctionCloner::CloneBlock() looks like it is able to handle this correctly:

// It is only legal to clone a function if a block address within that
// function is never referenced outside of the function.  Given that, we
// want to map block addresses from the old function to block addresses in
// the clone. (This is different from the generic ValueMapper
// implementation, which generates an invalid blockaddress when
// cloning a function.)
//
// Note that we don't need to fix the mapping for unreachable blocks;
// the default mapping there is safe.
if (BB->hasAddressTaken()) {
  Constant *OldBBAddr = BlockAddress::get(const_cast<Function*>(OldFunc),
                                          const_cast<BasicBlock*>(BB));
  VMap[OldBBAddr] = BlockAddress::get(NewFunc, NewBB);
}
...

I'm currently wrestling with CGSCC, and understanding ref to call edge promotion. Local patch (based off attached patch in https://bugs.llvm.org/show_bug.cgi?id=40722) is breaking the invariant that "No function transformations should introduce *new* ref edges!" in llvm/lib/Analysis/CGSCCPassManager.cpp#llvm::updateCGAndAnalysisManagerForFunctionPass().

  • rewrite approach to allow inlining of blockaddresses to proceed.
  • add base test case.
  • undo one missed part from v1
nickdesaulniers planned changes to this revision.Mar 6 2019, 4:48 PM
E5ten added a subscriber: E5ten.Mar 13 2019, 7:10 AM
  • undo one missed part from v1
  • - [CGSCCPassManager] don't walk references from blockaddress'
  • limit inlining blockaddresses to ones that are only used by callbr
nickdesaulniers planned changes to this revision.Mar 27 2019, 1:54 PM
nickdesaulniers retitled this revision from [INLINER] allow inlining of address taken blocks to [INLINER] allow inlining of blockaddresses if sole uses are callbrs.Mar 27 2019, 1:56 PM
nickdesaulniers edited the summary of this revision. (Show Details)
nickdesaulniers added a subscriber: void.

Note that with:

  1. https://reviews.llvm.org/D60208 ("[X86] Extend boolean arguments to

inline-asm according to getBooleanType")

  1. https://reviews.llvm.org/D58260 ("[INLINER] allow inlining of

blockaddresses if sole uses are callbrs")

  1. https://reviews.llvm.org/D56571 ("[RFC prototype] Implementation of

asm-goto support in clang")

I can compile a mainline x86 defconfig Linux kernel and boot it in QEMU.

@chandlerc is the change to llvm/lib/Analysis/CGSCCPassManager.cpp correct? Otherwise the block in llvm/include/llvm/Analysis/LazyCallGraph.h LazyCallGraph::visitReferences is suspicious:

if (BlockAddress *BA = dyn_cast<BlockAddress>(C)) {
  // The blockaddress constant expression is a weird special case, we
  // can't generically walk its operands the way we do for all other
  // constants.
  if (Visited.insert(BA->getFunction()).second)
    Worklist.push_back(BA->getFunction());
  continue;
}

I suspect that maybe the function shouldn't be added to the worklist if the BA refers to the function it resides in?

llvm/unittests/Analysis/LazyCallGraphTest.cpp has a test case HandleBlockAddress ($ ./llvm/build/unittests/Analysis/AnalysisTests --gtest_filter=LazyCallGraphTest.HandleBlockAddress) that tests block addresses that refer to blocks outside of their own function (that's a case I don't think can be done in C even with GNU C extensions).

TODO: I need to rebase the test case. <label>: are no longer generated.

that's a case I don't think can be done in C even with GNU C extensions

It can't be written explicitly in C, sure... but optimizations like IPSCCP can propagate blockaddress constants to other functions.

nickdesaulniers added a comment.EditedApr 23 2019, 12:56 PM

@chandlerc alternatively, if I drop the posted changed to llvm/lib/Analysis/CGSCCPassManager.cpp and instead do:

diff --git a/llvm/include/llvm/Analysis/LazyCallGraph.h b/llvm/include/llvm/Analysis/La
zyCallGraph.h
index 328654763b59..62e062a30718 100644
--- a/llvm/include/llvm/Analysis/LazyCallGraph.h
+++ b/llvm/include/llvm/Analysis/LazyCallGraph.h
@@ -1082,12 +1082,21 @@ public:
         continue;
       }
 
+      // The blockaddress constant expression is a weird special case, we can't
+      // generically walk its operands the way we do for all other constants.
       if (BlockAddress *BA = dyn_cast<BlockAddress>(C)) {
-        // The blockaddress constant expression is a weird special case, we
-        // can't generically walk its operands the way we do for all other
-        // constants.
+        bool AllUsersAreSameFn = true;
+        for (User *U : BA->users())
+          if (Instruction *I = dyn_cast<Instruction>(U))
+            if (I->getFunction() != BA->getFunction()) {
+              AllUsersAreSameFn = false;
+              break;
+            }
         if (Visited.insert(BA->getFunction()).second)
-          Worklist.push_back(BA->getFunction());
+          // If the blockaddress' function it refers to is a function it's not
+          // defined within, visit the other function.
+          if (!AllUsersAreSameFn)
+            Worklist.push_back(BA->getFunction());
         continue;
       }

This also works. I *think* it's correct that we don't want to re-visit the function the blockaddress refers to, if it refers to the same function the blockaddress is within. LazyCallGraphTest.HandleBlockAddress tests what happens with the blockaddress refers to a block outside the scope of the current function.

  • looks like <label>: is no longer emitted
  • move modified iteration logic into LazyCallGraph::visitReferences
  • undo cmake change
  • actually undo cmake changes
void added inline comments.Apr 25 2019, 6:20 PM
llvm/lib/Analysis/InlineCost.cpp
2087

nit: "Disallow inlining of blockaddresses which are used by non-callbr instructions."

void added a comment.EditedApr 26 2019, 12:30 PM

that's a case I don't think can be done in C even with GNU C extensions

It can't be written explicitly in C, sure... but optimizations like IPSCCP can propagate blockaddress constants to other functions.

But should it? Here's what indirectbr has to say:

This implies that jumps to labels defined in other functions have undefined behavior as well.

That implies that a blockaddress that's propagated to other functions aren't used by anything other than arithmetic and comparison instructions.

From the discussion, it seems people are expecting too much from the blockaddress. If it's stored in memory or passed to another function, it can then be used only in an arithmetic/comparison instruction. The only situation where it can be used in a branching instruction is if it's within the original function. callbr relaxes this for functions being inlined, but there's still a restriction that after the inlining is finished, the blockaddress refers to a block _inside_ the inlinee.

But should it?

It seems like a reasonable optimization, depending on the code structure.

That implies that a blockaddress that's propagated to other functions aren't used by anything other than arithmetic and comparison instructions.

The important thing is that it can also be returned back to the original function, either directly, or indirectly through memory.

void added a comment.Apr 29 2019, 1:37 PM

But should it?

It seems like a reasonable optimization, depending on the code structure.

That implies that a blockaddress that's propagated to other functions aren't used by anything other than arithmetic and comparison instructions.

The important thing is that it can also be returned back to the original function, either directly, or indirectly through memory.

Right. My point is that it's equivalent to passing a pointer around, except that the pointer cannot be dereferenced anywhere except the originating function. (For the purposes of what Nick's trying to do, "originating function" includes code that's inlined into a function.) As long as IPSCCP and the like don't violate that constraint, passing the pointer around isn't going to cause semantic changes to the program. (A function that tries to jump to a block address that's not within that function is already undefined and still should be.)

Perhaps I'm missing something though...

  • update for Bill's comment, improve commit message, add AnalysisTest
nickdesaulniers edited the summary of this revision. (Show Details)Apr 29 2019, 2:12 PM
nickdesaulniers edited the summary of this revision. (Show Details)
nickdesaulniers edited the summary of this revision. (Show Details)
nickdesaulniers marked 2 inline comments as done.Apr 29 2019, 2:14 PM
nickdesaulniers added inline comments.
llvm/include/llvm/Analysis/LazyCallGraph.h
1094

Is there a better way to express SomeConstant->getFunction()? I guess not, since the constant can appear in multiple functions? Is there a better way to do what I'm trying to do here?

nickdesaulniers edited the summary of this revision. (Show Details)Apr 29 2019, 2:15 PM
srhines added inline comments.Apr 30 2019, 12:00 AM
llvm/include/llvm/Analysis/LazyCallGraph.h
1094

I don't think there is a much better way to detect this, but BA->getFunction() will always return the same result here, so abstracting that out (e.g. FunctionContainingBlockAddress) would make this more obvious.

One other (possibly silly) question. Is it possible for a use of a blockaddress to be not contained in an Instruction (i.e. as part of the definition of some other global constant)? I don't think that makes sense in the IR, so perhaps it is forbidden some other way, but I just wanted to double check.

1096–1099

"If the blockaddress is used outside of the function it is defined within ..."

Right. My point is that it's equivalent to passing a pointer around, except that the pointer cannot be dereferenced anywhere except the originating function. (For the purposes of what Nick's trying to do, "originating function" includes code that's inlined into a function.) As long as IPSCCP and the like don't violate that constraint, passing the pointer around isn't going to cause semantic changes to the program. (A function that tries to jump to a block address that's not within that function is already undefined and still should be.)

Oh, you're saying that if we're inlining a function that contains an indirectbr, but we aren't inlining the indirectbr, we can allow inlining as long as we as the blockaddress still points to the original function. That's abstractly reasonable, but it probably breaks the Linux kernel's _THIS_IP_ macro (and therefore, I'm guessing, isn't what gcc does).

llvm/include/llvm/Analysis/LazyCallGraph.h
1094

Is it possible for a use of a blockaddress to be not contained in an Instruction

Yes; C testcase:

void f() { static void*p = &&Z; goto *p; Z:return; }

I'm not sure why we aren't just ignoring blockaddress constants here; they shouldn't really be relevant to the call graph. You can't get the address of a function from a blockaddress.

void added a comment.Apr 30 2019, 1:10 PM

Right. My point is that it's equivalent to passing a pointer around, except that the pointer cannot be dereferenced anywhere except the originating function. (For the purposes of what Nick's trying to do, "originating function" includes code that's inlined into a function.) As long as IPSCCP and the like don't violate that constraint, passing the pointer around isn't going to cause semantic changes to the program. (A function that tries to jump to a block address that's not within that function is already undefined and still should be.)

Oh, you're saying that if we're inlining a function that contains an indirectbr, but we aren't inlining the indirectbr, we can allow inlining as long as we as the blockaddress still points to the original function. That's abstractly reasonable, but it probably breaks the Linux kernel's _THIS_IP_ macro (and therefore, I'm guessing, isn't what gcc does).

Here's pseudo code that might explain what I'm getting at. Like I said, I might be missing something, but I think inlining a function that uses blockaddresses should be a fairly straight-forward process.

label label_array = []

function foo()
  label_array.append(_x)
  label_array.append(_y)

  // some code that calculates v

  mux(label_array[v])

  // maybe more code

  goto label_array[v]

_x: // do something fancy

_y: // do cleanup


function bar()
  foo()

In foo, even though the values of _x and _y escape the function, they can't be modified outside of foo (or, really, inside the function). If something outside of foo modifies the values in label_array, it's either a bug in the compiler or the user explicitly doing something bad. That's what we have today, I believe.

Now if we inline foo into bar, it should be an easy task of modifying the now inlined indirectbr to point to the new _x and _y:

function bar()
  label_array.append(_x_new)
  label_array.append(_y_new)

  // some code that calculates v

  mux(label_array[v])

  // maybe more code

  goto label_array[v]

_x_new: // do something fancy

_y_new: // do cleanup

There's the obvious issue that label_array now has different entries depending on the inlining, but we have that already with regular inlining that modifies a global array.

nickdesaulniers marked 2 inline comments as done.May 2 2019, 1:41 PM
nickdesaulniers added inline comments.
llvm/include/llvm/Analysis/LazyCallGraph.h
1094

Yes; C testcase:

Godbolt Link

I'm not sure why we aren't just ignoring blockaddress constants here; they shouldn't really be relevant to the call graph.

I think it's because the IR allows you to refer to blocks scoped to different functions, so if you're iterating the call graph, it's possible that you have a reference from one function to another via blockaddresses.

1096–1099

Not just that, because if g() contains a blockaddress that refers to a block within f().

nickdesaulniers marked an inline comment as done.May 2 2019, 1:41 PM

I think you mentioned that there is already a test case that checks a global initialized with blockaddress? If I've mis-remembered, we should definitely add one and check that it *doesn't* inline (today). We can update it if/when we add support. If it is possible to add one that directly works w/ callbr, that'd be great.

FWIW, I largely agree with Bill that we *should* be able to do this in way more cases. I'd love to see patches doing that. I think the most difficult aspect will just be writing all the test cases for all the different patterns so that we're confident we're doing the right updates in each case (or detecting when we can't and blocking). But I don't think you need to do that to land this patch. =D I'm mostly hoping Bill or you works on it as a follow-up.

llvm/include/llvm/Analysis/LazyCallGraph.h
1088–1100

This can be written more simply with a predicate:

if (llvm::any_of(BA->users(),
                 [&](User *U) {
                   if (Instruction *I = dyn_cast<Instruction>(U))
                     return I->getFunction() != BA->getFunction();

                   return false;
                 }))
  Worklist.push_back(BA->getFunction());

continue;
llvm/lib/Analysis/InlineCost.cpp
1835

Add a FIXME that we should probably continue relaxing this along the lines suggested by Bill?

  • simplify w/ llvm::any_of
  • add fixme
nickdesaulniers marked 2 inline comments as done.May 16 2019, 4:22 PM

I think you mentioned that there is already a test case that checks a global initialized with blockaddress? If I've mis-remembered, we should definitely add one and check that it *doesn't* inline (today). We can update it if/when we add support.

Yes; I believe that llvm/test/Transforms/Inline/blockaddress.ll covers this case.

If it is possible to add one that directly works w/ callbr, that'd be great.

TODO(Nick)

  • add global blockaddress + callbr test
nickdesaulniers marked an inline comment as done.May 17 2019, 11:28 AM
nickdesaulniers added inline comments.
llvm/test/Transforms/Inline/blockaddress.ll
53 ↗(On Diff #200071)

If it is possible to add one that directly works w/ callbr, that'd be great.

@chandlerc is this what you had in mind? Either way, ready for rereview. :)

chandlerc accepted this revision.May 17 2019, 2:31 PM

One edge case left I think, but otherwise this LGTM. Happy for you to submit with the suggested change and a test case.

llvm/include/llvm/Analysis/LazyCallGraph.h
1101

My clang-format spidey sense tingles here...

Also, come to think of it, we should check the Visited set first even though it means hashing it twice. And I think we can use early exit to make this more clear.

if (Visited.count(...))
  continue;
if (all_of(..., [](...) {
      if (Instruction *I = ...)
        return I->getFunction() == BA->getFunction();
      return false; // CHANGE HERE
    }))
  continue;

Visited.inesrt(...);
Worklist.push_back(...);

I think this will be a lot easier to understand.

I've also suggested a change in behavior. If we have a *non* instruction user, I think we should conservatively walk through it. We'd need to recurse trhough it here otherwise.

It'd be good to add a test case along thes elines where we round trip through an extra global or something?

This revision is now accepted and ready to land.May 17 2019, 2:31 PM
nickdesaulniers marked an inline comment as done.May 17 2019, 2:50 PM
nickdesaulniers added inline comments.
llvm/include/llvm/Analysis/LazyCallGraph.h
1101

we should check the Visited
if (Visited.count(...))

Just triple checking that you meant Visited.insert (I don't mean to be pedantic, I'm just unclear)?

nickdesaulniers marked an inline comment as done.May 17 2019, 2:51 PM
nickdesaulniers added inline comments.
llvm/include/llvm/Analysis/LazyCallGraph.h
1101

Oh, nvm I see count then insert later. Ok sorry for the noise.

nickdesaulniers marked an inline comment as done.May 17 2019, 2:53 PM
nickdesaulniers added inline comments.
llvm/include/llvm/Analysis/LazyCallGraph.h
1101

return false; // CHANGE HERE
If we have a *non* instruction user, I think we should conservatively walk through it. We'd need to recurse trhough it here otherwise.

Doesn't hurt.

It'd be good to add a test case along thes elines where we round trip through an extra global or something?

Maybe noob question but what's an example of a Use of a Constant that's not an Instruction?

  • prefer all_of == to anyof !=
void added inline comments.May 17 2019, 4:27 PM
llvm/include/llvm/Analysis/LazyCallGraph.h
1101

Values can be used by Global*s and probably other objects.

nickdesaulniers marked an inline comment as done.
  • additional test case w/ global aggregate
nickdesaulniers marked an inline comment as done.May 17 2019, 4:58 PM
nickdesaulniers added inline comments.
llvm/test/Transforms/Inline/blockaddress.ll
91 ↗(On Diff #200121)

It'd be good to add a test case along thes elines where we round trip through an extra global or something?

Values can be used by Global*s and probably other objects.

Is this just-added-testcase (from L91 to EOF) what was expected?

chandlerc accepted this revision.May 18 2019, 2:05 AM

LGTM

llvm/include/llvm/Analysis/LazyCallGraph.h
1101

Yeah. Globals, other constants, etc.

This revision was automatically updated to reflect the committed changes.

Thanks for the code reviews.