This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Don't emit lifetime intrinsics for some local variables
ClosedPublic

Authored by vitalybuka on Sep 16 2016, 5:48 PM.

Details

Summary

Current generation of lifetime intrinsics does not handle cases like:

{
  char x;
l1:
  bar(&x, 1);
}
goto l1;

We will get code like this:

  %x = alloca i8, align 1
  call void @llvm.lifetime.start(i64 1, i8* nonnull %x)
  br label %l1
l1:
  %call = call i32 @bar(i8* nonnull %x, i32 1)
  call void @llvm.lifetime.end(i64 1, i8* nonnull %x)
  br label %l1

So the second time bar was called for x which is marked as dead.
Lifetime markers here are misleading so it's better to remove them at all.
This type of bypasses are rare, e.g. code detects just 8 functions building
clang (2329 targets).

PR28267

Event Timeline

vitalybuka updated this revision to Diff 71719.Sep 16 2016, 5:48 PM
vitalybuka retitled this revision from to [CodeGen] Don't emit lifetime intrinsics for some local variables.
vitalybuka updated this object.
vitalybuka added a reviewer: eugenis.
vitalybuka added a subscriber: cfe-commits.
vitalybuka updated this object.Sep 16 2016, 5:49 PM
vitalybuka updated this object.Sep 16 2016, 5:51 PM
vitalybuka added a subscriber: kcc.
eugenis added inline comments.Sep 19 2016, 10:21 AM
lib/CodeGen/VarBypassDetector.h
51

rename to smth like StartFunction()?
add some API documentation.

Can you add some test cases?

It looks like the test case was removed when this patch we rebased.

vitalybuka marked an inline comment as done.Sep 19 2016, 11:27 AM

The patch was split in two and I moved the test into the wrong one. I'll fix this.

recovered test

Do we want to remove lifetime intrinsics when we aren't doing the asan-use-after-scope check? Since this isn't a mis-compile caused by inaccurate lifetime intrinsics, I was wondering whether we should do this only when asan-use-after-scope is on to minimize the impact on compile time.

Intrinsics are invalid there, code is generated in such way that variable is being accessed after lifetime.end.
I suspect that optimizer can make invalid code because of this, but I can't reproduce.
So I think it's safer to remove them at all and don't wait for miscompile reports.

Still if performance is greater concern than potential miscompiles, I can limit this only to asan-use-after-scope.

Do we want to remove lifetime intrinsics when we aren't doing the asan-use-after-scope check? Since this isn't a mis-compile caused by inaccurate lifetime intrinsics, I was wondering whether we should do this only when asan-use-after-scope is on to minimize the impact on compile time.

This doesn't sound right. Given the example in the description, we are accessing the memory location after end has been called: this seems like a real miscompile. It would appear unsafe to only do this for asan.

This doesn't sound right. Given the example in the description, we are accessing the memory location after end has been called: this seems like a real miscompile. It would appear unsafe to only do this for asan.

My impression was that this wasn't a miscompile, but I'm not so sure now. Do you have a concrete example where any of the optimization passes miscompile the code like that shown in PR28267 because of missing or misplaced lifetime intrinsics? I spent some time looking at how StackColoring (which is the primary user of this intrinsic) transforms the code in PR28267, and it didn't look like this would cause any miscompile (it seemed like it was able to compute the lifetime interval for %tmp correctly). I'm not sure whether other optimization passes are handling it correctly though.

vitalybuka added a comment.EditedSep 21 2016, 5:11 PM

Miscompile.
Here assert fails without the patch.

int* p1;
int* p2;

int use2() {
  assert(p1 != p2 || !"reuse");
  return p1 == p2;
}

void f3(int cond) {
  {
    int tmp[1024];
    p1 = tmp;
    goto l2;
  l1:
    int tmp2[1024];
    p2 = tmp2;
    exit(use2());
  }
 l2:
  goto l1;
}

Thank you for the great example! I can now see this patch does fix mis-compiles.

There are probably other lifetime bugs you'll see when the code being compiled includes gotos that jump past variable declarations, including the one here: http://lists.llvm.org/pipermail/cfe-dev/2016-July/050066.html. Do you think you can extend the approach taken in this patch to prevent mis-compiles for those too?

Also, rather than removing the intrinsics altogether, have you considered making changes to IRGen to insert them to different locations or insert extra lifetime.starts? In your example, if I insert lifetime.start for "tmp" at the beginning of label "l1", it doesn't assert. I made the same changes for the example I sent to cfe-dev, but it didn't work because DSE removed the store to "i" (if I disable DSE, I see the expected result).

Thank you for the great example! I can now see this patch does fix mis-compiles.

There are probably other lifetime bugs you'll see when the code being compiled includes gotos that jump past variable declarations, including the one here: http://lists.llvm.org/pipermail/cfe-dev/2016-July/050066.html. Do you think you can extend the approach taken in this patch to prevent mis-compiles for those too?

This probably can be extended to this case, but I'd prefer to do this in separate patch later.

Also, rather than removing the intrinsics altogether, have you considered making changes to IRGen to insert them to different locations or insert extra lifetime.starts?

I can see how to insert starts, e.g. on every label which bypass declaration, but I am not sure where to put ends.
Probably it's possible, but patch will be significantly more complicated. I'd prefer to do so only when needed.
This is infrequent usecase, so it's probably not worth of additional complexity.

I can see how to insert starts, e.g. on every label which bypass declaration, but I am not sure where to put ends.
Probably it's possible, but patch will be significantly more complicated. I'd prefer to do so only when needed.
This is infrequent usecase, so it's probably not worth of additional complexity.

I think the right long term solution is to have IRGen emit tighter lifetime ranges, but I don't have a solution that I know would always work correctly. That might require rethinking the design of lifetime intrinsics.

rsmith edited edge metadata.Sep 26 2016, 6:43 PM

Before we start with heroics here, we should consider whether the LLVM intrinsics are actually specified the right way. The current specification does the wrong thing for even trivial cases, such as a variable declared within a loop, so there's some impedance mismatch between the specification and how Clang uses the intrinsics, even with this patch applied. Can we get some clarity on how these intrinsics are *actually* supposed to work? (Is it permitted to have multiple start+end regions for the same alloca, or do we need to suppress them inside loops too? What happens if we have multiple starts but no ends?)

vitalybuka added a comment.EditedSep 26 2016, 10:13 PM

My assumption is that "start" makes access valid, and "end" makes access invalid, up to the next "start".
I see no problems with loops and multiple regions, as soon as access is happening between "start" and "end". Loops always call "start" for nested alloca on each iteration and call "end" on iteration cleanup. And this makes sense as it's a new variable every iteration, just stored in the same alloca. For my application I assume that variable is accessible right after the first start, and invalid after the first "end". So no problems with multiple starts.
I see no cases other then this "goto" issues where clang behaves differently.

This patch addresses issues were the last intrinsic before access was "end":
call start, ... call end, ...access
or only the end: entry, ... call end, ... access

vitalybuka updated this revision to Diff 72909.Sep 28 2016, 3:02 PM
vitalybuka edited edge metadata.

rebase

eugenis edited edge metadata.Sep 28 2016, 4:35 PM

My assumption is that "start" makes access valid, and "end" makes access invalid, up to the next "start".

That's also my understanding, but LangRef does not say anything about llvm.lifetime.start cancelling the effects of llvm.lifetime.end.

llvm.lifetime.end:
Any stores into the memory object following this intrinsic may be removed as dead.

I'm concerned about the complexity of this approach; it's hard for me to be confident that BuildScopeInformation is correct and will remain correct in the presence of future changes to the AST, and if it's not, we'll potentially silently miscompile.

A simpler but less precise approach would be to disable lifetime markers for all variables for which it is possible to jump into their scope (that is, in C++, variables of suitably-trivial types with no explicit initializer, and in C, variables of non-variably-modified type) whose scopes contain any kind of label. I'd have a lot more confidence that such an approach would be -- and would remain -- correct.

I'd also like to see some tests for indirectbrs (computed goto) since they're the case where we would have the most trouble producing the correct set of lifetime markers.

On the other hand, for ASan's purposes, we really want a precise set of markers here. If we can determine the set of variables brought into scope by each goto/computed goto/switch branch that we emit, we can split the edge or (for an unsplittable edge) insert a conditional branch into the destination block to call the lifetime start intrinsic. This would require CodeGen to track the set of live variables and store it for each such branch and branch target so that we can refer to it when fixing up the relevant edge, but that doesn't seem prohibitively expensive, and it would avoid any cost in the common case of a function containing no branches and no branch targets.

rsmith added a comment.Oct 3 2016, 2:31 PM

OK, so it seems like all the other approaches we discussed have problems too. Let's move forward with this for now.

Is there some reasonable base set of functionality between this and JumpDiagnostics that could be factored out and shared?

lib/CodeGen/VarBypassDetector.cpp
46

Drop the BuildScopeInformation - here and in other documentation comments.

51–52

Comment seems to not be relevant in this copy of the code; we don't have any special case for block literals here.

83–87

This looks unreachable (our only callers are the recursive call below -- which already checked for these -- and cases that cannot have a labelled statement). If we did reach it, we'd do the wrong thing, because we'd have created an independent scope rather than reusing the parent's scope (x: int n; would not introduce a new scope into the parent for n). Maybe replace this case with llvm_unreachable, or move the while (true) loop below up to the top of this function and delete these cases.

Do we have the same issue in JumpDiagnostics?

106–109

Combine these two into a single if (auto *SC = dyn_cast<SwitchCase>(SubStmt)) case.

130

Maybe pass in S.second instead of GS here so that the callee doesn't need to look it up again.

lib/CodeGen/VarBypassDetector.h
40

We should do /something/ about these. We could track the set of address-taken labels when we walk the function, and assume that each indirect goto can jump to each address-taken label, or more simply just conservatively say that all variables are bypassed in a function that contains an indirect goto.

vitalybuka marked 5 inline comments as done.
vitalybuka edited edge metadata.
  • updated comments
  • indirect jumps
  • optimized Detect()
  • fixed comment
  • added test for indirect jumps
vitalybuka marked an inline comment as done.Oct 16 2016, 10:30 PM

Please take a look. Meanwhile, I will investigate performance footprint.

Is there some reasonable base set of functionality between this and JumpDiagnostics that could be factored out and shared?

I tried to do so from beginning but seems common part is only recursive traversal of AST. Sharing this will likely make both classes less readable.
In my taste it would be easier to maintain them separately.

lib/CodeGen/VarBypassDetector.cpp
51–52

I read this comment as explanation of "origParentScope : independentParentScope" and it's still needed and relevant.

83–87

Done here, but can't do for JumpDiagnostics. there are various BuildScopeInformation which can have label as child.

vitalybuka marked an inline comment as done.Oct 17 2016, 11:55 AM

Slowdown from this function is below: 0.05% and it's mostly just traversing AST.

rsmith accepted this revision.Oct 25 2016, 5:16 PM
rsmith edited edge metadata.
This revision is now accepted and ready to land.Oct 25 2016, 5:16 PM
This revision was automatically updated to reflect the committed changes.