This is an archive of the discontinued LLVM Phabricator instance.

[ExtendLifetimes][2/4] Implement fake.use intrinsic in LLVM to extend the lifetime of operands

Authored by StephenTozer on Aug 10 2023, 6:50 AM.



This patch is part of a set of patches that add an -fextend-lifetimes flag to clang, which extends the lifetimes of local variables and parameters for improved debuggability. In addition to that flag, the patch series adds a new function attribute to disable the post-RA scheduler (which is applied by -fextend-lifetimes), a pragma to selectively disable -fextend-lifetimes, and an -fextend-this-ptr flag which functions as -fextend-lifetimes for this pointers only. All changes and tests in these patches were written by @wolfgangp, though I will be managing these reviews and addressing any comments raised. Discussion on the approach of this patch series as a whole should be directed to the 3rd patch, D157613, in order to simplify review.

This particular patch implements a new intrinsic instruction in LLVM, llvm.fake.use in IR and FAKE_USE in MIR, that takes a single operand and has no effect other than "using" its operand, to ensure that its operand remains live until after the fake use. This patch does not emit fake uses anywhere; the next patch in this sequence causes them to be emitted from the clang frontend, such that for each variable (or this) a fake.use operand is inserted at the end of that variable's scope, using that variable's value. This patch covers everything post-frontend, which is largely just the basic plumbing for a new intrinsic/instruction, along with a few steps to preserve the fake uses through optimizations (such as moving them ahead of a tail call).

Diff Detail

Event Timeline

StephenTozer created this revision.Aug 10 2023, 6:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 10 2023, 6:50 AM
StephenTozer requested review of this revision.Aug 10 2023, 6:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 10 2023, 6:50 AM

I've had a look at all the code changes but stopped at the tests; looking broadly good, I've got a few questions / improvements though.


Could we say "is a no-op" or something to even more explicitly express that there's no visible effect of this intrinsic?


Obvious question: why not just make it single-argument then, or are all intrinsics variadic?


The word "inconsistencies" is a bit scary, would you know whether there are serious repercussions, or is this a pragmatic-ish "Escape all fake-used ptrs to stop them getting optimised out" approach?

A little explanation of the motivation for anyone unfamiliar with fake_use's stumbling onto this code would help too.


IMO: we should absolutely ram it home into the reader that this is not an observable change to the generated code and explain why we remove it. i.e., "As it's unused, removing the load has no effect. The loaded value has been kept alive for debugging by the fake use, but it will be available on the stack, it isn't necessary to put it in a register too".


(See comment on above hunk)


Can we use C++17 if (decl; cond) statement's to make this a single lexical scope?


Can we just do without this addition, and have a hard-error (the llvm_unreachable) for anyone trying to invoke this intrinsic? For this kind of thing, I believe in "fail fast, fail hard".


SmallVector to avoid the initial allocation


This loop is awkwardly quadratic, in that the outer loop iterates from the tail call to End, and opDefdByOrAfter iterates from the tail call to "Inst". Is there any way we can restructure it? Maybe if sdag uses DominatorTree we can use the local-ordering-number cache to do this quicky?


IIiiiiinteresting -- I'm slightly twitchy about this because this behaviour isn't symmetric with getNextNonDebugInstruction, and fake uses are supposed to be observable no-op uses so that we really _do_ suppress optimisations in favour of debuggability. Are you aware of any functional reasons why this change is needed?

(I'm also trying to suppress this loop for performance reasons in the DDD/RemoveDIs stack of patches).


clang-format on the indentation


This SROA stuff looks fine to me, but as it's @Orlando 's favourite pass, he might want a closer look.

Comments on tests; haven't looked at the python one yet.

36 ↗(On Diff #549015)

There also needs to be some kind of positive check, that fake.uses are preserved or present, or at least something that will cause a test failure if we read a stack trace and no output.

39 ↗(On Diff #549015)

Typed pointers are banned now, right?

1 ↗(On Diff #549015)

I get the feeling that this has never been run with arches like SPARC, RISC-V and other exotic stuff. It might be risky to have such a generic test; in the worst case scenario we can downgrade it to just being an x86 test instead.

6–12 ↗(On Diff #549015)

I feel like a implicit-check-not for fake_use would be a stronger test.


pls2put at the top of the file as that's the convention.


This should be an implicit check not, and there should be a positive check line somewhere to ensure some vaguely relevant output is produced.

10 ↗(On Diff #549015)

RUN at the top, positive CHECK line needed etc, as with various other files here.

1 ↗(On Diff #549015)

This is widely targeted; is there any further information about what pass provoked this test? Having a whole-O2 test, especially in the "CodeGen" dir, might seem like overkill.

StephenTozer added inline comments.Sep 7 2023, 7:14 AM

Not all intrinsics are variadic, but only variadic intrinsics can take a wildcard argument type (so you can have fake.use(i32), fake.use(f64), fake.use(ptr) etc).


Looking back at this, I'm not sure whether this is still an issue, as it seemed to trigger asserts in situations involving pointer bitcasts, which we don't see any more with opaque pointers; I'll do some further testing, but pull it out of the review for now until I'm confident it's necessary.


Yeah, this could easily be done with either a SmallSet or Instruction::comesBefore; I'm not sure which would have a better performance profile (SmallSet = O(1) but with more overhead, comesBefore = O(n) if order is invalidated but very little overhead after order achieved), but the latter sounds more promising and readable I think.


This should be symmetric with getNextNonDebugInstruction, so that does need changing; broad principle here is that we only really want the fake uses to be "uses", while for all operations (I think) where we want to find the next/previous instruction, we would want to ignore them. They need a position to mark the end of the extended lifetime, and they need to be treated as a use, but there's no value for example in having them block a peephole optimization.

1 ↗(On Diff #549015)

Hm, looks like this test was included by mistake!

Made some general fixes to tests that were broken in some capacity and addressed review comments.

jmorse accepted this revision.Sep 12 2023, 4:25 AM

LGTM, with the rider that I'm not sure what the policy is on putting the python test in there. It's fundamentally a debug-info quality test, which is slightly different to the functional tests that we have elsewhere. The failure mode would be where some optimisation decision changes, we get a non-perfect but acceptable debug-info output, and an optimisation writer is inconvenienced by a test that "can't be correctly updated", as it were.

IMO, we should just put a comment in the test file using the script explaining the purpose of the test (ensure that we get 100% coverage when lifetimes are extended) and what to do if the optimisation decisions break the test, i.e. ensure there's a block of code that gets optimised but not fully because of fake uses, that achieves full coverage.


Cool -- just to give the full context, we can reclaim about 0.3% of compile time in _non_ -g builds if we don't have all the branching here when moving forwards one instruction. That's 0.3% I'd like to spend on RemoveDIs ideally. I think at some point we'll need to measure the performance impact of those peephole optimisations being blocked and see whether it's worth the tradeoff.


Run line at top pls

This revision is now accepted and ready to land.Sep 12 2023, 4:25 AM

LGTM, with the rider that I'm not sure what the policy is on putting the python test in there. It's fundamentally a debug-info quality test, which is slightly different to the functional tests that we have elsewhere.

Yeah, that doesn't seem suitable - it's an integration test - might as well make it use actual source code and put it in cross-project-tests, I think?