Page MenuHomePhabricator

[lldb] Optimize expressions
Needs ReviewPublic

Authored by teemperor on May 17 2021, 7:15 AM.


Group Reviewers
Restricted Project

At the moment we codegen or interpret the IR in exactly the way as it is generated by Clang.
This patch runs the default O2 optimizations over the generated LLVM module before we
interpret or JIT it.

The motivation for this patch is twofold. First, LLDB contains an LLVM IR interpreter that
supports a very limited subset of LLVM instructions for the sake of maintainability. The
IR interpreter is intended as a fast first path to run the code provided by the user (fast in
comparison to JITing and injecting the code). It also is the only way LLDB can evaluate
expressions for targets where we cant JIT&inject code. Because the LLVM IR interpreter
only supports a limited set of instructions, most non-trivial Clang expression can't be
interpreted which is really limiting the expression evaluator functionality when we can't
JIT code. By running running the default -O2 passes over the IR before interpreting it,
LLVM will give us usually a much more concise IR that the IR interpreter can more often
than not handle.

Second, sometimes users want to run expressions that are simple but might process quite
a bit of data. An example for that would be any algorithm that users can now invoke via
the std C++ module. Calling std::count_if with a relatively simple predicate on a large
vector currently takes far too long to be considered usable.

To give some stats on the first claim:

  • Currently we run about 7700 expressions in a test suit run (the actual number of expressions is fluctuating a bit....)
  • Around 5800 of these expressions are simple enough that the IR interpreter can interpret them right now (most of them just print simple variables)
  • This leaves us with around 1900 expressions that we currently can't interpret in the test suite.
  • After applying this patch the number of expressions we can't interpret drops to around 1300 (-33%).
  • The average expression evaluation time when running on my local machine doesn't change with this patch (around 50ms before and after).

The patch consists of two parts:

  • The generic (new) pass manager setup and run code.
  • Some new code that injects a function call at the end of our expressions.

The injected function call just exists to prevent the optimizer from removing the store to our
result variable. Right now we capture l-values by having a pointer that points to alloca'd memory.
The optimizer thinks this is a dead store (which it actually is but we read the memory after the
call has finished) and removes it. By injecting a call to an undefined function that just takes the
address of the result we can prevent any of these optimizations. The call is alter on removed
after the pass manager is done so that we can actually interpret/JIT the module again.

Diff Detail

Event Timeline

teemperor created this revision.May 17 2021, 7:15 AM
teemperor requested review of this revision.May 17 2021, 7:15 AM
teemperor edited the summary of this revision. (Show Details)May 17 2021, 7:19 AM
teemperor added inline comments.

I do wish we had a better way to do the iterate-while-erase algorithm on IR nodes but apparently there is no such thing :(

vsk added a subscriber: vsk.May 17 2021, 8:53 AM
vsk added inline comments.

Why is it safe for lldb to take the address of "c"? After evaluation completes, is the state of the stack preserved?


Would marking __lldb_expr_result_ptr volatile make the call to __lldb_use_expr_result redundant?

teemperor added inline comments.May 17 2021, 9:45 AM

That's actually a good point and I am actually writing up an RFC about the current behaviour to lldb-dev. The current behaviour of LLDB is that we end the function call, we call all the destructors and so on and then we read the results from memory. So the memory we read is just the leftovers after the function call has returned. That's why a unique_ptr local variable in LLDB always looks like it owns no memory because we read it after it got already destroyed. I'm actually thinking whether/how we should fix that behaviour as seems rather fragile (but that's for the RFC).

So unless I'm missing something, we currently already rely on the stack being preserved to even get the correct results. The function call is just there to make sure the optimizer doesn't realize what we're up to.

Not sure about the volatile and if it would help. The __lldb_expr_result variable gets transformed by IRForTarget until it ends up being just a part of our argument struct IIRC, so I need to look into that before I know where we could sneak in a volatile.