This is an archive of the discontinued LLVM Phabricator instance.

[debuginfo][lsr] SCEV-based salvaging for LoopStrengthReduction
ClosedPublic

Authored by chrisjackson on Jun 30 2021, 9:45 AM.

Details

Summary

This patch extends salvaging of debuginfo in the Loop Strength Reduction (LSR)
pass by translating Scalar Evaluations (SCEV) into DIExpressions. The method is
as follows:

Cacheing Step
Pre-lsr, Examine the DVIs (dbg.value) in the loop and cache information on
those that are deemed salvageable (the location op has a SCEV that can be
translated toa DIExpression). The location op, SCEV for the location op and
the DIExpr are stored in a record. Currently only old-style single location
op DVI are salvageable, but the code is readily extensible to multiple ops
and generates multi-location DVI if necessary during the salvage step.
Salvaging of variadic debug values is the likely next iteration of this
patch, if accepted.

Obtain the Loop Induction Variable (IV)
The Loop class provides two methods to get the IV. Sometimes these fail to
return anything (even in the presence of a loop.body block with a PHI node
labelled 'lsr.iv'). When these methods fail, pick a SCEVable PHI node from
the loop header and use this as the IV. A similar PHI-node selection is
made in the current lsr salvage method DbgGatherEqualValues().

Induction Variable to iteration count DIExpr translation
The SCEV for the IV/PHI node is translated to a DIExpr that results in the
current iteratoun count.

Salvage
The cached DVI are examined. If the location has been marked as unavailable,
Generate an expression that enables the value to be recovered. The
expression uses the iteration count and the location op's SCEV that was
cached to calculate the value.

I have stripped the the existing salvaging as it is capable of recovering only a
subset of the source values recovered by this implementation. However, this more
general method relies on deriving the iteration count. So, in some cases a
longer DIExpression is produced compared to the exiting method. If this is
deemed unsatisfactory there are two options:

Restore the previous functionality and select that method of salvaging when
possible. This is probably not as messy as it sounds, as I have used similar
methods for cacheing data of the salvageable DVIs. Simply restore some of the
functionality to DbgGatherSalvagableDVI() and prioritise rewriting DVIs using
constant offsets of a phi node value.

I expect this is very little work. An important detail is that the new method
caches salvageable DVI slightly earlier in ReduceLoopStrength() than the
existing method.

Have a function that finds no-ops in the generated DIExpression once the
iteration count and source value DIExpressions are combined. This will be more
complicated than the first option, but may be able to simplify a greater
number of cases.
This is not as simple, but I'm sure simplifying stack-based arithmetic is well
understood and I expect implementations exist.

There are implementation limitations.

Support top-level SCEVs for values that are SCEVCommutativeExpr. A SCEV is
of the form {Start,+,stride}, a recurrence relation, with the start applied
at initialisation and stride applied per-iteration. In LLVM this is known as
a SCEVAddRecExpr However, some values cannot be fully expressed in this
form as they are dependant on multiple phis. In this case one or more
SCEVCommutativeExpr wrap the SCEVAddRecExpr e.g.
(({start,+stride} * %a * %b) + %c).
I did not encounter these expressions when examining values that had been
optimised out, some functionality to tranlate these SCEVs has been omitted.

Recovery of DIExpressions that had multiple location ops before
LSR. Note thatsalvaged DVI with multiple location ops can be generated. DVI
with a single location op and a non-empty DIExpression are salvageable.
To implement salavaging of variadic DVI the append(DIExpr*) function must be
altered to append referenced location Ops to the SCEVDbgBuilders Values
vector. If any of these are now 'undef', then generate an expression to
recover the value.

Translation of chain (non-affine) SCEVs. LLVM has a C++
implementation that evaluates a chain SCEV with the iteration number that
should prove useful.

Translation of nested SCEVAddRecs. These are SCEVs with a loop-variant start
value and are created by nested loops.

MinMax SCEVs have not been addressed. I intended to add a push() function for
these when one was encountered. However, not a single test in the lit
Transmforms suite appeared to generate any.

Diff Detail

Event Timeline

chrisjackson created this revision.Jun 30 2021, 9:45 AM
chrisjackson requested review of this revision.Jun 30 2021, 9:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2021, 9:45 AM

@chrisjackson thanks for working on this!

(I've just started looking into this, so I've included a few nits for now.)

llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
5850

SCEVDbgValueBuilder instead?

5851

SCEVDbgBuilder() = default;

5917

const auto& ?

Updated with changes suggested by @djtodoro

I like this approach. It seems a lot more elaborate than my original patch from last fall. If any of my input is requested I will be on vacation for the three coming weeks so it likely wont happen until after that. Anyway thanks for working on this!

Correct me if i'm misremembering things, but weren't there something like that (SCEV-driven debuginfo salvaging/recovery) already,
and it had problems with holding to the dead IR, and was removed because of that?
Does have similar correctness concerns?

markus added a comment.Jul 9 2021, 1:47 AM

Correct me if i'm misremembering things, but weren't there something like that (SCEV-driven debuginfo salvaging/recovery) already,
and it had problems with holding to the dead IR, and was removed because of that?

There was and it was briefly removed before being re-landed again after the issues with holding on to dead IR were addressed (2a6782bb9f1d6).

Applied clang-format.

Prevent a couple of warnings that had escaped attention and correct a call to clear().

This change improves the selection of the IV, by using the PHi node inserted by ScalarEvolutionExpander, which is typically called "%lsr.iv" or similar. The decision to do this resulted from analysis of a build of clang3.4. While the previous method of using Loop->getInductionVariable()/getCanonicalInductionVariable() provided correct IVs, these tended to be relatively short lived and were often optimised away by subsequent passes. Some care is needed in using this PHi as ScalarEvolutionExpander holds Maps of handles to the new PHi node.

Replace passing of pointer to IV and clear() call with a WeakVH..

Adjusted new IV selection when multiple IVs are produced by ScalarEvolutionExpander.

Looks good in principle; I've added some nits, and there are a couple of things that need to be revised,

  • I think the DWARF stack operands go on in the wrong order for casts, plus it needs test coverage,
  • Fail gracefully with unrecognised casts rather than asserting.
  • InsertedIVs needs to be clear()'able from SCEVExpander

CC @reames and @fhahn who appear to frequently touch SCEV -- this patch exports a vector of induction variables from SCEVExpander, which we assume is OK as SCEVExpander already exports inserted instructions and similar. The induction variables are used to salvage variable locations when a Value is optimised out of a loop during strength reduction. CCing to make you aware of the API change.

llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
5600

Performance-nit: could be a const reference to WeakVH instead of constructing a new one each time around the loop.

5927

Mega nit, I think the style convention is to pre-increment

5940–5941

Doesn't the DWARF stack work by operating on things already pushed onto the stack; and thus shouldn't the Inner value be pushed onto the stack before the convert operation is? (Like you do with "div" below).

Possibly I've missed something; but there should also definitely be a test that covers such casting, and it doesn't look like the tests cover anything that produces DW_OP_LLVM_convert.

5944–5947

Can we not use pushSCEV for Inner in all circumstances and not expand the if/else cases?

5959–5960

This probably wants to return false, rather than assert, better to drop debug-info than stop compiling.

(If the four casts listed cover all cast types, even better would be putting that in a call to assert, rather than using a conditional).

5966

clang-format? I suspect the empty line doesn't meet the style guidelines.

6132
6137–6138

Redundant given the above conditional?

6179–6180

clang-format

6228
jmorse added inline comments.Jul 23 2021, 4:44 AM
llvm/include/llvm/Transforms/Utils/ScalarEvolutionExpander.h
196–198

The style in this class appears to be that data fields are private and at the top of the class declaration; public accessors near the bottom. IMO, to fit in with the style, this vector should move towards where InsertedValues is declared, with a public accessor that returns a const reference.

204–205

You're going to need to clear InsertedIVs here.

llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6236

if (!IV) continue;? Avoid extra indentation, same with the extra condition below.

llvm/test/Transforms/LoopStrengthReduce/debuginfo-scev-variadic-value-0.ll
62 ↗(On Diff #360727)

NB, we usually strip un-necessary attributes to avoid needless maintenance; see also @djtodoro 's meta-burn tool on the mailing list. (Probably not necessary for these tests as they're fairly small already).

After a rebase I observed additional DVIs that were single location op when cached (pre-LSR) but had multiple location ops post-LSR, when scev-salvaging was applied. Instead of skipping these, restore the pre-LSR information and attempt SCEV salvaging.

Thanks for the review @jmorse, I will address your comments next.

Apply changes in response to @jmorse's review. Additionally:

  • Make LSR IV vector private and add accessor.
  • Collapse shortened PushCast() into PushSCEV().
  • Fix formatting.
  • Fix a mistake in pushSCEV()'s call to pushArithmeticImpl().

A lit test covering pushCast will follow.

chrisjackson marked 15 inline comments as done.Jul 24 2021, 11:49 PM

Added a lit test that covers PushCast(), which translates SCEVCastExpr. Removed some unnecessary details from the existing tests and changed the test names to something more descriptive.

chrisjackson marked an inline comment as done.Jul 25 2021, 1:28 PM

Applied clang-format.

jmorse accepted this revision.Jul 26 2021, 4:37 AM

Thanks for the additional test; ideally the "switching dbg.value(!DIArgList..." back to single-value dbg.values would be covered by a test case too, just to ensure that this doesn't change in the future. LGTM with some nits.

llvm/include/llvm/Transforms/Utils/ScalarEvolutionExpander.h
86
209

IIRC, this needs to be SmallVectorImpl to abstract the size of the underlying vector from the way of accessing it.

llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
2091

Similarly wants to be SmallVectorImpl to avoid relying on the vector size detail.

5964–5966

I don't quite get why we're willing to return-false if there are unrecognised operators; but assert false if there are unrecognised cast operators. Are these four classes the only derived classes from SCEVCastExpr, and the idea is to fire the assertion if someone adds another? If so, a comment to that effect would be good.

6175–6185

Ideally would want a test.

This revision is now accepted and ready to land.Jul 26 2021, 4:37 AM

Applied changes requested by @jmorse: changed accessors to use SmallVectorImpl and added a comment in pushCast().
The path covering the salvage of dbg.value that were changed from single location op to multi location op by LSR is covered already e.g. debuginfo-scev-salvage-1.ll. I'm not certain how to make this clear as all the modifications to the intrinsic occur within the pass.

Corrected a comment.

chrisjackson marked 4 inline comments as done.Jul 26 2021, 2:22 PM
This revision was landed with ongoing or failed builds.Jul 27 2021, 5:01 AM
This revision was automatically updated to reflect the committed changes.
  • Add 'REQUIRES' clause to lit tests to prevent failure on aarch64.
  • A failure on buildbot sanitixer-X86_64-linux indicates that SCEVUknown::getValue() can return nullptr. Account for this by returning false and abandoning the salvage attempt.

Added missing RESTRICT in lit test.

This causes crashes for me, I’ll provide reproduction samples shortly.

The release branch has been created, so whatever fixes/reverts are done has to be made similarly for the release branch, cc @tstellar.

The crash is reproducible with this reduced sample:

$ cat repro.c 
struct a { 
  char *b
} static c; 
d(());
e(struct a *f) { d(f->b); }
g() { 
  while (h())
    e(&c);
}
$ clang -target x86_64-linux-gnu -c -g -O2 repro.c
Abpostelnicu added a subscriber: Abpostelnicu.EditedJul 28 2021, 1:57 AM

I think this introduced a build crash on Firefox. I'm saying this because the build crash occurs in the function touched by this patch, and the yesterday build, that didn't have this patch worked just fine!

Thanks for the reports, I have reverted and will look into the crash ASAP.

Thanks for the reports, I have reverted and will look into the crash ASAP.

Thank you for doing this! At Mozilla we have a nightly build with clang-trunk for Firefox in order to test performance degradation or build failures with the latest clang trunk.

getVariableLocationOp() can sometimes return nullptr. Guard against this.

Remove erroneous assert.

Thanks for the reports, I have reverted and will look into the crash ASAP.

Thanks! Can you make sure the 13.x release branch is fixed too? (I guess it's best to revert there too, and then possibly backport the reapplied version of the patch if deemed relevant?)

Thanks for the reports, I have reverted and will look into the crash ASAP.

Thanks! Can you make sure the 13.x release branch is fixed too? (I guess it's best to revert there too, and then possibly backport the reapplied version of the patch if deemed relevant?)

Of course, I will revert the commit in 13.x release.

Thanks for the reports, I have reverted and will look into the crash ASAP.

Thanks! Can you make sure the 13.x release branch is fixed too? (I guess it's best to revert there too, and then possibly backport the reapplied version of the patch if deemed relevant?)

The 13.x release branch has been reverted, as requested. I hope to land the updated commit in that branch too.

Thanks for the reports, I have reverted and will look into the crash ASAP.

Thanks! Can you make sure the 13.x release branch is fixed too? (I guess it's best to revert there too, and then possibly backport the reapplied version of the patch if deemed relevant?)

The 13.x release branch has been reverted, as requested.

Thanks!

I hope to land the updated commit in that branch too.

Please hold off of that for a little while, as the updated version still breaks things for me.

Unreduced repro: https://martin.st/temp/d3dx-preproc.c

clang -target aarch64-linux-gnu -w -c -g -O2 d3dx-preproc.c

(Chris is out today,)

Unreduced repro: https://martin.st/temp/d3dx-preproc.c

clang -target aarch64-linux-gnu -w -c -g -O2 d3dx-preproc.c

Thanks for the repo -- I've just pushed up rG2537120c8 as an obvious fix, none affine expressions are being fed to a function that doesn't expect them, and it's safe to return and not salvage variable locations in that case. I think it's just a case of an assertion being insufficiently filtered for.

If there's more / continued turbulence then we'll revert though, and we'll add test coverage for non-affine SCEVs being salvaged either way.

(Chris is out today,)

Unreduced repro: https://martin.st/temp/d3dx-preproc.c

clang -target aarch64-linux-gnu -w -c -g -O2 d3dx-preproc.c

Thanks for the repo -- I've just pushed up rG2537120c8 as an obvious fix, none affine expressions are being fed to a function that doesn't expect them, and it's safe to return and not salvage variable locations in that case. I think it's just a case of an assertion being insufficiently filtered for.

If there's more / continued turbulence then we'll revert though, and we'll add test coverage for non-affine SCEVs being salvaged either way.

Thanks for the quick fix on this matter! For now I think it seems like this issue is fixed wrt the code I test build regularly, but I'll know more exhaustively tomorrow.

Please backport whatever is necessary to fix this in the release/13.x branch (or file a bug and I can do it).

Please backport whatever is necessary to fix this in the release/13.x branch (or file a bug and I can do it).

AFAIK this was reverted in the release branch already, in 67d0736b14c7888e63af16dcc71129ce07383010, so the branch should be good wrt that. @chrisjackson expressed a desire to backport the working version of the commit though.

The current reapplied version of this in the main branch with the fix from rG2537120c8 seems to work fine for me now.

We've got a regression (assertion failure) with this change that didn't get fixed with 2537120c870c. I've opened PR 51329 for it. If this change gets backported to release/13.x, we'll need the fix for this assertion failure to also get backported.

Apologies for thre slow response, I currently have covid so I needed some time off. Thanks for reporting the failure I will attempt to address the assertion failure today. If I fix that then I am keen for this to be backported.

We've got a regression (assertion failure) with this change that didn't get fixed with 2537120c870c. I've opened PR 51329 for it. If this change gets backported to release/13.x, we'll need the fix for this assertion failure to also get backported.

Please see D107348 for a simple patch that provides a fix.

Apologies for thre slow response, I currently have covid so I needed some time off. Thanks for reporting the failure I will attempt to address the assertion failure today. If I fix that then I am keen for this to be backported.

We've got a regression (assertion failure) with this change that didn't get fixed with 2537120c870c. I've opened PR 51329 for it. If this change gets backported to release/13.x, we'll need the fix for this assertion failure to also get backported.

Please see D107348 for a simple patch that provides a fix.

Thanks for the quick turnaround. I've verified that 21ee38e24f98 fixes the original issues in our rolling tests as well as the reduced repro. I'm happy for this change to get backported along with 21ee38e24f98.

Apologies for thre slow response, I currently have covid so I needed some time off. Thanks for reporting the failure I will attempt to address the assertion failure today. If I fix that then I am keen for this to be backported.

We've got a regression (assertion failure) with this change that didn't get fixed with 2537120c870c. I've opened PR 51329 for it. If this change gets backported to release/13.x, we'll need the fix for this assertion failure to also get backported.

Please see D107348 for a simple patch that provides a fix.

Thanks for the quick turnaround. I've verified that 21ee38e24f98 fixes the original issues in our rolling tests as well as the reduced repro. I'm happy for this change to get backported along with 21ee38e24f98.

My pleasure. Thanks for verifying. Just to confirm, I will backport the required three patches to the 13.x release branch:

rG0ba8595287ea2203ef2250e2b0b41f284a055518
rG2537120c8
rG21ee38e24f9801a567306b2a88defacf6e589a8b

chrisjackson updated this revision to Diff 416852.EditedMar 21 2022, 1:43 AM

-EDIT-
Diff changed in error.

Herald added a project: Restricted Project. · View Herald TranscriptMar 21 2022, 1:43 AM
Herald added a subscriber: Groverkss. · View Herald Transcript

Restored diff. I accidentally updated this diff with a file intended for a separate issue.

bjope added a subscriber: bjope.Apr 26 2022, 2:16 AM
bjope added inline comments.
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6024

Is this some kind of optimization on IR/MIR level?

Seems like this can result in DBG_VALUE with DW_OP_LLVM_arg in the DIExpression (i.e. neither having a DIArgList or using DBG_VALUE_LIST). But it also seem like that particular scenario doesn't seem to be handled by DwarfDebug. See https://github.com/llvm/llvm-project/issues/55097

Maybe such things should be expected by all passes? Or is this just a pseudo optimization introducing a special case that complicate logic in other passes?

Well, I haven't analysed things so closely so maybe this choice is needed for some other reason.

jmorse added inline comments.Apr 26 2022, 2:56 AM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6024

(Drive-by comment) IIRC this clause is here to optimise scenarios where the expression starts with DW_OP_LLVM_arg, 0, and where there are no other arguments to the expression, meaning it could be replaced by a plain dbg.value(%123, ..., !DIExpression(blah)), rather than a variadic dbg.value. (Maybe it's faulty, I haven't dug in either).

bjope added inline comments.Apr 26 2022, 3:03 AM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6024

Ok, I see now that the comment for setShortFinalExpression actually say "omit DW_OP_llvm_arg" and not "emit DW_OP_llvm_arg" as I read it the first time.

So maybe the case here is that there are several references to "DW_OP_LLVM_arg, 0" and not only at the expression start. And then we can't just skip adding a DIArgList.

StephenTozer added inline comments.Apr 26 2022, 3:24 AM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6024

I think I see the issue here, I brought up the same thing on the more recent LSR-salvage patch: This check is overly permissive. Specifically, the check Values.size() == 1 only checks that there is only one value in the value list, not that there is only one DW_OP_LLVM_arg in the expression; i.e. if the DIExpression is !(DW_OP_LLVM_arg 0, DW_OP_LLVM_arg 0, DW_OP_plus), it would still return true. Thus, when this function sees an expression like that, it assumes that it is safe to emit the short form and drops the first operator, producing dbg.value(%value, !123, !DIExpression(DW_OP_LLVM_arg 0, DW_OP_plus).

Also, Chris's new patch stack will fix this when it lands, and I'm just about to approve it - will that be suitable, or is a temp fix needed? (For a very quick and dirty temp fix, adding && std::count_if(expr_op_iterator(Expr.begin()), expr_op_iterator(Expr.end()), [](auto ExprOp) { return ExprOp.getOp() == llvm::dwarf::DW_OP_LLVM_arg; }) == 1 would be the simplest fix.

bjope added inline comments.Apr 26 2022, 4:06 AM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6024

Thank you for quick answers and explanations!

From my point of view I'm happy to use the workaround downstream until the other patches has landed.