Page MenuHomePhabricator

Fix a LSR debug invariance issue
Needs RevisionPublic

Authored by markus on Jul 13 2022, 5:39 AM.

Details

Summary

In response to https://github.com/llvm/llvm-project/issues/56030

To be a bit more precise: LSR debug salvaging (DbgGatherSalvagableDVI) will call ScalarEvolution::getSCEV on the operands of @llvm.dbg.value. When doing this the results (SCEV -> Value and Value -> SCEV) are cached by the SCEV framework.

Later on when generating IR for the transformed loop LSRInstance::Expand ends up in SCEVExpander::FindValueInExprValueMap where the cached %inc = add i32 %0, 1 is found for SCEV (1 + %0) and that IR gets reused.

For the case where there is no @llvm.dbg.value in the input then %inc is not queried, not cached and wont be found when expanding. This results in a debug invariance issue.

If this is indeed the problem then I see two possible solutions. Either having SCEV forget about the query (and remove it from the cache) or some interface to prevent SCEV caching it in the first place.

Diff Detail

Event Timeline

markus created this revision.Jul 13 2022, 5:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2022, 5:39 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
markus requested review of this revision.Jul 13 2022, 5:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2022, 5:39 AM

Thanks for hunting this down, caching layers affecting codegen sounds hard to debug.

I'm no SCEV expert, so a couple of possibly obvious questions:

  • Is it safe to store the SCEV* for a Value that's then forgotten?
  • Is there a risk that the presence of debug-info will now cause a SCEV to be forgotten that _should_ have been remembered, because of unconditionally forgetting a SCEV if it's used by debug-info?

Thanks for hunting this down, caching layers affecting codegen sounds hard to debug.

I'm no SCEV expert, so a couple of possibly obvious questions:

  • Is it safe to store the SCEV* for a Value that's then forgotten?
  • Is there a risk that the presence of debug-info will now cause a SCEV to be forgotten that _should_ have been remembered, because of unconditionally forgetting a SCEV if it's used by debug-info?

Unsure and possibly :) I am hoping that the other reviewers will bring some clarity to these questions.

For the first question about storing a SCEV I know there was a lot of discussion about that when I did the original patch some years ago (D87494 and D91711) but since then that was fixed and it has afterwards been completely rewritten so not sure what the status is now.

For the second point maybe ScalarEvolution::getExistingSCEV() could be used to see if this SCEV existed before analyzing the dbg.value or not so that we don't end up forgetting about values that we would not forget about hadn't it been for the dbg.value.

eopXD added a subscriber: eopXD.Jul 15 2022, 2:04 AM
eopXD added inline comments.
llvm/test/Transforms/LoopStrengthReduce/dbg-inv-0.ll
9

Nit: extra space between were observed.

markus updated this revision to Diff 444919.Jul 15 2022, 2:10 AM

Fix nit.

markus marked an inline comment as done.Jul 15 2022, 2:10 AM
eopXD added a comment.Jul 15 2022, 2:13 AM

May you explain a bit how the test case shows SE.forgetValue() is preserving debug invariance. Thank you.

May you explain a bit how the test case shows SE.forgetValue() is preserving debug invariance. Thank you.

Right. The test case intends to show that for the given input LSR will give the same output regardless if debug info is stripped before running the LSR pass or after running the LSR pass (i.e .the pass being debug invariant).

Before the addition of SE.forgetValue() this was not the case and the test failed.

As I understand it LSR uses SCEVExpander to generate code for some of the expressions used by the new loop and the problem lies in that that SCEVExpander will search among (and reuse) values that have already been analyzed by SCEV (i.e. SE.getSCEV(). Now having debug intrinsics in the input means that we may do SE.getSCEV() on values that we would not have done SE.getSCEV() on otherwise and this may result in SCEVExpander generating different code which is bad.

It is still up for debate if SE.forgetValue() is the right solution but the idea was that if we immediately forget about those values that were analyzed just because of the debug intrinsic they will not end up affecting the SCEVExpander.

eopXD accepted this revision.Jul 15 2022, 2:38 AM

I see, the approach and test case make sense to me now. Thank you for the explanation.

This revision is now accepted and ready to land.Jul 15 2022, 2:38 AM
eopXD added a subscriber: nikic.Jul 15 2022, 2:46 AM

Probably can have @nikic to give a look before landing this?

markus requested review of this revision.Jul 15 2022, 2:47 AM

I'd still like to get some feedback from the author of the LSR debug salvaging (@chrisjackson) before proceeding as at the moment it is a bit unclear if it actually is safe to do a SE.forgetValue() here.

I'd still like to get some feedback from the author of the LSR debug salvaging (@chrisjackson) before proceeding as at the moment it is a bit unclear if it actually is safe to do a SE.forgetValue() here.

Hello, apologies, this only just came to my attention so I'm looking at it now! Possibly my phab email is misconfigured - I'll check that out and fix it.

Thanks for working on this. This seems a very interesting case. I had naively assumed getSCEV() was benign and would not result in creating extra information for the compiler's code generation. Obviously it is a problem that it does. On first glance your solution looks good, but I'll try and review in more detail.

I'm looking at the forgetValue() implementation. I think that as long as the SCEV for the value is preserved and accessable in SalvageDVI() then all is well in terms of the salvaging method.

Thanks for hunting this down, caching layers affecting codegen sounds hard to debug.

I'm no SCEV expert, so a couple of possibly obvious questions:

  • Is it safe to store the SCEV* for a Value that's then forgotten?
  • Is there a risk that the presence of debug-info will now cause a SCEV to be forgotten that _should_ have been remembered, because of unconditionally forgetting a SCEV if it's used by debug-info?

I think the second point is the crux here. Perhaps there is a way to test if the SCEV already exists, then call forget conditionally.

markus added a comment.Aug 3 2022, 4:23 AM

I'm looking at the forgetValue() implementation. I think that as long as the SCEV for the value is preserved and accessable in SalvageDVI() then all is well in terms of the salvaging method.

Right, I think that forgetting a SCEV does not invalidate the SCEV itself it is just that if we were to ask for it again then we would get a new object (that would still be equivalent though not the same object pointer wise).

Thanks for hunting this down, caching layers affecting codegen sounds hard to debug.

I'm no SCEV expert, so a couple of possibly obvious questions:

  • Is it safe to store the SCEV* for a Value that's then forgotten?
  • Is there a risk that the presence of debug-info will now cause a SCEV to be forgotten that _should_ have been remembered, because of unconditionally forgetting a SCEV if it's used by debug-info?

I think the second point is the crux here. Perhaps there is a way to test if the SCEV already exists, then call forget conditionally.

A few comments up I wrote which I think does what you are requesting

For the second point maybe ScalarEvolution::getExistingSCEV() could be used to see if this SCEV existed before analyzing the dbg.value or not so that we don't end up forgetting about values that we would not forget about hadn't it been for the dbg.value.

If that makes sense I will update the patch.

I'm looking at the forgetValue() implementation. I think that as long as the SCEV for the value is preserved and accessable in SalvageDVI() then all is well in terms of the salvaging method.

Right, I think that forgetting a SCEV does not invalidate the SCEV itself it is just that if we were to ask for it again then we would get a new object (that would still be equivalent though not the same object pointer wise).

Thanks for hunting this down, caching layers affecting codegen sounds hard to debug.

I'm no SCEV expert, so a couple of possibly obvious questions:

  • Is it safe to store the SCEV* for a Value that's then forgotten?
  • Is there a risk that the presence of debug-info will now cause a SCEV to be forgotten that _should_ have been remembered, because of unconditionally forgetting a SCEV if it's used by debug-info?

I think the second point is the crux here. Perhaps there is a way to test if the SCEV already exists, then call forget conditionally.

A few comments up I wrote which I think does what you are requesting

For the second point maybe ScalarEvolution::getExistingSCEV() could be used to see if this SCEV existed before analyzing the dbg.value or not so that we don't end up forgetting about values that we would not forget about hadn't it been for the dbg.value.

If that makes sense I will update the patch.

Apologies, I overlooked your earlier response to that point. Your solution sounds good to me.

markus updated this revision to Diff 449653.Aug 3 2022, 6:39 AM

Not sure if this ended up so well as it appears that getExistingSCEV was a private method and I am uncertain if we can or should make it public. Either way if we decide to go this route then we probably should add explicit tests for this part as well (i.e. to make sure that the forgetting of SCEV does not produce another invariance issue).

Not sure if this ended up so well as it appears that getExistingSCEV was a private method and I am uncertain if we can or should make it public. Either way if we decide to go this route then we probably should add explicit tests for this part as well (i.e. to make sure that the forgetting of SCEV does not produce another invariance issue).

I similarly added a public method in rG116dc70 in an early SCEV-based salvaging patch. Understandably, I don't think it was envisaged at the time of implementation that SCEVs would be used for salvaging. I agree we should be careful.

fhahn added a comment.Aug 3 2022, 9:21 AM

I think I am missing some context here, but this looks a bit suspicious. I am not sure if exposing and checking whether a SCEV exists is the right way to go.

IIUC the main issue seems to be that we try to create SCEVs for all dbg intrinsics? Would it be possible to instead only get SCEV expressions for values we really need (that/s the induction variable I assume, for which there already should be SCEV expression)?

I think I am missing some context here, but this looks a bit suspicious. I am not sure if exposing and checking whether a SCEV exists is the right way to go.

Right. I am not so sure this is a good idea either.

The underlaying problem, as I understand it and see it, is that due to the caching in SE.ExprValueMap past calls to getSCEV will affect IR generated by future calls to SCEVExpander and this is not obvious. While it is a reasonable optimization to try and reuse IR during SCEVExpander based on what has already been analyzed I think this 'side-effect' of getSCEV is somewhat unfortunate in the context of debug intrinsics that are not supposed to affect code generation.

Another, also ugly, solution could be to add a flag to getSCEV so that caching can be inhibited for certain calls.

IIUC the main issue seems to be that we try to create SCEVs for all dbg intrinsics? Would it be possible to instead only get SCEV expressions for values we really need (that/s the induction variable I assume, for which there already should be SCEV expression)?

That is possibly a better solution but I do not have sufficient insight into the salvaging algorithm to comment on that (@chrisjackson)

I think I am missing some context here, but this looks a bit suspicious. I am not sure if exposing and checking whether a SCEV exists is the right way to go.

IIUC the main issue seems to be that we try to create SCEVs for all dbg intrinsics? Would it be possible to instead only get SCEV expressions for values we really need (that/s the induction variable I assume, for which there already should be SCEV expression)?

It is necessary to obtain SCEVs for all the values used in dbg.value within the loop, as LSR may optimise out these values and these are salvaged by generating a DWARF expression from the SCEV expression. For further explanation see:
D105207 and the EuroLLVM2022 talk on the method here :SCEV-Based debuginfo salvaging in Loop Strength Reduction.

fhahn added a comment.Aug 4 2022, 5:42 AM

I think I am missing some context here, but this looks a bit suspicious. I am not sure if exposing and checking whether a SCEV exists is the right way to go.

IIUC the main issue seems to be that we try to create SCEVs for all dbg intrinsics? Would it be possible to instead only get SCEV expressions for values we really need (that/s the induction variable I assume, for which there already should be SCEV expression)?

It is necessary to obtain SCEVs for all the values used in dbg.value within the loop, as LSR may optimise out these values and these are salvaged by generating a DWARF expression from the SCEV expression. For further explanation see:
D105207 and the EuroLLVM2022 talk on the method here :SCEV-Based debuginfo salvaging in Loop Strength Reduction.

Right, but wouldn't it be sufficient to collect all expressions LSR optimizes? For those, it will already have created SCEV expressions for. This might be more difficult, but I am wondering if it would be feasible at least in theory.

mkazantsev requested changes to this revision.Aug 5 2022, 1:25 AM

The connection between described problem (as far as I understood, the fact that debuginfo changes IR is treated as a problem) and the proposed fix is not clear to me. The patch absolutely needs a detailed explanation, why existing SCEVs is a problem, in the commit message.

Note that, in presence of potential poison values, SCEV as well as many transform passes should treat it correctly. Sometimes "correctly" means "very conservatively", and this is what most likely happened here. Look at this:

https://godbolt.org/z/sbshjW1nM
https://godbolt.org/z/3fM8Exq6x

I've annotated the function's argument as noundef, and now there is no difference b/w presenting and not-presenting dbgs is in no-wrap flags in IR. This flag is there because poison argument of a call is UB by specification (didn't really check for dbg.value, but should be).

The issue will be gone if you annotate incoming parameters as noundef. Does your frontent allow this for this function?

This revision now requires changes to proceed.Aug 5 2022, 1:25 AM

I think I am missing some context here, but this looks a bit suspicious. I am not sure if exposing and checking whether a SCEV exists is the right way to go.

IIUC the main issue seems to be that we try to create SCEVs for all dbg intrinsics? Would it be possible to instead only get SCEV expressions for values we really need (that/s the induction variable I assume, for which there already should be SCEV expression)?

It is necessary to obtain SCEVs for all the values used in dbg.value within the loop, as LSR may optimise out these values and these are salvaged by generating a DWARF expression from the SCEV expression. For further explanation see:
D105207 and the EuroLLVM2022 talk on the method here :SCEV-Based debuginfo salvaging in Loop Strength Reduction.

Right, but wouldn't it be sufficient to collect all expressions LSR optimizes? For those, it will already have created SCEV expressions for. This might be more difficult, but I am wondering if it would be feasible at least in theory.

Currently the information for the dbg.value within the loop is cached before LSR executes, with DbgGatherSalvagableDVI(). If the implementation was changed to cache information on only the values that LSR optimises out, then the cacheing code would have to be moved inside the core LSR pass, I believe. I think this would require a significant change in the scev-salvaging implementation and also modifying the LSRInstance interface and implementation. I'm not keen on that idea, but I can have a further think and see if there is a better way.

markus edited the summary of this revision. (Show Details)Aug 8 2022, 12:54 AM
markus updated this revision to Diff 450721.Aug 8 2022, 12:57 AM

Removed nsw flags from test to avoid confusion. AFAICT they have no relevance here.

markus added a comment.Aug 8 2022, 1:03 AM

The connection between described problem (as far as I understood, the fact that debuginfo changes IR is treated as a problem) and the proposed fix is not clear to me. The patch absolutely needs a detailed explanation, why existing SCEVs is a problem, in the commit message.

I have update the review summary.

Note that, in presence of potential poison values, SCEV as well as many transform passes should treat it correctly. Sometimes "correctly" means "very conservatively", and this is what most likely happened here. Look at this:

https://godbolt.org/z/sbshjW1nM
https://godbolt.org/z/3fM8Exq6x

I've annotated the function's argument as noundef, and now there is no difference b/w presenting and not-presenting dbgs is in no-wrap flags in IR. This flag is there because poison argument of a call is UB by specification (didn't really check for dbg.value, but should be).

The issue will be gone if you annotate incoming parameters as noundef. Does your frontent allow this for this function?

I'm afraid that I do not understand this part at all. AFAICT the no-wrap flags are unrelated.

chrisjackson accepted this revision.EditedAug 9 2022, 4:50 AM

I think this is a succinct solution and I don't perceive a problem with exposing the extra SCEV method, as I've done something similar myself in the past, as mentioned above. I would like to see a comment explaining the need to call forget(), otherwise lgtm.

markus updated this revision to Diff 451121.Aug 9 2022, 5:38 AM

Added code comments.

mkazantsev requested changes to this revision.EditedAug 11 2022, 2:08 AM

Did you consider that forgetValue in fact does not only forget the value you are aiming, but also all its users recursively? Not SCEV users (they likely don't exist), but instruction users.

Imagine that some of the users had no-wrap flags inferred from other queries. Now you forget it, and the flags get lost. Won't this introruce a similar issue in another place? It may also affect some loops into which these values are involved.

Generally, exposing API about existing SCEVs isn't a great idea, and I don't see enough justification in your case.

The best alternative I can suggest is to create a new ScalarEvolution instance and use it for dbg value queries. It should be good enough if the only purpose of that is to find undefs.

Anyways, why do you need SCEV at all to find undefs? Can we instead traverse through instructions and get rid of SCEV query at all?

This revision now requires changes to proceed.Aug 11 2022, 2:08 AM

The best alternative I can suggest is to create a new ScalarEvolution instance and use it for dbg value queries. It should be good enough if the only purpose of that is to find undefs.

That does sound like a very tempting alternative to be sure that these sort of issues do not crop up again. Not sure if it is possible though (@chrisjackson?)

Did you consider that forgetValue in fact does not only forget the value you are aiming, but also all its users recursively? Not SCEV users (they likely don't exist), but instruction users.

Imagine that some of the users had no-wrap flags inferred from other queries. Now you forget it, and the flags get lost. Won't this introruce a similar issue in another place? It may also affect some loops into which these values are involved.

Generally, exposing API about existing SCEVs isn't a great idea, and I don't see enough justification in your case.

The best alternative I can suggest is to create a new ScalarEvolution instance and use it for dbg value queries. It should be good enough if the only purpose of that is to find undefs.

Anyways, why do you need SCEV at all to find undefs? Can we instead traverse through instructions and get rid of SCEV query at all?

SCEV is not being used to find undefs. The SCEV for a location referenced in a dbg.value is recorded, and if that location is optimised away, then the SCEV is used to generate a DWARF expression that computes the value of the locations and allows the dbg.value to be retained instead of dropped.

The best alternative I can suggest is to create a new ScalarEvolution instance and use it for dbg value queries. It should be good enough if the only purpose of that is to find undefs.

That does sound like a very tempting alternative to be sure that these sort of issues do not crop up again. Not sure if it is possible though (@chrisjackson?)

I don't like the sound of duplicaring an entire ScalarEvolution, at first glance it seems expensive to me. I prefer a solution that avoids side-effects of obtaining the SCEV for a value.

I think from the discussion so far, I would personally think the best answer would be to take the alternative route of modifying getSCEV to be able to not add items to the map when a flag IsDebug or such is added. It may be slightly difficult since createSCEVIter modifies the map as it recursively traverses the Value's operands and uses the modifications to complete the requested SCEV. That problem is even worse for this solution however; suppose instead of %inc, the debug value referred to a value %m = mul i32 %inc 2; in that case, getScev(%m) would cache a SCEV expr for %inc along the way, and we wouldn't be calling forgetValue on it since it's not the value we requested! I'd say it's definitely better to have all the messiness confined to just getSCEV and make a slight interface modification personally. Though I'm also surprised that just adding something to the cache is modifying the output - I would have expected that the newly cached SCEV expr would be used only if necessary (i.e. we would otherwise need to generate instructions), and if it was necessary then SCEV would search for an expression regardless. Is this a bug in SCEV, or just a gap left for performance reasons?

Did you consider that forgetValue in fact does not only forget the value you are aiming, but also all its users recursively? Not SCEV users (they likely don't exist), but instruction users.

Ah, that rather sounds like a blocker -- if there's no way to limit what's forgotten to "just the things unique to this Value", it's going to be impossible to peel apart "what was important to debug-info" and "what was important to codegen". A quick look at the SCEV methods suggests that plumbing in an argument flag is going to be pretty invasive, ScalarEvolution::getSCEV fans out to a lot of functions with a lot of logic.

Stephen wrote:

Though I'm also surprised that just adding something to the cache is modifying the output

I imagine you might find a different way expanding expressions for Values, depending on which PHIs you look at first, which changes if you look at the dbg.values first.

The best alternative I can suggest is to create a new ScalarEvolution instance and use it for dbg value queries. It should be good enough if the only purpose of that is to find undefs.

I'm in two minds about this -- it solves all the problems raised (obviously), on the other hand I imagine it doubles the amount of work done, which isn't ideal. However, I think the reason why we're using ScalarEvolution in this way is because all the PHIs might get replaced during loop-strength-reduction, and we need to use ScalarEvolution's map of old-to-new PHIs to reconstruct the dbg.values operands -- is that right @chrisjackson ?

StephenTozer added a comment.EditedAug 11 2022, 9:49 AM

I imagine you might find a different way expanding expressions for Values, depending on which PHIs you look at first, which changes if you look at the dbg.values first.

I thought that at first and was wondering if it was just an order-determined thing, which is always possible but could potentially be avoided by using something other than insertion order; in this case though it looks as though in the absence of the debug value, it does choose to insert a new instruction (%4 = add i32 %0, 1) when a valid one already exists (%inc = add nsw i32 %0, 1). It's not particularly relevant, since the existing behaviour is what it is and the patch needs to take that into account, it's just a slight surprise-factor that calling a get function and/or adding an item to a cache causes an instruction to be removed, although in the end it's almost sure to be removed anyway.

Did you consider that forgetValue in fact does not only forget the value you are aiming, but also all its users recursively? Not SCEV users (they likely don't exist), but instruction users.

Ah, that rather sounds like a blocker -- if there's no way to limit what's forgotten to "just the things unique to this Value", it's going to be impossible to peel apart "what was important to debug-info" and "what was important to codegen". A quick look at the SCEV methods suggests that plumbing in an argument flag is going to be pretty invasive, ScalarEvolution::getSCEV fans out to a lot of functions with a lot of logic.

Stephen wrote:

Though I'm also surprised that just adding something to the cache is modifying the output

I imagine you might find a different way expanding expressions for Values, depending on which PHIs you look at first, which changes if you look at the dbg.values first.

The best alternative I can suggest is to create a new ScalarEvolution instance and use it for dbg value queries. It should be good enough if the only purpose of that is to find undefs.

I'm in two minds about this -- it solves all the problems raised (obviously), on the other hand I imagine it doubles the amount of work done, which isn't ideal. However, I think the reason why we're using ScalarEvolution in this way is because all the PHIs might get replaced during loop-strength-reduction, and we need to use ScalarEvolution's map of old-to-new PHIs to reconstruct the dbg.values operands -- is that right @chrisjackson ?

That's correct. LSR can rewrite PHIs and create new Induction Variables (IVs), eliminating the old ones. Combining the LSR IVs with the SCEVs obtained for optimised-out locations enables reconstruction of the values.