This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
-
ScalarEvolution.h
-
lib/
-
Analysis/
1/2
ScalarEvolution.cpp
-
Transforms/Scalar/
-
Scalar/
-
LoopStrengthReduce.cpp

Differential D91711

SCEV add function to see if SCEVUnknown is null
AbandonedPublic

Authored by markus on Nov 18 2020, 7:27 AM.

Download Raw Diff

Details

Reviewers

reames
nikic
mkazantsev
lebedev.ri
jmorse

Summary

I tried raising the question if a SCEV expression should be analyzable even if part of the underlaying IR has been removed on llvm-dev without much response.

http://lists.llvm.org/pipermail/llvm-dev/2020-November/146636.html

This review is to take it one step further to see if that gives some reaction.

So my, possibly controversial, suggestion is that yes SCEV expressions should be analyzable even though part of the IR that was analyzed to build it has later been removed. Obviously the IR that can be removed is only the part that SCEV has already successfully "analyzed through" so if any SCEVUnknown points at IR that has been removed (i.e. the value pointer has been nulled) then the SCEV expression can no longer be analyzed.

This review adds a function ScalarEvolution::hasNulledUnknown to determine if a SCEV expression can still be analyzed or not.

For background see the llvm-dev post above.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	370 ms	linux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp

Event Timeline

markus created this revision.Nov 18 2020, 7:27 AM

Herald added subscribers: llvm-commits, javed.absar, hiraditya. · View Herald TranscriptNov 18 2020, 7:27 AM

markus requested review of this revision.Nov 18 2020, 7:27 AM

Harbormaster completed remote builds in B79293: Diff 306101.Nov 18 2020, 8:09 AM

markus added a reviewer: reames.Nov 18 2020, 11:40 PM

markus added reviewers: nikic, mkazantsev, lebedev.ri, jmorse.Nov 23 2020, 11:57 PM

nikic added inline comments.Nov 24 2020, 11:26 AM

llvm/lib/Analysis/ScalarEvolution.cpp
12469	This method is the same as `checkValidity()`. I can't really comment on whether exposing it publicly makes sense or not.

The idea of having SCEV pointing to deleted instructions scares me. Imagine what if we had SCEVAddRecs referencing the loops we've already deleted. There is a vast field for nasty bugs caused by UB and memory corruption. I'd rather expect that we fail some assertion if we try to optimize with SCEV in that state.

In D91711#2415349, @mkazantsev wrote:

The idea of having SCEV pointing to deleted instructions scares me. Imagine what if we had SCEVAddRecs referencing the loops we've already deleted. There is a vast field for nasty bugs caused by UB and memory corruption. I'd rather expect that we fail some assertion if we try to optimize with SCEV in that state.

I agree. Clearly such a SCEV would not be valid and should no longer be used for any purpose at all. However by calling checkValidity() a client could guard against such situations if the client knows that part of the IR may have been deleted during transformation. For the existing SCEV clients (except for the LSR debug salvaging use I did and mentioned in the llvm-dev post) they already know that IR is still around, and nothing needs to be done.

To me it seems that functionality is already in place and it is mostly a matter of clarification (and if favorable making checkValidity() public).

I think there are meaningful use cases such as https://reviews.llvm.org/D87494 but of course if a decision is taken in the opposite direction then that patch needs to be reverted. Either way is fine with me as long as it is well-founded.

llvm/lib/Analysis/ScalarEvolution.cpp
12469	That is very interesting, I had not noticed that before. I see that `checkValidity()` is only used by `getExistingSCEV()` which appears to be some kind of caching mechanism. Clearly one do not want to return a cached SCEV result if the expression has a SCEVUknown pointing at some Value that has been deleted.

Do we know at which point LSR deletes the code? To me it looks like we just forgot to call forgetLoop or smth like this somewhere.

In D91711#2415769, @mkazantsev wrote:

Do we know at which point LSR deletes the code? To me it looks like we just forgot to call forgetLoop or smth like this somewhere.

I have not looked closely where LSR removes the code but I am not sure how that would help.

The point with the debug salvaging we try to do in that pass is that we look at the llvm.dbg.value intrinsics and then compute a SCEV for the value it references. We store that SCEV. Then after the transformation we see if any of the intrinsics now reference undef (remember that debug intrinsics don’t count as real uses and hence predecessors may get deleted). If an intrinsic now references undef we look at its SCEV (that we stored previously). At this point that SCEV for sure represents some IR that has been deleted (well actually rewritten by LSR). The usual case at this point is that the SCEVUnknowns of the SCEV expression are still around. If this is the case we proceed to compare this SCEV of the old value to other SCEVs of things that are around after LSR (e.g. the phi-nodes of the loop header) to see if we can replace the undef reference of the intrinsic with any of these.

So the point is that we expect some of the IR that the SCEV was based on to have been deleted but only the IR that is already captured by the SCEV (i.e. not SCEVUnknown). Does that make any sense?

My understanding of this situation is that, whenever we delete an instruction that may be referenced by SCEV, we should call forgetValue for it. SCEV that contains references on deleted instructions is invalid. Faling to forget deleted values is a bug. I don't understand why llvm.dbg.value is somehow special about it. This patch tries to hide the consequences of the bug instead of fixing it.

Right, there is nothing special with llvm.dbg.value besides that it is a debug intrinsic so it is not allowed to affect optimizations and the fact that such intrinsic uses another Value may not inhibit transformation on that Value. I just wanted to point it out since the situation that predecessors of an instruction get removed without the instruction itself is a bit unusual (but otherwise unimportant for this discussion).

Ok, so whenever an instruction gets deleted we should call forgetValue on it to make sure that ScalarEvolution clears any SCEV based on it from its cache. That also suggest that I am not allowed to store any SCEV but must always call ScalarEvolution::getSCEV (at least if I start making modifications to the IR)?

So in order to do https://reviews.llvm.org/D87494 we would need to introduce a new representation that is a more lightweight version of the SCEV (lets call it MySCEV). Before LSR starts its transformation we build a MySCEV based on the SCEV for each intrinsic and store away the former.

Later after transformation we again build MySCEVs based on the SCEV of phi-nodes and then do the comparison in the MySCEV domain.

Sure that would work but it seems a bit cumbersome to basically replicate what we already have (although it would be much more lightweight than a full SCEV). At least with a new representation there would be less restrictions on how it could be used but I am not sure reviewers would be too happy with the duplication.

But I don’t know. I am fine with either way (or simply just reverting https://reviews.llvm.org/D87494) as long as a sufficient number of people think it is a good idea.

Skimming the review and discussion, I think this patch is moving in the wrong direction. The basic issue appears to be that a pass is holding on to the result of an analysis after mutating the IR without appropriately updating the cached information. That's a bug for any analysis, not just SCEV. Say you held a pointer to a Loop object and then broke the backedge in the IR; that's a bug. I will admit that the existing SCEV code appears to support some mutation of the IR, but replace all uses with is different from true deletion.

Purely from the discussion in this bug (i.e. I haven't studied the code of the other patch), it sounds like the debug info salvaging described is wrong and needs reverted.

If I understand what that patch is doing (as described here), you might be able to get the same effect by computing the set of equivalent values before rewriting, hang onto the IR Values with a ValueHandle to detect deletion, and then rewrite.

The other option would be just to treat debug info uses as first class uses during LSR rewriting.

+1, i'm also uncomfortable with both the direction and approach.

markus mentioned this in rG808fcfe59447: Revert "[DebugInfo] Improve dbg preservation in LSR.".Nov 26 2020, 11:55 PM

Thanks for the clarification. The offending patch has been reverted now (in 808fcfe5944755f0).

I will probably look into other options such the one described by trying to determine the SCEV equivalence before anything has been deleted.

markus abandoned this revision.Dec 1 2020, 7:51 AM

markus mentioned this in D87494: Improve LSR debug-info.Dec 1 2020, 8:19 AM

markus mentioned this in D129636: Fix a LSR debug invariance issue.Jul 13 2022, 6:20 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

ScalarEvolution.h

3 lines

lib/

Analysis/

ScalarEvolution.cpp

7 lines

Transforms/

Scalar/

LoopStrengthReduce.cpp

2 lines

Diff 306101

llvm/include/llvm/Analysis/ScalarEvolution.h

Show First 20 Lines • Show All 1,026 Lines • ▼ Show 20 Lines	public:

/// Return true if elements that makes up the given SCEV properly dominate		/// Return true if elements that makes up the given SCEV properly dominate
/// the specified basic block.		/// the specified basic block.
bool properlyDominates(const SCEV S, const BasicBlock BB);		bool properlyDominates(const SCEV S, const BasicBlock BB);

/// Test whether the given SCEV has Op as a direct or indirect operand.		/// Test whether the given SCEV has Op as a direct or indirect operand.
bool hasOperand(const SCEV S, const SCEV Op) const;		bool hasOperand(const SCEV S, const SCEV Op) const;

		/// Test whether the given SCEV has a SCEVUnknown that has been nulled.
		bool hasNulledUnknown(const SCEV *S) const;

/// Return the size of an element read or written by Inst.		/// Return the size of an element read or written by Inst.
const SCEV getElementSize(Instruction Inst);		const SCEV getElementSize(Instruction Inst);

/// Compute the array dimensions Sizes from the set of Terms extracted from		/// Compute the array dimensions Sizes from the set of Terms extracted from
/// the memory access function of this SCEVAddRecExpr (second step of		/// the memory access function of this SCEVAddRecExpr (second step of
/// delinearization).		/// delinearization).
void findArrayDimensions(SmallVectorImpl<const SCEV *> &Terms,		void findArrayDimensions(SmallVectorImpl<const SCEV *> &Terms,
SmallVectorImpl<const SCEV *> &Sizes,		SmallVectorImpl<const SCEV *> &Sizes,
▲ Show 20 Lines • Show All 1,157 Lines • Show Last 20 Lines

llvm/lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 12,455 Lines • ▼ Show 20 Lines
	bool ScalarEvolution::properlyDominates(const SCEV S, const BasicBlock BB) {			bool ScalarEvolution::properlyDominates(const SCEV S, const BasicBlock BB) {
	return getBlockDisposition(S, BB) == ProperlyDominatesBlock;			return getBlockDisposition(S, BB) == ProperlyDominatesBlock;
	}			}

	bool ScalarEvolution::hasOperand(const SCEV S, const SCEV Op) const {			bool ScalarEvolution::hasOperand(const SCEV S, const SCEV Op) const {
	return SCEVExprContains(S, [&](const SCEV *Expr) { return Expr == Op; });			return SCEVExprContains(S, [&](const SCEV *Expr) { return Expr == Op; });
	}			}

				bool ScalarEvolution::hasNulledUnknown(const SCEV *S) const {
				return SCEVExprContains(S, [&](const SCEV *Expr) {
				auto U = dyn_cast<SCEVUnknown>(Expr);
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: 'auto U' can be declared as 'const auto U' [llvm-qualified-auto] not useful Lint: Pre-merge checks:* clang-tidy: warning: 'auto U' can be declared as 'const auto *U' [llvm-qualified-auto] [[https…
				return U && !U->getValPtr();
				});
				}
				nikicUnsubmitted Not Done Reply Inline Actions This method is the same as `checkValidity()`. I can't really comment on whether exposing it publicly makes sense or not. nikic: This method is the same as `checkValidity()`. I can't really comment on whether exposing it…
				markusAuthorUnsubmitted Done Reply Inline Actions That is very interesting, I had not noticed that before. I see that `checkValidity()` is only used by `getExistingSCEV()` which appears to be some kind of caching mechanism. Clearly one do not want to return a cached SCEV result if the expression has a SCEVUknown pointing at some Value that has been deleted. markus: That is very interesting, I had not noticed that before. I see that `checkValidity()` is only…

	bool ScalarEvolution::ExitLimit::hasOperand(const SCEV *S) const {			bool ScalarEvolution::ExitLimit::hasOperand(const SCEV *S) const {
	auto IsS = [&](const SCEV *X) { return S == X; };			auto IsS = [&](const SCEV *X) { return S == X; };
	auto ContainsS = [&](const SCEV *X) {			auto ContainsS = [&](const SCEV *X) {
	return !isa<SCEVCouldNotCompute>(X) && SCEVExprContains(X, IsS);			return !isa<SCEVCouldNotCompute>(X) && SCEVExprContains(X, IsS);
	};			};
	return ContainsS(ExactNotTaken) \|\| ContainsS(MaxNotTaken);			return ContainsS(ExactNotTaken) \|\| ContainsS(MaxNotTaken);
	}			}

	▲ Show 20 Lines • Show All 879 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp

Show First 20 Lines • Show All 5,837 Lines • ▼ Show 20 Lines	#endif
if (Changed) {		if (Changed) {
for (auto &D : DbgValues) {		for (auto &D : DbgValues) {
auto DbgValue = std::get<DbgValueInst *>(D);		auto DbgValue = std::get<DbgValueInst *>(D);
auto DbgValueType = std::get<const Type *>(D);		auto DbgValueType = std::get<const Type *>(D);
auto DbgValueSCEV = std::get<const SCEV *>(D);		auto DbgValueSCEV = std::get<const SCEV *>(D);
auto DbgDIExpr = std::get<DIExpression *>(D);		auto DbgDIExpr = std::get<DIExpression *>(D);
if (!isa<UndefValue>(DbgValue->getVariableLocation()))		if (!isa<UndefValue>(DbgValue->getVariableLocation()))
continue;		continue;
		if (SE.hasNulledUnknown(DbgValueSCEV))
		continue;
for (PHINode &Phi : L->getHeader()->phis()) {		for (PHINode &Phi : L->getHeader()->phis()) {
if (DbgValueType != Phi.getType())		if (DbgValueType != Phi.getType())
continue;		continue;
if (!SE.isSCEVable(Phi.getType()))		if (!SE.isSCEVable(Phi.getType()))
continue;		continue;
auto PhiSCEV = SE.getSCEV(&Phi);		auto PhiSCEV = SE.getSCEV(&Phi);
if (Optional<APInt> Offset =		if (Optional<APInt> Offset =
SE.computeConstantDifference(DbgValueSCEV, PhiSCEV)) {		SE.computeConstantDifference(DbgValueSCEV, PhiSCEV)) {
▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines