Page MenuHomePhabricator

SCEV add function to see if SCEVUnknown is null
Needs ReviewPublic

Authored by markus on Wed, Nov 18, 7:27 AM.

Details

Summary

I tried raising the question if a SCEV expression should be analyzable even if part of the underlaying IR has been removed on llvm-dev without much response.

http://lists.llvm.org/pipermail/llvm-dev/2020-November/146636.html

This review is to take it one step further to see if that gives some reaction.

So my, possibly controversial, suggestion is that yes SCEV expressions should be analyzable even though part of the IR that was analyzed to build it has later been removed. Obviously the IR that can be removed is only the part that SCEV has already successfully "analyzed through" so if any SCEVUnknown points at IR that has been removed (i.e. the value pointer has been nulled) then the SCEV expression can no longer be analyzed.

This review adds a function ScalarEvolution::hasNulledUnknown to determine if a SCEV expression can still be analyzed or not.

For background see the llvm-dev post above.

Diff Detail

Unit TestsFailed

TimeTest
370 mslinux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp
Script: -- : 'RUN: at line 3'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang --driver-mode=g++ -m64 -gline-tables-only -fsanitize=hwaddress -fuse-ld=lld -mcmodel=large -mllvm -hwasan-globals -mllvm -hwasan-use-short-granules -mllvm -hwasan-instrument-landing-pads=0 -mllvm -hwasan-instrument-personality-functions /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/hwasan/TestCases/sizes.cpp -nostdlib++ -lstdc++ -o /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/hwasan/X86_64/TestCases/Output/sizes.cpp.tmp

Event Timeline

markus created this revision.Wed, Nov 18, 7:27 AM
markus requested review of this revision.Wed, Nov 18, 7:27 AM
nikic added inline comments.Tue, Nov 24, 11:26 AM
llvm/lib/Analysis/ScalarEvolution.cpp
12469

This method is the same as checkValidity(). I can't really comment on whether exposing it publicly makes sense or not.

The idea of having SCEV pointing to deleted instructions scares me. Imagine what if we had SCEVAddRecs referencing the loops we've already deleted. There is a vast field for nasty bugs caused by UB and memory corruption. I'd rather expect that we fail some assertion if we try to optimize with SCEV in that state.

The idea of having SCEV pointing to deleted instructions scares me. Imagine what if we had SCEVAddRecs referencing the loops we've already deleted. There is a vast field for nasty bugs caused by UB and memory corruption. I'd rather expect that we fail some assertion if we try to optimize with SCEV in that state.

I agree. Clearly such a SCEV would not be valid and should no longer be used for any purpose at all. However by calling checkValidity() a client could guard against such situations if the client knows that part of the IR may have been deleted during transformation. For the existing SCEV clients (except for the LSR debug salvaging use I did and mentioned in the llvm-dev post) they already know that IR is still around, and nothing needs to be done.

To me it seems that functionality is already in place and it is mostly a matter of clarification (and if favorable making checkValidity() public).

I think there are meaningful use cases such as https://reviews.llvm.org/D87494 but of course if a decision is taken in the opposite direction then that patch needs to be reverted. Either way is fine with me as long as it is well-founded.

llvm/lib/Analysis/ScalarEvolution.cpp
12469

That is very interesting, I had not noticed that before. I see that checkValidity() is only used by getExistingSCEV() which appears to be some kind of caching mechanism.

Clearly one do not want to return a cached SCEV result if the expression has a SCEVUknown pointing at some Value that has been deleted.

Do we know at which point LSR deletes the code? To me it looks like we just forgot to call forgetLoop or smth like this somewhere.

Do we know at which point LSR deletes the code? To me it looks like we just forgot to call forgetLoop or smth like this somewhere.

I have not looked closely where LSR removes the code but I am not sure how that would help.

The point with the debug salvaging we try to do in that pass is that we look at the llvm.dbg.value intrinsics and then compute a SCEV for the value it references. We store that SCEV. Then after the transformation we see if any of the intrinsics now reference undef (remember that debug intrinsics don’t count as real uses and hence predecessors may get deleted). If an intrinsic now references undef we look at its SCEV (that we stored previously). At this point that SCEV for sure represents some IR that has been deleted (well actually rewritten by LSR). The usual case at this point is that the SCEVUnknowns of the SCEV expression are still around. If this is the case we proceed to compare this SCEV of the old value to other SCEVs of things that are around after LSR (e.g. the phi-nodes of the loop header) to see if we can replace the undef reference of the intrinsic with any of these.

So the point is that we expect some of the IR that the SCEV was based on to have been deleted but only the IR that is already captured by the SCEV (i.e. not SCEVUnknown). Does that make any sense?

mkazantsev added a comment.EditedWed, Nov 25, 11:39 PM

My understanding of this situation is that, whenever we delete an instruction that may be referenced by SCEV, we should call forgetValue for it. SCEV that contains references on deleted instructions is invalid. Faling to forget deleted values is a bug. I don't understand why llvm.dbg.value is somehow special about it. This patch tries to hide the consequences of the bug instead of fixing it.